Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: intel-xe@lists.freedesktop.org,
	"Christian König" <christian.koenig@amd.com>,
	"Somalapuram Amaranath" <Amaranath.Somalapuram@amd.com>,
	dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v5 05/12] drm/ttm: Provide a generic LRU walker helper
Date: Wed, 19 Jun 2024 15:09:52 +0000	[thread overview]
Message-ID: <ZnL0wKcnTLXw5FsY@DUT025-TGLU.fm.intel.com> (raw)
In-Reply-To: <ea9a30622e4d2e773eede7afe9d60891eea0c7c1.camel@linux.intel.com>

On Wed, Jun 19, 2024 at 09:31:33AM +0200, Thomas Hellström wrote:
> Hi, Matthew. 
> 
> Thanks for reviewing.
> 
> On Tue, 2024-06-18 at 22:11 +0000, Matthew Brost wrote:
> > On Tue, Jun 18, 2024 at 09:18:13AM +0200, Thomas Hellström wrote:
> > 
> > Replying to correct version...
> > 
> > > Provide a generic LRU walker in TTM, in the spirit of
> > > drm_gem_lru_scan()
> > > but building on the restartable TTM LRU functionality.
> > > 
> > > The LRU walker optionally supports locking objects as part of
> > > a ww mutex locking transaction, to mimic to some extent the
> > > current functionality in ttm. However any -EDEADLK return
> > > is converted to -ENOMEM, so that the driver will need to back
> > > off and possibly retry without being able to keep the
> > > ticket.
> > > 
> > 
> > Wouldn't the backoff be unlock everything but keep the ticket?
> 
> We can't do that (yet) since we don't have the drm_exec or similar
> functionality. The missing part is that if keep the ticket, it's in
> contended state which means we need to slow-lock the contending lock as
> the first lock. And the caller doesn't know which lock is the
> contending one....
> 
> That is all addressed in the RFC part of the series that I left out for
> now. This is only trying to mimic current functionality.
> 

Got it.

> > 
> > > v3:
> > > - Move the helper to core ttm.
> > > - Remove the drm_exec usage from it for now, it will be
> > >   reintroduced later in the series.
> > > v4:
> > > - Handle the -EALREADY case if ticketlocking.
> > > 
> > > Cc: Christian König <christian.koenig@amd.com>
> > > Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
> > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > Cc: <dri-devel@lists.freedesktop.org>
> > > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/ttm/ttm_bo_util.c | 145
> > > ++++++++++++++++++++++++++++++
> > >  include/drm/ttm/ttm_bo.h          |  32 +++++++
> > >  2 files changed, 177 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > index 0b3f4267130c..45fcaf6f8644 100644
> > > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > @@ -768,3 +768,148 @@ int ttm_bo_pipeline_gutting(struct
> > > ttm_buffer_object *bo)
> > >  	ttm_tt_destroy(bo->bdev, ttm);
> > >  	return ret;
> > >  }
> > > +
> > > +static bool ttm_lru_walk_trylock(struct ttm_lru_walk *walk,
> > > +				 struct ttm_buffer_object *bo,
> > > +				 bool *needs_unlock)
> > > +{
> > > +	struct ttm_operation_ctx *ctx = walk->ctx;
> > > +
> > > +	*needs_unlock = false;
> > > +
> > > +	if (dma_resv_trylock(bo->base.resv)) {
> > > +		*needs_unlock = true;
> > > +		return true;
> > > +	}
> > > +
> > > +	if (bo->base.resv == ctx->resv && ctx->allow_res_evict) {
> > > +		dma_resv_assert_held(bo->base.resv);
> > > +		return true;
> > > +	}
> > > +i
> > 
> > Any reason this is done after the try lock? Just kinda goofy as if
> > this
> > statement is true the dma_resv_trylock will always fail.
> 
> It should work either way. I guess I had viewed it as "trylock first,
> if that fails, attempt any exception". I guess if we want to optimize
> performance for shared lock implementations, moving it first might
> avoid the atomic trylock operation, but I wouldn't expect a noticeable
> difference.
>

Agree it works either way.
 
> > 
> > > +	return false;
> > > +}
> > > +
> > > +static int ttm_lru_walk_ticketlock(struct ttm_lru_walk *walk,
> > > +				   struct ttm_buffer_object *bo,
> > > +				   bool *needs_unlock)
> > > +{
> > > +	struct dma_resv *resv = bo->base.resv;
> > > +	int ret;
> > > +
> > 
> > I suppose we don't have asserts here like in Xe but if we did,
> > assert(walk->ticket)?
> 
> I agree. I think we'd really want a TTM assert or warning that could be
> compiled away. In any case, I only expect a single caller of this
> function.
> 
> > 
> > > +	if (walk->ctx->interruptible)
> > > +		ret = dma_resv_lock_interruptible(resv, walk-
> > > >ticket);
> > > +	else
> > > +		ret = dma_resv_lock(resv, walk->ticket);
> > > +
> > > +	if (!ret) {
> > > +		*needs_unlock = true;
> > > +		/* Only a single ticketlock per loop. */
> > > +		walk->ticket = NULL;
> > 
> > Can you explain this a bit more? I see that once the walk->ticket is
> > set
> > to NULL this function will not be called again (i.e. only try locking
> > will be used). I want to understand the reasoning for this.
> > 
> > It might be helpful for a more lengthly explaination in the comments
> > of
> > the code too.
> 
> I can add a more thorough explanation, Again, this is trying to mimic
> the current code, that does a walk of trylocking, then a single ticket
> lock more as a sort of "wait a while to be able to make progress" and
> then resorts to trylock again.
> 
> The real reason to avoid multiple ticketlocks here is that each have a
> chance of failing with -EDEADLK, (in particular with the -EDEADLK
> injection enabled) which would translate to an -ENOMEM in the caller.
> 

Makes sense.

> > 
> > > +	} else if (ret == -EDEADLK) {
> > > +		/* Caller needs to exit the ww transaction. */
> > > +		ret = -ENOSPC;
> > 
> > The commit message says -ENOMEM.
> 
> The -ENOSPC is converted to -ENOMEM for the driver in
> ttm_bo_validate(). I could explain a bit more in the commit message.
>

Forgot about that.

> > 
> > > +	}
> > > +
> > > +	return ret;
> > > +}
> > > +
> > > +static void ttm_lru_walk_unlock(struct ttm_buffer_object *bo, bool
> > > locked)
> > > +{
> > > +	if (locked)
> > > +		dma_resv_unlock(bo->base.resv);
> > > +}
> > > +
> > > +/**
> > > + * ttm_lru_walk_for_evict() - Perform a LRU list walk, with
> > > actions taken on
> > > + * valid items.
> > > + * @walk: describe the walks and actions taken
> > > + * @bdev: The TTM device.
> > > + * @man: The struct ttm_resource manager whose LRU lists we're
> > > walking.
> > > + * @target: The end condition for the walk.
> > > + *
> > > + * The LRU lists of @man are walk, and for each struct
> > > ttm_resource encountered,
> > > + * the corresponding ttm_buffer_object is locked and taken a
> > > reference on, and
> > > + * the LRU lock is dropped. the LRU lock may be dropped before
> > > locking and, in
> > > + * that case, it's verified that the item actually remains on the
> > > LRU list after
> > > + * the lock, and that the buffer object didn't switch resource in
> > > between.
> > > + *
> > > + * With a locked object, the actions indicated by @walk-
> > > >process_bo are
> > > + * performed, and after that, the bo is unlocked, the refcount
> > > dropped and the
> > > + * next struct ttm_resource is processed. Here, the walker relies
> > > on
> > > + * TTM's restartable LRU list implementation.
> > > + *
> > > + * Typically @walk->process_bo() would return the number of pages
> > > evicted,
> > > + * swapped or shrunken, so that when the total exceeds @target, or
> > > when the
> > > + * LRU list has been walked in full, iteration is terminated. It's
> > > also terminated
> > > + * on error. Note that the definition of @target is done by the
> > > caller, it
> > > + * could have a different meaning than the number of pages.
> > > + *
> > > + * Note that the way dma_resv individualization is done, locking
> > > needs to be done
> > > + * either with the LRU lock held (trylocking only) or with a
> > > reference on the
> > > + * object.
> > > + *
> > > + * Return: The progress made towards target or negative error code
> > > on error.
> > > + */
> > > +long ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct
> > > ttm_device *bdev,
> > > +			    struct ttm_resource_manager *man, long
> > > target)
> > > +{
> > > +	struct ttm_resource_cursor cursor;
> > > +	struct ttm_resource *res;
> > > +	long sofar = 0;
> > 
> > s/sofar/evicted?
> 
> That's not always the case. When used for eviction it is actually 0 if
> the new allocation failed, 1 on success. It should be interpreted as
> "progress towards target", so perhaps progress?
> 

progress sounds good.

> > 
> > > +	long lret;
> > > +
> > > +	spin_lock(&bdev->lru_lock);
> > > +	ttm_resource_manager_for_each_res(man, &cursor, res) {
> > > +		struct ttm_buffer_object *bo = res->bo;
> > > +		bool bo_needs_unlock = false;
> > > +		bool bo_locked = false;
> > > +		int mem_type;
> > > +
> > > +		if (!bo || bo->resource != res)
> > > +			continue;
> > > +
> > > +		if (ttm_lru_walk_trylock(walk, bo,
> > > &bo_needs_unlock))
> > > +			bo_locked = true;
> > > +		else if ((!walk->ticket) || walk->ctx->no_wait_gpu
> > > ||
> > 
> > Nit - (!walk->ticket) could just be !walk->ticket.
> 
> Will fix.
> 
> > 
> > > +			 walk->trylock_only)
> > > +			continue;
> > > +
> > > +		if (!ttm_bo_get_unless_zero(bo)) {
> > > +			ttm_lru_walk_unlock(bo, bo_needs_unlock);
> > > +			continue;
> > > +		}
> > > +
> > 
> > This kinda goofy pattern too, typically in code a get_unless_zero is
> > done before trying to lock the object not after. Even odder here, the
> > could or could not be locked depending on the outcome of
> > ttm_lru_walk_trylock. This is covering individualization case? Would
> > it
> > make more sense to move ttm_bo_get_unless_zero before the try lock or
> > is
> > that to avoid a put on try lock failure + continue?
> 
> I guess this is still a remnant from the old code: ttm_bos can't be put
> in atomic context, so we'd had to unlock the lru_lock to put ( which we
> still do). However, without the restartable lists that we have with
> this series, list traversal would have to be reset.
> 
> If we were to change the order of trylock and get_unless_zero now, we
> could do that, but that would mean unnecessary refcounting and dropping
> the lru_lock in the case of trylock failure.
> 

Missed the putting in atomic context not allowed. Seems fine as is.

> 
> > 
> > > +		mem_type = res->mem_type;
> > > +		spin_unlock(&bdev->lru_lock);
> > > +
> > > +		lret = 0;
> > > +		if (!bo_locked && walk->ticket)
> > 
> > As above could you explain the ticket usage a bit more?
> > 
> > Also you shouldn't need to check the ticket here there is !walk-
> > >ticket
> > above which triggers a continue.
> 
> I hope I explained sufficiently above. Please get back otherwise, I'll
> remove the walk->ticket check.
> 

You did explain sufficiently.

> > 
> > > +			lret = ttm_lru_walk_ticketlock(walk, bo,
> > > &bo_needs_unlock);
> > > +
> > > +		/*
> > > +		 * Note that in between the release of the lru
> > > lock and the
> > > +		 * ticketlock, the bo may have switched resource,
> > > +		 * and also memory type, since the resource may
> > > have been
> > > +		 * freed and allocated again with a different
> > > memory type.
> > > +		 * In that case, just skip it.
> > > +		 */
> > > +		if (!lret && bo->resource == res && res->mem_type
> > > == mem_type)
> > > +			lret = walk->ops->process_bo(walk, bo);
> > > +
> > > +		ttm_lru_walk_unlock(bo, bo_needs_unlock);
> > > +		ttm_bo_put(bo);
> > > +		if (lret == -EBUSY || lret == -EALREADY)
> > > +			lret = 0;
> > 
> > What is usage of these error codes?
> > 
> > -EALREADY means the resv is locked with the current ticket, right?
> > Wouldn't we want to call process_bo in this case too?
> 
> No, then we might evict our own working set. Processing of shared bos
> is handled as before with ctx->allow_res_evict.
>

Ah, yes. Makes sense.
 
> > 
> > -EBUSY I need some help figuring out.
> 
> -EBUSY is the return of ttm_bo_wait_ctx() in case we're in no-wait-gpu
> context. Like direct reclaim. If the bo is then busy, we simply skip to
> the next bo. If process_bo() waits for gpu idle using the above
> function, this is to catch that error code.
> Should probably add that to the process_bo() docs.
>
> > 
> > > +		sofar = (lret < 0) ? lret : sofar + lret;
> > > +		if (sofar < 0 || sofar >= target)
> > > +			goto out;
> > > +
> > 
> > Here we have dropped the BO unlock. What prevents the BO from being
> > moved back to the resource we just evicted it from resulting in sofar
> > not being accurate?
> 
> For eviction, the byte-count is not used, but rather the success of
> allocating the new resource.
> 
> For shrinking, any process could allocate what we just shrunk, and even
> so shrinking does not always mean that the memory is immediately
> available so the byte-count will always be a rough estimate.
> 
> A related question is "what prevents anyone else from stealing memory
> that we just evicted", and the answer to that is "Nothing... yet". The
> exhaustive eviction part handles that with drm_exec keeping the bo
> locks of evicted objects until we make progress. Eventually we'd hold
> enough locks to block all other competitors.
> 
> Unrelated to this patch, me and Christian had a discussion on whether
> to unlock those when we made progress in the sense of
> 
> a) A validation succeeded,
> b) An exec submission succeeded.
> 

Thanks all of the explainations, makes more sense.

Patch LGTM aside from a couple of nits, with that:
Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> /Thomas
> 
> > 
> > Matt
> > 
> > > +		cond_resched();
> > > +		spin_lock(&bdev->lru_lock);
> > > +	}
> > > +	spin_unlock(&bdev->lru_lock);
> > > +out:
> > > +	ttm_resource_cursor_fini(&cursor);
> > > +	return sofar;
> > > +}
> > > diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> > > index 6ccf96c91f3a..8b032298d66e 100644
> > > --- a/include/drm/ttm/ttm_bo.h
> > > +++ b/include/drm/ttm/ttm_bo.h
> > > @@ -190,6 +190,38 @@ struct ttm_operation_ctx {
> > >  	uint64_t bytes_moved;
> > >  };
> > >  
> > > +struct ttm_lru_walk;
> > > +
> > > +/** struct ttm_lru_walk_ops - Operations for a LRU walk. */
> > > +struct ttm_lru_walk_ops {
> > > +	/**
> > > +	 * process_bo - Process this bo.
> > > +	 * @walk: struct ttm_lru_walk describing the walk.
> > > +	 * @bo: A locked and referenced buffer object.
> > > +	 *
> > > +	 * Return: Negative error code on error, Number of
> > > processed pages on
> > > +	 * success. 0 also indicates success.
> > > +	 */
> > > +	long (*process_bo)(struct ttm_lru_walk *walk, struct
> > > ttm_buffer_object *bo);
> > > +};
> > > +
> > > +/**
> > > + * struct ttm_lru_walk - Structure describing a LRU walk.
> > > + */
> > > +struct ttm_lru_walk {
> > > +	/** @ops: Pointer to the ops structure. */
> > > +	const struct ttm_lru_walk_ops *ops;
> > > +	/** @ctx: Pointer to the struct ttm_operation_ctx. */
> > > +	struct ttm_operation_ctx *ctx;
> > > +	/** @ticket: The struct ww_acquire_ctx if any. */
> > > +	struct ww_acquire_ctx *ticket;
> > > +	/** @tryock_only: Only use trylock for locking. */
> > > +	bool trylock_only;
> > > +};
> > > +
> > > +long ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct
> > > ttm_device *bdev,
> > > +			    struct ttm_resource_manager *man, long
> > > target);
> > > +
> > >  /**
> > >   * ttm_bo_get - reference a struct ttm_buffer_object
> > >   *
> > > -- 
> > > 2.44.0
> > > 
> 

  reply	other threads:[~2024-06-19 15:10 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-18  7:18 [PATCH v5 00/12] TTM shrinker helpers and xe buffer object shrinker Thomas Hellström
2024-06-18  7:18 ` [PATCH v5 01/12] drm/ttm: Allow TTM LRU list nodes of different types Thomas Hellström
2024-06-18  7:18 ` [PATCH v5 02/12] drm/ttm: Slightly clean up LRU list iteration Thomas Hellström
2024-06-18  7:18 ` [PATCH v5 03/12] drm/ttm: Use LRU hitches Thomas Hellström
2024-06-18  7:18 ` [PATCH v5 04/12] drm/ttm, drm/amdgpu, drm/xe: Consider hitch moves within bulk sublist moves Thomas Hellström
2024-06-19  3:37   ` Matthew Brost
2024-06-19  8:24     ` Thomas Hellström
2024-06-19 14:44       ` Matthew Brost
2024-06-18  7:18 ` [PATCH v5 05/12] drm/ttm: Provide a generic LRU walker helper Thomas Hellström
2024-06-18 22:11   ` Matthew Brost
2024-06-19  7:31     ` Thomas Hellström
2024-06-19 15:09       ` Matthew Brost [this message]
2024-06-18  7:18 ` [PATCH v5 06/12] drm/ttm: Use the LRU walker helper for swapping Thomas Hellström
2024-06-19  4:23   ` Matthew Brost
2024-06-19  8:36     ` Thomas Hellström
2024-06-18  7:18 ` [PATCH v5 07/12] drm/ttm: Use the LRU walker for eviction Thomas Hellström
2024-06-19 22:52   ` Matthew Brost
2024-06-24  9:06     ` Thomas Hellström
2024-06-19 23:33   ` Matthew Brost
2024-06-24  9:16     ` Thomas Hellström
2024-06-18  7:18 ` [PATCH v5 08/12] drm/ttm: Add a virtual base class for graphics memory backup Thomas Hellström
2024-06-20 15:17   ` Matthew Brost
2024-06-24  9:26     ` Thomas Hellström
2024-06-24 15:47       ` Thomas Hellström
2024-06-18  7:18 ` [PATCH v5 09/12] drm/ttm/pool: Provide a helper to shrink pages Thomas Hellström
2024-06-18  7:18 ` [PATCH v5 10/12] drm/ttm: Use fault-injection to test error paths Thomas Hellström
2024-06-18  7:18 ` [PATCH v5 11/12] drm/ttm, drm/xe: Add a shrinker for xe bos Thomas Hellström
2024-06-18  7:18 ` [PATCH v5 12/12] drm/xe: Increase the XE_PL_TT watermark Thomas Hellström
2024-06-18  7:24 ` ✓ CI.Patch_applied: success for TTM shrinker helpers and xe buffer object shrinker (rev5) Patchwork
2024-06-18  7:24 ` ✗ CI.checkpatch: warning " Patchwork
2024-06-18  7:25 ` ✓ CI.KUnit: success " Patchwork
2024-06-18  7:37 ` ✓ CI.Build: " Patchwork
2024-06-18  7:39 ` ✗ CI.Hooks: failure " Patchwork
2024-06-18  7:41 ` ✗ CI.checksparse: warning " Patchwork
2024-06-18  8:03 ` ✓ CI.BAT: success " Patchwork
2024-06-18 19:08 ` ✗ CI.FULL: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZnL0wKcnTLXw5FsY@DUT025-TGLU.fm.intel.com \
    --to=matthew.brost@intel.com \
    --cc=Amaranath.Somalapuram@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox