From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: intel-xe@lists.freedesktop.org
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
"Christian König" <christian.koenig@amd.com>,
"Somalapuram Amaranath" <Amaranath.Somalapuram@amd.com>,
"Matthew Brost" <matthew.brost@intel.com>,
dri-devel@lists.freedesktop.org
Subject: [PATCH v6 05/12] drm/ttm: Provide a generic LRU walker helper
Date: Wed, 3 Jul 2024 17:38:06 +0200 [thread overview]
Message-ID: <20240703153813.182001-6-thomas.hellstrom@linux.intel.com> (raw)
In-Reply-To: <20240703153813.182001-1-thomas.hellstrom@linux.intel.com>
Provide a generic LRU walker in TTM, in the spirit of drm_gem_lru_scan()
but building on the restartable TTM LRU functionality.
The LRU walker optionally supports locking objects as part of
a ww mutex locking transaction, to mimic to some extent the
current functionality in ttm. However any -EDEADLK return
is converted to -ENOSPC and then to -ENOMEM before reaching
the driver, so that the driver will need to backoff and possibly retry
without being able to keep the ticket.
v3:
- Move the helper to core ttm.
- Remove the drm_exec usage from it for now, it will be
reintroduced later in the series.
v4:
- Handle the -EALREADY case if ticketlocking.
v6:
- Some cleanup and added code comments (Matthew Brost)
- Clarified the ticketlock in the commit message (Matthew Brost)
Cc: Christian König <christian.koenig@amd.com>
Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <dri-devel@lists.freedesktop.org>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
drivers/gpu/drm/ttm/ttm_bo_util.c | 156 ++++++++++++++++++++++++++++++
include/drm/ttm/ttm_bo.h | 35 +++++++
2 files changed, 191 insertions(+)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 0b3f4267130c..c4f678f30fc2 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -768,3 +768,159 @@ int ttm_bo_pipeline_gutting(struct ttm_buffer_object *bo)
ttm_tt_destroy(bo->bdev, ttm);
return ret;
}
+
+static bool ttm_lru_walk_trylock(struct ttm_lru_walk *walk,
+ struct ttm_buffer_object *bo,
+ bool *needs_unlock)
+{
+ struct ttm_operation_ctx *ctx = walk->ctx;
+
+ *needs_unlock = false;
+
+ if (dma_resv_trylock(bo->base.resv)) {
+ *needs_unlock = true;
+ return true;
+ }
+
+ if (bo->base.resv == ctx->resv && ctx->allow_res_evict) {
+ dma_resv_assert_held(bo->base.resv);
+ return true;
+ }
+
+ return false;
+}
+
+static int ttm_lru_walk_ticketlock(struct ttm_lru_walk *walk,
+ struct ttm_buffer_object *bo,
+ bool *needs_unlock)
+{
+ struct dma_resv *resv = bo->base.resv;
+ int ret;
+
+ if (walk->ctx->interruptible)
+ ret = dma_resv_lock_interruptible(resv, walk->ticket);
+ else
+ ret = dma_resv_lock(resv, walk->ticket);
+
+ if (!ret) {
+ *needs_unlock = true;
+ /*
+ * Only a single ticketlock per loop. Ticketlocks are prone
+ * to return -EDEADLK causing the eviction to fail, so
+ * after waiting for the ticketlock, revert back to
+ * trylocking for this walk.
+ */
+ walk->ticket = NULL;
+ } else if (ret == -EDEADLK) {
+ /* Caller needs to exit the ww transaction. */
+ ret = -ENOSPC;
+ }
+
+ return ret;
+}
+
+static void ttm_lru_walk_unlock(struct ttm_buffer_object *bo, bool locked)
+{
+ if (locked)
+ dma_resv_unlock(bo->base.resv);
+}
+
+/**
+ * ttm_lru_walk_for_evict() - Perform a LRU list walk, with actions taken on
+ * valid items.
+ * @walk: describe the walks and actions taken
+ * @bdev: The TTM device.
+ * @man: The struct ttm_resource manager whose LRU lists we're walking.
+ * @target: The end condition for the walk.
+ *
+ * The LRU lists of @man are walk, and for each struct ttm_resource encountered,
+ * the corresponding ttm_buffer_object is locked and taken a reference on, and
+ * the LRU lock is dropped. the LRU lock may be dropped before locking and, in
+ * that case, it's verified that the item actually remains on the LRU list after
+ * the lock, and that the buffer object didn't switch resource in between.
+ *
+ * With a locked object, the actions indicated by @walk->process_bo are
+ * performed, and after that, the bo is unlocked, the refcount dropped and the
+ * next struct ttm_resource is processed. Here, the walker relies on
+ * TTM's restartable LRU list implementation.
+ *
+ * Typically @walk->process_bo() would return the number of pages evicted,
+ * swapped or shrunken, so that when the total exceeds @target, or when the
+ * LRU list has been walked in full, iteration is terminated. It's also terminated
+ * on error. Note that the definition of @target is done by the caller, it
+ * could have a different meaning than the number of pages.
+ *
+ * Note that the way dma_resv individualization is done, locking needs to be done
+ * either with the LRU lock held (trylocking only) or with a reference on the
+ * object.
+ *
+ * Return: The progress made towards target or negative error code on error.
+ */
+long ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct ttm_device *bdev,
+ struct ttm_resource_manager *man, long target)
+{
+ struct ttm_resource_cursor cursor;
+ struct ttm_resource *res;
+ long progress = 0;
+ long lret;
+
+ spin_lock(&bdev->lru_lock);
+ ttm_resource_manager_for_each_res(man, &cursor, res) {
+ struct ttm_buffer_object *bo = res->bo;
+ bool bo_needs_unlock = false;
+ bool bo_locked = false;
+ int mem_type;
+
+ if (!bo || bo->resource != res)
+ continue;
+
+ /*
+ * Attempt a trylock before taking a reference on the bo,
+ * since if we do it the other way around, and the trylock fails,
+ * we need to drop the lru lock to put the bo.
+ */
+
+ if (ttm_lru_walk_trylock(walk, bo, &bo_needs_unlock))
+ bo_locked = true;
+ else if (!walk->ticket || walk->ctx->no_wait_gpu ||
+ walk->trylock_only)
+ continue;
+
+ if (!ttm_bo_get_unless_zero(bo)) {
+ ttm_lru_walk_unlock(bo, bo_needs_unlock);
+ continue;
+ }
+
+ mem_type = res->mem_type;
+ spin_unlock(&bdev->lru_lock);
+
+ lret = 0;
+ if (!bo_locked)
+ lret = ttm_lru_walk_ticketlock(walk, bo, &bo_needs_unlock);
+
+ /*
+ * Note that in between the release of the lru lock and the
+ * ticketlock, the bo may have switched resource,
+ * and also memory type, since the resource may have been
+ * freed and allocated again with a different memory type.
+ * In that case, just skip it.
+ */
+ if (!lret && bo->resource == res && res->mem_type == mem_type)
+ lret = walk->ops->process_bo(walk, bo);
+
+ ttm_lru_walk_unlock(bo, bo_needs_unlock);
+ ttm_bo_put(bo);
+ if (lret == -EBUSY || lret == -EALREADY)
+ lret = 0;
+ progress = (lret < 0) ? lret : progress + lret;
+
+ cond_resched();
+ spin_lock(&bdev->lru_lock);
+ if (progress < 0 || progress >= target)
+ break;
+ }
+ ttm_resource_cursor_fini_locked(&cursor);
+ spin_unlock(&bdev->lru_lock);
+
+ return progress;
+}
diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
index ef0f52f56ebc..10bff3aecd5c 100644
--- a/include/drm/ttm/ttm_bo.h
+++ b/include/drm/ttm/ttm_bo.h
@@ -194,6 +194,41 @@ struct ttm_operation_ctx {
uint64_t bytes_moved;
};
+struct ttm_lru_walk;
+
+/** struct ttm_lru_walk_ops - Operations for a LRU walk. */
+struct ttm_lru_walk_ops {
+ /**
+ * process_bo - Process this bo.
+ * @walk: struct ttm_lru_walk describing the walk.
+ * @bo: A locked and referenced buffer object.
+ *
+ * Return: Negative error code on error, User-defined positive value
+ * (typically, but not always number of processed pages) on success.
+ * On success, the returned values are summed by the walk and the
+ * walk exits when its garget is met.
+ * 0 also indicates success, -EBUSY means this bo was skipped.
+ */
+ long (*process_bo)(struct ttm_lru_walk *walk, struct ttm_buffer_object *bo);
+};
+
+/**
+ * struct ttm_lru_walk - Structure describing a LRU walk.
+ */
+struct ttm_lru_walk {
+ /** @ops: Pointer to the ops structure. */
+ const struct ttm_lru_walk_ops *ops;
+ /** @ctx: Pointer to the struct ttm_operation_ctx. */
+ struct ttm_operation_ctx *ctx;
+ /** @ticket: The struct ww_acquire_ctx if any. */
+ struct ww_acquire_ctx *ticket;
+ /** @tryock_only: Only use trylock for locking. */
+ bool trylock_only;
+};
+
+long ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct ttm_device *bdev,
+ struct ttm_resource_manager *man, long target);
+
/**
* ttm_bo_get - reference a struct ttm_buffer_object
*
--
2.44.0
next prev parent reply other threads:[~2024-07-03 15:38 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-03 15:38 [PATCH v6 00/12] TTM shrinker helpers and xe buffer object shrinker Thomas Hellström
2024-07-03 15:38 ` [PATCH v6 01/12] drm/ttm: Allow TTM LRU list nodes of different types Thomas Hellström
2024-07-03 15:38 ` [PATCH v6 02/12] drm/ttm: Slightly clean up LRU list iteration Thomas Hellström
2024-07-03 15:38 ` [PATCH v6 03/12] drm/ttm: Use LRU hitches Thomas Hellström
2024-07-04 9:05 ` Christian König
2024-07-03 15:38 ` [PATCH v6 04/12] drm/ttm, drm/amdgpu, drm/xe: Consider hitch moves within bulk sublist moves Thomas Hellström
2024-07-03 17:53 ` Matthew Brost
2024-07-04 9:21 ` Christian König
2024-07-04 12:41 ` Thomas Hellström
2024-07-04 13:13 ` Christian König
2024-07-04 13:53 ` Thomas Hellström
2024-07-04 14:32 ` Christian König
2024-07-03 15:38 ` Thomas Hellström [this message]
2024-07-03 15:38 ` [PATCH v6 06/12] drm/ttm: Use the LRU walker helper for swapping Thomas Hellström
2024-07-03 18:24 ` Matthew Brost
2024-07-03 15:38 ` [PATCH v6 07/12] drm/ttm: Use the LRU walker for eviction Thomas Hellström
2024-07-03 19:20 ` Matthew Brost
2024-07-03 15:38 ` [PATCH v6 08/12] drm/ttm: Add a virtual base class for graphics memory backup Thomas Hellström
2024-07-03 19:47 ` Matthew Brost
2024-07-04 11:57 ` Christian König
2024-07-03 15:38 ` [PATCH v6 09/12] drm/ttm/pool: Provide a helper to shrink pages Thomas Hellström
2024-08-07 23:38 ` Matthew Brost
2024-08-16 9:47 ` Thomas Hellström
2024-07-03 15:38 ` [PATCH v6 10/12] drm/ttm: Use fault-injection to test error paths Thomas Hellström
2024-08-07 23:43 ` Matthew Brost
2024-08-09 13:53 ` Thomas Hellström
2024-08-09 16:40 ` Matthew Brost
2024-07-03 15:38 ` [PATCH v6 11/12] drm/ttm, drm/xe: Add a shrinker for xe bos Thomas Hellström
2024-08-08 1:37 ` Matthew Brost
2024-08-09 14:31 ` Thomas Hellström
2024-08-09 17:22 ` Matthew Brost
2024-08-09 16:05 ` Matthew Auld
2024-07-03 15:38 ` [PATCH v6 12/12] drm/xe: Increase the XE_PL_TT watermark Thomas Hellström
2024-08-05 18:35 ` Souza, Jose
2024-08-07 23:13 ` Matthew Brost
2024-08-09 12:22 ` Thomas Hellström
2024-08-07 23:44 ` Matthew Brost
2024-08-09 13:53 ` Thomas Hellström
2024-07-03 17:35 ` ✓ CI.Patch_applied: success for TTM shrinker helpers and xe buffer object shrinker (rev6) Patchwork
2024-07-03 17:35 ` ✗ CI.checkpatch: warning " Patchwork
2024-07-03 17:36 ` ✗ CI.KUnit: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240703153813.182001-6-thomas.hellstrom@linux.intel.com \
--to=thomas.hellstrom@linux.intel.com \
--cc=Amaranath.Somalapuram@amd.com \
--cc=christian.koenig@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).