From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: intel-xe@lists.freedesktop.org
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
"Natalie Vock" <natalie.vock@gmx.de>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Tejun Heo" <tj@kernel.org>, "Michal Koutný" <mkoutny@suse.com>,
cgroups@vger.kernel.org, "Huang Rui" <ray.huang@amd.com>,
"Matthew Brost" <matthew.brost@intel.com>,
"Matthew Auld" <matthew.auld@intel.com>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
"Maxime Ripard" <mripard@kernel.org>,
"Thomas Zimmermann" <tzimmermann@suse.de>,
"Simona Vetter" <simona@ffwll.ch>,
"David Airlie" <airlied@gmail.com>,
"Christian König" <christian.koenig@amd.com>,
"Alex Deucher" <alexander.deucher@amd.com>,
"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org,
linux-kernel@vger.kernel.org
Subject: [PATCH v4 3/5] drm/ttm: Hook up a cgroup-aware reclaim callback for the dmem controller
Date: Tue, 12 May 2026 10:24:04 +0200 [thread overview]
Message-ID: <20260512082406.44470-4-thomas.hellstrom@linux.intel.com> (raw)
In-Reply-To: <20260512082406.44470-1-thomas.hellstrom@linux.intel.com>
Add ttm_bo_evict_cgroup() to evict buffer objects charged to a specific
dmem cgroup pool from a resource manager's LRU until a byte target is
met. Add ttm_resource_manager_set_dmem_region() to register the TTM
eviction path as the reclaim callback for a dmem cgroup region.
The eviction context is interruptible; signals abort the operation and
propagate back through the write() syscall.
Introduce a new mode for the bo LRU walker so that sleeping locks
can be taken. This can be used when the caller doesn't hold any
previous dma_resv locks, and where it intends to hold at most
one lock at a time.
Like the rest of the TTM eviction this should sooner than later
be converted to full WW transactions.
v3:
- Fix ttm_resource_manager_set_dmem_region() storing an error pointer
in man->cg unconditionally. (Sashiko-bot)
- Fix kernel-doc function name format for ttm_bo_evict_cgroup() and
ttm_resource_manager_set_dmem_region().
Assisted-by: GitHub_Copilot:claude-sonnet-4.6
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
drivers/gpu/drm/ttm/ttm_bo.c | 95 +++++++++++++++++++++++++++++-
drivers/gpu/drm/ttm/ttm_bo_util.c | 3 +-
drivers/gpu/drm/ttm/ttm_resource.c | 37 ++++++++++++
include/drm/ttm/ttm_bo.h | 10 ++++
include/drm/ttm/ttm_resource.h | 4 ++
5 files changed, 145 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 293401705542..3d0872f2f14d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -515,12 +515,20 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct ttm_buffer_object *
{
struct ttm_bo_evict_walk *evict_walk =
container_of(walk, typeof(*evict_walk), walk);
+ /* Capture size before eviction in case res is cleared. */
+ s64 bo_size = bo->base.size;
s64 lret;
if (!dmem_cgroup_state_evict_valuable(evict_walk->limit_pool, bo->resource->css,
evict_walk->try_low, &evict_walk->hit_low))
return 0;
+ /*
+ * evict_walk->place is NULL in cgroup drain mode. Drivers'
+ * eviction_valuable() callbacks must handle a NULL place, treating it
+ * as "any placement": the TTM base implementation already does so via
+ * ttm_resource_intersects().
+ */
if (bo->pin_count || !bo->bdev->funcs->eviction_valuable(bo, evict_walk->place))
return 0;
@@ -536,11 +544,15 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct ttm_buffer_object *
goto out;
evict_walk->evicted++;
- if (evict_walk->res)
+ if (evict_walk->res) {
lret = ttm_resource_alloc(evict_walk->evictor, evict_walk->place,
evict_walk->res, NULL);
- if (lret == 0)
- return 1;
+ if (lret == 0)
+ return 1;
+ } else {
+ /* Cgroup drain: return bytes freed for byte-denominated progress. */
+ return bo_size;
+ }
out:
/* Errors that should terminate the walk. */
if (lret == -ENOSPC)
@@ -614,6 +626,83 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
return 0;
}
+/**
+ * ttm_bo_evict_cgroup() - Evict buffer objects charged to a specific cgroup.
+ * @bdev: The TTM device.
+ * @man: The resource manager whose LRU to walk.
+ * @limit_pool: The cgroup pool state whose members should be evicted.
+ * @target_bytes: Number of bytes to free.
+ * @ctx: The TTM operation context.
+ *
+ * Walk the LRU of @man and evict buffer objects that are charged to the
+ * cgroup identified by @limit_pool, until at least @target_bytes have been
+ * freed. Mirrors the two-pass (trylock -> sleeping-lock, low-watermark)
+ * strategy used by ttm_bo_evict_alloc().
+ *
+ * Return: >= @target_bytes on full success, 0..target_bytes-1 if partial,
+ * negative error code on fatal error.
+ */
+s64 ttm_bo_evict_cgroup(struct ttm_device *bdev,
+ struct ttm_resource_manager *man,
+ struct dmem_cgroup_pool_state *limit_pool,
+ s64 target_bytes,
+ struct ttm_operation_ctx *ctx)
+{
+ struct ttm_bo_evict_walk evict_walk = {
+ .walk = {
+ .ops = &ttm_evict_walk_ops,
+ .arg = { .ctx = ctx },
+ },
+ .limit_pool = limit_pool,
+ /* place, evictor, res left NULL: selects cgroup drain mode */
+ };
+ s64 lret, pass;
+
+ evict_walk.walk.arg.trylock_only = true;
+ lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, target_bytes);
+ if (lret < 0 || lret >= target_bytes)
+ return lret;
+
+ /* Second pass: also evict BOs at the low watermark. */
+ if (evict_walk.hit_low) {
+ evict_walk.try_low = true;
+ pass = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man,
+ target_bytes - lret);
+ if (pass < 0)
+ return pass;
+ lret += pass;
+ if (lret >= target_bytes)
+ return lret;
+ }
+
+ /* Full sleeping-lock pass for remaining target. */
+ evict_walk.try_low = evict_walk.hit_low = false;
+ evict_walk.walk.arg.trylock_only = false;
+
+retry:
+ evict_walk.walk.arg.sleeping_lock = true;
+ do {
+ evict_walk.evicted = 0;
+ pass = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man,
+ target_bytes - lret);
+ if (pass < 0) {
+ lret = pass;
+ goto out;
+ }
+ lret += pass;
+ } while (lret < target_bytes && evict_walk.evicted);
+
+ /* One more attempt if we hit the low limit during sleeping-lock pass. */
+ if (lret < target_bytes && evict_walk.hit_low && !evict_walk.try_low) {
+ evict_walk.try_low = true;
+ goto retry;
+ }
+
+out:
+ return lret;
+}
+EXPORT_SYMBOL(ttm_bo_evict_cgroup);
+
/**
* ttm_bo_pin - Pin the buffer object.
* @bo: The buffer object to pin
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index f83b7d5ec6c6..81c6a674c462 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -999,7 +999,8 @@ __ttm_bo_lru_cursor_next(struct ttm_bo_lru_cursor *curs)
bo = res->bo;
if (ttm_lru_walk_trylock(curs, bo))
bo_locked = true;
- else if (!arg->ticket || arg->ctx->no_wait_gpu || arg->trylock_only)
+ else if ((!arg->ticket && !arg->sleeping_lock) || arg->ctx->no_wait_gpu ||
+ arg->trylock_only)
continue;
if (!ttm_bo_get_unless_zero(bo)) {
diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
index 0e5f1582f13d..6d24f3320892 100644
--- a/drivers/gpu/drm/ttm/ttm_resource.c
+++ b/drivers/gpu/drm/ttm/ttm_resource.c
@@ -950,3 +950,40 @@ void ttm_resource_manager_create_debugfs(struct ttm_resource_manager *man,
#endif
}
EXPORT_SYMBOL(ttm_resource_manager_create_debugfs);
+
+static int ttm_resource_manager_dmem_reclaim(struct dmem_cgroup_pool_state *pool,
+ u64 target_bytes, void *priv)
+{
+ struct ttm_resource_manager *man = priv;
+ struct ttm_operation_ctx ctx = { .interruptible = true };
+ s64 freed;
+
+ freed = ttm_bo_evict_cgroup(man->bdev, man, pool, target_bytes, &ctx);
+ if (freed < 0)
+ return freed;
+
+ return freed >= (s64)target_bytes ? 0 : -ENOSPC;
+}
+
+/**
+ * ttm_resource_manager_set_dmem_region() - Associate a dmem cgroup region with a
+ * resource manager and register a reclaim
+ * callback.
+ * @man: The resource manager.
+ * @region: The dmem cgroup region to associate, may be NULL or IS_ERR().
+ *
+ * Sets @man->cg and registers ttm_resource_manager_dmem_reclaim() so that
+ * writing to dmem.max below current usage triggers TTM eviction rather than
+ * returning -EBUSY to userspace.
+ */
+void ttm_resource_manager_set_dmem_region(struct ttm_resource_manager *man,
+ struct dmem_cgroup_region *region)
+{
+ if (!IS_ERR_OR_NULL(region)) {
+ man->cg = region;
+ dmem_cgroup_region_set_reclaim(region,
+ ttm_resource_manager_dmem_reclaim,
+ man);
+ }
+}
+EXPORT_SYMBOL(ttm_resource_manager_set_dmem_region);
diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
index 8310bc3d55f9..32791c4db2a9 100644
--- a/include/drm/ttm/ttm_bo.h
+++ b/include/drm/ttm/ttm_bo.h
@@ -226,6 +226,11 @@ struct ttm_lru_walk_arg {
struct ww_acquire_ctx *ticket;
/** @trylock_only: Only use trylock for locking. */
bool trylock_only;
+ /**
+ * @sleeping_lock: Use sleeping locks even with %NULL @ticket.
+ * @trylock_only has precedence over this field.
+ */
+ bool sleeping_lock;
};
/**
@@ -431,6 +436,11 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo);
int ttm_bo_evict_first(struct ttm_device *bdev,
struct ttm_resource_manager *man,
struct ttm_operation_ctx *ctx);
+s64 ttm_bo_evict_cgroup(struct ttm_device *bdev,
+ struct ttm_resource_manager *man,
+ struct dmem_cgroup_pool_state *limit_pool,
+ s64 target_bytes,
+ struct ttm_operation_ctx *ctx);
int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
void *buf, int len, int write);
vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h
index a5d386583fb6..1ba3ae39763c 100644
--- a/include/drm/ttm/ttm_resource.h
+++ b/include/drm/ttm/ttm_resource.h
@@ -39,6 +39,7 @@
struct dentry;
struct dmem_cgroup_device;
+struct dmem_cgroup_region;
struct drm_printer;
struct ttm_device;
struct ttm_resource_manager;
@@ -477,6 +478,9 @@ void ttm_resource_manager_init(struct ttm_resource_manager *man,
struct ttm_device *bdev,
uint64_t size);
+void ttm_resource_manager_set_dmem_region(struct ttm_resource_manager *man,
+ struct dmem_cgroup_region *region);
+
int ttm_resource_manager_evict_all(struct ttm_device *bdev,
struct ttm_resource_manager *man);
--
2.54.0
next prev parent reply other threads:[~2026-05-12 8:24 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-12 8:24 [PATCH v4 0/5] Add reclaim to the dmem cgroup controller Thomas Hellström
2026-05-12 8:24 ` [PATCH v4 1/5] drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init() Thomas Hellström
2026-05-12 8:24 ` [PATCH v4 2/5] cgroup/dmem: Add reclaim callback for lowering max below current usage Thomas Hellström
2026-05-12 8:24 ` Thomas Hellström [this message]
2026-05-12 8:24 ` [PATCH v4 4/5] drm/xe: Wire up dmem cgroup reclaim for VRAM manager Thomas Hellström
2026-05-12 8:24 ` [PATCH v4 5/5] drm/amdgpu: " Thomas Hellström
2026-05-12 15:51 ` ✗ CI.checkpatch: warning for Add reclaim to the dmem cgroup controller (rev4) Patchwork
2026-05-12 15:53 ` ✓ CI.KUnit: success " Patchwork
2026-05-12 16:48 ` ✓ Xe.CI.BAT: " Patchwork
2026-05-13 4:27 ` ✗ Xe.CI.FULL: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260512082406.44470-4-thomas.hellstrom@linux.intel.com \
--to=thomas.hellstrom@linux.intel.com \
--cc=airlied@gmail.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=cgroups@vger.kernel.org \
--cc=christian.koenig@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=hannes@cmpxchg.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maarten.lankhorst@linux.intel.com \
--cc=matthew.auld@intel.com \
--cc=matthew.brost@intel.com \
--cc=mkoutny@suse.com \
--cc=mripard@kernel.org \
--cc=natalie.vock@gmx.de \
--cc=ray.huang@amd.com \
--cc=rodrigo.vivi@intel.com \
--cc=simona@ffwll.ch \
--cc=tj@kernel.org \
--cc=tzimmermann@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.