[PATCH v3 0/5] cgroup/dmem,drm/ttm: Improve protection in contended cases

public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/5] cgroup/dmem,drm/ttm: Improve protection in contended cases
@ 2025-11-10 12:37 Natalie Vock
  2025-11-10 12:37 ` [PATCH v3 1/5] cgroup/dmem: Add queries for protection values Natalie Vock
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Natalie Vock @ 2025-11-10 12:37 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Tejun Heo, Johannes Weiner,
	Michal Koutný, Christian Koenig, Huang Rui, Matthew Auld,
	Matthew Brost, Maarten Lankhorst, Thomas Zimmermann, David Airlie,
	Simona Vetter
  Cc: cgroups, dri-devel

Hi all,

I've been looking into some cases where dmem protection fails to prevent
allocations from ending up in GTT when VRAM gets scarce and apps start
competing hard.

In short, this is because other (unprotected) applications end up
filling VRAM before protected applications do. This causes TTM to back
off and try allocating in GTT before anything else, and that is where
the allocation is placed in the end. The existing eviction protection
cannot prevent this, because no attempt at evicting is ever made
(although you could consider the backing-off as an immediate eviction to
GTT).

This series tries to alleviate this by adding a special case when the
allocation is protected by cgroups: Instead of backing off immediately,
TTM will try evicting unprotected buffers from the domain to make space
for the protected one. This ensures that applications can actually use
all the memory protection awarded to them by the system, without being
prone to ping-ponging (only protected allocations can evict unprotected
ones, never the other way around).

The first two patches just add a few small utilities needed to implement
this to the dmem controller. The second two patches are the TTM
implementation:

"drm/ttm: Be more aggressive..." decouples cgroup charging from resource
allocation to allow us to hold on to the charge even if allocation fails
on first try, and adds a path to call ttm_bo_evict_alloc when the
charged allocation falls within min/low protection limits.

"drm/ttm: Use common ancestor..." is a more general improvement in
correctly implementing cgroup protection semantics. With recursive
protection rules, unused memory protection afforded to a parent node is
transferred to children recursively, which helps protect entire
subtrees from stealing each others' memory without needing to protect
each cgroup individually. This doesn't apply when considering direct
siblings inside the same subtree, so in order to not break
prioritization between these siblings, we need to consider the
relationship of evictor and evictee when calculating protection.
In practice, this fixes cases where a protected cgroup cannot steal
memory from unprotected siblings (which, in turn, leads to eviction
failures and new allocations being placed in GTT).

Thanks,
Natalie

Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
---
Changes in v3:
- Improved documentation around cgroup queries and TTM eviction helpers
  (Maarten)
- Fixed up ttm_alloc_at_place charge failure logic to return either
  -EBUSY or -ENOSPC, not -EAGAIN (found this myself)
- Link to v2: https://lore.kernel.org/r/20251015-dmemcg-aggressive-protect-v2-0-36644fb4e37f@gmx.de

Changes in v2:
- Factored out the ttm logic for charging/allocating/evicting into a
  separate helper to keep things simpler
- Link to v1: https://lore.kernel.org/r/20250915-dmemcg-aggressive-protect-v1-0-2f3353bfcdac@gmx.de

---
Natalie Vock (5):
      cgroup/dmem: Add queries for protection values
      cgroup/dmem: Add dmem_cgroup_common_ancestor helper
      drm/ttm: Make a helper for attempting allocation in a place
      drm/ttm: Be more aggressive when allocating below protection limit
      drm/ttm: Use common ancestor of evictor and evictee as limit pool

 drivers/gpu/drm/ttm/ttm_bo.c       | 192 +++++++++++++++++++++++++++++++------
 drivers/gpu/drm/ttm/ttm_resource.c |  48 +++++++---
 include/drm/ttm/ttm_resource.h     |   6 +-
 include/linux/cgroup_dmem.h        |  25 +++++
 kernel/cgroup/dmem.c               |  87 +++++++++++++++++
 5 files changed, 317 insertions(+), 41 deletions(-)
---
base-commit: f3e82936857b3bd77b824ecd2fa7839dd99ec0c6
change-id: 20250915-dmemcg-aggressive-protect-5cf37f717cdb

Best regards,
-- 
Natalie Vock <natalie.vock@gmx.de>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 1/5] cgroup/dmem: Add queries for protection values
  2025-11-10 12:37 [PATCH v3 0/5] cgroup/dmem,drm/ttm: Improve protection in contended cases Natalie Vock
@ 2025-11-10 12:37 ` Natalie Vock
  2025-11-10 12:37 ` [PATCH v3 2/5] cgroup/dmem: Add dmem_cgroup_common_ancestor helper Natalie Vock
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Natalie Vock @ 2025-11-10 12:37 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Tejun Heo, Johannes Weiner,
	Michal Koutný, Christian Koenig, Huang Rui, Matthew Auld,
	Matthew Brost, Maarten Lankhorst, Thomas Zimmermann, David Airlie,
	Simona Vetter
  Cc: cgroups, dri-devel

Callers can use this feedback to be more aggressive in making space for
allocations of a cgroup if they know it is protected.

These are counterparts to memcg's mem_cgroup_below_{min,low}.

Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
---
 include/linux/cgroup_dmem.h | 16 ++++++++++++
 kernel/cgroup/dmem.c        | 62 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 78 insertions(+)

diff --git a/include/linux/cgroup_dmem.h b/include/linux/cgroup_dmem.h
index dd4869f1d736e..1a88cd0c9eb00 100644
--- a/include/linux/cgroup_dmem.h
+++ b/include/linux/cgroup_dmem.h
@@ -24,6 +24,10 @@ void dmem_cgroup_uncharge(struct dmem_cgroup_pool_state *pool, u64 size);
 bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_pool_state *limit_pool,
 				      struct dmem_cgroup_pool_state *test_pool,
 				      bool ignore_low, bool *ret_hit_low);
+bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state *root,
+			   struct dmem_cgroup_pool_state *test);
+bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
+			   struct dmem_cgroup_pool_state *test);
 
 void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool);
 #else
@@ -59,6 +63,18 @@ bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_pool_state *limit_pool,
 	return true;
 }
 
+static inline bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state *root,
+					 struct dmem_cgroup_pool_state *test)
+{
+	return false;
+}
+
+static inline bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
+					 struct dmem_cgroup_pool_state *test)
+{
+	return false;
+}
+
 static inline void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool)
 { }
 
diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
index 10b63433f0573..314c37c06f81e 100644
--- a/kernel/cgroup/dmem.c
+++ b/kernel/cgroup/dmem.c
@@ -641,6 +641,68 @@ int dmem_cgroup_try_charge(struct dmem_cgroup_region *region, u64 size,
 }
 EXPORT_SYMBOL_GPL(dmem_cgroup_try_charge);
 
+/**
+ * dmem_cgroup_below_min() - Tests whether current usage is within min limit.
+ *
+ * @root: Root of the subtree to calculate protection for, or NULL to calculate global protection.
+ * @test: The pool to test the usage/min limit of.
+ *
+ * Return: true if usage is below min and the cgroup is protected, false otherwise.
+ */
+bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state *root,
+			   struct dmem_cgroup_pool_state *test)
+{
+	if (root == test || !pool_parent(test))
+		return false;
+
+	if (!root) {
+		for (root = test; pool_parent(root); root = pool_parent(root))
+			{}
+	}
+
+	/*
+	 * In mem_cgroup_below_min(), the memcg pendant, this call is missing.
+	 * mem_cgroup_below_min() gets called during traversal of the cgroup tree, where
+	 * protection is already calculated as part of the traversal. dmem cgroup eviction
+	 * does not traverse the cgroup tree, so we need to recalculate effective protection
+	 * here.
+	 */
+	dmem_cgroup_calculate_protection(root, test);
+	return page_counter_read(&test->cnt) <= READ_ONCE(test->cnt.emin);
+}
+EXPORT_SYMBOL_GPL(dmem_cgroup_below_min);
+
+/**
+ * dmem_cgroup_below_low() - Tests whether current usage is within low limit.
+ *
+ * @root: Root of the subtree to calculate protection for, or NULL to calculate global protection.
+ * @test: The pool to test the usage/low limit of.
+ *
+ * Return: true if usage is below low and the cgroup is protected, false otherwise.
+ */
+bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
+			   struct dmem_cgroup_pool_state *test)
+{
+	if (root == test || !pool_parent(test))
+		return false;
+
+	if (!root) {
+		for (root = test; pool_parent(root); root = pool_parent(root))
+			{}
+	}
+
+	/*
+	 * In mem_cgroup_below_low(), the memcg pendant, this call is missing.
+	 * mem_cgroup_below_low() gets called during traversal of the cgroup tree, where
+	 * protection is already calculated as part of the traversal. dmem cgroup eviction
+	 * does not traverse the cgroup tree, so we need to recalculate effective protection
+	 * here.
+	 */
+	dmem_cgroup_calculate_protection(root, test);
+	return page_counter_read(&test->cnt) <= READ_ONCE(test->cnt.elow);
+}
+EXPORT_SYMBOL_GPL(dmem_cgroup_below_low);
+
 static int dmem_cgroup_region_capacity_show(struct seq_file *sf, void *v)
 {
 	struct dmem_cgroup_region *region;

-- 
2.51.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 2/5] cgroup/dmem: Add dmem_cgroup_common_ancestor helper
  2025-11-10 12:37 [PATCH v3 0/5] cgroup/dmem,drm/ttm: Improve protection in contended cases Natalie Vock
  2025-11-10 12:37 ` [PATCH v3 1/5] cgroup/dmem: Add queries for protection values Natalie Vock
@ 2025-11-10 12:37 ` Natalie Vock
  2025-11-10 12:37 ` [PATCH v3 3/5] drm/ttm: Make a helper for attempting allocation in a place Natalie Vock
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Natalie Vock @ 2025-11-10 12:37 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Tejun Heo, Johannes Weiner,
	Michal Koutný, Christian Koenig, Huang Rui, Matthew Auld,
	Matthew Brost, Maarten Lankhorst, Thomas Zimmermann, David Airlie,
	Simona Vetter
  Cc: cgroups, dri-devel

This helps to find a common subtree of two resources, which is important
when determining whether it's helpful to evict one resource in favor of
another.

Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
---
 include/linux/cgroup_dmem.h |  9 +++++++++
 kernel/cgroup/dmem.c        | 25 +++++++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/include/linux/cgroup_dmem.h b/include/linux/cgroup_dmem.h
index 1a88cd0c9eb00..444b84f4c253a 100644
--- a/include/linux/cgroup_dmem.h
+++ b/include/linux/cgroup_dmem.h
@@ -28,6 +28,8 @@ bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state *root,
 			   struct dmem_cgroup_pool_state *test);
 bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
 			   struct dmem_cgroup_pool_state *test);
+struct dmem_cgroup_pool_state *dmem_cgroup_common_ancestor(struct dmem_cgroup_pool_state *a,
+							   struct dmem_cgroup_pool_state *b);
 
 void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool);
 #else
@@ -75,6 +77,13 @@ static inline bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
 	return false;
 }
 
+static inline
+struct dmem_cgroup_pool_state *dmem_cgroup_common_ancestor(struct dmem_cgroup_pool_state *a,
+							   struct dmem_cgroup_pool_state *b)
+{
+	return NULL;
+}
+
 static inline void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool)
 { }
 
diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
index 314c37c06f81e..fb7502d15cf8a 100644
--- a/kernel/cgroup/dmem.c
+++ b/kernel/cgroup/dmem.c
@@ -703,6 +703,31 @@ bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
 }
 EXPORT_SYMBOL_GPL(dmem_cgroup_below_low);
 
+/**
+ * dmem_cgroup_common_ancestor(): Find the first common ancestor of two pools.
+ * @a: First pool to find the common ancestor of.
+ * @b: First pool to find the common ancestor of.
+ *
+ * Return: The first pool that is a parent of both @a and @b, or NULL if either @a or @b are NULL.
+ */
+struct dmem_cgroup_pool_state *dmem_cgroup_common_ancestor(struct dmem_cgroup_pool_state *a,
+							   struct dmem_cgroup_pool_state *b)
+{
+	struct dmem_cgroup_pool_state *parent;
+
+	while (a) {
+		parent = b;
+		while (parent) {
+			if (a == parent)
+				return a;
+			parent = pool_parent(parent);
+		}
+		a = pool_parent(a);
+	}
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(dmem_cgroup_common_ancestor);
+
 static int dmem_cgroup_region_capacity_show(struct seq_file *sf, void *v)
 {
 	struct dmem_cgroup_region *region;

-- 
2.51.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 3/5] drm/ttm: Make a helper for attempting allocation in a place
  2025-11-10 12:37 [PATCH v3 0/5] cgroup/dmem,drm/ttm: Improve protection in contended cases Natalie Vock
  2025-11-10 12:37 ` [PATCH v3 1/5] cgroup/dmem: Add queries for protection values Natalie Vock
  2025-11-10 12:37 ` [PATCH v3 2/5] cgroup/dmem: Add dmem_cgroup_common_ancestor helper Natalie Vock
@ 2025-11-10 12:37 ` Natalie Vock
  2026-02-24 16:09   ` Tvrtko Ursulin
  2025-11-10 12:37 ` [PATCH v3 4/5] drm/ttm: Be more aggressive when allocating below protection limit Natalie Vock
  2025-11-10 12:37 ` [PATCH v3 5/5] drm/ttm: Use common ancestor of evictor and evictee as limit pool Natalie Vock
  4 siblings, 1 reply; 15+ messages in thread
From: Natalie Vock @ 2025-11-10 12:37 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Tejun Heo, Johannes Weiner,
	Michal Koutný, Christian Koenig, Huang Rui, Matthew Auld,
	Matthew Brost, Maarten Lankhorst, Thomas Zimmermann, David Airlie,
	Simona Vetter
  Cc: cgroups, dri-devel

ttm_bo_alloc_place is a new helper function to make an attempt at
allocating a bo's resource in a specific place. It also makes decisions
on whether eviction should be attempted: If -ENOSPC is returned,
allocation should not be retried.

Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 98 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 73 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index f4d9e68b21e70..829d994798835 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -489,6 +489,11 @@ int ttm_bo_evict_first(struct ttm_device *bdev, struct ttm_resource_manager *man
 	return ret;
 }
 
+struct ttm_bo_alloc_state {
+	/** @limit_pool: Which pool limit we should test against */
+	struct dmem_cgroup_pool_state *limit_pool;
+};
+
 /**
  * struct ttm_bo_evict_walk - Parameters for the evict walk.
  */
@@ -504,12 +509,13 @@ struct ttm_bo_evict_walk {
 	/** @evicted: Number of successful evictions. */
 	unsigned long evicted;
 
-	/** @limit_pool: Which pool limit we should test against */
-	struct dmem_cgroup_pool_state *limit_pool;
 	/** @try_low: Whether we should attempt to evict BO's with low watermark threshold */
 	bool try_low;
 	/** @hit_low: If we cannot evict a bo when @try_low is false (first pass) */
 	bool hit_low;
+
+	/** @alloc_state: */
+	struct ttm_bo_alloc_state *alloc_state;
 };
 
 static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct ttm_buffer_object *bo)
@@ -518,8 +524,9 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct ttm_buffer_object *
 		container_of(walk, typeof(*evict_walk), walk);
 	s64 lret;
 
-	if (!dmem_cgroup_state_evict_valuable(evict_walk->limit_pool, bo->resource->css,
-					      evict_walk->try_low, &evict_walk->hit_low))
+	if (!dmem_cgroup_state_evict_valuable(evict_walk->alloc_state->limit_pool,
+					      bo->resource->css, evict_walk->try_low,
+					      &evict_walk->hit_low))
 		return 0;
 
 	if (bo->pin_count || !bo->bdev->funcs->eviction_valuable(bo, evict_walk->place))
@@ -561,7 +568,7 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
 			      struct ttm_operation_ctx *ctx,
 			      struct ww_acquire_ctx *ticket,
 			      struct ttm_resource **res,
-			      struct dmem_cgroup_pool_state *limit_pool)
+			      struct ttm_bo_alloc_state *state)
 {
 	struct ttm_bo_evict_walk evict_walk = {
 		.walk = {
@@ -574,7 +581,7 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
 		.place = place,
 		.evictor = evictor,
 		.res = res,
-		.limit_pool = limit_pool,
+		.alloc_state = state,
 	};
 	s64 lret;
 
@@ -688,6 +695,47 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo,
 	return ret;
 }
 
+
+/**
+ * ttm_bo_alloc_at_place - Attempt allocating a BO's backing store in a place
+ *
+ * @bo: The buffer to allocate the backing store of
+ * @place: The place to attempt allocation in
+ * @ctx: ttm_operation_ctx associated with this allocation
+ * @force_space: If we should evict buffers to force space
+ * @res: On allocation success, the resulting struct ttm_resource.
+ * @alloc_state: Object holding allocation state such as charged cgroups.
+ *
+ * Returns:
+ * -EBUSY: No space available, but allocation should be retried with eviction.
+ * -ENOSPC: No space available, allocation should not be retried.
+ * -ERESTARTSYS: An interruptible sleep was interrupted by a signal.
+ *
+ */
+static int ttm_bo_alloc_at_place(struct ttm_buffer_object *bo,
+				 const struct ttm_place *place,
+				 struct ttm_operation_ctx *ctx,
+				 bool force_space,
+				 struct ttm_resource **res,
+				 struct ttm_bo_alloc_state *alloc_state)
+{
+	bool may_evict;
+	int ret;
+
+	may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
+
+	ret = ttm_resource_alloc(bo, place, res,
+				 force_space ? &alloc_state->limit_pool : NULL);
+
+	if (ret) {
+		if ((ret == -ENOSPC || ret == -EAGAIN) && may_evict)
+			ret = -EBUSY;
+		return ret;
+	}
+
+	return 0;
+}
+
 /**
  * ttm_bo_alloc_resource - Allocate backing store for a BO
  *
@@ -713,7 +761,9 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
 				 bool force_space,
 				 struct ttm_resource **res)
 {
+	struct ttm_bo_alloc_state alloc_state = {0};
 	struct ttm_device *bdev = bo->bdev;
+	struct ttm_resource_manager *man;
 	struct ww_acquire_ctx *ticket;
 	int i, ret;
 
@@ -724,9 +774,6 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
 
 	for (i = 0; i < placement->num_placement; ++i) {
 		const struct ttm_place *place = &placement->placement[i];
-		struct dmem_cgroup_pool_state *limit_pool = NULL;
-		struct ttm_resource_manager *man;
-		bool may_evict;
 
 		man = ttm_manager_type(bdev, place->mem_type);
 		if (!man || !ttm_resource_manager_used(man))
@@ -736,25 +783,26 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
 				    TTM_PL_FLAG_FALLBACK))
 			continue;
 
-		may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
-		ret = ttm_resource_alloc(bo, place, res, force_space ? &limit_pool : NULL);
-		if (ret) {
-			if (ret != -ENOSPC && ret != -EAGAIN) {
-				dmem_cgroup_pool_state_put(limit_pool);
-				return ret;
-			}
-			if (!may_evict) {
-				dmem_cgroup_pool_state_put(limit_pool);
-				continue;
-			}
+		ret = ttm_bo_alloc_at_place(bo, place, ctx, force_space,
+				res, &alloc_state);
 
+		if (ret == -ENOSPC) {
+			dmem_cgroup_pool_state_put(alloc_state.limit_pool);
+			continue;
+		} else if (ret == -EBUSY) {
 			ret = ttm_bo_evict_alloc(bdev, man, place, bo, ctx,
-						 ticket, res, limit_pool);
-			dmem_cgroup_pool_state_put(limit_pool);
-			if (ret == -EBUSY)
+						 ticket, res, &alloc_state);
+
+			dmem_cgroup_pool_state_put(alloc_state.limit_pool);
+
+			if (ret) {
+				if (ret != -ENOSPC && ret != -EBUSY)
+					return ret;
 				continue;
-			if (ret)
-				return ret;
+			}
+		} else if (ret) {
+			dmem_cgroup_pool_state_put(alloc_state.limit_pool);
+			return ret;
 		}
 
 		ret = ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu);

-- 
2.51.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 3/5] drm/ttm: Make a helper for attempting allocation in a place
  2025-11-10 12:37 ` [PATCH v3 3/5] drm/ttm: Make a helper for attempting allocation in a place Natalie Vock
@ 2026-02-24 16:09   ` Tvrtko Ursulin
  2026-02-25  9:36     ` Natalie Vock
  0 siblings, 1 reply; 15+ messages in thread
From: Tvrtko Ursulin @ 2026-02-24 16:09 UTC (permalink / raw)
  To: Natalie Vock, Maarten Lankhorst, Maxime Ripard, Tejun Heo,
	Johannes Weiner, Michal Koutný, Christian Koenig, Huang Rui,
	Matthew Auld, Matthew Brost, Maarten Lankhorst, Thomas Zimmermann,
	David Airlie, Simona Vetter
  Cc: cgroups, dri-devel


On 10/11/2025 12:37, Natalie Vock wrote:
> ttm_bo_alloc_place is a new helper function to make an attempt at
> allocating a bo's resource in a specific place. It also makes decisions
> on whether eviction should be attempted: If -ENOSPC is returned,
> allocation should not be retried.

At first I thought the new helper will get used from more than one call 
site but it seems that is not the case? No objections to extract some 
code to be clear, just that when I see a patch title of "making a 
helper" I expect something different.

Is there any functional difference here or it is just prep to enable 
easier extending in the following patch? If no functional difference it 
is good to state that in the commit message. If functional difference 
please explain what and why.

Also please explain that the patch is only adding a new struct parameter 
as a preparation for it being extended.

> 
> Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 98 +++++++++++++++++++++++++++++++++-----------
>   1 file changed, 73 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index f4d9e68b21e70..829d994798835 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -489,6 +489,11 @@ int ttm_bo_evict_first(struct ttm_device *bdev, struct ttm_resource_manager *man
>   	return ret;
>   }
>   
> +struct ttm_bo_alloc_state {
> +	/** @limit_pool: Which pool limit we should test against */
> +	struct dmem_cgroup_pool_state *limit_pool;
> +};
> +
>   /**
>    * struct ttm_bo_evict_walk - Parameters for the evict walk.
>    */
> @@ -504,12 +509,13 @@ struct ttm_bo_evict_walk {
>   	/** @evicted: Number of successful evictions. */
>   	unsigned long evicted;
>   
> -	/** @limit_pool: Which pool limit we should test against */
> -	struct dmem_cgroup_pool_state *limit_pool;
>   	/** @try_low: Whether we should attempt to evict BO's with low watermark threshold */
>   	bool try_low;
>   	/** @hit_low: If we cannot evict a bo when @try_low is false (first pass) */
>   	bool hit_low;
> +
> +	/** @alloc_state: */
> +	struct ttm_bo_alloc_state *alloc_state;
>   };
>   
>   static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct ttm_buffer_object *bo)
> @@ -518,8 +524,9 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct ttm_buffer_object *
>   		container_of(walk, typeof(*evict_walk), walk);
>   	s64 lret;
>   
> -	if (!dmem_cgroup_state_evict_valuable(evict_walk->limit_pool, bo->resource->css,
> -					      evict_walk->try_low, &evict_walk->hit_low))
> +	if (!dmem_cgroup_state_evict_valuable(evict_walk->alloc_state->limit_pool,
> +					      bo->resource->css, evict_walk->try_low,
> +					      &evict_walk->hit_low))
>   		return 0;
>   
>   	if (bo->pin_count || !bo->bdev->funcs->eviction_valuable(bo, evict_walk->place))
> @@ -561,7 +568,7 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
>   			      struct ttm_operation_ctx *ctx,
>   			      struct ww_acquire_ctx *ticket,
>   			      struct ttm_resource **res,
> -			      struct dmem_cgroup_pool_state *limit_pool)
> +			      struct ttm_bo_alloc_state *state)
>   {
>   	struct ttm_bo_evict_walk evict_walk = {
>   		.walk = {
> @@ -574,7 +581,7 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
>   		.place = place,
>   		.evictor = evictor,
>   		.res = res,
> -		.limit_pool = limit_pool,
> +		.alloc_state = state,
>   	};
>   	s64 lret;
>   
> @@ -688,6 +695,47 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo,
>   	return ret;
>   }
>   
> +
> +/**
> + * ttm_bo_alloc_at_place - Attempt allocating a BO's backing store in a place
> + *
> + * @bo: The buffer to allocate the backing store of
> + * @place: The place to attempt allocation in
> + * @ctx: ttm_operation_ctx associated with this allocation
> + * @force_space: If we should evict buffers to force space
> + * @res: On allocation success, the resulting struct ttm_resource.
> + * @alloc_state: Object holding allocation state such as charged cgroups.
> + *
> + * Returns:
> + * -EBUSY: No space available, but allocation should be retried with eviction.

With or after eviction?

> + * -ENOSPC: No space available, allocation should not be retried.
> + * -ERESTARTSYS: An interruptible sleep was interrupted by a signal.

-EAGAIN cannot get out from ttm_resource_alloc()? In the current 
codebase or only with this patch?

> + *
> + */
> +static int ttm_bo_alloc_at_place(struct ttm_buffer_object *bo,
> +				 const struct ttm_place *place,
> +				 struct ttm_operation_ctx *ctx,
> +				 bool force_space,
> +				 struct ttm_resource **res,
> +				 struct ttm_bo_alloc_state *alloc_state)

Maybe you did not write this but I am curious and thinking out loud 
here. The documentation for struct ttm_operation_ctx among other things 
says:

"""
  * Context for TTM operations like changing buffer placement or general 
memory
  * allocation.
"""

Hence I am wondering if the new alloc_state couldn't simply go in there? 
Which would make the function prototype identical to the existing 
ttm_bo_alloc_resource and is also already passed through the relevant 
call chains. Which raises another question - why did 
ttm_bo_evict_alloc() need to have struct dmem_cgroup_pool_state as a 
separate argument and why it couldn't be passed in the context?

> +{
> +	bool may_evict;
> +	int ret;
> +
> +	may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
> +
> +	ret = ttm_resource_alloc(bo, place, res,
> +				 force_space ? &alloc_state->limit_pool : NULL);
> +
> +	if (ret) {
> +		if ((ret == -ENOSPC || ret == -EAGAIN) && may_evict)
> +			ret = -EBUSY;
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
>   /**
>    * ttm_bo_alloc_resource - Allocate backing store for a BO
>    *
> @@ -713,7 +761,9 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
>   				 bool force_space,
>   				 struct ttm_resource **res)
>   {
> +	struct ttm_bo_alloc_state alloc_state = {0};

= {}; is usually enough.

Any particular reason to move this and the manager outside of the loop?

>   	struct ttm_device *bdev = bo->bdev;
> +	struct ttm_resource_manager *man;
>   	struct ww_acquire_ctx *ticket;
>   	int i, ret;
>   
> @@ -724,9 +774,6 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
>   
>   	for (i = 0; i < placement->num_placement; ++i) {
>   		const struct ttm_place *place = &placement->placement[i];
> -		struct dmem_cgroup_pool_state *limit_pool = NULL;
> -		struct ttm_resource_manager *man;
> -		bool may_evict;
>   
>   		man = ttm_manager_type(bdev, place->mem_type);
>   		if (!man || !ttm_resource_manager_used(man))
> @@ -736,25 +783,26 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
>   				    TTM_PL_FLAG_FALLBACK))
>   			continue;
>   
> -		may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
> -		ret = ttm_resource_alloc(bo, place, res, force_space ? &limit_pool : NULL);
> -		if (ret) {
> -			if (ret != -ENOSPC && ret != -EAGAIN) {
> -				dmem_cgroup_pool_state_put(limit_pool);
> -				return ret;
> -			}
> -			if (!may_evict) {
> -				dmem_cgroup_pool_state_put(limit_pool);
> -				continue;
> -			}
> +		ret = ttm_bo_alloc_at_place(bo, place, ctx, force_space,
> +				res, &alloc_state);
>   
> +		if (ret == -ENOSPC) {
> +			dmem_cgroup_pool_state_put(alloc_state.limit_pool);
> +			continue;

I always disliked the TTM eviction logic and now I remember why. It 
requires a brain size of a small planet to figure out the flow.. :) I'd 
say this change makes it more readable.

> +		} else if (ret == -EBUSY) {
>   			ret = ttm_bo_evict_alloc(bdev, man, place, bo, ctx,
> -						 ticket, res, limit_pool);
> -			dmem_cgroup_pool_state_put(limit_pool);
> -			if (ret == -EBUSY)
> +						 ticket, res, &alloc_state);
> +
> +			dmem_cgroup_pool_state_put(alloc_state.limit_pool);
> +
> +			if (ret) {
> +				if (ret != -ENOSPC && ret != -EBUSY)
> +					return ret;
>   				continue;

Is this a functional change and why? Before only EBUSY went to the next 
placement. Now ENOSPC does as well.

Regards,

Tvrtko

> -			if (ret)
> -				return ret;
> +			}
> +		} else if (ret) {
> +			dmem_cgroup_pool_state_put(alloc_state.limit_pool);
> +			return ret;
>   		}
>   
>   		ret = ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu);
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 3/5] drm/ttm: Make a helper for attempting allocation in a place
  2026-02-24 16:09   ` Tvrtko Ursulin
@ 2026-02-25  9:36     ` Natalie Vock
  0 siblings, 0 replies; 15+ messages in thread
From: Natalie Vock @ 2026-02-25  9:36 UTC (permalink / raw)
  To: Tvrtko Ursulin, Maarten Lankhorst, Maxime Ripard, Tejun Heo,
	Johannes Weiner, Michal Koutný, Christian Koenig, Huang Rui,
	Matthew Auld, Matthew Brost, Maarten Lankhorst, Thomas Zimmermann,
	David Airlie, Simona Vetter
  Cc: cgroups, dri-devel

On 2/24/26 17:09, Tvrtko Ursulin wrote:
> 
> On 10/11/2025 12:37, Natalie Vock wrote:
>> ttm_bo_alloc_place is a new helper function to make an attempt at
>> allocating a bo's resource in a specific place. It also makes decisions
>> on whether eviction should be attempted: If -ENOSPC is returned,
>> allocation should not be retried.
> 
> At first I thought the new helper will get used from more than one call 
> site but it seems that is not the case? No objections to extract some 
> code to be clear, just that when I see a patch title of "making a 
> helper" I expect something different.

Yes, this is just extracting code to a separate function (which itself 
is preparation for the next patch). I'll clarify.

> 
> Is there any functional difference here or it is just prep to enable 
> easier extending in the following patch? If no functional difference it 
> is good to state that in the commit message. If functional difference 
> please explain what and why.
There should be no functional difference.

> 
> Also please explain that the patch is only adding a new struct parameter 
> as a preparation for it being extended.
> 
>>
>> Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
>> ---
>>   drivers/gpu/drm/ttm/ttm_bo.c | 98 ++++++++++++++++++++++++++++++++ 
>> +-----------
>>   1 file changed, 73 insertions(+), 25 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> index f4d9e68b21e70..829d994798835 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -489,6 +489,11 @@ int ttm_bo_evict_first(struct ttm_device *bdev, 
>> struct ttm_resource_manager *man
>>       return ret;
>>   }
>> +struct ttm_bo_alloc_state {
>> +    /** @limit_pool: Which pool limit we should test against */
>> +    struct dmem_cgroup_pool_state *limit_pool;
>> +};
>> +
>>   /**
>>    * struct ttm_bo_evict_walk - Parameters for the evict walk.
>>    */
>> @@ -504,12 +509,13 @@ struct ttm_bo_evict_walk {
>>       /** @evicted: Number of successful evictions. */
>>       unsigned long evicted;
>> -    /** @limit_pool: Which pool limit we should test against */
>> -    struct dmem_cgroup_pool_state *limit_pool;
>>       /** @try_low: Whether we should attempt to evict BO's with low 
>> watermark threshold */
>>       bool try_low;
>>       /** @hit_low: If we cannot evict a bo when @try_low is false 
>> (first pass) */
>>       bool hit_low;
>> +
>> +    /** @alloc_state: */
>> +    struct ttm_bo_alloc_state *alloc_state;
>>   };
>>   static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct 
>> ttm_buffer_object *bo)
>> @@ -518,8 +524,9 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk 
>> *walk, struct ttm_buffer_object *
>>           container_of(walk, typeof(*evict_walk), walk);
>>       s64 lret;
>> -    if (!dmem_cgroup_state_evict_valuable(evict_walk->limit_pool, bo- 
>> >resource->css,
>> -                          evict_walk->try_low, &evict_walk->hit_low))
>> +    if (!dmem_cgroup_state_evict_valuable(evict_walk->alloc_state- 
>> >limit_pool,
>> +                          bo->resource->css, evict_walk->try_low,
>> +                          &evict_walk->hit_low))
>>           return 0;
>>       if (bo->pin_count || !bo->bdev->funcs->eviction_valuable(bo, 
>> evict_walk->place))
>> @@ -561,7 +568,7 @@ static int ttm_bo_evict_alloc(struct ttm_device 
>> *bdev,
>>                     struct ttm_operation_ctx *ctx,
>>                     struct ww_acquire_ctx *ticket,
>>                     struct ttm_resource **res,
>> -                  struct dmem_cgroup_pool_state *limit_pool)
>> +                  struct ttm_bo_alloc_state *state)
>>   {
>>       struct ttm_bo_evict_walk evict_walk = {
>>           .walk = {
>> @@ -574,7 +581,7 @@ static int ttm_bo_evict_alloc(struct ttm_device 
>> *bdev,
>>           .place = place,
>>           .evictor = evictor,
>>           .res = res,
>> -        .limit_pool = limit_pool,
>> +        .alloc_state = state,
>>       };
>>       s64 lret;
>> @@ -688,6 +695,47 @@ static int ttm_bo_add_move_fence(struct 
>> ttm_buffer_object *bo,
>>       return ret;
>>   }
>> +
>> +/**
>> + * ttm_bo_alloc_at_place - Attempt allocating a BO's backing store in 
>> a place
>> + *
>> + * @bo: The buffer to allocate the backing store of
>> + * @place: The place to attempt allocation in
>> + * @ctx: ttm_operation_ctx associated with this allocation
>> + * @force_space: If we should evict buffers to force space
>> + * @res: On allocation success, the resulting struct ttm_resource.
>> + * @alloc_state: Object holding allocation state such as charged 
>> cgroups.
>> + *
>> + * Returns:
>> + * -EBUSY: No space available, but allocation should be retried with 
>> eviction.
> 
> With or after eviction?

I suppose I should say "with ttm_bo_evict_alloc" to remove ambiguity. 
ttm_bo_evict_alloc retries allocation in a loop while also evicting, so 
in a sense it evicts and retries at the same time.

> 
>> + * -ENOSPC: No space available, allocation should not be retried.
>> + * -ERESTARTSYS: An interruptible sleep was interrupted by a signal.
> 
> -EAGAIN cannot get out from ttm_resource_alloc()? In the current 
> codebase or only with this patch?

It's supposed to be in this patch, too, but you're right it's not right 
now. Will add handling for it.

> 
>> + *
>> + */
>> +static int ttm_bo_alloc_at_place(struct ttm_buffer_object *bo,
>> +                 const struct ttm_place *place,
>> +                 struct ttm_operation_ctx *ctx,
>> +                 bool force_space,
>> +                 struct ttm_resource **res,
>> +                 struct ttm_bo_alloc_state *alloc_state)
> 
> Maybe you did not write this but I am curious and thinking out loud 
> here. The documentation for struct ttm_operation_ctx among other things 
> says:
> 
> """
>   * Context for TTM operations like changing buffer placement or general 
> memory
>   * allocation.
> """
> 
> Hence I am wondering if the new alloc_state couldn't simply go in there? 
> Which would make the function prototype identical to the existing 
> ttm_bo_alloc_resource and is also already passed through the relevant 
> call chains. Which raises another question - why did 
> ttm_bo_evict_alloc() need to have struct dmem_cgroup_pool_state as a 
> separate argument and why it couldn't be passed in the context?

I was operating under the assumption that ttm_operation_ctx is state 
that is shared among all (or nearly all) TTM operations. Under that 
assumption, it felt wrong to add structs to ttm_operation_ctx that end 
up only ever being used for one specific operation (i.e. memory allocation).

> 
>> +{
>> +    bool may_evict;
>> +    int ret;
>> +
>> +    may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
>> +
>> +    ret = ttm_resource_alloc(bo, place, res,
>> +                 force_space ? &alloc_state->limit_pool : NULL);
>> +
>> +    if (ret) {
>> +        if ((ret == -ENOSPC || ret == -EAGAIN) && may_evict)
>> +            ret = -EBUSY;
>> +        return ret;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>   /**
>>    * ttm_bo_alloc_resource - Allocate backing store for a BO
>>    *
>> @@ -713,7 +761,9 @@ static int ttm_bo_alloc_resource(struct 
>> ttm_buffer_object *bo,
>>                    bool force_space,
>>                    struct ttm_resource **res)
>>   {
>> +    struct ttm_bo_alloc_state alloc_state = {0};
> 
> = {}; is usually enough.
> 
> Any particular reason to move this and the manager outside of the loop?

No, will fix.

> 
>>       struct ttm_device *bdev = bo->bdev;
>> +    struct ttm_resource_manager *man;
>>       struct ww_acquire_ctx *ticket;
>>       int i, ret;
>> @@ -724,9 +774,6 @@ static int ttm_bo_alloc_resource(struct 
>> ttm_buffer_object *bo,
>>       for (i = 0; i < placement->num_placement; ++i) {
>>           const struct ttm_place *place = &placement->placement[i];
>> -        struct dmem_cgroup_pool_state *limit_pool = NULL;
>> -        struct ttm_resource_manager *man;
>> -        bool may_evict;
>>           man = ttm_manager_type(bdev, place->mem_type);
>>           if (!man || !ttm_resource_manager_used(man))
>> @@ -736,25 +783,26 @@ static int ttm_bo_alloc_resource(struct 
>> ttm_buffer_object *bo,
>>                       TTM_PL_FLAG_FALLBACK))
>>               continue;
>> -        may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
>> -        ret = ttm_resource_alloc(bo, place, res, force_space ? 
>> &limit_pool : NULL);
>> -        if (ret) {
>> -            if (ret != -ENOSPC && ret != -EAGAIN) {
>> -                dmem_cgroup_pool_state_put(limit_pool);
>> -                return ret;
>> -            }
>> -            if (!may_evict) {
>> -                dmem_cgroup_pool_state_put(limit_pool);
>> -                continue;
>> -            }
>> +        ret = ttm_bo_alloc_at_place(bo, place, ctx, force_space,
>> +                res, &alloc_state);
>> +        if (ret == -ENOSPC) {
>> +            dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>> +            continue;
> 
> I always disliked the TTM eviction logic and now I remember why. It 
> requires a brain size of a small planet to figure out the flow.. :) I'd 
> say this change makes it more readable.
> 
>> +        } else if (ret == -EBUSY) {
>>               ret = ttm_bo_evict_alloc(bdev, man, place, bo, ctx,
>> -                         ticket, res, limit_pool);
>> -            dmem_cgroup_pool_state_put(limit_pool);
>> -            if (ret == -EBUSY)
>> +                         ticket, res, &alloc_state);
>> +
>> +            dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>> +
>> +            if (ret) {
>> +                if (ret != -ENOSPC && ret != -EBUSY)
>> +                    return ret;
>>                   continue;
> 
> Is this a functional change and why? Before only EBUSY went to the next 
> placement. Now ENOSPC does as well.

Right, I didn't look at what each part of the evict walk does with all 
the errno values. I'm not sure it's really possible to get -ENOSPC out 
of ttm_bo_evict_alloc since that is special-cased, but in any case, this 
should just be ret != -EBUSY.

Thanks,
Natalie

> 
> Regards,
> 
> Tvrtko
> 
>> -            if (ret)
>> -                return ret;
>> +            }
>> +        } else if (ret) {
>> +            dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>> +            return ret;
>>           }
>>           ret = ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu);
>>
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 4/5] drm/ttm: Be more aggressive when allocating below protection limit
  2025-11-10 12:37 [PATCH v3 0/5] cgroup/dmem,drm/ttm: Improve protection in contended cases Natalie Vock
                   ` (2 preceding siblings ...)
  2025-11-10 12:37 ` [PATCH v3 3/5] drm/ttm: Make a helper for attempting allocation in a place Natalie Vock
@ 2025-11-10 12:37 ` Natalie Vock
  2026-02-24 16:40   ` Tvrtko Ursulin
  2026-02-25 11:45   ` Tvrtko Ursulin
  2025-11-10 12:37 ` [PATCH v3 5/5] drm/ttm: Use common ancestor of evictor and evictee as limit pool Natalie Vock
  4 siblings, 2 replies; 15+ messages in thread
From: Natalie Vock @ 2025-11-10 12:37 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Tejun Heo, Johannes Weiner,
	Michal Koutný, Christian Koenig, Huang Rui, Matthew Auld,
	Matthew Brost, Maarten Lankhorst, Thomas Zimmermann, David Airlie,
	Simona Vetter
  Cc: cgroups, dri-devel

When the cgroup's memory usage is below the low/min limit and allocation
fails, try evicting some unprotected buffers to make space. Otherwise,
application buffers may be forced to go into GTT even though usage is
below the corresponding low/min limit, if other applications filled VRAM
with their allocations first.

Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
---
 drivers/gpu/drm/ttm/ttm_bo.c       | 75 ++++++++++++++++++++++++++++++++++----
 drivers/gpu/drm/ttm/ttm_resource.c | 48 +++++++++++++++++-------
 include/drm/ttm/ttm_resource.h     |  6 ++-
 3 files changed, 108 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 829d994798835..bd467c965e1bc 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -490,8 +490,12 @@ int ttm_bo_evict_first(struct ttm_device *bdev, struct ttm_resource_manager *man
 }
 
 struct ttm_bo_alloc_state {
+	/** @charge_pool: The memory pool the resource is charged to */
+	struct dmem_cgroup_pool_state *charge_pool;
 	/** @limit_pool: Which pool limit we should test against */
 	struct dmem_cgroup_pool_state *limit_pool;
+	/** @only_evict_unprotected: If eviction should be restricted to unprotected BOs */
+	bool only_evict_unprotected;
 };
 
 /**
@@ -546,7 +550,7 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct ttm_buffer_object *
 	evict_walk->evicted++;
 	if (evict_walk->res)
 		lret = ttm_resource_alloc(evict_walk->evictor, evict_walk->place,
-					  evict_walk->res, NULL);
+					  evict_walk->res, evict_walk->alloc_state->charge_pool);
 	if (lret == 0)
 		return 1;
 out:
@@ -589,7 +593,7 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
 	lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
 
 	/* One more attempt if we hit low limit? */
-	if (!lret && evict_walk.hit_low) {
+	if (!lret && evict_walk.hit_low && !state->only_evict_unprotected) {
 		evict_walk.try_low = true;
 		lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
 	}
@@ -610,7 +614,8 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
 	} while (!lret && evict_walk.evicted);
 
 	/* We hit the low limit? Try once more */
-	if (!lret && evict_walk.hit_low && !evict_walk.try_low) {
+	if (!lret && evict_walk.hit_low && !evict_walk.try_low &&
+			!state->only_evict_unprotected) {
 		evict_walk.try_low = true;
 		goto retry;
 	}
@@ -719,20 +724,72 @@ static int ttm_bo_alloc_at_place(struct ttm_buffer_object *bo,
 				 struct ttm_resource **res,
 				 struct ttm_bo_alloc_state *alloc_state)
 {
-	bool may_evict;
+	bool may_evict, below_low = false;
 	int ret;
 
 	may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
+	ret = ttm_resource_try_charge(bo, place, &alloc_state->charge_pool,
+				      force_space ? &alloc_state->limit_pool : NULL);
+	if (ret) {
+		/*
+		 * -EAGAIN means the charge failed, which we treat like an
+		 * allocation failure. Therefore, return an error code indicating
+		 * the allocation failed - either -EBUSY if the allocation should
+		 * be retried with eviction, or -ENOSPC if there should be no second
+		 * attempt.
+		 */
+		if (ret == -EAGAIN)
+			ret = may_evict ? -EBUSY : -ENOSPC;
+		return ret;
+	}
 
-	ret = ttm_resource_alloc(bo, place, res,
-				 force_space ? &alloc_state->limit_pool : NULL);
+	/*
+	 * cgroup protection plays a special role in eviction.
+	 * Conceptually, protection of memory via the dmem cgroup controller
+	 * entitles the protected cgroup to use a certain amount of memory.
+	 * There are two types of protection - the 'low' limit is a
+	 * "best-effort" protection, whereas the 'min' limit provides a hard
+	 * guarantee that memory within the cgroup's allowance will not be
+	 * evicted under any circumstance.
+	 *
+	 * To faithfully model this concept in TTM, we also need to take cgroup
+	 * protection into account when allocating. When allocation in one
+	 * place fails, TTM will default to trying other places first before
+	 * evicting.
+	 * If the allocation is covered by dmem cgroup protection, however,
+	 * this prevents the allocation from using the memory it is "entitled"
+	 * to. To make sure unprotected allocations cannot push new protected
+	 * allocations out of places they are "entitled" to use, we should
+	 * evict buffers not covered by any cgroup protection, if this
+	 * allocation is covered by cgroup protection.
+	 *
+	 * Buffers covered by 'min' protection are a special case - the 'min'
+	 * limit is a stronger guarantee than 'low', and thus buffers protected
+	 * by 'low' but not 'min' should also be considered for eviction.
+	 * Buffers protected by 'min' will never be considered for eviction
+	 * anyway, so the regular eviction path should be triggered here.
+	 * Buffers protected by 'low' but not 'min' will take a special
+	 * eviction path that only evicts buffers covered by neither 'low' or
+	 * 'min' protections.
+	 */
+	may_evict |= dmem_cgroup_below_min(NULL, alloc_state->charge_pool);
+	below_low = dmem_cgroup_below_low(NULL, alloc_state->charge_pool);
+	alloc_state->only_evict_unprotected = !may_evict && below_low;
+
+	ret = ttm_resource_alloc(bo, place, res, alloc_state->charge_pool);
 
 	if (ret) {
-		if ((ret == -ENOSPC || ret == -EAGAIN) && may_evict)
+		if ((ret == -ENOSPC || ret == -EAGAIN) &&
+				(may_evict || below_low))
 			ret = -EBUSY;
 		return ret;
 	}
 
+	/*
+	 * Ownership of charge_pool has been transferred to the TTM resource,
+	 * don't make the caller think we still hold a reference to it.
+	 */
+	alloc_state->charge_pool = NULL;
 	return 0;
 }
 
@@ -787,6 +844,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
 				res, &alloc_state);
 
 		if (ret == -ENOSPC) {
+			dmem_cgroup_pool_state_put(alloc_state.charge_pool);
 			dmem_cgroup_pool_state_put(alloc_state.limit_pool);
 			continue;
 		} else if (ret == -EBUSY) {
@@ -796,11 +854,14 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
 			dmem_cgroup_pool_state_put(alloc_state.limit_pool);
 
 			if (ret) {
+				dmem_cgroup_pool_state_put(
+						alloc_state.charge_pool);
 				if (ret != -ENOSPC && ret != -EBUSY)
 					return ret;
 				continue;
 			}
 		} else if (ret) {
+			dmem_cgroup_pool_state_put(alloc_state.charge_pool);
 			dmem_cgroup_pool_state_put(alloc_state.limit_pool);
 			return ret;
 		}
diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
index e2c82ad07eb44..fcfa8b51b0337 100644
--- a/drivers/gpu/drm/ttm/ttm_resource.c
+++ b/drivers/gpu/drm/ttm/ttm_resource.c
@@ -372,30 +372,52 @@ void ttm_resource_fini(struct ttm_resource_manager *man,
 }
 EXPORT_SYMBOL(ttm_resource_fini);
 
+/**
+ * ttm_resource_try_charge - charge a resource manager's cgroup pool
+ * @bo: buffer for which an allocation should be charged
+ * @place: where the allocation is attempted to be placed
+ * @ret_pool: on charge success, the pool that was charged
+ * @ret_limit_pool: on charge failure, the pool responsible for the failure
+ *
+ * Should be used to charge cgroups before attempting resource allocation.
+ * When charging succeeds, the value of ret_pool should be passed to
+ * ttm_resource_alloc.
+ *
+ * Returns: 0 on charge success, negative errno on failure.
+ */
+int ttm_resource_try_charge(struct ttm_buffer_object *bo,
+			    const struct ttm_place *place,
+			    struct dmem_cgroup_pool_state **ret_pool,
+			    struct dmem_cgroup_pool_state **ret_limit_pool)
+{
+	struct ttm_resource_manager *man =
+		ttm_manager_type(bo->bdev, place->mem_type);
+
+	if (!man->cg) {
+		*ret_pool = NULL;
+		if (ret_limit_pool)
+			*ret_limit_pool = NULL;
+		return 0;
+	}
+
+	return dmem_cgroup_try_charge(man->cg, bo->base.size, ret_pool,
+				      ret_limit_pool);
+}
+
 int ttm_resource_alloc(struct ttm_buffer_object *bo,
 		       const struct ttm_place *place,
 		       struct ttm_resource **res_ptr,
-		       struct dmem_cgroup_pool_state **ret_limit_pool)
+		       struct dmem_cgroup_pool_state *charge_pool)
 {
 	struct ttm_resource_manager *man =
 		ttm_manager_type(bo->bdev, place->mem_type);
-	struct dmem_cgroup_pool_state *pool = NULL;
 	int ret;
 
-	if (man->cg) {
-		ret = dmem_cgroup_try_charge(man->cg, bo->base.size, &pool, ret_limit_pool);
-		if (ret)
-			return ret;
-	}
-
 	ret = man->func->alloc(man, bo, place, res_ptr);
-	if (ret) {
-		if (pool)
-			dmem_cgroup_uncharge(pool, bo->base.size);
+	if (ret)
 		return ret;
-	}
 
-	(*res_ptr)->css = pool;
+	(*res_ptr)->css = charge_pool;
 
 	spin_lock(&bo->bdev->lru_lock);
 	ttm_resource_add_bulk_move(*res_ptr, bo);
diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h
index e52bba15012f7..3aef7efdd7cfb 100644
--- a/include/drm/ttm/ttm_resource.h
+++ b/include/drm/ttm/ttm_resource.h
@@ -442,10 +442,14 @@ void ttm_resource_init(struct ttm_buffer_object *bo,
 void ttm_resource_fini(struct ttm_resource_manager *man,
 		       struct ttm_resource *res);
 
+int ttm_resource_try_charge(struct ttm_buffer_object *bo,
+			    const struct ttm_place *place,
+			    struct dmem_cgroup_pool_state **ret_pool,
+			    struct dmem_cgroup_pool_state **ret_limit_pool);
 int ttm_resource_alloc(struct ttm_buffer_object *bo,
 		       const struct ttm_place *place,
 		       struct ttm_resource **res,
-		       struct dmem_cgroup_pool_state **ret_limit_pool);
+		       struct dmem_cgroup_pool_state *charge_pool);
 void ttm_resource_free(struct ttm_buffer_object *bo, struct ttm_resource **res);
 bool ttm_resource_intersects(struct ttm_device *bdev,
 			     struct ttm_resource *res,

-- 
2.51.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/5] drm/ttm: Be more aggressive when allocating below protection limit
  2025-11-10 12:37 ` [PATCH v3 4/5] drm/ttm: Be more aggressive when allocating below protection limit Natalie Vock
@ 2026-02-24 16:40   ` Tvrtko Ursulin
  2026-02-25  9:49     ` Natalie Vock
  2026-02-25 11:45   ` Tvrtko Ursulin
  1 sibling, 1 reply; 15+ messages in thread
From: Tvrtko Ursulin @ 2026-02-24 16:40 UTC (permalink / raw)
  To: Natalie Vock, Maarten Lankhorst, Maxime Ripard, Tejun Heo,
	Johannes Weiner, Michal Koutný, Christian Koenig, Huang Rui,
	Matthew Auld, Matthew Brost, Maarten Lankhorst, Thomas Zimmermann,
	David Airlie, Simona Vetter
  Cc: cgroups, dri-devel


On 10/11/2025 12:37, Natalie Vock wrote:
> When the cgroup's memory usage is below the low/min limit and allocation
> fails, try evicting some unprotected buffers to make space. Otherwise,
> application buffers may be forced to go into GTT even though usage is
> below the corresponding low/min limit, if other applications filled VRAM
> with their allocations first.
> 
> Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c       | 75 ++++++++++++++++++++++++++++++++++----
>   drivers/gpu/drm/ttm/ttm_resource.c | 48 +++++++++++++++++-------
>   include/drm/ttm/ttm_resource.h     |  6 ++-
>   3 files changed, 108 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 829d994798835..bd467c965e1bc 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -490,8 +490,12 @@ int ttm_bo_evict_first(struct ttm_device *bdev, struct ttm_resource_manager *man
>   }
>   
>   struct ttm_bo_alloc_state {
> +	/** @charge_pool: The memory pool the resource is charged to */
> +	struct dmem_cgroup_pool_state *charge_pool;
>   	/** @limit_pool: Which pool limit we should test against */
>   	struct dmem_cgroup_pool_state *limit_pool;
> +	/** @only_evict_unprotected: If eviction should be restricted to unprotected BOs */
> +	bool only_evict_unprotected;
>   };
>   
>   /**
> @@ -546,7 +550,7 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct ttm_buffer_object *
>   	evict_walk->evicted++;
>   	if (evict_walk->res)
>   		lret = ttm_resource_alloc(evict_walk->evictor, evict_walk->place,
> -					  evict_walk->res, NULL);
> +					  evict_walk->res, evict_walk->alloc_state->charge_pool);
>   	if (lret == 0)
>   		return 1;
>   out:
> @@ -589,7 +593,7 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
>   	lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
>   
>   	/* One more attempt if we hit low limit? */
> -	if (!lret && evict_walk.hit_low) {
> +	if (!lret && evict_walk.hit_low && !state->only_evict_unprotected) {

What is unprotected synonymous with? No low watermark set? Should 
dmem_cgroup_state_evict_valuable() even set *hit_low = true for if low 
is not set to begin with?

>   		evict_walk.try_low = true;
>   		lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
>   	}
> @@ -610,7 +614,8 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
>   	} while (!lret && evict_walk.evicted);
>   
>   	/* We hit the low limit? Try once more */
> -	if (!lret && evict_walk.hit_low && !evict_walk.try_low) {
> +	if (!lret && evict_walk.hit_low && !evict_walk.try_low &&
> +			!state->only_evict_unprotected) {
>   		evict_walk.try_low = true;
>   		goto retry;
>   	}
> @@ -719,20 +724,72 @@ static int ttm_bo_alloc_at_place(struct ttm_buffer_object *bo,
>   				 struct ttm_resource **res,
>   				 struct ttm_bo_alloc_state *alloc_state)
>   {
> -	bool may_evict;
> +	bool may_evict, below_low = false;
>   	int ret;
>   
>   	may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
> +	ret = ttm_resource_try_charge(bo, place, &alloc_state->charge_pool,
> +				      force_space ? &alloc_state->limit_pool : NULL);
> +	if (ret) {
> +		/*
> +		 * -EAGAIN means the charge failed, which we treat like an
> +		 * allocation failure. Therefore, return an error code indicating
> +		 * the allocation failed - either -EBUSY if the allocation should
> +		 * be retried with eviction, or -ENOSPC if there should be no second
> +		 * attempt.
> +		 */
> +		if (ret == -EAGAIN)
> +			ret = may_evict ? -EBUSY : -ENOSPC;
> +		return ret;
> +	}
>   
> -	ret = ttm_resource_alloc(bo, place, res,
> -				 force_space ? &alloc_state->limit_pool : NULL);
> +	/*
> +	 * cgroup protection plays a special role in eviction.
> +	 * Conceptually, protection of memory via the dmem cgroup controller
> +	 * entitles the protected cgroup to use a certain amount of memory.
> +	 * There are two types of protection - the 'low' limit is a
> +	 * "best-effort" protection, whereas the 'min' limit provides a hard
> +	 * guarantee that memory within the cgroup's allowance will not be
> +	 * evicted under any circumstance.
> +	 *
> +	 * To faithfully model this concept in TTM, we also need to take cgroup
> +	 * protection into account when allocating. When allocation in one
> +	 * place fails, TTM will default to trying other places first before
> +	 * evicting.
> +	 * If the allocation is covered by dmem cgroup protection, however,
> +	 * this prevents the allocation from using the memory it is "entitled"
> +	 * to. To make sure unprotected allocations cannot push new protected
> +	 * allocations out of places they are "entitled" to use, we should
> +	 * evict buffers not covered by any cgroup protection, if this
> +	 * allocation is covered by cgroup protection.
> +	 *
> +	 * Buffers covered by 'min' protection are a special case - the 'min'
> +	 * limit is a stronger guarantee than 'low', and thus buffers protected
> +	 * by 'low' but not 'min' should also be considered for eviction.
> +	 * Buffers protected by 'min' will never be considered for eviction
> +	 * anyway, so the regular eviction path should be triggered here.
> +	 * Buffers protected by 'low' but not 'min' will take a special
> +	 * eviction path that only evicts buffers covered by neither 'low' or
> +	 * 'min' protections.
> +	 */
> +	may_evict |= dmem_cgroup_below_min(NULL, alloc_state->charge_pool);
> +	below_low = dmem_cgroup_below_low(NULL, alloc_state->charge_pool);
> +	alloc_state->only_evict_unprotected = !may_evict && below_low;
> +
> +	ret = ttm_resource_alloc(bo, place, res, alloc_state->charge_pool);
>   
>   	if (ret) {
> -		if ((ret == -ENOSPC || ret == -EAGAIN) && may_evict)
> +		if ((ret == -ENOSPC || ret == -EAGAIN) &&
> +				(may_evict || below_low))
>   			ret = -EBUSY;
>   		return ret;
>   	}
>   
> +	/*
> +	 * Ownership of charge_pool has been transferred to the TTM resource,
> +	 * don't make the caller think we still hold a reference to it.
> +	 */
> +	alloc_state->charge_pool = NULL;
>   	return 0;
>   }
>   
> @@ -787,6 +844,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
>   				res, &alloc_state);
>   
>   		if (ret == -ENOSPC) {
> +			dmem_cgroup_pool_state_put(alloc_state.charge_pool);
>   			dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>   			continue;
>   		} else if (ret == -EBUSY) {
> @@ -796,11 +854,14 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
>   			dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>   
>   			if (ret) {
> +				dmem_cgroup_pool_state_put(
> +						alloc_state.charge_pool);

Funky line break.

>   				if (ret != -ENOSPC && ret != -EBUSY)
>   					return ret;
>   				continue;
>   			}
>   		} else if (ret) {
> +			dmem_cgroup_pool_state_put(alloc_state.charge_pool);
>   			dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>   			return ret;
>   		}
> diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
> index e2c82ad07eb44..fcfa8b51b0337 100644
> --- a/drivers/gpu/drm/ttm/ttm_resource.c
> +++ b/drivers/gpu/drm/ttm/ttm_resource.c
> @@ -372,30 +372,52 @@ void ttm_resource_fini(struct ttm_resource_manager *man,
>   }
>   EXPORT_SYMBOL(ttm_resource_fini);
>   
> +/**
> + * ttm_resource_try_charge - charge a resource manager's cgroup pool
> + * @bo: buffer for which an allocation should be charged
> + * @place: where the allocation is attempted to be placed
> + * @ret_pool: on charge success, the pool that was charged
> + * @ret_limit_pool: on charge failure, the pool responsible for the failure
> + *
> + * Should be used to charge cgroups before attempting resource allocation.
> + * When charging succeeds, the value of ret_pool should be passed to
> + * ttm_resource_alloc.
> + *
> + * Returns: 0 on charge success, negative errno on failure.
> + */
> +int ttm_resource_try_charge(struct ttm_buffer_object *bo,
> +			    const struct ttm_place *place,
> +			    struct dmem_cgroup_pool_state **ret_pool,
> +			    struct dmem_cgroup_pool_state **ret_limit_pool)
> +{
> +	struct ttm_resource_manager *man =
> +		ttm_manager_type(bo->bdev, place->mem_type);
> +
> +	if (!man->cg) {
> +		*ret_pool = NULL;
> +		if (ret_limit_pool)
> +			*ret_limit_pool = NULL;
> +		return 0;
> +	}
> +
> +	return dmem_cgroup_try_charge(man->cg, bo->base.size, ret_pool,
> +				      ret_limit_pool);
> +}
> +
>   int ttm_resource_alloc(struct ttm_buffer_object *bo,
>   		       const struct ttm_place *place,
>   		       struct ttm_resource **res_ptr,
> -		       struct dmem_cgroup_pool_state **ret_limit_pool)
> +		       struct dmem_cgroup_pool_state *charge_pool)
>   {
>   	struct ttm_resource_manager *man =
>   		ttm_manager_type(bo->bdev, place->mem_type);
> -	struct dmem_cgroup_pool_state *pool = NULL;
>   	int ret;
>   
> -	if (man->cg) {
> -		ret = dmem_cgroup_try_charge(man->cg, bo->base.size, &pool, ret_limit_pool);
> -		if (ret)
> -			return ret;
> -	}
> -
>   	ret = man->func->alloc(man, bo, place, res_ptr);
> -	if (ret) {
> -		if (pool)
> -			dmem_cgroup_uncharge(pool, bo->base.size);
> +	if (ret)
>   		return ret;
> -	}
>   
> -	(*res_ptr)->css = pool;
> +	(*res_ptr)->css = charge_pool;

Is it possible to somehow split this patch into two? I mean first a 
patch which changes the prototype of ttm_resource_alloc(), adjusting the 
callers, set out new rules for owning the charge pool, etc, then the 
patch which only adds the cgroup smarts to ttm_bo_alloc_at_place(). If 
that could be made without creating any functional difference to the 
eviction alone I think it could make it easier to review.

Regards,

Tvrtko

>   
>   	spin_lock(&bo->bdev->lru_lock);
>   	ttm_resource_add_bulk_move(*res_ptr, bo);
> diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h
> index e52bba15012f7..3aef7efdd7cfb 100644
> --- a/include/drm/ttm/ttm_resource.h
> +++ b/include/drm/ttm/ttm_resource.h
> @@ -442,10 +442,14 @@ void ttm_resource_init(struct ttm_buffer_object *bo,
>   void ttm_resource_fini(struct ttm_resource_manager *man,
>   		       struct ttm_resource *res);
>   
> +int ttm_resource_try_charge(struct ttm_buffer_object *bo,
> +			    const struct ttm_place *place,
> +			    struct dmem_cgroup_pool_state **ret_pool,
> +			    struct dmem_cgroup_pool_state **ret_limit_pool);
>   int ttm_resource_alloc(struct ttm_buffer_object *bo,
>   		       const struct ttm_place *place,
>   		       struct ttm_resource **res,
> -		       struct dmem_cgroup_pool_state **ret_limit_pool);
> +		       struct dmem_cgroup_pool_state *charge_pool);
>   void ttm_resource_free(struct ttm_buffer_object *bo, struct ttm_resource **res);
>   bool ttm_resource_intersects(struct ttm_device *bdev,
>   			     struct ttm_resource *res,
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/5] drm/ttm: Be more aggressive when allocating below protection limit
  2026-02-24 16:40   ` Tvrtko Ursulin
@ 2026-02-25  9:49     ` Natalie Vock
  2026-02-25 10:12       ` Tvrtko Ursulin
  0 siblings, 1 reply; 15+ messages in thread
From: Natalie Vock @ 2026-02-25  9:49 UTC (permalink / raw)
  To: Tvrtko Ursulin, Maarten Lankhorst, Maxime Ripard, Tejun Heo,
	Johannes Weiner, Michal Koutný, Christian Koenig, Huang Rui,
	Matthew Auld, Matthew Brost, Maarten Lankhorst, Thomas Zimmermann,
	David Airlie, Simona Vetter
  Cc: cgroups, dri-devel

On 2/24/26 17:40, Tvrtko Ursulin wrote:
> 
> On 10/11/2025 12:37, Natalie Vock wrote:
>> When the cgroup's memory usage is below the low/min limit and allocation
>> fails, try evicting some unprotected buffers to make space. Otherwise,
>> application buffers may be forced to go into GTT even though usage is
>> below the corresponding low/min limit, if other applications filled VRAM
>> with their allocations first.
>>
>> Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
>> ---
>>   drivers/gpu/drm/ttm/ttm_bo.c       | 75 ++++++++++++++++++++++++++++ 
>> ++++++----
>>   drivers/gpu/drm/ttm/ttm_resource.c | 48 +++++++++++++++++-------
>>   include/drm/ttm/ttm_resource.h     |  6 ++-
>>   3 files changed, 108 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> index 829d994798835..bd467c965e1bc 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -490,8 +490,12 @@ int ttm_bo_evict_first(struct ttm_device *bdev, 
>> struct ttm_resource_manager *man
>>   }
>>   struct ttm_bo_alloc_state {
>> +    /** @charge_pool: The memory pool the resource is charged to */
>> +    struct dmem_cgroup_pool_state *charge_pool;
>>       /** @limit_pool: Which pool limit we should test against */
>>       struct dmem_cgroup_pool_state *limit_pool;
>> +    /** @only_evict_unprotected: If eviction should be restricted to 
>> unprotected BOs */
>> +    bool only_evict_unprotected;
>>   };
>>   /**
>> @@ -546,7 +550,7 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk 
>> *walk, struct ttm_buffer_object *
>>       evict_walk->evicted++;
>>       if (evict_walk->res)
>>           lret = ttm_resource_alloc(evict_walk->evictor, evict_walk- 
>> >place,
>> -                      evict_walk->res, NULL);
>> +                      evict_walk->res, evict_walk->alloc_state- 
>> >charge_pool);
>>       if (lret == 0)
>>           return 1;
>>   out:
>> @@ -589,7 +593,7 @@ static int ttm_bo_evict_alloc(struct ttm_device 
>> *bdev,
>>       lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
>>       /* One more attempt if we hit low limit? */
>> -    if (!lret && evict_walk.hit_low) {
>> +    if (!lret && evict_walk.hit_low && !state->only_evict_unprotected) {
> 
> What is unprotected synonymous with? No low watermark set? Should 
> dmem_cgroup_state_evict_valuable() even set *hit_low = true for if low 
> is not set to begin with?

In terms of cgroup usage, a cgroup (and by extension, its BOs) is 
protected as long as its usage stays under the low watermark (if not 
set, that watermark is zero and any BO is trivially unprotected). If the 
usage exceeds the low watermark, the cgroup/its BOs become unprotected 
and can be evicted (more easily), until the usage goes below the 
watermark again.

With only_evict_unprotected, what we're trying to do is evict buffers 
from any cgroup that currently exceeds its low (or min) watermark, but 
leave alone cgroups that are within their limits. I've elaborated on the 
rationale more in the cover letter, but essentially, this is supposed to 
make TTM honor the low/min protection better for cgroups that are 
allocating and currently within their low/min watermark, by allowing 
them to push out BOs from cgroups that exceed their respective watermarks.

I'll add some comments to the only_evict_unprotected docs to explain 
what "unprotected" means here.

> 
>>           evict_walk.try_low = true;
>>           lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
>>       }
>> @@ -610,7 +614,8 @@ static int ttm_bo_evict_alloc(struct ttm_device 
>> *bdev,
>>       } while (!lret && evict_walk.evicted);
>>       /* We hit the low limit? Try once more */
>> -    if (!lret && evict_walk.hit_low && !evict_walk.try_low) {
>> +    if (!lret && evict_walk.hit_low && !evict_walk.try_low &&
>> +            !state->only_evict_unprotected) {
>>           evict_walk.try_low = true;
>>           goto retry;
>>       }
>> @@ -719,20 +724,72 @@ static int ttm_bo_alloc_at_place(struct 
>> ttm_buffer_object *bo,
>>                    struct ttm_resource **res,
>>                    struct ttm_bo_alloc_state *alloc_state)
>>   {
>> -    bool may_evict;
>> +    bool may_evict, below_low = false;
>>       int ret;
>>       may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
>> +    ret = ttm_resource_try_charge(bo, place, &alloc_state->charge_pool,
>> +                      force_space ? &alloc_state->limit_pool : NULL);
>> +    if (ret) {
>> +        /*
>> +         * -EAGAIN means the charge failed, which we treat like an
>> +         * allocation failure. Therefore, return an error code 
>> indicating
>> +         * the allocation failed - either -EBUSY if the allocation 
>> should
>> +         * be retried with eviction, or -ENOSPC if there should be no 
>> second
>> +         * attempt.
>> +         */
>> +        if (ret == -EAGAIN)
>> +            ret = may_evict ? -EBUSY : -ENOSPC;
>> +        return ret;
>> +    }
>> -    ret = ttm_resource_alloc(bo, place, res,
>> -                 force_space ? &alloc_state->limit_pool : NULL);
>> +    /*
>> +     * cgroup protection plays a special role in eviction.
>> +     * Conceptually, protection of memory via the dmem cgroup controller
>> +     * entitles the protected cgroup to use a certain amount of memory.
>> +     * There are two types of protection - the 'low' limit is a
>> +     * "best-effort" protection, whereas the 'min' limit provides a hard
>> +     * guarantee that memory within the cgroup's allowance will not be
>> +     * evicted under any circumstance.
>> +     *
>> +     * To faithfully model this concept in TTM, we also need to take 
>> cgroup
>> +     * protection into account when allocating. When allocation in one
>> +     * place fails, TTM will default to trying other places first before
>> +     * evicting.
>> +     * If the allocation is covered by dmem cgroup protection, however,
>> +     * this prevents the allocation from using the memory it is 
>> "entitled"
>> +     * to. To make sure unprotected allocations cannot push new 
>> protected
>> +     * allocations out of places they are "entitled" to use, we should
>> +     * evict buffers not covered by any cgroup protection, if this
>> +     * allocation is covered by cgroup protection.
>> +     *
>> +     * Buffers covered by 'min' protection are a special case - the 
>> 'min'
>> +     * limit is a stronger guarantee than 'low', and thus buffers 
>> protected
>> +     * by 'low' but not 'min' should also be considered for eviction.
>> +     * Buffers protected by 'min' will never be considered for eviction
>> +     * anyway, so the regular eviction path should be triggered here.
>> +     * Buffers protected by 'low' but not 'min' will take a special
>> +     * eviction path that only evicts buffers covered by neither 
>> 'low' or
>> +     * 'min' protections.
>> +     */
>> +    may_evict |= dmem_cgroup_below_min(NULL, alloc_state->charge_pool);
>> +    below_low = dmem_cgroup_below_low(NULL, alloc_state->charge_pool);
>> +    alloc_state->only_evict_unprotected = !may_evict && below_low;
>> +
>> +    ret = ttm_resource_alloc(bo, place, res, alloc_state->charge_pool);
>>       if (ret) {
>> -        if ((ret == -ENOSPC || ret == -EAGAIN) && may_evict)
>> +        if ((ret == -ENOSPC || ret == -EAGAIN) &&
>> +                (may_evict || below_low))
>>               ret = -EBUSY;
>>           return ret;
>>       }
>> +    /*
>> +     * Ownership of charge_pool has been transferred to the TTM 
>> resource,
>> +     * don't make the caller think we still hold a reference to it.
>> +     */
>> +    alloc_state->charge_pool = NULL;
>>       return 0;
>>   }
>> @@ -787,6 +844,7 @@ static int ttm_bo_alloc_resource(struct 
>> ttm_buffer_object *bo,
>>                   res, &alloc_state);
>>           if (ret == -ENOSPC) {
>> +            dmem_cgroup_pool_state_put(alloc_state.charge_pool);
>>               dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>>               continue;
>>           } else if (ret == -EBUSY) {
>> @@ -796,11 +854,14 @@ static int ttm_bo_alloc_resource(struct 
>> ttm_buffer_object *bo,
>>               dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>>               if (ret) {
>> +                dmem_cgroup_pool_state_put(
>> +                        alloc_state.charge_pool);
> 
> Funky line break.

Will fix.

> 
>>                   if (ret != -ENOSPC && ret != -EBUSY)
>>                       return ret;
>>                   continue;
>>               }
>>           } else if (ret) {
>> +            dmem_cgroup_pool_state_put(alloc_state.charge_pool);
>>               dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>>               return ret;
>>           }
>> diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ 
>> ttm_resource.c
>> index e2c82ad07eb44..fcfa8b51b0337 100644
>> --- a/drivers/gpu/drm/ttm/ttm_resource.c
>> +++ b/drivers/gpu/drm/ttm/ttm_resource.c
>> @@ -372,30 +372,52 @@ void ttm_resource_fini(struct 
>> ttm_resource_manager *man,
>>   }
>>   EXPORT_SYMBOL(ttm_resource_fini);
>> +/**
>> + * ttm_resource_try_charge - charge a resource manager's cgroup pool
>> + * @bo: buffer for which an allocation should be charged
>> + * @place: where the allocation is attempted to be placed
>> + * @ret_pool: on charge success, the pool that was charged
>> + * @ret_limit_pool: on charge failure, the pool responsible for the 
>> failure
>> + *
>> + * Should be used to charge cgroups before attempting resource 
>> allocation.
>> + * When charging succeeds, the value of ret_pool should be passed to
>> + * ttm_resource_alloc.
>> + *
>> + * Returns: 0 on charge success, negative errno on failure.
>> + */
>> +int ttm_resource_try_charge(struct ttm_buffer_object *bo,
>> +                const struct ttm_place *place,
>> +                struct dmem_cgroup_pool_state **ret_pool,
>> +                struct dmem_cgroup_pool_state **ret_limit_pool)
>> +{
>> +    struct ttm_resource_manager *man =
>> +        ttm_manager_type(bo->bdev, place->mem_type);
>> +
>> +    if (!man->cg) {
>> +        *ret_pool = NULL;
>> +        if (ret_limit_pool)
>> +            *ret_limit_pool = NULL;
>> +        return 0;
>> +    }
>> +
>> +    return dmem_cgroup_try_charge(man->cg, bo->base.size, ret_pool,
>> +                      ret_limit_pool);
>> +}
>> +
>>   int ttm_resource_alloc(struct ttm_buffer_object *bo,
>>                  const struct ttm_place *place,
>>                  struct ttm_resource **res_ptr,
>> -               struct dmem_cgroup_pool_state **ret_limit_pool)
>> +               struct dmem_cgroup_pool_state *charge_pool)
>>   {
>>       struct ttm_resource_manager *man =
>>           ttm_manager_type(bo->bdev, place->mem_type);
>> -    struct dmem_cgroup_pool_state *pool = NULL;
>>       int ret;
>> -    if (man->cg) {
>> -        ret = dmem_cgroup_try_charge(man->cg, bo->base.size, &pool, 
>> ret_limit_pool);
>> -        if (ret)
>> -            return ret;
>> -    }
>> -
>>       ret = man->func->alloc(man, bo, place, res_ptr);
>> -    if (ret) {
>> -        if (pool)
>> -            dmem_cgroup_uncharge(pool, bo->base.size);
>> +    if (ret)
>>           return ret;
>> -    }
>> -    (*res_ptr)->css = pool;
>> +    (*res_ptr)->css = charge_pool;
> 
> Is it possible to somehow split this patch into two? I mean first a 
> patch which changes the prototype of ttm_resource_alloc(), adjusting the 
> callers, set out new rules for owning the charge pool, etc, then the 
> patch which only adds the cgroup smarts to ttm_bo_alloc_at_place(). If 
> that could be made without creating any functional difference to the 
> eviction alone I think it could make it easier to review.

Will try.

Thanks,
Natalie

> 
> Regards,
> 
> Tvrtko
> 
>>       spin_lock(&bo->bdev->lru_lock);
>>       ttm_resource_add_bulk_move(*res_ptr, bo);
>> diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ 
>> ttm_resource.h
>> index e52bba15012f7..3aef7efdd7cfb 100644
>> --- a/include/drm/ttm/ttm_resource.h
>> +++ b/include/drm/ttm/ttm_resource.h
>> @@ -442,10 +442,14 @@ void ttm_resource_init(struct ttm_buffer_object 
>> *bo,
>>   void ttm_resource_fini(struct ttm_resource_manager *man,
>>                  struct ttm_resource *res);
>> +int ttm_resource_try_charge(struct ttm_buffer_object *bo,
>> +                const struct ttm_place *place,
>> +                struct dmem_cgroup_pool_state **ret_pool,
>> +                struct dmem_cgroup_pool_state **ret_limit_pool);
>>   int ttm_resource_alloc(struct ttm_buffer_object *bo,
>>                  const struct ttm_place *place,
>>                  struct ttm_resource **res,
>> -               struct dmem_cgroup_pool_state **ret_limit_pool);
>> +               struct dmem_cgroup_pool_state *charge_pool);
>>   void ttm_resource_free(struct ttm_buffer_object *bo, struct 
>> ttm_resource **res);
>>   bool ttm_resource_intersects(struct ttm_device *bdev,
>>                    struct ttm_resource *res,
>>
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/5] drm/ttm: Be more aggressive when allocating below protection limit
  2026-02-25  9:49     ` Natalie Vock
@ 2026-02-25 10:12       ` Tvrtko Ursulin
  2026-02-25 10:19         ` Natalie Vock
  0 siblings, 1 reply; 15+ messages in thread
From: Tvrtko Ursulin @ 2026-02-25 10:12 UTC (permalink / raw)
  To: Natalie Vock, Maarten Lankhorst, Maxime Ripard, Tejun Heo,
	Johannes Weiner, Michal Koutný, Christian Koenig, Huang Rui,
	Matthew Auld, Matthew Brost, Maarten Lankhorst, Thomas Zimmermann,
	David Airlie, Simona Vetter
  Cc: cgroups, dri-devel


On 25/02/2026 09:49, Natalie Vock wrote:
> On 2/24/26 17:40, Tvrtko Ursulin wrote:
>>
>> On 10/11/2025 12:37, Natalie Vock wrote:
>>> When the cgroup's memory usage is below the low/min limit and allocation
>>> fails, try evicting some unprotected buffers to make space. Otherwise,
>>> application buffers may be forced to go into GTT even though usage is
>>> below the corresponding low/min limit, if other applications filled VRAM
>>> with their allocations first.
>>>
>>> Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
>>> ---
>>>   drivers/gpu/drm/ttm/ttm_bo.c       | 75 +++++++++++++++++++++++++++ 
>>> + ++++++----
>>>   drivers/gpu/drm/ttm/ttm_resource.c | 48 +++++++++++++++++-------
>>>   include/drm/ttm/ttm_resource.h     |  6 ++-
>>>   3 files changed, 108 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index 829d994798835..bd467c965e1bc 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -490,8 +490,12 @@ int ttm_bo_evict_first(struct ttm_device *bdev, 
>>> struct ttm_resource_manager *man
>>>   }
>>>   struct ttm_bo_alloc_state {
>>> +    /** @charge_pool: The memory pool the resource is charged to */
>>> +    struct dmem_cgroup_pool_state *charge_pool;
>>>       /** @limit_pool: Which pool limit we should test against */
>>>       struct dmem_cgroup_pool_state *limit_pool;
>>> +    /** @only_evict_unprotected: If eviction should be restricted to 
>>> unprotected BOs */
>>> +    bool only_evict_unprotected;
>>>   };
>>>   /**
>>> @@ -546,7 +550,7 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk 
>>> *walk, struct ttm_buffer_object *
>>>       evict_walk->evicted++;
>>>       if (evict_walk->res)
>>>           lret = ttm_resource_alloc(evict_walk->evictor, evict_walk- 
>>> >place,
>>> -                      evict_walk->res, NULL);
>>> +                      evict_walk->res, evict_walk->alloc_state- 
>>> >charge_pool);
>>>       if (lret == 0)
>>>           return 1;
>>>   out:
>>> @@ -589,7 +593,7 @@ static int ttm_bo_evict_alloc(struct ttm_device 
>>> *bdev,
>>>       lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
>>>       /* One more attempt if we hit low limit? */
>>> -    if (!lret && evict_walk.hit_low) {
>>> +    if (!lret && evict_walk.hit_low && !state- 
>>> >only_evict_unprotected) {
>>
>> What is unprotected synonymous with? No low watermark set? Should 
>> dmem_cgroup_state_evict_valuable() even set *hit_low = true for if low 
>> is not set to begin with?
> 
> In terms of cgroup usage, a cgroup (and by extension, its BOs) is 
> protected as long as its usage stays under the low watermark (if not 
> set, that watermark is zero and any BO is trivially unprotected). If the 
> usage exceeds the low watermark, the cgroup/its BOs become unprotected 
> and can be evicted (more easily), until the usage goes below the 
> watermark again.

Got it thanks, so either no low set, or usage above low. Makes sense.

> With only_evict_unprotected, what we're trying to do is evict buffers 
> from any cgroup that currently exceeds its low (or min) watermark, but 
> leave alone cgroups that are within their limits. I've elaborated on the 
> rationale more in the cover letter, but essentially, this is supposed to 
> make TTM honor the low/min protection better for cgroups that are 
> allocating and currently within their low/min watermark, by allowing 
> them to push out BOs from cgroups that exceed their respective watermarks.

Yep, I got that part. Just that I will need a second pass to fully grasp 
the extended logic. Problem being more booleans and passes make things 
more complex. That is why I made this side question on whether it even 
makes sense for dmem_cgroup_state_evict_valuable() to set hit_low if the 
low is not even set. Assuming I got it right it can happen:

	if (!ignore_low) {
		low = READ_ONCE(ctest->elow);
		if (used > low)
			return true;

		*ret_hit_low = true;
		return false;
	}

So I was wondering what would be the effect of making that like this:

	if (!ignore_low) {
		low = READ_ONCE(ctest->elow);
		if (used > low)
			return true;

+		if (low)
+			*ret_hit_low = true;

		return false;
	}


Could that somehow simplify the logic, maybe allow for not having to add 
the additional condition above. Possibly not, it seems more complex than 
that. But I am just thinking out loud at this point. Again, I need to 
make a second reading pass.

> I'll add some comments to the only_evict_unprotected docs to explain 
> what "unprotected" means here.
> 
>>
>>>           evict_walk.try_low = true;
>>>           lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
>>>       }
>>> @@ -610,7 +614,8 @@ static int ttm_bo_evict_alloc(struct ttm_device 
>>> *bdev,
>>>       } while (!lret && evict_walk.evicted);
>>>       /* We hit the low limit? Try once more */
>>> -    if (!lret && evict_walk.hit_low && !evict_walk.try_low) {
>>> +    if (!lret && evict_walk.hit_low && !evict_walk.try_low &&
>>> +            !state->only_evict_unprotected) {
>>>           evict_walk.try_low = true;
>>>           goto retry;
>>>       }
>>> @@ -719,20 +724,72 @@ static int ttm_bo_alloc_at_place(struct 
>>> ttm_buffer_object *bo,
>>>                    struct ttm_resource **res,
>>>                    struct ttm_bo_alloc_state *alloc_state)
>>>   {
>>> -    bool may_evict;
>>> +    bool may_evict, below_low = false;
>>>       int ret;
>>>       may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
>>> +    ret = ttm_resource_try_charge(bo, place, &alloc_state->charge_pool,
>>> +                      force_space ? &alloc_state->limit_pool : NULL);
>>> +    if (ret) {
>>> +        /*
>>> +         * -EAGAIN means the charge failed, which we treat like an
>>> +         * allocation failure. Therefore, return an error code 
>>> indicating
>>> +         * the allocation failed - either -EBUSY if the allocation 
>>> should
>>> +         * be retried with eviction, or -ENOSPC if there should be 
>>> no second
>>> +         * attempt.
>>> +         */
>>> +        if (ret == -EAGAIN)
>>> +            ret = may_evict ? -EBUSY : -ENOSPC;
>>> +        return ret;
>>> +    }
>>> -    ret = ttm_resource_alloc(bo, place, res,
>>> -                 force_space ? &alloc_state->limit_pool : NULL);
>>> +    /*
>>> +     * cgroup protection plays a special role in eviction.
>>> +     * Conceptually, protection of memory via the dmem cgroup 
>>> controller
>>> +     * entitles the protected cgroup to use a certain amount of memory.
>>> +     * There are two types of protection - the 'low' limit is a
>>> +     * "best-effort" protection, whereas the 'min' limit provides a 
>>> hard
>>> +     * guarantee that memory within the cgroup's allowance will not be
>>> +     * evicted under any circumstance.
>>> +     *
>>> +     * To faithfully model this concept in TTM, we also need to take 
>>> cgroup
>>> +     * protection into account when allocating. When allocation in one
>>> +     * place fails, TTM will default to trying other places first 
>>> before
>>> +     * evicting.
>>> +     * If the allocation is covered by dmem cgroup protection, however,
>>> +     * this prevents the allocation from using the memory it is 
>>> "entitled"
>>> +     * to. To make sure unprotected allocations cannot push new 
>>> protected
>>> +     * allocations out of places they are "entitled" to use, we should
>>> +     * evict buffers not covered by any cgroup protection, if this
>>> +     * allocation is covered by cgroup protection.
>>> +     *
>>> +     * Buffers covered by 'min' protection are a special case - the 
>>> 'min'
>>> +     * limit is a stronger guarantee than 'low', and thus buffers 
>>> protected
>>> +     * by 'low' but not 'min' should also be considered for eviction.
>>> +     * Buffers protected by 'min' will never be considered for eviction
>>> +     * anyway, so the regular eviction path should be triggered here.
>>> +     * Buffers protected by 'low' but not 'min' will take a special
>>> +     * eviction path that only evicts buffers covered by neither 
>>> 'low' or
>>> +     * 'min' protections.
>>> +     */
>>> +    may_evict |= dmem_cgroup_below_min(NULL, alloc_state->charge_pool);
>>> +    below_low = dmem_cgroup_below_low(NULL, alloc_state->charge_pool);
>>> +    alloc_state->only_evict_unprotected = !may_evict && below_low;
>>> +
>>> +    ret = ttm_resource_alloc(bo, place, res, alloc_state->charge_pool);
>>>       if (ret) {
>>> -        if ((ret == -ENOSPC || ret == -EAGAIN) && may_evict)
>>> +        if ((ret == -ENOSPC || ret == -EAGAIN) &&
>>> +                (may_evict || below_low))
>>>               ret = -EBUSY;
>>>           return ret;
>>>       }
>>> +    /*
>>> +     * Ownership of charge_pool has been transferred to the TTM 
>>> resource,
>>> +     * don't make the caller think we still hold a reference to it.
>>> +     */
>>> +    alloc_state->charge_pool = NULL;
>>>       return 0;
>>>   }
>>> @@ -787,6 +844,7 @@ static int ttm_bo_alloc_resource(struct 
>>> ttm_buffer_object *bo,
>>>                   res, &alloc_state);
>>>           if (ret == -ENOSPC) {
>>> +            dmem_cgroup_pool_state_put(alloc_state.charge_pool);
>>>               dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>>>               continue;
>>>           } else if (ret == -EBUSY) {
>>> @@ -796,11 +854,14 @@ static int ttm_bo_alloc_resource(struct 
>>> ttm_buffer_object *bo,
>>>               dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>>>               if (ret) {
>>> +                dmem_cgroup_pool_state_put(
>>> +                        alloc_state.charge_pool);
>>
>> Funky line break.
> 
> Will fix.
> 
>>
>>>                   if (ret != -ENOSPC && ret != -EBUSY)
>>>                       return ret;
>>>                   continue;
>>>               }
>>>           } else if (ret) {
>>> +            dmem_cgroup_pool_state_put(alloc_state.charge_pool);
>>>               dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>>>               return ret;
>>>           }
>>> diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ 
>>> ttm/ ttm_resource.c
>>> index e2c82ad07eb44..fcfa8b51b0337 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_resource.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_resource.c
>>> @@ -372,30 +372,52 @@ void ttm_resource_fini(struct 
>>> ttm_resource_manager *man,
>>>   }
>>>   EXPORT_SYMBOL(ttm_resource_fini);
>>> +/**
>>> + * ttm_resource_try_charge - charge a resource manager's cgroup pool
>>> + * @bo: buffer for which an allocation should be charged
>>> + * @place: where the allocation is attempted to be placed
>>> + * @ret_pool: on charge success, the pool that was charged
>>> + * @ret_limit_pool: on charge failure, the pool responsible for the 
>>> failure
>>> + *
>>> + * Should be used to charge cgroups before attempting resource 
>>> allocation.
>>> + * When charging succeeds, the value of ret_pool should be passed to
>>> + * ttm_resource_alloc.
>>> + *
>>> + * Returns: 0 on charge success, negative errno on failure.
>>> + */
>>> +int ttm_resource_try_charge(struct ttm_buffer_object *bo,
>>> +                const struct ttm_place *place,
>>> +                struct dmem_cgroup_pool_state **ret_pool,
>>> +                struct dmem_cgroup_pool_state **ret_limit_pool)
>>> +{
>>> +    struct ttm_resource_manager *man =
>>> +        ttm_manager_type(bo->bdev, place->mem_type);
>>> +
>>> +    if (!man->cg) {
>>> +        *ret_pool = NULL;
>>> +        if (ret_limit_pool)
>>> +            *ret_limit_pool = NULL;
>>> +        return 0;
>>> +    }
>>> +
>>> +    return dmem_cgroup_try_charge(man->cg, bo->base.size, ret_pool,
>>> +                      ret_limit_pool);
>>> +}
>>> +
>>>   int ttm_resource_alloc(struct ttm_buffer_object *bo,
>>>                  const struct ttm_place *place,
>>>                  struct ttm_resource **res_ptr,
>>> -               struct dmem_cgroup_pool_state **ret_limit_pool)
>>> +               struct dmem_cgroup_pool_state *charge_pool)
>>>   {
>>>       struct ttm_resource_manager *man =
>>>           ttm_manager_type(bo->bdev, place->mem_type);
>>> -    struct dmem_cgroup_pool_state *pool = NULL;
>>>       int ret;
>>> -    if (man->cg) {
>>> -        ret = dmem_cgroup_try_charge(man->cg, bo->base.size, &pool, 
>>> ret_limit_pool);
>>> -        if (ret)
>>> -            return ret;
>>> -    }
>>> -
>>>       ret = man->func->alloc(man, bo, place, res_ptr);
>>> -    if (ret) {
>>> -        if (pool)
>>> -            dmem_cgroup_uncharge(pool, bo->base.size);
>>> +    if (ret)
>>>           return ret;
>>> -    }
>>> -    (*res_ptr)->css = pool;
>>> +    (*res_ptr)->css = charge_pool;
>>
>> Is it possible to somehow split this patch into two? I mean first a 
>> patch which changes the prototype of ttm_resource_alloc(), adjusting 
>> the callers, set out new rules for owning the charge pool, etc, then 
>> the patch which only adds the cgroup smarts to 
>> ttm_bo_alloc_at_place(). If that could be made without creating any 
>> functional difference to the eviction alone I think it could make it 
>> easier to review.
> 
> Will try.

Only if it sounds plausible that it can be sensibly done. Otherwise 
don't spend too much time if you think it makes no sense. I'll wait for 
the verdict, or for v4 to appear and then have another go of making 
sense of the existing vs new eviction logic.

Regards,

Tvrtko


> Thanks,
> Natalie
> 
>>
>> Regards,
>>
>> Tvrtko
>>
>>>       spin_lock(&bo->bdev->lru_lock);
>>>       ttm_resource_add_bulk_move(*res_ptr, bo);
>>> diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ 
>>> ttm_resource.h
>>> index e52bba15012f7..3aef7efdd7cfb 100644
>>> --- a/include/drm/ttm/ttm_resource.h
>>> +++ b/include/drm/ttm/ttm_resource.h
>>> @@ -442,10 +442,14 @@ void ttm_resource_init(struct ttm_buffer_object 
>>> *bo,
>>>   void ttm_resource_fini(struct ttm_resource_manager *man,
>>>                  struct ttm_resource *res);
>>> +int ttm_resource_try_charge(struct ttm_buffer_object *bo,
>>> +                const struct ttm_place *place,
>>> +                struct dmem_cgroup_pool_state **ret_pool,
>>> +                struct dmem_cgroup_pool_state **ret_limit_pool);
>>>   int ttm_resource_alloc(struct ttm_buffer_object *bo,
>>>                  const struct ttm_place *place,
>>>                  struct ttm_resource **res,
>>> -               struct dmem_cgroup_pool_state **ret_limit_pool);
>>> +               struct dmem_cgroup_pool_state *charge_pool);
>>>   void ttm_resource_free(struct ttm_buffer_object *bo, struct 
>>> ttm_resource **res);
>>>   bool ttm_resource_intersects(struct ttm_device *bdev,
>>>                    struct ttm_resource *res,
>>>
>>
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/5] drm/ttm: Be more aggressive when allocating below protection limit
  2026-02-25 10:12       ` Tvrtko Ursulin
@ 2026-02-25 10:19         ` Natalie Vock
  0 siblings, 0 replies; 15+ messages in thread
From: Natalie Vock @ 2026-02-25 10:19 UTC (permalink / raw)
  To: Tvrtko Ursulin, Maarten Lankhorst, Maxime Ripard, Tejun Heo,
	Johannes Weiner, Michal Koutný, Christian Koenig, Huang Rui,
	Matthew Auld, Matthew Brost, Maarten Lankhorst, Thomas Zimmermann,
	David Airlie, Simona Vetter
  Cc: cgroups, dri-devel

On 2/25/26 11:12, Tvrtko Ursulin wrote:
> 
> On 25/02/2026 09:49, Natalie Vock wrote:
>> On 2/24/26 17:40, Tvrtko Ursulin wrote:
>>>
>>> On 10/11/2025 12:37, Natalie Vock wrote:
>>>> When the cgroup's memory usage is below the low/min limit and 
>>>> allocation
>>>> fails, try evicting some unprotected buffers to make space. Otherwise,
>>>> application buffers may be forced to go into GTT even though usage is
>>>> below the corresponding low/min limit, if other applications filled 
>>>> VRAM
>>>> with their allocations first.
>>>>
>>>> Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
>>>> ---
>>>>   drivers/gpu/drm/ttm/ttm_bo.c       | 75 ++++++++++++++++++++++++++ 
>>>> + + ++++++----
>>>>   drivers/gpu/drm/ttm/ttm_resource.c | 48 +++++++++++++++++-------
>>>>   include/drm/ttm/ttm_resource.h     |  6 ++-
>>>>   3 files changed, 108 insertions(+), 21 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ 
>>>> ttm_bo.c
>>>> index 829d994798835..bd467c965e1bc 100644
>>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>> @@ -490,8 +490,12 @@ int ttm_bo_evict_first(struct ttm_device *bdev, 
>>>> struct ttm_resource_manager *man
>>>>   }
>>>>   struct ttm_bo_alloc_state {
>>>> +    /** @charge_pool: The memory pool the resource is charged to */
>>>> +    struct dmem_cgroup_pool_state *charge_pool;
>>>>       /** @limit_pool: Which pool limit we should test against */
>>>>       struct dmem_cgroup_pool_state *limit_pool;
>>>> +    /** @only_evict_unprotected: If eviction should be restricted 
>>>> to unprotected BOs */
>>>> +    bool only_evict_unprotected;
>>>>   };
>>>>   /**
>>>> @@ -546,7 +550,7 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk 
>>>> *walk, struct ttm_buffer_object *
>>>>       evict_walk->evicted++;
>>>>       if (evict_walk->res)
>>>>           lret = ttm_resource_alloc(evict_walk->evictor, evict_walk- 
>>>> >place,
>>>> -                      evict_walk->res, NULL);
>>>> +                      evict_walk->res, evict_walk->alloc_state- 
>>>> >charge_pool);
>>>>       if (lret == 0)
>>>>           return 1;
>>>>   out:
>>>> @@ -589,7 +593,7 @@ static int ttm_bo_evict_alloc(struct ttm_device 
>>>> *bdev,
>>>>       lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
>>>>       /* One more attempt if we hit low limit? */
>>>> -    if (!lret && evict_walk.hit_low) {
>>>> +    if (!lret && evict_walk.hit_low && !state- 
>>>> >only_evict_unprotected) {
>>>
>>> What is unprotected synonymous with? No low watermark set? Should 
>>> dmem_cgroup_state_evict_valuable() even set *hit_low = true for if 
>>> low is not set to begin with?
>>
>> In terms of cgroup usage, a cgroup (and by extension, its BOs) is 
>> protected as long as its usage stays under the low watermark (if not 
>> set, that watermark is zero and any BO is trivially unprotected). If 
>> the usage exceeds the low watermark, the cgroup/its BOs become 
>> unprotected and can be evicted (more easily), until the usage goes 
>> below the watermark again.
> 
> Got it thanks, so either no low set, or usage above low. Makes sense.
> 
>> With only_evict_unprotected, what we're trying to do is evict buffers 
>> from any cgroup that currently exceeds its low (or min) watermark, but 
>> leave alone cgroups that are within their limits. I've elaborated on 
>> the rationale more in the cover letter, but essentially, this is 
>> supposed to make TTM honor the low/min protection better for cgroups 
>> that are allocating and currently within their low/min watermark, by 
>> allowing them to push out BOs from cgroups that exceed their 
>> respective watermarks.
> 
> Yep, I got that part. Just that I will need a second pass to fully grasp 
> the extended logic. Problem being more booleans and passes make things 
> more complex. That is why I made this side question on whether it even 
> makes sense for dmem_cgroup_state_evict_valuable() to set hit_low if the 
> low is not even set. Assuming I got it right it can happen:
> 
>      if (!ignore_low) {
>          low = READ_ONCE(ctest->elow);
>          if (used > low)
>              return true;
> 
>          *ret_hit_low = true;
>          return false;
>      }
> 
> So I was wondering what would be the effect of making that like this:
> 
>      if (!ignore_low) {
>          low = READ_ONCE(ctest->elow);
>          if (used > low)
>              return true;
> 
> +        if (low)
> +            *ret_hit_low = true;
> 
>          return false;
>      }
> 
> 
> Could that somehow simplify the logic, maybe allow for not having to add 
> the additional condition above. Possibly not, it seems more complex than 
> that. But I am just thinking out loud at this point. Again, I need to 
> make a second reading pass.

FWIW, hit_low can never become true if low is not set. If we even call 
dmem_cgroup_eviction_valuable in the first place, that means there has 
to be a buffer object associated with the cgroup. Unless something went 
super duper wrong, that also means the memory usage of that buffer is 
charged to that cgroup, and therefore usage has to be > 0.

If low is unset, i.e. 0, we therefore always exit out of the
"if (used > low)" statement, and never set *ret_hit_low to true.

*ret_hit_low can only be set to true if there is a buffer that has a 
(set) low limit associated with its cgroup, and that cgroup's usage does 
not exceed the low limit.

Thanks,
Natalie

> 
>> I'll add some comments to the only_evict_unprotected docs to explain 
>> what "unprotected" means here.
>>
>>>
>>>>           evict_walk.try_low = true;
>>>>           lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 
>>>> 1);
>>>>       }
>>>> @@ -610,7 +614,8 @@ static int ttm_bo_evict_alloc(struct ttm_device 
>>>> *bdev,
>>>>       } while (!lret && evict_walk.evicted);
>>>>       /* We hit the low limit? Try once more */
>>>> -    if (!lret && evict_walk.hit_low && !evict_walk.try_low) {
>>>> +    if (!lret && evict_walk.hit_low && !evict_walk.try_low &&
>>>> +            !state->only_evict_unprotected) {
>>>>           evict_walk.try_low = true;
>>>>           goto retry;
>>>>       }
>>>> @@ -719,20 +724,72 @@ static int ttm_bo_alloc_at_place(struct 
>>>> ttm_buffer_object *bo,
>>>>                    struct ttm_resource **res,
>>>>                    struct ttm_bo_alloc_state *alloc_state)
>>>>   {
>>>> -    bool may_evict;
>>>> +    bool may_evict, below_low = false;
>>>>       int ret;
>>>>       may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
>>>> +    ret = ttm_resource_try_charge(bo, place, &alloc_state- 
>>>> >charge_pool,
>>>> +                      force_space ? &alloc_state->limit_pool : NULL);
>>>> +    if (ret) {
>>>> +        /*
>>>> +         * -EAGAIN means the charge failed, which we treat like an
>>>> +         * allocation failure. Therefore, return an error code 
>>>> indicating
>>>> +         * the allocation failed - either -EBUSY if the allocation 
>>>> should
>>>> +         * be retried with eviction, or -ENOSPC if there should be 
>>>> no second
>>>> +         * attempt.
>>>> +         */
>>>> +        if (ret == -EAGAIN)
>>>> +            ret = may_evict ? -EBUSY : -ENOSPC;
>>>> +        return ret;
>>>> +    }
>>>> -    ret = ttm_resource_alloc(bo, place, res,
>>>> -                 force_space ? &alloc_state->limit_pool : NULL);
>>>> +    /*
>>>> +     * cgroup protection plays a special role in eviction.
>>>> +     * Conceptually, protection of memory via the dmem cgroup 
>>>> controller
>>>> +     * entitles the protected cgroup to use a certain amount of 
>>>> memory.
>>>> +     * There are two types of protection - the 'low' limit is a
>>>> +     * "best-effort" protection, whereas the 'min' limit provides a 
>>>> hard
>>>> +     * guarantee that memory within the cgroup's allowance will not be
>>>> +     * evicted under any circumstance.
>>>> +     *
>>>> +     * To faithfully model this concept in TTM, we also need to 
>>>> take cgroup
>>>> +     * protection into account when allocating. When allocation in one
>>>> +     * place fails, TTM will default to trying other places first 
>>>> before
>>>> +     * evicting.
>>>> +     * If the allocation is covered by dmem cgroup protection, 
>>>> however,
>>>> +     * this prevents the allocation from using the memory it is 
>>>> "entitled"
>>>> +     * to. To make sure unprotected allocations cannot push new 
>>>> protected
>>>> +     * allocations out of places they are "entitled" to use, we should
>>>> +     * evict buffers not covered by any cgroup protection, if this
>>>> +     * allocation is covered by cgroup protection.
>>>> +     *
>>>> +     * Buffers covered by 'min' protection are a special case - the 
>>>> 'min'
>>>> +     * limit is a stronger guarantee than 'low', and thus buffers 
>>>> protected
>>>> +     * by 'low' but not 'min' should also be considered for eviction.
>>>> +     * Buffers protected by 'min' will never be considered for 
>>>> eviction
>>>> +     * anyway, so the regular eviction path should be triggered here.
>>>> +     * Buffers protected by 'low' but not 'min' will take a special
>>>> +     * eviction path that only evicts buffers covered by neither 
>>>> 'low' or
>>>> +     * 'min' protections.
>>>> +     */
>>>> +    may_evict |= dmem_cgroup_below_min(NULL, alloc_state- 
>>>> >charge_pool);
>>>> +    below_low = dmem_cgroup_below_low(NULL, alloc_state->charge_pool);
>>>> +    alloc_state->only_evict_unprotected = !may_evict && below_low;
>>>> +
>>>> +    ret = ttm_resource_alloc(bo, place, res, alloc_state- 
>>>> >charge_pool);
>>>>       if (ret) {
>>>> -        if ((ret == -ENOSPC || ret == -EAGAIN) && may_evict)
>>>> +        if ((ret == -ENOSPC || ret == -EAGAIN) &&
>>>> +                (may_evict || below_low))
>>>>               ret = -EBUSY;
>>>>           return ret;
>>>>       }
>>>> +    /*
>>>> +     * Ownership of charge_pool has been transferred to the TTM 
>>>> resource,
>>>> +     * don't make the caller think we still hold a reference to it.
>>>> +     */
>>>> +    alloc_state->charge_pool = NULL;
>>>>       return 0;
>>>>   }
>>>> @@ -787,6 +844,7 @@ static int ttm_bo_alloc_resource(struct 
>>>> ttm_buffer_object *bo,
>>>>                   res, &alloc_state);
>>>>           if (ret == -ENOSPC) {
>>>> +            dmem_cgroup_pool_state_put(alloc_state.charge_pool);
>>>>               dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>>>>               continue;
>>>>           } else if (ret == -EBUSY) {
>>>> @@ -796,11 +854,14 @@ static int ttm_bo_alloc_resource(struct 
>>>> ttm_buffer_object *bo,
>>>>               dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>>>>               if (ret) {
>>>> +                dmem_cgroup_pool_state_put(
>>>> +                        alloc_state.charge_pool);
>>>
>>> Funky line break.
>>
>> Will fix.
>>
>>>
>>>>                   if (ret != -ENOSPC && ret != -EBUSY)
>>>>                       return ret;
>>>>                   continue;
>>>>               }
>>>>           } else if (ret) {
>>>> +            dmem_cgroup_pool_state_put(alloc_state.charge_pool);
>>>>               dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>>>>               return ret;
>>>>           }
>>>> diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ 
>>>> ttm/ ttm_resource.c
>>>> index e2c82ad07eb44..fcfa8b51b0337 100644
>>>> --- a/drivers/gpu/drm/ttm/ttm_resource.c
>>>> +++ b/drivers/gpu/drm/ttm/ttm_resource.c
>>>> @@ -372,30 +372,52 @@ void ttm_resource_fini(struct 
>>>> ttm_resource_manager *man,
>>>>   }
>>>>   EXPORT_SYMBOL(ttm_resource_fini);
>>>> +/**
>>>> + * ttm_resource_try_charge - charge a resource manager's cgroup pool
>>>> + * @bo: buffer for which an allocation should be charged
>>>> + * @place: where the allocation is attempted to be placed
>>>> + * @ret_pool: on charge success, the pool that was charged
>>>> + * @ret_limit_pool: on charge failure, the pool responsible for the 
>>>> failure
>>>> + *
>>>> + * Should be used to charge cgroups before attempting resource 
>>>> allocation.
>>>> + * When charging succeeds, the value of ret_pool should be passed to
>>>> + * ttm_resource_alloc.
>>>> + *
>>>> + * Returns: 0 on charge success, negative errno on failure.
>>>> + */
>>>> +int ttm_resource_try_charge(struct ttm_buffer_object *bo,
>>>> +                const struct ttm_place *place,
>>>> +                struct dmem_cgroup_pool_state **ret_pool,
>>>> +                struct dmem_cgroup_pool_state **ret_limit_pool)
>>>> +{
>>>> +    struct ttm_resource_manager *man =
>>>> +        ttm_manager_type(bo->bdev, place->mem_type);
>>>> +
>>>> +    if (!man->cg) {
>>>> +        *ret_pool = NULL;
>>>> +        if (ret_limit_pool)
>>>> +            *ret_limit_pool = NULL;
>>>> +        return 0;
>>>> +    }
>>>> +
>>>> +    return dmem_cgroup_try_charge(man->cg, bo->base.size, ret_pool,
>>>> +                      ret_limit_pool);
>>>> +}
>>>> +
>>>>   int ttm_resource_alloc(struct ttm_buffer_object *bo,
>>>>                  const struct ttm_place *place,
>>>>                  struct ttm_resource **res_ptr,
>>>> -               struct dmem_cgroup_pool_state **ret_limit_pool)
>>>> +               struct dmem_cgroup_pool_state *charge_pool)
>>>>   {
>>>>       struct ttm_resource_manager *man =
>>>>           ttm_manager_type(bo->bdev, place->mem_type);
>>>> -    struct dmem_cgroup_pool_state *pool = NULL;
>>>>       int ret;
>>>> -    if (man->cg) {
>>>> -        ret = dmem_cgroup_try_charge(man->cg, bo->base.size, &pool, 
>>>> ret_limit_pool);
>>>> -        if (ret)
>>>> -            return ret;
>>>> -    }
>>>> -
>>>>       ret = man->func->alloc(man, bo, place, res_ptr);
>>>> -    if (ret) {
>>>> -        if (pool)
>>>> -            dmem_cgroup_uncharge(pool, bo->base.size);
>>>> +    if (ret)
>>>>           return ret;
>>>> -    }
>>>> -    (*res_ptr)->css = pool;
>>>> +    (*res_ptr)->css = charge_pool;
>>>
>>> Is it possible to somehow split this patch into two? I mean first a 
>>> patch which changes the prototype of ttm_resource_alloc(), adjusting 
>>> the callers, set out new rules for owning the charge pool, etc, then 
>>> the patch which only adds the cgroup smarts to 
>>> ttm_bo_alloc_at_place(). If that could be made without creating any 
>>> functional difference to the eviction alone I think it could make it 
>>> easier to review.
>>
>> Will try.
> 
> Only if it sounds plausible that it can be sensibly done. Otherwise 
> don't spend too much time if you think it makes no sense. I'll wait for 
> the verdict, or for v4 to appear and then have another go of making 
> sense of the existing vs new eviction logic.
> 
> Regards,
> 
> Tvrtko
> 
> 
>> Thanks,
>> Natalie
>>
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>>       spin_lock(&bo->bdev->lru_lock);
>>>>       ttm_resource_add_bulk_move(*res_ptr, bo);
>>>> diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ 
>>>> ttm_resource.h
>>>> index e52bba15012f7..3aef7efdd7cfb 100644
>>>> --- a/include/drm/ttm/ttm_resource.h
>>>> +++ b/include/drm/ttm/ttm_resource.h
>>>> @@ -442,10 +442,14 @@ void ttm_resource_init(struct 
>>>> ttm_buffer_object *bo,
>>>>   void ttm_resource_fini(struct ttm_resource_manager *man,
>>>>                  struct ttm_resource *res);
>>>> +int ttm_resource_try_charge(struct ttm_buffer_object *bo,
>>>> +                const struct ttm_place *place,
>>>> +                struct dmem_cgroup_pool_state **ret_pool,
>>>> +                struct dmem_cgroup_pool_state **ret_limit_pool);
>>>>   int ttm_resource_alloc(struct ttm_buffer_object *bo,
>>>>                  const struct ttm_place *place,
>>>>                  struct ttm_resource **res,
>>>> -               struct dmem_cgroup_pool_state **ret_limit_pool);
>>>> +               struct dmem_cgroup_pool_state *charge_pool);
>>>>   void ttm_resource_free(struct ttm_buffer_object *bo, struct 
>>>> ttm_resource **res);
>>>>   bool ttm_resource_intersects(struct ttm_device *bdev,
>>>>                    struct ttm_resource *res,
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/5] drm/ttm: Be more aggressive when allocating below protection limit
  2025-11-10 12:37 ` [PATCH v3 4/5] drm/ttm: Be more aggressive when allocating below protection limit Natalie Vock
  2026-02-24 16:40   ` Tvrtko Ursulin
@ 2026-02-25 11:45   ` Tvrtko Ursulin
  2026-02-25 13:26     ` Natalie Vock
  1 sibling, 1 reply; 15+ messages in thread
From: Tvrtko Ursulin @ 2026-02-25 11:45 UTC (permalink / raw)
  To: Natalie Vock, Maarten Lankhorst, Maxime Ripard, Tejun Heo,
	Johannes Weiner, Michal Koutný, Christian Koenig, Huang Rui,
	Matthew Auld, Matthew Brost, Maarten Lankhorst, Thomas Zimmermann,
	David Airlie, Simona Vetter
  Cc: cgroups, dri-devel


On 10/11/2025 12:37, Natalie Vock wrote:
> When the cgroup's memory usage is below the low/min limit and allocation
> fails, try evicting some unprotected buffers to make space. Otherwise,
> application buffers may be forced to go into GTT even though usage is
> below the corresponding low/min limit, if other applications filled VRAM
> with their allocations first.
> 
> Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c       | 75 ++++++++++++++++++++++++++++++++++----
>   drivers/gpu/drm/ttm/ttm_resource.c | 48 +++++++++++++++++-------
>   include/drm/ttm/ttm_resource.h     |  6 ++-
>   3 files changed, 108 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 829d994798835..bd467c965e1bc 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -490,8 +490,12 @@ int ttm_bo_evict_first(struct ttm_device *bdev, struct ttm_resource_manager *man
>   }
>   
>   struct ttm_bo_alloc_state {
> +	/** @charge_pool: The memory pool the resource is charged to */
> +	struct dmem_cgroup_pool_state *charge_pool;
>   	/** @limit_pool: Which pool limit we should test against */
>   	struct dmem_cgroup_pool_state *limit_pool;
> +	/** @only_evict_unprotected: If eviction should be restricted to unprotected BOs */
> +	bool only_evict_unprotected;
>   };
>   
>   /**
> @@ -546,7 +550,7 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct ttm_buffer_object *
>   	evict_walk->evicted++;
>   	if (evict_walk->res)
>   		lret = ttm_resource_alloc(evict_walk->evictor, evict_walk->place,
> -					  evict_walk->res, NULL);
> +					  evict_walk->res, evict_walk->alloc_state->charge_pool);
>   	if (lret == 0)
>   		return 1;
>   out:
> @@ -589,7 +593,7 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
>   	lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
>   
>   	/* One more attempt if we hit low limit? */
> -	if (!lret && evict_walk.hit_low) {
> +	if (!lret && evict_walk.hit_low && !state->only_evict_unprotected) {
>   		evict_walk.try_low = true;
>   		lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
>   	}
> @@ -610,7 +614,8 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
>   	} while (!lret && evict_walk.evicted);
>   
>   	/* We hit the low limit? Try once more */
> -	if (!lret && evict_walk.hit_low && !evict_walk.try_low) {
> +	if (!lret && evict_walk.hit_low && !evict_walk.try_low &&
> +			!state->only_evict_unprotected) {
>   		evict_walk.try_low = true;
>   		goto retry;
>   	}
> @@ -719,20 +724,72 @@ static int ttm_bo_alloc_at_place(struct ttm_buffer_object *bo,
>   				 struct ttm_resource **res,
>   				 struct ttm_bo_alloc_state *alloc_state)
>   {
> -	bool may_evict;
> +	bool may_evict, below_low = false;

No need to init below_low.

>   	int ret;
>   
>   	may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
> +	ret = ttm_resource_try_charge(bo, place, &alloc_state->charge_pool,
> +				      force_space ? &alloc_state->limit_pool : NULL);
> +	if (ret) {
> +		/*
> +		 * -EAGAIN means the charge failed, which we treat like an
> +		 * allocation failure. Therefore, return an error code indicating
> +		 * the allocation failed - either -EBUSY if the allocation should
> +		 * be retried with eviction, or -ENOSPC if there should be no second
> +		 * attempt.
> +		 */
> +		if (ret == -EAGAIN)
> +			ret = may_evict ? -EBUSY : -ENOSPC;
> +		return ret;
> +	}
>   
> -	ret = ttm_resource_alloc(bo, place, res,
> -				 force_space ? &alloc_state->limit_pool : NULL);
> +	/*
> +	 * cgroup protection plays a special role in eviction.
> +	 * Conceptually, protection of memory via the dmem cgroup controller
> +	 * entitles the protected cgroup to use a certain amount of memory.
> +	 * There are two types of protection - the 'low' limit is a
> +	 * "best-effort" protection, whereas the 'min' limit provides a hard
> +	 * guarantee that memory within the cgroup's allowance will not be
> +	 * evicted under any circumstance.
> +	 *
> +	 * To faithfully model this concept in TTM, we also need to take cgroup
> +	 * protection into account when allocating. When allocation in one
> +	 * place fails, TTM will default to trying other places first before
> +	 * evicting.
> +	 * If the allocation is covered by dmem cgroup protection, however,
> +	 * this prevents the allocation from using the memory it is "entitled"
> +	 * to. To make sure unprotected allocations cannot push new protected
> +	 * allocations out of places they are "entitled" to use, we should
> +	 * evict buffers not covered by any cgroup protection, if this
> +	 * allocation is covered by cgroup protection.
> +	 *
> +	 * Buffers covered by 'min' protection are a special case - the 'min'
> +	 * limit is a stronger guarantee than 'low', and thus buffers protected
> +	 * by 'low' but not 'min' should also be considered for eviction.
> +	 * Buffers protected by 'min' will never be considered for eviction
> +	 * anyway, so the regular eviction path should be triggered here.
> +	 * Buffers protected by 'low' but not 'min' will take a special
> +	 * eviction path that only evicts buffers covered by neither 'low' or
> +	 * 'min' protections.
> +	 */
> +	may_evict |= dmem_cgroup_below_min(NULL, alloc_state->charge_pool);
> +	below_low = dmem_cgroup_below_low(NULL, alloc_state->charge_pool);

Are these some magic macros? Couldn't grep for them.

> +	alloc_state->only_evict_unprotected = !may_evict && below_low;
 > +> +	ret = ttm_resource_alloc(bo, place, res, alloc_state->charge_pool);
>   
>   	if (ret) {
> -		if ((ret == -ENOSPC || ret == -EAGAIN) && may_evict)
> +		if ((ret == -ENOSPC || ret == -EAGAIN) &&
> +				(may_evict || below_low))
>   			ret = -EBUSY;
>   		return ret;

Where does the uncharge happen on the failure path?

Regards,

Tvrtko

>   	}
>   
> +	/*
> +	 * Ownership of charge_pool has been transferred to the TTM resource,
> +	 * don't make the caller think we still hold a reference to it.
> +	 */
> +	alloc_state->charge_pool = NULL;
>   	return 0;
>   }
>   
> @@ -787,6 +844,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
>   				res, &alloc_state);
>   
>   		if (ret == -ENOSPC) {
> +			dmem_cgroup_pool_state_put(alloc_state.charge_pool);
>   			dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>   			continue;
>   		} else if (ret == -EBUSY) {
> @@ -796,11 +854,14 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
>   			dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>   
>   			if (ret) {
> +				dmem_cgroup_pool_state_put(
> +						alloc_state.charge_pool);
>   				if (ret != -ENOSPC && ret != -EBUSY)
>   					return ret;
>   				continue;
>   			}
>   		} else if (ret) {
> +			dmem_cgroup_pool_state_put(alloc_state.charge_pool);
>   			dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>   			return ret;
>   		}
> diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
> index e2c82ad07eb44..fcfa8b51b0337 100644
> --- a/drivers/gpu/drm/ttm/ttm_resource.c
> +++ b/drivers/gpu/drm/ttm/ttm_resource.c
> @@ -372,30 +372,52 @@ void ttm_resource_fini(struct ttm_resource_manager *man,
>   }
>   EXPORT_SYMBOL(ttm_resource_fini);
>   
> +/**
> + * ttm_resource_try_charge - charge a resource manager's cgroup pool
> + * @bo: buffer for which an allocation should be charged
> + * @place: where the allocation is attempted to be placed
> + * @ret_pool: on charge success, the pool that was charged
> + * @ret_limit_pool: on charge failure, the pool responsible for the failure
> + *
> + * Should be used to charge cgroups before attempting resource allocation.
> + * When charging succeeds, the value of ret_pool should be passed to
> + * ttm_resource_alloc.
> + *
> + * Returns: 0 on charge success, negative errno on failure.
> + */
> +int ttm_resource_try_charge(struct ttm_buffer_object *bo,
> +			    const struct ttm_place *place,
> +			    struct dmem_cgroup_pool_state **ret_pool,
> +			    struct dmem_cgroup_pool_state **ret_limit_pool)
> +{
> +	struct ttm_resource_manager *man =
> +		ttm_manager_type(bo->bdev, place->mem_type);
> +
> +	if (!man->cg) {
> +		*ret_pool = NULL;
> +		if (ret_limit_pool)
> +			*ret_limit_pool = NULL;
> +		return 0;
> +	}
> +
> +	return dmem_cgroup_try_charge(man->cg, bo->base.size, ret_pool,
> +				      ret_limit_pool);
> +}
> +
>   int ttm_resource_alloc(struct ttm_buffer_object *bo,
>   		       const struct ttm_place *place,
>   		       struct ttm_resource **res_ptr,
> -		       struct dmem_cgroup_pool_state **ret_limit_pool)
> +		       struct dmem_cgroup_pool_state *charge_pool)
>   {
>   	struct ttm_resource_manager *man =
>   		ttm_manager_type(bo->bdev, place->mem_type);
> -	struct dmem_cgroup_pool_state *pool = NULL;
>   	int ret;
>   
> -	if (man->cg) {
> -		ret = dmem_cgroup_try_charge(man->cg, bo->base.size, &pool, ret_limit_pool);
> -		if (ret)
> -			return ret;
> -	}
> -
>   	ret = man->func->alloc(man, bo, place, res_ptr);
> -	if (ret) {
> -		if (pool)
> -			dmem_cgroup_uncharge(pool, bo->base.size);
> +	if (ret)
>   		return ret;
> -	}
>   
> -	(*res_ptr)->css = pool;
> +	(*res_ptr)->css = charge_pool;
>   
>   	spin_lock(&bo->bdev->lru_lock);
>   	ttm_resource_add_bulk_move(*res_ptr, bo);
> diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h
> index e52bba15012f7..3aef7efdd7cfb 100644
> --- a/include/drm/ttm/ttm_resource.h
> +++ b/include/drm/ttm/ttm_resource.h
> @@ -442,10 +442,14 @@ void ttm_resource_init(struct ttm_buffer_object *bo,
>   void ttm_resource_fini(struct ttm_resource_manager *man,
>   		       struct ttm_resource *res);
>   
> +int ttm_resource_try_charge(struct ttm_buffer_object *bo,
> +			    const struct ttm_place *place,
> +			    struct dmem_cgroup_pool_state **ret_pool,
> +			    struct dmem_cgroup_pool_state **ret_limit_pool);
>   int ttm_resource_alloc(struct ttm_buffer_object *bo,
>   		       const struct ttm_place *place,
>   		       struct ttm_resource **res,
> -		       struct dmem_cgroup_pool_state **ret_limit_pool);
> +		       struct dmem_cgroup_pool_state *charge_pool);
>   void ttm_resource_free(struct ttm_buffer_object *bo, struct ttm_resource **res);
>   bool ttm_resource_intersects(struct ttm_device *bdev,
>   			     struct ttm_resource *res,
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/5] drm/ttm: Be more aggressive when allocating below protection limit
  2026-02-25 11:45   ` Tvrtko Ursulin
@ 2026-02-25 13:26     ` Natalie Vock
  2026-02-25 14:09       ` Tvrtko Ursulin
  0 siblings, 1 reply; 15+ messages in thread
From: Natalie Vock @ 2026-02-25 13:26 UTC (permalink / raw)
  To: Tvrtko Ursulin, Maarten Lankhorst, Maxime Ripard, Tejun Heo,
	Johannes Weiner, Michal Koutný, Christian Koenig, Huang Rui,
	Matthew Auld, Matthew Brost, Maarten Lankhorst, Thomas Zimmermann,
	David Airlie, Simona Vetter
  Cc: cgroups, dri-devel

Sorry, already sent out v4 before I saw this.

On 2/25/26 12:45, Tvrtko Ursulin wrote:
> 
> On 10/11/2025 12:37, Natalie Vock wrote:
>> When the cgroup's memory usage is below the low/min limit and allocation
>> fails, try evicting some unprotected buffers to make space. Otherwise,
>> application buffers may be forced to go into GTT even though usage is
>> below the corresponding low/min limit, if other applications filled VRAM
>> with their allocations first.
>>
>> Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
>> ---
>>   drivers/gpu/drm/ttm/ttm_bo.c       | 75 ++++++++++++++++++++++++++++ 
>> ++++++----
>>   drivers/gpu/drm/ttm/ttm_resource.c | 48 +++++++++++++++++-------
>>   include/drm/ttm/ttm_resource.h     |  6 ++-
>>   3 files changed, 108 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> index 829d994798835..bd467c965e1bc 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -490,8 +490,12 @@ int ttm_bo_evict_first(struct ttm_device *bdev, 
>> struct ttm_resource_manager *man
>>   }
>>   struct ttm_bo_alloc_state {
>> +    /** @charge_pool: The memory pool the resource is charged to */
>> +    struct dmem_cgroup_pool_state *charge_pool;
>>       /** @limit_pool: Which pool limit we should test against */
>>       struct dmem_cgroup_pool_state *limit_pool;
>> +    /** @only_evict_unprotected: If eviction should be restricted to 
>> unprotected BOs */
>> +    bool only_evict_unprotected;
>>   };
>>   /**
>> @@ -546,7 +550,7 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk 
>> *walk, struct ttm_buffer_object *
>>       evict_walk->evicted++;
>>       if (evict_walk->res)
>>           lret = ttm_resource_alloc(evict_walk->evictor, evict_walk- 
>> >place,
>> -                      evict_walk->res, NULL);
>> +                      evict_walk->res, evict_walk->alloc_state- 
>> >charge_pool);
>>       if (lret == 0)
>>           return 1;
>>   out:
>> @@ -589,7 +593,7 @@ static int ttm_bo_evict_alloc(struct ttm_device 
>> *bdev,
>>       lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
>>       /* One more attempt if we hit low limit? */
>> -    if (!lret && evict_walk.hit_low) {
>> +    if (!lret && evict_walk.hit_low && !state->only_evict_unprotected) {
>>           evict_walk.try_low = true;
>>           lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
>>       }
>> @@ -610,7 +614,8 @@ static int ttm_bo_evict_alloc(struct ttm_device 
>> *bdev,
>>       } while (!lret && evict_walk.evicted);
>>       /* We hit the low limit? Try once more */
>> -    if (!lret && evict_walk.hit_low && !evict_walk.try_low) {
>> +    if (!lret && evict_walk.hit_low && !evict_walk.try_low &&
>> +            !state->only_evict_unprotected) {
>>           evict_walk.try_low = true;
>>           goto retry;
>>       }
>> @@ -719,20 +724,72 @@ static int ttm_bo_alloc_at_place(struct 
>> ttm_buffer_object *bo,
>>                    struct ttm_resource **res,
>>                    struct ttm_bo_alloc_state *alloc_state)
>>   {
>> -    bool may_evict;
>> +    bool may_evict, below_low = false;
> 
> No need to init below_low.

Oops. Oh well, that one goes into v5 then.

> 
>>       int ret;
>>       may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
>> +    ret = ttm_resource_try_charge(bo, place, &alloc_state->charge_pool,
>> +                      force_space ? &alloc_state->limit_pool : NULL);
>> +    if (ret) {
>> +        /*
>> +         * -EAGAIN means the charge failed, which we treat like an
>> +         * allocation failure. Therefore, return an error code 
>> indicating
>> +         * the allocation failed - either -EBUSY if the allocation 
>> should
>> +         * be retried with eviction, or -ENOSPC if there should be no 
>> second
>> +         * attempt.
>> +         */
>> +        if (ret == -EAGAIN)
>> +            ret = may_evict ? -EBUSY : -ENOSPC;
>> +        return ret;
>> +    }
>> -    ret = ttm_resource_alloc(bo, place, res,
>> -                 force_space ? &alloc_state->limit_pool : NULL);
>> +    /*
>> +     * cgroup protection plays a special role in eviction.
>> +     * Conceptually, protection of memory via the dmem cgroup controller
>> +     * entitles the protected cgroup to use a certain amount of memory.
>> +     * There are two types of protection - the 'low' limit is a
>> +     * "best-effort" protection, whereas the 'min' limit provides a hard
>> +     * guarantee that memory within the cgroup's allowance will not be
>> +     * evicted under any circumstance.
>> +     *
>> +     * To faithfully model this concept in TTM, we also need to take 
>> cgroup
>> +     * protection into account when allocating. When allocation in one
>> +     * place fails, TTM will default to trying other places first before
>> +     * evicting.
>> +     * If the allocation is covered by dmem cgroup protection, however,
>> +     * this prevents the allocation from using the memory it is 
>> "entitled"
>> +     * to. To make sure unprotected allocations cannot push new 
>> protected
>> +     * allocations out of places they are "entitled" to use, we should
>> +     * evict buffers not covered by any cgroup protection, if this
>> +     * allocation is covered by cgroup protection.
>> +     *
>> +     * Buffers covered by 'min' protection are a special case - the 
>> 'min'
>> +     * limit is a stronger guarantee than 'low', and thus buffers 
>> protected
>> +     * by 'low' but not 'min' should also be considered for eviction.
>> +     * Buffers protected by 'min' will never be considered for eviction
>> +     * anyway, so the regular eviction path should be triggered here.
>> +     * Buffers protected by 'low' but not 'min' will take a special
>> +     * eviction path that only evicts buffers covered by neither 
>> 'low' or
>> +     * 'min' protections.
>> +     */
>> +    may_evict |= dmem_cgroup_below_min(NULL, alloc_state->charge_pool);
>> +    below_low = dmem_cgroup_below_low(NULL, alloc_state->charge_pool);
> 
> Are these some magic macros? Couldn't grep for them.

They're functions added in patch 1.

> 
>> +    alloc_state->only_evict_unprotected = !may_evict && below_low;
>  > +> +    ret = ttm_resource_alloc(bo, place, res, alloc_state- 
>  >charge_pool);
>>       if (ret) {
>> -        if ((ret == -ENOSPC || ret == -EAGAIN) && may_evict)
>> +        if ((ret == -ENOSPC || ret == -EAGAIN) &&
>> +                (may_evict || below_low))
>>               ret = -EBUSY;
>>           return ret;
> 
> Where does the uncharge happen on the failure path?

The charge is passed to the caller regardless of success or failure, so 
that the caller can retry allocation (with eviction) using the same 
charge. The caller is also responsible for uncharging.

Maybe this is clearer in the split patch in v4.

Regards,
Natalie

> 
> Regards,
> 
> Tvrtko
> 
>>       }
>> +    /*
>> +     * Ownership of charge_pool has been transferred to the TTM 
>> resource,
>> +     * don't make the caller think we still hold a reference to it.
>> +     */
>> +    alloc_state->charge_pool = NULL;
>>       return 0;
>>   }
>> @@ -787,6 +844,7 @@ static int ttm_bo_alloc_resource(struct 
>> ttm_buffer_object *bo,
>>                   res, &alloc_state);
>>           if (ret == -ENOSPC) {
>> +            dmem_cgroup_pool_state_put(alloc_state.charge_pool);
>>               dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>>               continue;
>>           } else if (ret == -EBUSY) {
>> @@ -796,11 +854,14 @@ static int ttm_bo_alloc_resource(struct 
>> ttm_buffer_object *bo,
>>               dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>>               if (ret) {
>> +                dmem_cgroup_pool_state_put(
>> +                        alloc_state.charge_pool);
>>                   if (ret != -ENOSPC && ret != -EBUSY)
>>                       return ret;
>>                   continue;
>>               }
>>           } else if (ret) {
>> +            dmem_cgroup_pool_state_put(alloc_state.charge_pool);
>>               dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>>               return ret;
>>           }
>> diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ 
>> ttm_resource.c
>> index e2c82ad07eb44..fcfa8b51b0337 100644
>> --- a/drivers/gpu/drm/ttm/ttm_resource.c
>> +++ b/drivers/gpu/drm/ttm/ttm_resource.c
>> @@ -372,30 +372,52 @@ void ttm_resource_fini(struct 
>> ttm_resource_manager *man,
>>   }
>>   EXPORT_SYMBOL(ttm_resource_fini);
>> +/**
>> + * ttm_resource_try_charge - charge a resource manager's cgroup pool
>> + * @bo: buffer for which an allocation should be charged
>> + * @place: where the allocation is attempted to be placed
>> + * @ret_pool: on charge success, the pool that was charged
>> + * @ret_limit_pool: on charge failure, the pool responsible for the 
>> failure
>> + *
>> + * Should be used to charge cgroups before attempting resource 
>> allocation.
>> + * When charging succeeds, the value of ret_pool should be passed to
>> + * ttm_resource_alloc.
>> + *
>> + * Returns: 0 on charge success, negative errno on failure.
>> + */
>> +int ttm_resource_try_charge(struct ttm_buffer_object *bo,
>> +                const struct ttm_place *place,
>> +                struct dmem_cgroup_pool_state **ret_pool,
>> +                struct dmem_cgroup_pool_state **ret_limit_pool)
>> +{
>> +    struct ttm_resource_manager *man =
>> +        ttm_manager_type(bo->bdev, place->mem_type);
>> +
>> +    if (!man->cg) {
>> +        *ret_pool = NULL;
>> +        if (ret_limit_pool)
>> +            *ret_limit_pool = NULL;
>> +        return 0;
>> +    }
>> +
>> +    return dmem_cgroup_try_charge(man->cg, bo->base.size, ret_pool,
>> +                      ret_limit_pool);
>> +}
>> +
>>   int ttm_resource_alloc(struct ttm_buffer_object *bo,
>>                  const struct ttm_place *place,
>>                  struct ttm_resource **res_ptr,
>> -               struct dmem_cgroup_pool_state **ret_limit_pool)
>> +               struct dmem_cgroup_pool_state *charge_pool)
>>   {
>>       struct ttm_resource_manager *man =
>>           ttm_manager_type(bo->bdev, place->mem_type);
>> -    struct dmem_cgroup_pool_state *pool = NULL;
>>       int ret;
>> -    if (man->cg) {
>> -        ret = dmem_cgroup_try_charge(man->cg, bo->base.size, &pool, 
>> ret_limit_pool);
>> -        if (ret)
>> -            return ret;
>> -    }
>> -
>>       ret = man->func->alloc(man, bo, place, res_ptr);
>> -    if (ret) {
>> -        if (pool)
>> -            dmem_cgroup_uncharge(pool, bo->base.size);
>> +    if (ret)
>>           return ret;
>> -    }
>> -    (*res_ptr)->css = pool;
>> +    (*res_ptr)->css = charge_pool;
>>       spin_lock(&bo->bdev->lru_lock);
>>       ttm_resource_add_bulk_move(*res_ptr, bo);
>> diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ 
>> ttm_resource.h
>> index e52bba15012f7..3aef7efdd7cfb 100644
>> --- a/include/drm/ttm/ttm_resource.h
>> +++ b/include/drm/ttm/ttm_resource.h
>> @@ -442,10 +442,14 @@ void ttm_resource_init(struct ttm_buffer_object 
>> *bo,
>>   void ttm_resource_fini(struct ttm_resource_manager *man,
>>                  struct ttm_resource *res);
>> +int ttm_resource_try_charge(struct ttm_buffer_object *bo,
>> +                const struct ttm_place *place,
>> +                struct dmem_cgroup_pool_state **ret_pool,
>> +                struct dmem_cgroup_pool_state **ret_limit_pool);
>>   int ttm_resource_alloc(struct ttm_buffer_object *bo,
>>                  const struct ttm_place *place,
>>                  struct ttm_resource **res,
>> -               struct dmem_cgroup_pool_state **ret_limit_pool);
>> +               struct dmem_cgroup_pool_state *charge_pool);
>>   void ttm_resource_free(struct ttm_buffer_object *bo, struct 
>> ttm_resource **res);
>>   bool ttm_resource_intersects(struct ttm_device *bdev,
>>                    struct ttm_resource *res,
>>
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/5] drm/ttm: Be more aggressive when allocating below protection limit
  2026-02-25 13:26     ` Natalie Vock
@ 2026-02-25 14:09       ` Tvrtko Ursulin
  0 siblings, 0 replies; 15+ messages in thread
From: Tvrtko Ursulin @ 2026-02-25 14:09 UTC (permalink / raw)
  To: Natalie Vock, Maarten Lankhorst, Maxime Ripard, Tejun Heo,
	Johannes Weiner, Michal Koutný, Christian Koenig, Huang Rui,
	Matthew Auld, Matthew Brost, Maarten Lankhorst, Thomas Zimmermann,
	David Airlie, Simona Vetter
  Cc: cgroups, dri-devel


On 25/02/2026 13:26, Natalie Vock wrote:
> Sorry, already sent out v4 before I saw this.
> 
> On 2/25/26 12:45, Tvrtko Ursulin wrote:
>>
>> On 10/11/2025 12:37, Natalie Vock wrote:
>>> When the cgroup's memory usage is below the low/min limit and allocation
>>> fails, try evicting some unprotected buffers to make space. Otherwise,
>>> application buffers may be forced to go into GTT even though usage is
>>> below the corresponding low/min limit, if other applications filled VRAM
>>> with their allocations first.
>>>
>>> Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
>>> ---
>>>   drivers/gpu/drm/ttm/ttm_bo.c       | 75 +++++++++++++++++++++++++++ 
>>> + ++++++----
>>>   drivers/gpu/drm/ttm/ttm_resource.c | 48 +++++++++++++++++-------
>>>   include/drm/ttm/ttm_resource.h     |  6 ++-
>>>   3 files changed, 108 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index 829d994798835..bd467c965e1bc 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -490,8 +490,12 @@ int ttm_bo_evict_first(struct ttm_device *bdev, 
>>> struct ttm_resource_manager *man
>>>   }
>>>   struct ttm_bo_alloc_state {
>>> +    /** @charge_pool: The memory pool the resource is charged to */
>>> +    struct dmem_cgroup_pool_state *charge_pool;
>>>       /** @limit_pool: Which pool limit we should test against */
>>>       struct dmem_cgroup_pool_state *limit_pool;
>>> +    /** @only_evict_unprotected: If eviction should be restricted to 
>>> unprotected BOs */
>>> +    bool only_evict_unprotected;
>>>   };
>>>   /**
>>> @@ -546,7 +550,7 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk 
>>> *walk, struct ttm_buffer_object *
>>>       evict_walk->evicted++;
>>>       if (evict_walk->res)
>>>           lret = ttm_resource_alloc(evict_walk->evictor, evict_walk- 
>>> >place,
>>> -                      evict_walk->res, NULL);
>>> +                      evict_walk->res, evict_walk->alloc_state- 
>>> >charge_pool);
>>>       if (lret == 0)
>>>           return 1;
>>>   out:
>>> @@ -589,7 +593,7 @@ static int ttm_bo_evict_alloc(struct ttm_device 
>>> *bdev,
>>>       lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
>>>       /* One more attempt if we hit low limit? */
>>> -    if (!lret && evict_walk.hit_low) {
>>> +    if (!lret && evict_walk.hit_low && !state- 
>>> >only_evict_unprotected) {
>>>           evict_walk.try_low = true;
>>>           lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
>>>       }
>>> @@ -610,7 +614,8 @@ static int ttm_bo_evict_alloc(struct ttm_device 
>>> *bdev,
>>>       } while (!lret && evict_walk.evicted);
>>>       /* We hit the low limit? Try once more */
>>> -    if (!lret && evict_walk.hit_low && !evict_walk.try_low) {
>>> +    if (!lret && evict_walk.hit_low && !evict_walk.try_low &&
>>> +            !state->only_evict_unprotected) {
>>>           evict_walk.try_low = true;
>>>           goto retry;
>>>       }
>>> @@ -719,20 +724,72 @@ static int ttm_bo_alloc_at_place(struct 
>>> ttm_buffer_object *bo,
>>>                    struct ttm_resource **res,
>>>                    struct ttm_bo_alloc_state *alloc_state)
>>>   {
>>> -    bool may_evict;
>>> +    bool may_evict, below_low = false;
>>
>> No need to init below_low.
> 
> Oops. Oh well, that one goes into v5 then.
> 
>>
>>>       int ret;
>>>       may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
>>> +    ret = ttm_resource_try_charge(bo, place, &alloc_state->charge_pool,
>>> +                      force_space ? &alloc_state->limit_pool : NULL);
>>> +    if (ret) {
>>> +        /*
>>> +         * -EAGAIN means the charge failed, which we treat like an
>>> +         * allocation failure. Therefore, return an error code 
>>> indicating
>>> +         * the allocation failed - either -EBUSY if the allocation 
>>> should
>>> +         * be retried with eviction, or -ENOSPC if there should be 
>>> no second
>>> +         * attempt.
>>> +         */
>>> +        if (ret == -EAGAIN)
>>> +            ret = may_evict ? -EBUSY : -ENOSPC;
>>> +        return ret;
>>> +    }
>>> -    ret = ttm_resource_alloc(bo, place, res,
>>> -                 force_space ? &alloc_state->limit_pool : NULL);
>>> +    /*
>>> +     * cgroup protection plays a special role in eviction.
>>> +     * Conceptually, protection of memory via the dmem cgroup 
>>> controller
>>> +     * entitles the protected cgroup to use a certain amount of memory.
>>> +     * There are two types of protection - the 'low' limit is a
>>> +     * "best-effort" protection, whereas the 'min' limit provides a 
>>> hard
>>> +     * guarantee that memory within the cgroup's allowance will not be
>>> +     * evicted under any circumstance.
>>> +     *
>>> +     * To faithfully model this concept in TTM, we also need to take 
>>> cgroup
>>> +     * protection into account when allocating. When allocation in one
>>> +     * place fails, TTM will default to trying other places first 
>>> before
>>> +     * evicting.
>>> +     * If the allocation is covered by dmem cgroup protection, however,
>>> +     * this prevents the allocation from using the memory it is 
>>> "entitled"
>>> +     * to. To make sure unprotected allocations cannot push new 
>>> protected
>>> +     * allocations out of places they are "entitled" to use, we should
>>> +     * evict buffers not covered by any cgroup protection, if this
>>> +     * allocation is covered by cgroup protection.
>>> +     *
>>> +     * Buffers covered by 'min' protection are a special case - the 
>>> 'min'
>>> +     * limit is a stronger guarantee than 'low', and thus buffers 
>>> protected
>>> +     * by 'low' but not 'min' should also be considered for eviction.
>>> +     * Buffers protected by 'min' will never be considered for eviction
>>> +     * anyway, so the regular eviction path should be triggered here.
>>> +     * Buffers protected by 'low' but not 'min' will take a special
>>> +     * eviction path that only evicts buffers covered by neither 
>>> 'low' or
>>> +     * 'min' protections.
>>> +     */
>>> +    may_evict |= dmem_cgroup_below_min(NULL, alloc_state->charge_pool);
>>> +    below_low = dmem_cgroup_below_low(NULL, alloc_state->charge_pool);
>>
>> Are these some magic macros? Couldn't grep for them.
> 
> They're functions added in patch 1.

Doh!

> 
>>
>>> +    alloc_state->only_evict_unprotected = !may_evict && below_low;
>>  > +> +    ret = ttm_resource_alloc(bo, place, res, alloc_state- 
>>  >charge_pool);
>>>       if (ret) {
>>> -        if ((ret == -ENOSPC || ret == -EAGAIN) && may_evict)
>>> +        if ((ret == -ENOSPC || ret == -EAGAIN) &&
>>> +                (may_evict || below_low))
>>>               ret = -EBUSY;
>>>           return ret;
>>
>> Where does the uncharge happen on the failure path?
> 
> The charge is passed to the caller regardless of success or failure, so 
> that the caller can retry allocation (with eviction) using the same 
> charge. The caller is also responsible for uncharging.
> 
> Maybe this is clearer in the split patch in v4.

I don't see it just yet. I see v4 also removes one pair of 
dmem_cgroup_try_charge + dmem_cgroup_uncharge, while adding one 
dmem_cgroup_try_charge. I don't see where the uncharge in the new flow is.

At least there are some returns directly out of ttm_bo_alloc_resource() 
(before or after the eviction attempt) so couldn't those have the charge 
already applied and not uncharged? Or the games with converting error 
codes make that impossible?

Regards,

Tvrtko

>>>       }
>>> +    /*
>>> +     * Ownership of charge_pool has been transferred to the TTM 
>>> resource,
>>> +     * don't make the caller think we still hold a reference to it.
>>> +     */
>>> +    alloc_state->charge_pool = NULL;
>>>       return 0;
>>>   }
>>> @@ -787,6 +844,7 @@ static int ttm_bo_alloc_resource(struct 
>>> ttm_buffer_object *bo,
>>>                   res, &alloc_state);
>>>           if (ret == -ENOSPC) {
>>> +            dmem_cgroup_pool_state_put(alloc_state.charge_pool);
>>>               dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>>>               continue;
>>>           } else if (ret == -EBUSY) {
>>> @@ -796,11 +854,14 @@ static int ttm_bo_alloc_resource(struct 
>>> ttm_buffer_object *bo,
>>>               dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>>>               if (ret) {
>>> +                dmem_cgroup_pool_state_put(
>>> +                        alloc_state.charge_pool);
>>>                   if (ret != -ENOSPC && ret != -EBUSY)
>>>                       return ret;
>>>                   continue;
>>>               }
>>>           } else if (ret) {
>>> +            dmem_cgroup_pool_state_put(alloc_state.charge_pool);
>>>               dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>>>               return ret;
>>>           }
>>> diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ 
>>> ttm/ ttm_resource.c
>>> index e2c82ad07eb44..fcfa8b51b0337 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_resource.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_resource.c
>>> @@ -372,30 +372,52 @@ void ttm_resource_fini(struct 
>>> ttm_resource_manager *man,
>>>   }
>>>   EXPORT_SYMBOL(ttm_resource_fini);
>>> +/**
>>> + * ttm_resource_try_charge - charge a resource manager's cgroup pool
>>> + * @bo: buffer for which an allocation should be charged
>>> + * @place: where the allocation is attempted to be placed
>>> + * @ret_pool: on charge success, the pool that was charged
>>> + * @ret_limit_pool: on charge failure, the pool responsible for the 
>>> failure
>>> + *
>>> + * Should be used to charge cgroups before attempting resource 
>>> allocation.
>>> + * When charging succeeds, the value of ret_pool should be passed to
>>> + * ttm_resource_alloc.
>>> + *
>>> + * Returns: 0 on charge success, negative errno on failure.
>>> + */
>>> +int ttm_resource_try_charge(struct ttm_buffer_object *bo,
>>> +                const struct ttm_place *place,
>>> +                struct dmem_cgroup_pool_state **ret_pool,
>>> +                struct dmem_cgroup_pool_state **ret_limit_pool)
>>> +{
>>> +    struct ttm_resource_manager *man =
>>> +        ttm_manager_type(bo->bdev, place->mem_type);
>>> +
>>> +    if (!man->cg) {
>>> +        *ret_pool = NULL;
>>> +        if (ret_limit_pool)
>>> +            *ret_limit_pool = NULL;
>>> +        return 0;
>>> +    }
>>> +
>>> +    return dmem_cgroup_try_charge(man->cg, bo->base.size, ret_pool,
>>> +                      ret_limit_pool);
>>> +}
>>> +
>>>   int ttm_resource_alloc(struct ttm_buffer_object *bo,
>>>                  const struct ttm_place *place,
>>>                  struct ttm_resource **res_ptr,
>>> -               struct dmem_cgroup_pool_state **ret_limit_pool)
>>> +               struct dmem_cgroup_pool_state *charge_pool)
>>>   {
>>>       struct ttm_resource_manager *man =
>>>           ttm_manager_type(bo->bdev, place->mem_type);
>>> -    struct dmem_cgroup_pool_state *pool = NULL;
>>>       int ret;
>>> -    if (man->cg) {
>>> -        ret = dmem_cgroup_try_charge(man->cg, bo->base.size, &pool, 
>>> ret_limit_pool);
>>> -        if (ret)
>>> -            return ret;
>>> -    }
>>> -
>>>       ret = man->func->alloc(man, bo, place, res_ptr);
>>> -    if (ret) {
>>> -        if (pool)
>>> -            dmem_cgroup_uncharge(pool, bo->base.size);
>>> +    if (ret)
>>>           return ret;
>>> -    }
>>> -    (*res_ptr)->css = pool;
>>> +    (*res_ptr)->css = charge_pool;
>>>       spin_lock(&bo->bdev->lru_lock);
>>>       ttm_resource_add_bulk_move(*res_ptr, bo);
>>> diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ 
>>> ttm_resource.h
>>> index e52bba15012f7..3aef7efdd7cfb 100644
>>> --- a/include/drm/ttm/ttm_resource.h
>>> +++ b/include/drm/ttm/ttm_resource.h
>>> @@ -442,10 +442,14 @@ void ttm_resource_init(struct ttm_buffer_object 
>>> *bo,
>>>   void ttm_resource_fini(struct ttm_resource_manager *man,
>>>                  struct ttm_resource *res);
>>> +int ttm_resource_try_charge(struct ttm_buffer_object *bo,
>>> +                const struct ttm_place *place,
>>> +                struct dmem_cgroup_pool_state **ret_pool,
>>> +                struct dmem_cgroup_pool_state **ret_limit_pool);
>>>   int ttm_resource_alloc(struct ttm_buffer_object *bo,
>>>                  const struct ttm_place *place,
>>>                  struct ttm_resource **res,
>>> -               struct dmem_cgroup_pool_state **ret_limit_pool);
>>> +               struct dmem_cgroup_pool_state *charge_pool);
>>>   void ttm_resource_free(struct ttm_buffer_object *bo, struct 
>>> ttm_resource **res);
>>>   bool ttm_resource_intersects(struct ttm_device *bdev,
>>>                    struct ttm_resource *res,
>>>
>>
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 5/5] drm/ttm: Use common ancestor of evictor and evictee as limit pool
  2025-11-10 12:37 [PATCH v3 0/5] cgroup/dmem,drm/ttm: Improve protection in contended cases Natalie Vock
                   ` (3 preceding siblings ...)
  2025-11-10 12:37 ` [PATCH v3 4/5] drm/ttm: Be more aggressive when allocating below protection limit Natalie Vock
@ 2025-11-10 12:37 ` Natalie Vock
  4 siblings, 0 replies; 15+ messages in thread
From: Natalie Vock @ 2025-11-10 12:37 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Tejun Heo, Johannes Weiner,
	Michal Koutný, Christian Koenig, Huang Rui, Matthew Auld,
	Matthew Brost, Maarten Lankhorst, Thomas Zimmermann, David Airlie,
	Simona Vetter
  Cc: cgroups, dri-devel

When checking whether to skip certain buffers because they're protected
by dmem.low, we're checking the effective protection of the evictee's
cgroup, but depending on how the evictor's cgroup relates to the
evictee's, the semantics of effective protection values change.

When testing against cgroups from different subtrees, page_counter's
recursive protection propagates memory protection afforded to a parent
down to the child cgroups, even if the children were not explicitly
protected. This prevents cgroups whose parents were afforded no
protection from stealing memory from cgroups whose parents were afforded
more protection, without users having to explicitly propagate this
protection.

However, if we always calculate protection from the root cgroup, this
breaks prioritization of sibling cgroups: If one cgroup was explicitly
protected and its siblings were not, the protected cgroup should get
higher priority, i.e. the protected cgroup should be able to steal from
unprotected siblings. This only works if we restrict the protection
calculation to the subtree shared by evictor and evictee.

Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 35 ++++++++++++++++++++++++++++++++---
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index bd467c965e1bc..50ed4bffd3437 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -524,13 +524,42 @@ struct ttm_bo_evict_walk {
 
 static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct ttm_buffer_object *bo)
 {
+	struct dmem_cgroup_pool_state *limit_pool;
 	struct ttm_bo_evict_walk *evict_walk =
 		container_of(walk, typeof(*evict_walk), walk);
 	s64 lret;
 
-	if (!dmem_cgroup_state_evict_valuable(evict_walk->alloc_state->limit_pool,
-					      bo->resource->css, evict_walk->try_low,
-					      &evict_walk->hit_low))
+	/*
+	 * If only_evict_unprotected is set, then we're trying to evict unprotected
+	 * buffers in favor of a protected allocation for charge_pool. Explicitly skip
+	 * buffers belonging to the same cgroup here - that cgroup is definitely protected,
+	 * even though dmem_cgroup_state_evict_valuable would allow the eviction because a
+	 * cgroup is always allowed to evict from itself even if it is protected.
+	 */
+	if (evict_walk->alloc_state->only_evict_unprotected &&
+			bo->resource->css == evict_walk->alloc_state->charge_pool)
+		return 0;
+
+	limit_pool = evict_walk->alloc_state->limit_pool;
+	/*
+	 * If there is no explicit limit pool, find the root of the shared subtree between
+	 * evictor and evictee. This is important so that recursive protection rules can
+	 * apply properly: Recursive protection distributes cgroup protection afforded
+	 * to a parent cgroup but not used explicitly by a child cgroup between all child
+	 * cgroups (see docs of effective_protection in mm/page_counter.c). However, when
+	 * direct siblings compete for memory, siblings that were explicitly protected
+	 * should get prioritized over siblings that weren't. This only happens correctly
+	 * when the root of the shared subtree is passed to
+	 * dmem_cgroup_state_evict_valuable. Otherwise, the effective-protection
+	 * calculation cannot distinguish direct siblings from unrelated subtrees and the
+	 * calculated protection ends up wrong.
+	 */
+	if (!limit_pool)
+		limit_pool = dmem_cgroup_common_ancestor(bo->resource->css,
+							 evict_walk->alloc_state->charge_pool);
+
+	if (!dmem_cgroup_state_evict_valuable(limit_pool, bo->resource->css,
+					      evict_walk->try_low, &evict_walk->hit_low))
 		return 0;
 
 	if (bo->pin_count || !bo->bdev->funcs->eviction_valuable(bo, evict_walk->place))

-- 
2.51.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-02-25 14:09 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-10 12:37 [PATCH v3 0/5] cgroup/dmem,drm/ttm: Improve protection in contended cases Natalie Vock
2025-11-10 12:37 ` [PATCH v3 1/5] cgroup/dmem: Add queries for protection values Natalie Vock
2025-11-10 12:37 ` [PATCH v3 2/5] cgroup/dmem: Add dmem_cgroup_common_ancestor helper Natalie Vock
2025-11-10 12:37 ` [PATCH v3 3/5] drm/ttm: Make a helper for attempting allocation in a place Natalie Vock
2026-02-24 16:09   ` Tvrtko Ursulin
2026-02-25  9:36     ` Natalie Vock
2025-11-10 12:37 ` [PATCH v3 4/5] drm/ttm: Be more aggressive when allocating below protection limit Natalie Vock
2026-02-24 16:40   ` Tvrtko Ursulin
2026-02-25  9:49     ` Natalie Vock
2026-02-25 10:12       ` Tvrtko Ursulin
2026-02-25 10:19         ` Natalie Vock
2026-02-25 11:45   ` Tvrtko Ursulin
2026-02-25 13:26     ` Natalie Vock
2026-02-25 14:09       ` Tvrtko Ursulin
2025-11-10 12:37 ` [PATCH v3 5/5] drm/ttm: Use common ancestor of evictor and evictee as limit pool Natalie Vock

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox