* [PATCH v5 0/6] cgroup/dmem,drm/ttm: Improve protection in contended cases
@ 2026-03-02 12:37 Natalie Vock
2026-03-02 12:37 ` [PATCH v5 1/6] cgroup/dmem: Add queries for protection values Natalie Vock
` (5 more replies)
0 siblings, 6 replies; 18+ messages in thread
From: Natalie Vock @ 2026-03-02 12:37 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Tejun Heo, Johannes Weiner,
Michal Koutný, Christian Koenig, Huang Rui, Matthew Auld,
Matthew Brost, Maarten Lankhorst, Thomas Zimmermann, David Airlie,
Simona Vetter, Tvrtko Ursulin
Cc: cgroups, dri-devel, Natalie Vock, Tvrtko Ursulin
Hi all,
I've been looking into some cases where dmem protection fails to prevent
allocations from ending up in GTT when VRAM gets scarce and apps start
competing hard.
In short, this is because other (unprotected) applications end up
filling VRAM before protected applications do. This causes TTM to back
off and try allocating in GTT before anything else, and that is where
the allocation is placed in the end. The existing eviction protection
cannot prevent this, because no attempt at evicting is ever made
(although you could consider the backing-off as an immediate eviction to
GTT).
This series tries to alleviate this by adding a special case when the
allocation is protected by cgroups: Instead of backing off immediately,
TTM will try evicting unprotected buffers from the domain to make space
for the protected one. This ensures that applications can actually use
all the memory protection awarded to them by the system, without being
prone to ping-ponging (only protected allocations can evict unprotected
ones, never the other way around).
The first two patches just add a few small utilities needed to implement
this to the dmem controller. The other patches are the TTM implementation:
"drm/ttm: Be more aggressive..." decouples cgroup charging from resource
allocation to allow us to hold on to the charge even if allocation fails
on first try, and adds a path to call ttm_bo_evict_alloc when the
charged allocation falls within min/low protection limits.
"drm/ttm: Use common ancestor..." is a more general improvement in
correctly implementing cgroup protection semantics. With recursive
protection rules, unused memory protection afforded to a parent node is
transferred to children recursively, which helps protect entire
subtrees from stealing each others' memory without needing to protect
each cgroup individually. This doesn't apply when considering direct
siblings inside the same subtree, so in order to not break
prioritization between these siblings, we need to consider the
relationship of evictor and evictee when calculating protection.
In practice, this fixes cases where a protected cgroup cannot steal
memory from unprotected siblings (which, in turn, leads to eviction
failures and new allocations being placed in GTT).
Thanks,
Natalie
Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
---
Changes in v5:
- Added cgroup_common_ancestor helper to use with
dmem_cgroup_common_ancestor (Tejun)
- Note: "drm/ttm: Use common ancestor..." needed minor changes since
dmem_cgroup_common_ancestor now grabs a reference to the ancestor
pool which needs to be dropped after use
- Removed extraneous whitespaces in "drm/ttm: Split cgroup charge..."
and unnecessary changes done in "drm/ttm: Extract code..." (Tvrtko)
- Applied a comment from v3 about below_low not needing to be
initialized in "drm/ttm: Be more aggressive..." (Tvrtko)
- Fixed uncharging the cgroup on allocation failure (Tvrtko)
- Fixed a typo in the message of "drm/ttm: Split cgroup charge..."
(Tvrtko)
- Added case in ttm_bo_evict_cb for when charging fails, since we need
to retry the charge (found myself)
- Link to v4: https://lore.kernel.org/r/20260225-dmemcg-aggressive-protect-v4-0-de847ab35184@gmx.de
Changes in v4:
- Split cgroup charge decoupling and eviction logic changes into
separate commits (Tvrtko)
- Fix two cases of errno handling in ttm_bo_alloc_place and its caller
(Tvrtko)
- Improve commit message/description of "drm/ttm: Make a helper..." (now
"drm/ttm: Extract code...") (Tvrtko)
- Documentation improvements for new TTM eviction logic (Tvrtko)
- Formatting fixes (Tvrtko)
- Link to v3: https://lore.kernel.org/r/20251110-dmemcg-aggressive-protect-v3-0-219ffcfc54e9@gmx.de
Changes in v3:
- Improved documentation around cgroup queries and TTM eviction helpers
(Maarten)
- Fixed up ttm_alloc_at_place charge failure logic to return either
-EBUSY or -ENOSPC, not -EAGAIN (found this myself)
- Link to v2: https://lore.kernel.org/r/20251015-dmemcg-aggressive-protect-v2-0-36644fb4e37f@gmx.de
Changes in v2:
- Factored out the ttm logic for charging/allocating/evicting into a
separate helper to keep things simpler
- Link to v1: https://lore.kernel.org/r/20250915-dmemcg-aggressive-protect-v1-0-2f3353bfcdac@gmx.de
---
Natalie Vock (6):
cgroup/dmem: Add queries for protection values
cgroup,cgroup/dmem: Add (dmem_)cgroup_common_ancestor helper
drm/ttm: Extract code for attempting allocation in a place
drm/ttm: Split cgroup charge and resource allocation
drm/ttm: Be more aggressive when allocating below protection limit
drm/ttm: Use common ancestor of evictor and evictee as limit pool
drivers/gpu/drm/ttm/ttm_bo.c | 214 ++++++++++++++++++++++++++++++++-----
drivers/gpu/drm/ttm/ttm_resource.c | 48 ++++++---
include/drm/ttm/ttm_resource.h | 6 +-
include/linux/cgroup.h | 21 ++++
include/linux/cgroup_dmem.h | 25 +++++
kernel/cgroup/dmem.c | 105 +++++++++++++++++-
6 files changed, 374 insertions(+), 45 deletions(-)
---
base-commit: 61c0f69a2ff79c8f388a9e973abb4853be467127
change-id: 20250915-dmemcg-aggressive-protect-5cf37f717cdb
Best regards,
--
Natalie Vock <natalie.vock@gmx.de>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v5 1/6] cgroup/dmem: Add queries for protection values
2026-03-02 12:37 [PATCH v5 0/6] cgroup/dmem,drm/ttm: Improve protection in contended cases Natalie Vock
@ 2026-03-02 12:37 ` Natalie Vock
2026-03-11 1:12 ` Chen Ridong
2026-03-02 12:37 ` [PATCH v5 2/6] cgroup,cgroup/dmem: Add (dmem_)cgroup_common_ancestor helper Natalie Vock
` (4 subsequent siblings)
5 siblings, 1 reply; 18+ messages in thread
From: Natalie Vock @ 2026-03-02 12:37 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Tejun Heo, Johannes Weiner,
Michal Koutný, Christian Koenig, Huang Rui, Matthew Auld,
Matthew Brost, Maarten Lankhorst, Thomas Zimmermann, David Airlie,
Simona Vetter, Tvrtko Ursulin
Cc: cgroups, dri-devel, Natalie Vock
Callers can use this feedback to be more aggressive in making space for
allocations of a cgroup if they know it is protected.
These are counterparts to memcg's mem_cgroup_below_{min,low}.
Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
---
include/linux/cgroup_dmem.h | 16 ++++++++++++
kernel/cgroup/dmem.c | 62 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 78 insertions(+)
diff --git a/include/linux/cgroup_dmem.h b/include/linux/cgroup_dmem.h
index dd4869f1d736e..1a88cd0c9eb00 100644
--- a/include/linux/cgroup_dmem.h
+++ b/include/linux/cgroup_dmem.h
@@ -24,6 +24,10 @@ void dmem_cgroup_uncharge(struct dmem_cgroup_pool_state *pool, u64 size);
bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_pool_state *limit_pool,
struct dmem_cgroup_pool_state *test_pool,
bool ignore_low, bool *ret_hit_low);
+bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state *root,
+ struct dmem_cgroup_pool_state *test);
+bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
+ struct dmem_cgroup_pool_state *test);
void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool);
#else
@@ -59,6 +63,18 @@ bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_pool_state *limit_pool,
return true;
}
+static inline bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state *root,
+ struct dmem_cgroup_pool_state *test)
+{
+ return false;
+}
+
+static inline bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
+ struct dmem_cgroup_pool_state *test)
+{
+ return false;
+}
+
static inline void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool)
{ }
diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
index 9d95824dc6fa0..28227405f7cfe 100644
--- a/kernel/cgroup/dmem.c
+++ b/kernel/cgroup/dmem.c
@@ -694,6 +694,68 @@ int dmem_cgroup_try_charge(struct dmem_cgroup_region *region, u64 size,
}
EXPORT_SYMBOL_GPL(dmem_cgroup_try_charge);
+/**
+ * dmem_cgroup_below_min() - Tests whether current usage is within min limit.
+ *
+ * @root: Root of the subtree to calculate protection for, or NULL to calculate global protection.
+ * @test: The pool to test the usage/min limit of.
+ *
+ * Return: true if usage is below min and the cgroup is protected, false otherwise.
+ */
+bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state *root,
+ struct dmem_cgroup_pool_state *test)
+{
+ if (root == test || !pool_parent(test))
+ return false;
+
+ if (!root) {
+ for (root = test; pool_parent(root); root = pool_parent(root))
+ {}
+ }
+
+ /*
+ * In mem_cgroup_below_min(), the memcg pendant, this call is missing.
+ * mem_cgroup_below_min() gets called during traversal of the cgroup tree, where
+ * protection is already calculated as part of the traversal. dmem cgroup eviction
+ * does not traverse the cgroup tree, so we need to recalculate effective protection
+ * here.
+ */
+ dmem_cgroup_calculate_protection(root, test);
+ return page_counter_read(&test->cnt) <= READ_ONCE(test->cnt.emin);
+}
+EXPORT_SYMBOL_GPL(dmem_cgroup_below_min);
+
+/**
+ * dmem_cgroup_below_low() - Tests whether current usage is within low limit.
+ *
+ * @root: Root of the subtree to calculate protection for, or NULL to calculate global protection.
+ * @test: The pool to test the usage/low limit of.
+ *
+ * Return: true if usage is below low and the cgroup is protected, false otherwise.
+ */
+bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
+ struct dmem_cgroup_pool_state *test)
+{
+ if (root == test || !pool_parent(test))
+ return false;
+
+ if (!root) {
+ for (root = test; pool_parent(root); root = pool_parent(root))
+ {}
+ }
+
+ /*
+ * In mem_cgroup_below_low(), the memcg pendant, this call is missing.
+ * mem_cgroup_below_low() gets called during traversal of the cgroup tree, where
+ * protection is already calculated as part of the traversal. dmem cgroup eviction
+ * does not traverse the cgroup tree, so we need to recalculate effective protection
+ * here.
+ */
+ dmem_cgroup_calculate_protection(root, test);
+ return page_counter_read(&test->cnt) <= READ_ONCE(test->cnt.elow);
+}
+EXPORT_SYMBOL_GPL(dmem_cgroup_below_low);
+
static int dmem_cgroup_region_capacity_show(struct seq_file *sf, void *v)
{
struct dmem_cgroup_region *region;
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v5 2/6] cgroup,cgroup/dmem: Add (dmem_)cgroup_common_ancestor helper
2026-03-02 12:37 [PATCH v5 0/6] cgroup/dmem,drm/ttm: Improve protection in contended cases Natalie Vock
2026-03-02 12:37 ` [PATCH v5 1/6] cgroup/dmem: Add queries for protection values Natalie Vock
@ 2026-03-02 12:37 ` Natalie Vock
2026-03-02 14:38 ` Maarten Lankhorst
2026-03-02 12:37 ` [PATCH v5 3/6] drm/ttm: Extract code for attempting allocation in a place Natalie Vock
` (3 subsequent siblings)
5 siblings, 1 reply; 18+ messages in thread
From: Natalie Vock @ 2026-03-02 12:37 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Tejun Heo, Johannes Weiner,
Michal Koutný, Christian Koenig, Huang Rui, Matthew Auld,
Matthew Brost, Maarten Lankhorst, Thomas Zimmermann, David Airlie,
Simona Vetter, Tvrtko Ursulin
Cc: cgroups, dri-devel, Natalie Vock
This helps to find a common subtree of two resources, which is important
when determining whether it's helpful to evict one resource in favor of
another.
To facilitate this, add a common helper to find the ancestor of two
cgroups using each cgroup's ancestor array.
Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
---
include/linux/cgroup.h | 21 +++++++++++++++++++++
include/linux/cgroup_dmem.h | 9 +++++++++
kernel/cgroup/dmem.c | 43 ++++++++++++++++++++++++++++++++++++++++---
3 files changed, 70 insertions(+), 3 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index bc892e3b37eea..560ae995e3a54 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -561,6 +561,27 @@ static inline struct cgroup *cgroup_ancestor(struct cgroup *cgrp,
return cgrp->ancestors[ancestor_level];
}
+/**
+ * cgroup_common_ancestor - find common ancestor of two cgroups
+ * @a: first cgroup to find common ancestor of
+ * @b: second cgroup to find common ancestor of
+ *
+ * Find the first cgroup that is an ancestor of both @a and @b, if it exists
+ * and return a pointer to it. If such a cgroup doesn't exist, return NULL.
+ *
+ * This function is safe to call as long as both @a and @b are accessible.
+ */
+static inline struct cgroup *cgroup_common_ancestor(struct cgroup *a,
+ struct cgroup *b)
+{
+ int level;
+
+ for (level = min(a->level, b->level); level >= 0; level--)
+ if (a->ancestors[level] == b->ancestors[level])
+ return a->ancestors[level];
+ return NULL;
+}
+
/**
* task_under_cgroup_hierarchy - test task's membership of cgroup ancestry
* @task: the task to be tested
diff --git a/include/linux/cgroup_dmem.h b/include/linux/cgroup_dmem.h
index 1a88cd0c9eb00..444b84f4c253a 100644
--- a/include/linux/cgroup_dmem.h
+++ b/include/linux/cgroup_dmem.h
@@ -28,6 +28,8 @@ bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state *root,
struct dmem_cgroup_pool_state *test);
bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
struct dmem_cgroup_pool_state *test);
+struct dmem_cgroup_pool_state *dmem_cgroup_common_ancestor(struct dmem_cgroup_pool_state *a,
+ struct dmem_cgroup_pool_state *b);
void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool);
#else
@@ -75,6 +77,13 @@ static inline bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
return false;
}
+static inline
+struct dmem_cgroup_pool_state *dmem_cgroup_common_ancestor(struct dmem_cgroup_pool_state *a,
+ struct dmem_cgroup_pool_state *b)
+{
+ return NULL;
+}
+
static inline void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool)
{ }
diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
index 28227405f7cfe..a3ba865f4c68f 100644
--- a/kernel/cgroup/dmem.c
+++ b/kernel/cgroup/dmem.c
@@ -569,11 +569,10 @@ void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool)
EXPORT_SYMBOL_GPL(dmem_cgroup_pool_state_put);
static struct dmem_cgroup_pool_state *
-get_cg_pool_unlocked(struct dmemcg_state *cg, struct dmem_cgroup_region *region)
+find_cg_pool_unlocked(struct dmemcg_state *cg, struct dmem_cgroup_region *region)
{
- struct dmem_cgroup_pool_state *pool, *allocpool = NULL;
+ struct dmem_cgroup_pool_state *pool;
- /* fastpath lookup? */
rcu_read_lock();
pool = find_cg_pool_locked(cg, region);
if (pool && !READ_ONCE(pool->inited))
@@ -582,6 +581,17 @@ get_cg_pool_unlocked(struct dmemcg_state *cg, struct dmem_cgroup_region *region)
pool = NULL;
rcu_read_unlock();
+ return pool;
+}
+
+static struct dmem_cgroup_pool_state *
+get_cg_pool_unlocked(struct dmemcg_state *cg, struct dmem_cgroup_region *region)
+{
+ struct dmem_cgroup_pool_state *pool, *allocpool = NULL;
+
+ /* fastpath lookup? */
+ pool = find_cg_pool_unlocked(cg, region);
+
while (!pool) {
spin_lock(&dmemcg_lock);
if (!region->unregistered)
@@ -756,6 +766,33 @@ bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
}
EXPORT_SYMBOL_GPL(dmem_cgroup_below_low);
+/**
+ * dmem_cgroup_common_ancestor(): Find the first common ancestor of two pools.
+ * @a: First pool to find the common ancestor of.
+ * @b: First pool to find the common ancestor of.
+ *
+ * Return: The first pool that is a parent of both @a and @b, or NULL if either @a or @b are NULL,
+ * or if such a pool does not exist.
+ */
+struct dmem_cgroup_pool_state *dmem_cgroup_common_ancestor(struct dmem_cgroup_pool_state *a,
+ struct dmem_cgroup_pool_state *b)
+{
+ struct cgroup *ancestor_cgroup;
+ struct cgroup_subsys_state *ancestor_css;
+
+ if (!a || !b)
+ return NULL;
+
+ ancestor_cgroup = cgroup_common_ancestor(a->cs->css.cgroup, b->cs->css.cgroup);
+ if (!ancestor_cgroup)
+ return NULL;
+
+ ancestor_css = cgroup_e_css(ancestor_cgroup, &dmem_cgrp_subsys);
+
+ return find_cg_pool_unlocked(css_to_dmemcs(ancestor_css), a->region);
+}
+EXPORT_SYMBOL_GPL(dmem_cgroup_common_ancestor);
+
static int dmem_cgroup_region_capacity_show(struct seq_file *sf, void *v)
{
struct dmem_cgroup_region *region;
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v5 3/6] drm/ttm: Extract code for attempting allocation in a place
2026-03-02 12:37 [PATCH v5 0/6] cgroup/dmem,drm/ttm: Improve protection in contended cases Natalie Vock
2026-03-02 12:37 ` [PATCH v5 1/6] cgroup/dmem: Add queries for protection values Natalie Vock
2026-03-02 12:37 ` [PATCH v5 2/6] cgroup,cgroup/dmem: Add (dmem_)cgroup_common_ancestor helper Natalie Vock
@ 2026-03-02 12:37 ` Natalie Vock
2026-03-02 15:08 ` Tvrtko Ursulin
2026-03-02 12:37 ` [PATCH v5 4/6] drm/ttm: Split cgroup charge and resource allocation Natalie Vock
` (2 subsequent siblings)
5 siblings, 1 reply; 18+ messages in thread
From: Natalie Vock @ 2026-03-02 12:37 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Tejun Heo, Johannes Weiner,
Michal Koutný, Christian Koenig, Huang Rui, Matthew Auld,
Matthew Brost, Maarten Lankhorst, Thomas Zimmermann, David Airlie,
Simona Vetter, Tvrtko Ursulin
Cc: cgroups, dri-devel, Natalie Vock, Tvrtko Ursulin
Move all code for attempting allocation for a specific place to
ttm_bo_alloc_place. With subsequent patches, this logic is going to get
more complicated, so it helps readability to have this separate.
ttm_bo_alloc_at_place takes a pointer to a struct ttm_bo_alloc_state.
This struct holds various state produced by the allocation (e.g. cgroup
resource associated with the allocation) that the caller needs to keep
track of (and potentially dispose of). This is just the limiting cgroup
pool for now, but future patches will add more state needing to be tracked.
ttm_bo_alloc_at_place also communicates via return codes if eviction
using ttm_bo_evict_alloc should be attempted. This is preparation for
attempting eviction in more cases than just force_space being set.
No functional change intended.
Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
---
drivers/gpu/drm/ttm/ttm_bo.c | 104 +++++++++++++++++++++++++++++++++----------
1 file changed, 81 insertions(+), 23 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index acb9197db8798..3e62cab51f870 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -489,6 +489,11 @@ int ttm_bo_evict_first(struct ttm_device *bdev, struct ttm_resource_manager *man
return ret;
}
+struct ttm_bo_alloc_state {
+ /** @limit_pool: Which pool limit we should test against */
+ struct dmem_cgroup_pool_state *limit_pool;
+};
+
/**
* struct ttm_bo_evict_walk - Parameters for the evict walk.
*/
@@ -504,12 +509,13 @@ struct ttm_bo_evict_walk {
/** @evicted: Number of successful evictions. */
unsigned long evicted;
- /** @limit_pool: Which pool limit we should test against */
- struct dmem_cgroup_pool_state *limit_pool;
/** @try_low: Whether we should attempt to evict BO's with low watermark threshold */
bool try_low;
/** @hit_low: If we cannot evict a bo when @try_low is false (first pass) */
bool hit_low;
+
+ /** @alloc_state: State associated with the allocation attempt. */
+ struct ttm_bo_alloc_state *alloc_state;
};
static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct ttm_buffer_object *bo)
@@ -518,8 +524,9 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct ttm_buffer_object *
container_of(walk, typeof(*evict_walk), walk);
s64 lret;
- if (!dmem_cgroup_state_evict_valuable(evict_walk->limit_pool, bo->resource->css,
- evict_walk->try_low, &evict_walk->hit_low))
+ if (!dmem_cgroup_state_evict_valuable(evict_walk->alloc_state->limit_pool,
+ bo->resource->css, evict_walk->try_low,
+ &evict_walk->hit_low))
return 0;
if (bo->pin_count || !bo->bdev->funcs->eviction_valuable(bo, evict_walk->place))
@@ -561,7 +568,7 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
struct ttm_operation_ctx *ctx,
struct ww_acquire_ctx *ticket,
struct ttm_resource **res,
- struct dmem_cgroup_pool_state *limit_pool)
+ struct ttm_bo_alloc_state *state)
{
struct ttm_bo_evict_walk evict_walk = {
.walk = {
@@ -574,7 +581,7 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
.place = place,
.evictor = evictor,
.res = res,
- .limit_pool = limit_pool,
+ .alloc_state = state,
};
s64 lret;
@@ -689,6 +696,58 @@ static int ttm_bo_add_pipelined_eviction_fences(struct ttm_buffer_object *bo,
return dma_resv_reserve_fences(bo->base.resv, 1);
}
+
+/**
+ * ttm_bo_alloc_at_place - Attempt allocating a BO's backing store in a place
+ *
+ * @bo: The buffer to allocate the backing store of
+ * @place: The place to attempt allocation in
+ * @ctx: ttm_operation_ctx associated with this allocation
+ * @force_space: If we should evict buffers to force space
+ * @res: On allocation success, the resulting struct ttm_resource.
+ * @alloc_state: Object holding allocation state such as charged cgroups.
+ *
+ * Returns:
+ * -EBUSY: No space available, but allocation should be retried with ttm_bo_evict_alloc.
+ * -ENOSPC: No space available, allocation should not be retried.
+ * -ERESTARTSYS: An interruptible sleep was interrupted by a signal.
+ *
+ */
+static int ttm_bo_alloc_at_place(struct ttm_buffer_object *bo,
+ const struct ttm_place *place,
+ struct ttm_operation_ctx *ctx,
+ bool force_space,
+ struct ttm_resource **res,
+ struct ttm_bo_alloc_state *alloc_state)
+{
+ bool may_evict;
+ int ret;
+
+ may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
+
+ ret = ttm_resource_alloc(bo, place, res,
+ force_space ? &alloc_state->limit_pool : NULL);
+
+ if (ret) {
+ /*
+ * -EAGAIN means the charge failed, which we treat like an
+ * allocation failure. Therefore, return an error code indicating
+ * the allocation failed - either -EBUSY if the allocation should
+ * be retried with eviction, or -ENOSPC if there should be no second
+ * attempt.
+ */
+ if (ret == -EAGAIN)
+ return may_evict ? -EBUSY : -ENOSPC;
+
+ if (ret == -ENOSPC && may_evict)
+ return -EBUSY;
+
+ return ret;
+ }
+
+ return 0;
+}
+
/**
* ttm_bo_alloc_resource - Allocate backing store for a BO
*
@@ -725,9 +784,8 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
for (i = 0; i < placement->num_placement; ++i) {
const struct ttm_place *place = &placement->placement[i];
- struct dmem_cgroup_pool_state *limit_pool = NULL;
+ struct ttm_bo_alloc_state alloc_state = {};
struct ttm_resource_manager *man;
- bool may_evict;
man = ttm_manager_type(bdev, place->mem_type);
if (!man || !ttm_resource_manager_used(man))
@@ -737,25 +795,25 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
TTM_PL_FLAG_FALLBACK))
continue;
- may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
- ret = ttm_resource_alloc(bo, place, res, force_space ? &limit_pool : NULL);
- if (ret) {
- if (ret != -ENOSPC && ret != -EAGAIN) {
- dmem_cgroup_pool_state_put(limit_pool);
- return ret;
- }
- if (!may_evict) {
- dmem_cgroup_pool_state_put(limit_pool);
- continue;
- }
+ ret = ttm_bo_alloc_at_place(bo, place, ctx, force_space,
+ res, &alloc_state);
+ if (ret == -ENOSPC) {
+ dmem_cgroup_pool_state_put(alloc_state.limit_pool);
+ continue;
+ } else if (ret == -EBUSY) {
ret = ttm_bo_evict_alloc(bdev, man, place, bo, ctx,
- ticket, res, limit_pool);
- dmem_cgroup_pool_state_put(limit_pool);
+ ticket, res, &alloc_state);
+
+ dmem_cgroup_pool_state_put(alloc_state.limit_pool);
+
if (ret == -EBUSY)
continue;
- if (ret)
- return ret;
+ else if (ret)
+ return;
+ } else if (ret) {
+ dmem_cgroup_pool_state_put(alloc_state.limit_pool);
+ return ret;
}
ret = ttm_bo_add_pipelined_eviction_fences(bo, man, ctx->no_wait_gpu);
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v5 4/6] drm/ttm: Split cgroup charge and resource allocation
2026-03-02 12:37 [PATCH v5 0/6] cgroup/dmem,drm/ttm: Improve protection in contended cases Natalie Vock
` (2 preceding siblings ...)
2026-03-02 12:37 ` [PATCH v5 3/6] drm/ttm: Extract code for attempting allocation in a place Natalie Vock
@ 2026-03-02 12:37 ` Natalie Vock
2026-03-02 15:25 ` Tvrtko Ursulin
2026-03-02 12:37 ` [PATCH v5 5/6] drm/ttm: Be more aggressive when allocating below protection limit Natalie Vock
2026-03-02 12:37 ` [PATCH v5 6/6] drm/ttm: Use common ancestor of evictor and evictee as limit pool Natalie Vock
5 siblings, 1 reply; 18+ messages in thread
From: Natalie Vock @ 2026-03-02 12:37 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Tejun Heo, Johannes Weiner,
Michal Koutný, Christian Koenig, Huang Rui, Matthew Auld,
Matthew Brost, Maarten Lankhorst, Thomas Zimmermann, David Airlie,
Simona Vetter, Tvrtko Ursulin
Cc: cgroups, dri-devel, Natalie Vock
Coupling resource allocation and cgroup charging is racy when charging
succeeds, but subsequent resource allocation fails. Certain eviction
decisions are made on the basis of whether the allocating cgroup is
protected, i.e. within its min/low limits, but with the charge being
tied to resource allocation (and uncharged when the resource allocation
fails), this check is done at a point where the allocation is not actually
charged to the cgroup.
This is subtly wrong if the allocation were to cause the cgroup to exceed
the min/low protection, but it's even more wrong if the same cgroup tries
allocating multiple buffers concurrently: In this case, the min/low
protection may pass for all allocation attempts when the real min/low
protection covers only some, or potentially none of the allocated
buffers.
Instead, charge the allocation to the cgroup once and keep the charge
for as long as we try to allocate a ttm_resource, and only undo the charge
if allocating the resource is ultimately unsuccessful and we move on to
a different ttm_place.
Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
---
drivers/gpu/drm/ttm/ttm_bo.c | 45 +++++++++++++++++++++++++----------
drivers/gpu/drm/ttm/ttm_resource.c | 48 +++++++++++++++++++++++++++-----------
include/drm/ttm/ttm_resource.h | 6 ++++-
3 files changed, 73 insertions(+), 26 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 3e62cab51f870..53c4de4bcc1e3 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -490,6 +490,8 @@ int ttm_bo_evict_first(struct ttm_device *bdev, struct ttm_resource_manager *man
}
struct ttm_bo_alloc_state {
+ /** @charge_pool: The memory pool the resource is charged to */
+ struct dmem_cgroup_pool_state *charge_pool;
/** @limit_pool: Which pool limit we should test against */
struct dmem_cgroup_pool_state *limit_pool;
};
@@ -544,9 +546,17 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct ttm_buffer_object *
goto out;
evict_walk->evicted++;
+ if (!evict_walk->alloc_state->charge_pool) {
+ lret = ttm_resource_try_charge(bo, evict_walk->place,
+ &evict_walk->alloc_state->charge_pool, NULL);
+ if (lret == -EAGAIN)
+ return -EBUSY;
+ else if (lret)
+ return lret;
+ }
if (evict_walk->res)
lret = ttm_resource_alloc(evict_walk->evictor, evict_walk->place,
- evict_walk->res, NULL);
+ evict_walk->res, evict_walk->alloc_state->charge_pool);
if (lret == 0)
return 1;
out:
@@ -724,10 +734,8 @@ static int ttm_bo_alloc_at_place(struct ttm_buffer_object *bo,
int ret;
may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
-
- ret = ttm_resource_alloc(bo, place, res,
- force_space ? &alloc_state->limit_pool : NULL);
-
+ ret = ttm_resource_try_charge(bo, place, &alloc_state->charge_pool,
+ force_space ? &alloc_state->limit_pool : NULL);
if (ret) {
/*
* -EAGAIN means the charge failed, which we treat like an
@@ -737,14 +745,22 @@ static int ttm_bo_alloc_at_place(struct ttm_buffer_object *bo,
* attempt.
*/
if (ret == -EAGAIN)
- return may_evict ? -EBUSY : -ENOSPC;
+ ret = may_evict ? -EBUSY : -ENOSPC;
+ return ret;
+ }
+ ret = ttm_resource_alloc(bo, place, res, alloc_state->charge_pool);
+ if (ret) {
if (ret == -ENOSPC && may_evict)
- return -EBUSY;
-
+ ret = -EBUSY;
return ret;
}
+ /*
+ * Ownership of charge_pool has been transferred to the TTM resource,
+ * don't make the caller think we still hold a reference to it.
+ */
+ alloc_state->charge_pool = NULL;
return 0;
}
@@ -799,6 +815,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
res, &alloc_state);
if (ret == -ENOSPC) {
+ dmem_cgroup_uncharge(alloc_state.charge_pool, bo->base.size);
dmem_cgroup_pool_state_put(alloc_state.limit_pool);
continue;
} else if (ret == -EBUSY) {
@@ -807,11 +824,15 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
dmem_cgroup_pool_state_put(alloc_state.limit_pool);
- if (ret == -EBUSY)
- continue;
- else if (ret)
- return;
+ if (ret) {
+ dmem_cgroup_uncharge(alloc_state.charge_pool,
+ bo->base.size);
+ if (ret == -EBUSY)
+ continue;
+ return ret;
+ }
} else if (ret) {
+ dmem_cgroup_uncharge(alloc_state.charge_pool, bo->base.size);
dmem_cgroup_pool_state_put(alloc_state.limit_pool);
return ret;
}
diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
index 192fca24f37e4..a8a836f6e376a 100644
--- a/drivers/gpu/drm/ttm/ttm_resource.c
+++ b/drivers/gpu/drm/ttm/ttm_resource.c
@@ -373,30 +373,52 @@ void ttm_resource_fini(struct ttm_resource_manager *man,
}
EXPORT_SYMBOL(ttm_resource_fini);
+/**
+ * ttm_resource_try_charge - charge a resource manager's cgroup pool
+ * @bo: buffer for which an allocation should be charged
+ * @place: where the allocation is attempted to be placed
+ * @ret_pool: on charge success, the pool that was charged
+ * @ret_limit_pool: on charge failure, the pool responsible for the failure
+ *
+ * Should be used to charge cgroups before attempting resource allocation.
+ * When charging succeeds, the value of ret_pool should be passed to
+ * ttm_resource_alloc.
+ *
+ * Returns: 0 on charge success, negative errno on failure.
+ */
+int ttm_resource_try_charge(struct ttm_buffer_object *bo,
+ const struct ttm_place *place,
+ struct dmem_cgroup_pool_state **ret_pool,
+ struct dmem_cgroup_pool_state **ret_limit_pool)
+{
+ struct ttm_resource_manager *man =
+ ttm_manager_type(bo->bdev, place->mem_type);
+
+ if (!man->cg) {
+ *ret_pool = NULL;
+ if (ret_limit_pool)
+ *ret_limit_pool = NULL;
+ return 0;
+ }
+
+ return dmem_cgroup_try_charge(man->cg, bo->base.size, ret_pool,
+ ret_limit_pool);
+}
+
int ttm_resource_alloc(struct ttm_buffer_object *bo,
const struct ttm_place *place,
struct ttm_resource **res_ptr,
- struct dmem_cgroup_pool_state **ret_limit_pool)
+ struct dmem_cgroup_pool_state *charge_pool)
{
struct ttm_resource_manager *man =
ttm_manager_type(bo->bdev, place->mem_type);
- struct dmem_cgroup_pool_state *pool = NULL;
int ret;
- if (man->cg) {
- ret = dmem_cgroup_try_charge(man->cg, bo->base.size, &pool, ret_limit_pool);
- if (ret)
- return ret;
- }
-
ret = man->func->alloc(man, bo, place, res_ptr);
- if (ret) {
- if (pool)
- dmem_cgroup_uncharge(pool, bo->base.size);
+ if (ret)
return ret;
- }
- (*res_ptr)->css = pool;
+ (*res_ptr)->css = charge_pool;
spin_lock(&bo->bdev->lru_lock);
ttm_resource_add_bulk_move(*res_ptr, bo);
diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h
index 33e80f30b8b82..549b5b796884d 100644
--- a/include/drm/ttm/ttm_resource.h
+++ b/include/drm/ttm/ttm_resource.h
@@ -456,10 +456,14 @@ void ttm_resource_init(struct ttm_buffer_object *bo,
void ttm_resource_fini(struct ttm_resource_manager *man,
struct ttm_resource *res);
+int ttm_resource_try_charge(struct ttm_buffer_object *bo,
+ const struct ttm_place *place,
+ struct dmem_cgroup_pool_state **ret_pool,
+ struct dmem_cgroup_pool_state **ret_limit_pool);
int ttm_resource_alloc(struct ttm_buffer_object *bo,
const struct ttm_place *place,
struct ttm_resource **res,
- struct dmem_cgroup_pool_state **ret_limit_pool);
+ struct dmem_cgroup_pool_state *charge_pool);
void ttm_resource_free(struct ttm_buffer_object *bo, struct ttm_resource **res);
bool ttm_resource_intersects(struct ttm_device *bdev,
struct ttm_resource *res,
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v5 5/6] drm/ttm: Be more aggressive when allocating below protection limit
2026-03-02 12:37 [PATCH v5 0/6] cgroup/dmem,drm/ttm: Improve protection in contended cases Natalie Vock
` (3 preceding siblings ...)
2026-03-02 12:37 ` [PATCH v5 4/6] drm/ttm: Split cgroup charge and resource allocation Natalie Vock
@ 2026-03-02 12:37 ` Natalie Vock
2026-03-02 17:02 ` Tvrtko Ursulin
2026-03-02 12:37 ` [PATCH v5 6/6] drm/ttm: Use common ancestor of evictor and evictee as limit pool Natalie Vock
5 siblings, 1 reply; 18+ messages in thread
From: Natalie Vock @ 2026-03-02 12:37 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Tejun Heo, Johannes Weiner,
Michal Koutný, Christian Koenig, Huang Rui, Matthew Auld,
Matthew Brost, Maarten Lankhorst, Thomas Zimmermann, David Airlie,
Simona Vetter, Tvrtko Ursulin
Cc: cgroups, dri-devel, Natalie Vock
When the cgroup's memory usage is below the low/min limit and allocation
fails, try evicting some unprotected buffers to make space. Otherwise,
application buffers may be forced to go into GTT even though usage is
below the corresponding low/min limit, if other applications filled VRAM
with their allocations first.
Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
---
drivers/gpu/drm/ttm/ttm_bo.c | 52 +++++++++++++++++++++++++++++++++++++++-----
1 file changed, 47 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 53c4de4bcc1e3..86f99237f6490 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -494,6 +494,10 @@ struct ttm_bo_alloc_state {
struct dmem_cgroup_pool_state *charge_pool;
/** @limit_pool: Which pool limit we should test against */
struct dmem_cgroup_pool_state *limit_pool;
+ /** @only_evict_unprotected: If only unprotected BOs, i.e. BOs whose cgroup
+ * is exceeding its dmem low/min protection, should be considered for eviction
+ */
+ bool only_evict_unprotected;
};
/**
@@ -598,8 +602,12 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
evict_walk.walk.arg.trylock_only = true;
lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
- /* One more attempt if we hit low limit? */
- if (!lret && evict_walk.hit_low) {
+ /* If we failed to find enough BOs to evict, but we skipped over
+ * some BOs because they were covered by dmem low protection, retry
+ * evicting these protected BOs too, except if we're told not to
+ * consider protected BOs at all.
+ */
+ if (!lret && evict_walk.hit_low && !state->only_evict_unprotected) {
evict_walk.try_low = true;
lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
}
@@ -620,7 +628,8 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
} while (!lret && evict_walk.evicted);
/* We hit the low limit? Try once more */
- if (!lret && evict_walk.hit_low && !evict_walk.try_low) {
+ if (!lret && evict_walk.hit_low && !evict_walk.try_low &&
+ !state->only_evict_unprotected) {
evict_walk.try_low = true;
goto retry;
}
@@ -730,7 +739,7 @@ static int ttm_bo_alloc_at_place(struct ttm_buffer_object *bo,
struct ttm_resource **res,
struct ttm_bo_alloc_state *alloc_state)
{
- bool may_evict;
+ bool may_evict, below_low;
int ret;
may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
@@ -749,9 +758,42 @@ static int ttm_bo_alloc_at_place(struct ttm_buffer_object *bo,
return ret;
}
+ /*
+ * cgroup protection plays a special role in eviction.
+ * Conceptually, protection of memory via the dmem cgroup controller
+ * entitles the protected cgroup to use a certain amount of memory.
+ * There are two types of protection - the 'low' limit is a
+ * "best-effort" protection, whereas the 'min' limit provides a hard
+ * guarantee that memory within the cgroup's allowance will not be
+ * evicted under any circumstance.
+ *
+ * To faithfully model this concept in TTM, we also need to take cgroup
+ * protection into account when allocating. When allocation in one
+ * place fails, TTM will default to trying other places first before
+ * evicting.
+ * If the allocation is covered by dmem cgroup protection, however,
+ * this prevents the allocation from using the memory it is "entitled"
+ * to. To make sure unprotected allocations cannot push new protected
+ * allocations out of places they are "entitled" to use, we should
+ * evict buffers not covered by any cgroup protection, if this
+ * allocation is covered by cgroup protection.
+ *
+ * Buffers covered by 'min' protection are a special case - the 'min'
+ * limit is a stronger guarantee than 'low', and thus buffers protected
+ * by 'low' but not 'min' should also be considered for eviction.
+ * Buffers protected by 'min' will never be considered for eviction
+ * anyway, so the regular eviction path should be triggered here.
+ * Buffers protected by 'low' but not 'min' will take a special
+ * eviction path that only evicts buffers covered by neither 'low' or
+ * 'min' protections.
+ */
+ may_evict |= dmem_cgroup_below_min(NULL, alloc_state->charge_pool);
+ below_low = dmem_cgroup_below_low(NULL, alloc_state->charge_pool);
+ alloc_state->only_evict_unprotected = !may_evict && below_low;
+
ret = ttm_resource_alloc(bo, place, res, alloc_state->charge_pool);
if (ret) {
- if (ret == -ENOSPC && may_evict)
+ if (ret == -ENOSPC && (may_evict || below_low))
ret = -EBUSY;
return ret;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v5 6/6] drm/ttm: Use common ancestor of evictor and evictee as limit pool
2026-03-02 12:37 [PATCH v5 0/6] cgroup/dmem,drm/ttm: Improve protection in contended cases Natalie Vock
` (4 preceding siblings ...)
2026-03-02 12:37 ` [PATCH v5 5/6] drm/ttm: Be more aggressive when allocating below protection limit Natalie Vock
@ 2026-03-02 12:37 ` Natalie Vock
5 siblings, 0 replies; 18+ messages in thread
From: Natalie Vock @ 2026-03-02 12:37 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Tejun Heo, Johannes Weiner,
Michal Koutný, Christian Koenig, Huang Rui, Matthew Auld,
Matthew Brost, Maarten Lankhorst, Thomas Zimmermann, David Airlie,
Simona Vetter, Tvrtko Ursulin
Cc: cgroups, dri-devel, Natalie Vock
When checking whether to skip certain buffers because they're protected
by dmem.low, we're checking the effective protection of the evictee's
cgroup, but depending on how the evictor's cgroup relates to the
evictee's, the semantics of effective protection values change.
When testing against cgroups from different subtrees, page_counter's
recursive protection propagates memory protection afforded to a parent
down to the child cgroups, even if the children were not explicitly
protected. This prevents cgroups whose parents were afforded no
protection from stealing memory from cgroups whose parents were afforded
more protection, without users having to explicitly propagate this
protection.
However, if we always calculate protection from the root cgroup, this
breaks prioritization of sibling cgroups: If one cgroup was explicitly
protected and its siblings were not, the protected cgroup should get
higher priority, i.e. the protected cgroup should be able to steal from
unprotected siblings. This only works if we restrict the protection
calculation to the subtree shared by evictor and evictee.
Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
---
drivers/gpu/drm/ttm/ttm_bo.c | 43 ++++++++++++++++++++++++++++++++++++++++---
1 file changed, 40 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 86f99237f6490..53b53a2791725 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -528,11 +528,48 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct ttm_buffer_object *
{
struct ttm_bo_evict_walk *evict_walk =
container_of(walk, typeof(*evict_walk), walk);
+ struct dmem_cgroup_pool_state *limit_pool, *ancestor = NULL;
+ bool evict_valuable;
s64 lret;
- if (!dmem_cgroup_state_evict_valuable(evict_walk->alloc_state->limit_pool,
- bo->resource->css, evict_walk->try_low,
- &evict_walk->hit_low))
+ /*
+ * If only_evict_unprotected is set, then we're trying to evict unprotected
+ * buffers in favor of a protected allocation for charge_pool. Explicitly skip
+ * buffers belonging to the same cgroup here - that cgroup is definitely protected,
+ * even though dmem_cgroup_state_evict_valuable would allow the eviction because a
+ * cgroup is always allowed to evict from itself even if it is protected.
+ */
+ if (evict_walk->alloc_state->only_evict_unprotected &&
+ bo->resource->css == evict_walk->alloc_state->charge_pool)
+ return 0;
+
+ limit_pool = evict_walk->alloc_state->limit_pool;
+ /*
+ * If there is no explicit limit pool, find the root of the shared subtree between
+ * evictor and evictee. This is important so that recursive protection rules can
+ * apply properly: Recursive protection distributes cgroup protection afforded
+ * to a parent cgroup but not used explicitly by a child cgroup between all child
+ * cgroups (see docs of effective_protection in mm/page_counter.c). However, when
+ * direct siblings compete for memory, siblings that were explicitly protected
+ * should get prioritized over siblings that weren't. This only happens correctly
+ * when the root of the shared subtree is passed to
+ * dmem_cgroup_state_evict_valuable. Otherwise, the effective-protection
+ * calculation cannot distinguish direct siblings from unrelated subtrees and the
+ * calculated protection ends up wrong.
+ */
+ if (!limit_pool) {
+ ancestor = dmem_cgroup_common_ancestor(bo->resource->css,
+ evict_walk->alloc_state->charge_pool);
+ limit_pool = ancestor;
+ }
+
+ evict_valuable = dmem_cgroup_state_evict_valuable(limit_pool, bo->resource->css,
+ evict_walk->try_low,
+ &evict_walk->hit_low);
+ if (ancestor)
+ dmem_cgroup_pool_state_put(ancestor);
+
+ if (!evict_valuable)
return 0;
if (bo->pin_count || !bo->bdev->funcs->eviction_valuable(bo, evict_walk->place))
--
2.53.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v5 2/6] cgroup,cgroup/dmem: Add (dmem_)cgroup_common_ancestor helper
2026-03-02 12:37 ` [PATCH v5 2/6] cgroup,cgroup/dmem: Add (dmem_)cgroup_common_ancestor helper Natalie Vock
@ 2026-03-02 14:38 ` Maarten Lankhorst
2026-03-05 20:02 ` Natalie Vock
0 siblings, 1 reply; 18+ messages in thread
From: Maarten Lankhorst @ 2026-03-02 14:38 UTC (permalink / raw)
To: Natalie Vock, Maarten Lankhorst, Maxime Ripard, Tejun Heo,
Johannes Weiner, Michal Koutný, Christian Koenig, Huang Rui,
Matthew Auld, Matthew Brost, Thomas Zimmermann, David Airlie,
Simona Vetter, Tvrtko Ursulin
Cc: cgroups, dri-devel
Hey,
This should probably have a Co-developed-by: Tejun Heo <tj@kernel.org>
I need to take a closer look at patch 4 and 6, to add my r-b over the rest.
Den 2026-03-02 kl. 13:37, skrev Natalie Vock:
> This helps to find a common subtree of two resources, which is important
> when determining whether it's helpful to evict one resource in favor of
> another.
>
> To facilitate this, add a common helper to find the ancestor of two
> cgroups using each cgroup's ancestor array.
>
> Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
> ---
> include/linux/cgroup.h | 21 +++++++++++++++++++++
> include/linux/cgroup_dmem.h | 9 +++++++++
> kernel/cgroup/dmem.c | 43 ++++++++++++++++++++++++++++++++++++++++---
> 3 files changed, 70 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index bc892e3b37eea..560ae995e3a54 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -561,6 +561,27 @@ static inline struct cgroup *cgroup_ancestor(struct cgroup *cgrp,
> return cgrp->ancestors[ancestor_level];
> }
>
> +/**
> + * cgroup_common_ancestor - find common ancestor of two cgroups
> + * @a: first cgroup to find common ancestor of
> + * @b: second cgroup to find common ancestor of
> + *
> + * Find the first cgroup that is an ancestor of both @a and @b, if it exists
> + * and return a pointer to it. If such a cgroup doesn't exist, return NULL.
> + *
> + * This function is safe to call as long as both @a and @b are accessible.
> + */
> +static inline struct cgroup *cgroup_common_ancestor(struct cgroup *a,
> + struct cgroup *b)
> +{
> + int level;
> +
> + for (level = min(a->level, b->level); level >= 0; level--)
> + if (a->ancestors[level] == b->ancestors[level])
> + return a->ancestors[level];
> + return NULL;
> +}
> +
> /**
> * task_under_cgroup_hierarchy - test task's membership of cgroup ancestry
> * @task: the task to be tested
> diff --git a/include/linux/cgroup_dmem.h b/include/linux/cgroup_dmem.h
> index 1a88cd0c9eb00..444b84f4c253a 100644
> --- a/include/linux/cgroup_dmem.h
> +++ b/include/linux/cgroup_dmem.h
> @@ -28,6 +28,8 @@ bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state *root,
> struct dmem_cgroup_pool_state *test);
> bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
> struct dmem_cgroup_pool_state *test);
> +struct dmem_cgroup_pool_state *dmem_cgroup_common_ancestor(struct dmem_cgroup_pool_state *a,
> + struct dmem_cgroup_pool_state *b);
>
> void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool);
> #else
> @@ -75,6 +77,13 @@ static inline bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
> return false;
> }
>
> +static inline
> +struct dmem_cgroup_pool_state *dmem_cgroup_common_ancestor(struct dmem_cgroup_pool_state *a,
> + struct dmem_cgroup_pool_state *b)
> +{
> + return NULL;
> +}
> +
> static inline void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool)
> { }
>
> diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
> index 28227405f7cfe..a3ba865f4c68f 100644
> --- a/kernel/cgroup/dmem.c
> +++ b/kernel/cgroup/dmem.c
> @@ -569,11 +569,10 @@ void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool)
> EXPORT_SYMBOL_GPL(dmem_cgroup_pool_state_put);
>
> static struct dmem_cgroup_pool_state *
> -get_cg_pool_unlocked(struct dmemcg_state *cg, struct dmem_cgroup_region *region)
> +find_cg_pool_unlocked(struct dmemcg_state *cg, struct dmem_cgroup_region *region)
> {
> - struct dmem_cgroup_pool_state *pool, *allocpool = NULL;
> + struct dmem_cgroup_pool_state *pool;
>
> - /* fastpath lookup? */
> rcu_read_lock();
> pool = find_cg_pool_locked(cg, region);
> if (pool && !READ_ONCE(pool->inited))
> @@ -582,6 +581,17 @@ get_cg_pool_unlocked(struct dmemcg_state *cg, struct dmem_cgroup_region *region)
> pool = NULL;
> rcu_read_unlock();
>
> + return pool;
> +}
> +
> +static struct dmem_cgroup_pool_state *
> +get_cg_pool_unlocked(struct dmemcg_state *cg, struct dmem_cgroup_region *region)
> +{
> + struct dmem_cgroup_pool_state *pool, *allocpool = NULL;
> +
> + /* fastpath lookup? */
> + pool = find_cg_pool_unlocked(cg, region);
> +
> while (!pool) {
> spin_lock(&dmemcg_lock);
> if (!region->unregistered)
> @@ -756,6 +766,33 @@ bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
> }
> EXPORT_SYMBOL_GPL(dmem_cgroup_below_low);
>
> +/**
> + * dmem_cgroup_common_ancestor(): Find the first common ancestor of two pools.
> + * @a: First pool to find the common ancestor of.
> + * @b: First pool to find the common ancestor of.
> + *
> + * Return: The first pool that is a parent of both @a and @b, or NULL if either @a or @b are NULL,
> + * or if such a pool does not exist.
> + */
> +struct dmem_cgroup_pool_state *dmem_cgroup_common_ancestor(struct dmem_cgroup_pool_state *a,
> + struct dmem_cgroup_pool_state *b)
> +{
> + struct cgroup *ancestor_cgroup;
> + struct cgroup_subsys_state *ancestor_css;
> +
> + if (!a || !b)
> + return NULL;
> +
> + ancestor_cgroup = cgroup_common_ancestor(a->cs->css.cgroup, b->cs->css.cgroup);
> + if (!ancestor_cgroup)
> + return NULL;
> +
> + ancestor_css = cgroup_e_css(ancestor_cgroup, &dmem_cgrp_subsys);
> +
> + return find_cg_pool_unlocked(css_to_dmemcs(ancestor_css), a->region);
> +}
> +EXPORT_SYMBOL_GPL(dmem_cgroup_common_ancestor);
From the naming, I would not expect a reference to be taken to the common ancestor, especially because the reference through a and b would both be able keep the ancestor alive. Otherwise it would not be an ancestor. Rename to dmem_cgroup_get_common_ancestor perhaps? Same for the find_, perhaps rename to lookup_ or use the unmodified get_cg_pool_unlocked version, because the common ancestor's pool_state definitely exists if either a or b do.
Kind regards,
~Maarten Lankhorst
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5 3/6] drm/ttm: Extract code for attempting allocation in a place
2026-03-02 12:37 ` [PATCH v5 3/6] drm/ttm: Extract code for attempting allocation in a place Natalie Vock
@ 2026-03-02 15:08 ` Tvrtko Ursulin
0 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2026-03-02 15:08 UTC (permalink / raw)
To: Natalie Vock, Maarten Lankhorst, Maxime Ripard, Tejun Heo,
Johannes Weiner, Michal Koutný, Christian Koenig, Huang Rui,
Matthew Auld, Matthew Brost, Maarten Lankhorst, Thomas Zimmermann,
David Airlie, Simona Vetter, Tvrtko Ursulin
Cc: cgroups, dri-devel
On 02/03/2026 12:37, Natalie Vock wrote:
> Move all code for attempting allocation for a specific place to
> ttm_bo_alloc_place. With subsequent patches, this logic is going to get
> more complicated, so it helps readability to have this separate.
>
> ttm_bo_alloc_at_place takes a pointer to a struct ttm_bo_alloc_state.
> This struct holds various state produced by the allocation (e.g. cgroup
> resource associated with the allocation) that the caller needs to keep
> track of (and potentially dispose of). This is just the limiting cgroup
> pool for now, but future patches will add more state needing to be tracked.
>
> ttm_bo_alloc_at_place also communicates via return codes if eviction
> using ttm_bo_evict_alloc should be attempted. This is preparation for
> attempting eviction in more cases than just force_space being set.
>
> No functional change intended.
>
> Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> ---
> drivers/gpu/drm/ttm/ttm_bo.c | 104 +++++++++++++++++++++++++++++++++----------
> 1 file changed, 81 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index acb9197db8798..3e62cab51f870 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -489,6 +489,11 @@ int ttm_bo_evict_first(struct ttm_device *bdev, struct ttm_resource_manager *man
> return ret;
> }
>
> +struct ttm_bo_alloc_state {
> + /** @limit_pool: Which pool limit we should test against */
> + struct dmem_cgroup_pool_state *limit_pool;
> +};
> +
> /**
> * struct ttm_bo_evict_walk - Parameters for the evict walk.
> */
> @@ -504,12 +509,13 @@ struct ttm_bo_evict_walk {
> /** @evicted: Number of successful evictions. */
> unsigned long evicted;
>
> - /** @limit_pool: Which pool limit we should test against */
> - struct dmem_cgroup_pool_state *limit_pool;
> /** @try_low: Whether we should attempt to evict BO's with low watermark threshold */
> bool try_low;
> /** @hit_low: If we cannot evict a bo when @try_low is false (first pass) */
> bool hit_low;
> +
> + /** @alloc_state: State associated with the allocation attempt. */
> + struct ttm_bo_alloc_state *alloc_state;
> };
>
> static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct ttm_buffer_object *bo)
> @@ -518,8 +524,9 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct ttm_buffer_object *
> container_of(walk, typeof(*evict_walk), walk);
> s64 lret;
>
> - if (!dmem_cgroup_state_evict_valuable(evict_walk->limit_pool, bo->resource->css,
> - evict_walk->try_low, &evict_walk->hit_low))
> + if (!dmem_cgroup_state_evict_valuable(evict_walk->alloc_state->limit_pool,
> + bo->resource->css, evict_walk->try_low,
> + &evict_walk->hit_low))
> return 0;
>
> if (bo->pin_count || !bo->bdev->funcs->eviction_valuable(bo, evict_walk->place))
> @@ -561,7 +568,7 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
> struct ttm_operation_ctx *ctx,
> struct ww_acquire_ctx *ticket,
> struct ttm_resource **res,
> - struct dmem_cgroup_pool_state *limit_pool)
> + struct ttm_bo_alloc_state *state)
> {
> struct ttm_bo_evict_walk evict_walk = {
> .walk = {
> @@ -574,7 +581,7 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
> .place = place,
> .evictor = evictor,
> .res = res,
> - .limit_pool = limit_pool,
> + .alloc_state = state,
> };
> s64 lret;
>
> @@ -689,6 +696,58 @@ static int ttm_bo_add_pipelined_eviction_fences(struct ttm_buffer_object *bo,
> return dma_resv_reserve_fences(bo->base.resv, 1);
> }
>
> +
> +/**
> + * ttm_bo_alloc_at_place - Attempt allocating a BO's backing store in a place
> + *
> + * @bo: The buffer to allocate the backing store of
> + * @place: The place to attempt allocation in
> + * @ctx: ttm_operation_ctx associated with this allocation
> + * @force_space: If we should evict buffers to force space
> + * @res: On allocation success, the resulting struct ttm_resource.
> + * @alloc_state: Object holding allocation state such as charged cgroups.
> + *
> + * Returns:
> + * -EBUSY: No space available, but allocation should be retried with ttm_bo_evict_alloc.
> + * -ENOSPC: No space available, allocation should not be retried.
> + * -ERESTARTSYS: An interruptible sleep was interrupted by a signal.
> + *
> + */
> +static int ttm_bo_alloc_at_place(struct ttm_buffer_object *bo,
> + const struct ttm_place *place,
> + struct ttm_operation_ctx *ctx,
> + bool force_space,
> + struct ttm_resource **res,
> + struct ttm_bo_alloc_state *alloc_state)
> +{
> + bool may_evict;
> + int ret;
> +
> + may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
> +
> + ret = ttm_resource_alloc(bo, place, res,
> + force_space ? &alloc_state->limit_pool : NULL);
> +
> + if (ret) {
> + /*
> + * -EAGAIN means the charge failed, which we treat like an
> + * allocation failure. Therefore, return an error code indicating
> + * the allocation failed - either -EBUSY if the allocation should
> + * be retried with eviction, or -ENOSPC if there should be no second
> + * attempt.
> + */
> + if (ret == -EAGAIN)
> + return may_evict ? -EBUSY : -ENOSPC;
> +
> + if (ret == -ENOSPC && may_evict)
> + return -EBUSY;
> +
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> /**
> * ttm_bo_alloc_resource - Allocate backing store for a BO
> *
> @@ -725,9 +784,8 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
>
> for (i = 0; i < placement->num_placement; ++i) {
> const struct ttm_place *place = &placement->placement[i];
> - struct dmem_cgroup_pool_state *limit_pool = NULL;
> + struct ttm_bo_alloc_state alloc_state = {};
> struct ttm_resource_manager *man;
> - bool may_evict;
>
> man = ttm_manager_type(bdev, place->mem_type);
> if (!man || !ttm_resource_manager_used(man))
> @@ -737,25 +795,25 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
> TTM_PL_FLAG_FALLBACK))
> continue;
>
> - may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
> - ret = ttm_resource_alloc(bo, place, res, force_space ? &limit_pool : NULL);
> - if (ret) {
> - if (ret != -ENOSPC && ret != -EAGAIN) {
> - dmem_cgroup_pool_state_put(limit_pool);
> - return ret;
> - }
> - if (!may_evict) {
> - dmem_cgroup_pool_state_put(limit_pool);
> - continue;
> - }
> + ret = ttm_bo_alloc_at_place(bo, place, ctx, force_space,
> + res, &alloc_state);
>
> + if (ret == -ENOSPC) {
> + dmem_cgroup_pool_state_put(alloc_state.limit_pool);
> + continue;
> + } else if (ret == -EBUSY) {
> ret = ttm_bo_evict_alloc(bdev, man, place, bo, ctx,
> - ticket, res, limit_pool);
> - dmem_cgroup_pool_state_put(limit_pool);
> + ticket, res, &alloc_state);
> +
> + dmem_cgroup_pool_state_put(alloc_state.limit_pool);
> +
> if (ret == -EBUSY)
> continue;
> - if (ret)
> - return ret;
> + else if (ret)
> + return;
return ret;
Regards,
Tvrtko
> + } else if (ret) {
> + dmem_cgroup_pool_state_put(alloc_state.limit_pool);
> + return ret;
> }
>
> ret = ttm_bo_add_pipelined_eviction_fences(bo, man, ctx->no_wait_gpu);
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5 4/6] drm/ttm: Split cgroup charge and resource allocation
2026-03-02 12:37 ` [PATCH v5 4/6] drm/ttm: Split cgroup charge and resource allocation Natalie Vock
@ 2026-03-02 15:25 ` Tvrtko Ursulin
0 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2026-03-02 15:25 UTC (permalink / raw)
To: Natalie Vock, Maarten Lankhorst, Maxime Ripard, Tejun Heo,
Johannes Weiner, Michal Koutný, Christian Koenig, Huang Rui,
Matthew Auld, Matthew Brost, Maarten Lankhorst, Thomas Zimmermann,
David Airlie, Simona Vetter
Cc: cgroups, dri-devel
On 02/03/2026 12:37, Natalie Vock wrote:
> Coupling resource allocation and cgroup charging is racy when charging
> succeeds, but subsequent resource allocation fails. Certain eviction
> decisions are made on the basis of whether the allocating cgroup is
> protected, i.e. within its min/low limits, but with the charge being
> tied to resource allocation (and uncharged when the resource allocation
> fails), this check is done at a point where the allocation is not actually
> charged to the cgroup.
>
> This is subtly wrong if the allocation were to cause the cgroup to exceed
> the min/low protection, but it's even more wrong if the same cgroup tries
> allocating multiple buffers concurrently: In this case, the min/low
> protection may pass for all allocation attempts when the real min/low
> protection covers only some, or potentially none of the allocated
> buffers.
>
> Instead, charge the allocation to the cgroup once and keep the charge
> for as long as we try to allocate a ttm_resource, and only undo the charge
> if allocating the resource is ultimately unsuccessful and we move on to
> a different ttm_place.
>
> Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
> ---
> drivers/gpu/drm/ttm/ttm_bo.c | 45 +++++++++++++++++++++++++----------
> drivers/gpu/drm/ttm/ttm_resource.c | 48 +++++++++++++++++++++++++++-----------
> include/drm/ttm/ttm_resource.h | 6 ++++-
> 3 files changed, 73 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 3e62cab51f870..53c4de4bcc1e3 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -490,6 +490,8 @@ int ttm_bo_evict_first(struct ttm_device *bdev, struct ttm_resource_manager *man
> }
>
> struct ttm_bo_alloc_state {
> + /** @charge_pool: The memory pool the resource is charged to */
> + struct dmem_cgroup_pool_state *charge_pool;
> /** @limit_pool: Which pool limit we should test against */
> struct dmem_cgroup_pool_state *limit_pool;
> };
> @@ -544,9 +546,17 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct ttm_buffer_object *
> goto out;
>
> evict_walk->evicted++;
> + if (!evict_walk->alloc_state->charge_pool) {
> + lret = ttm_resource_try_charge(bo, evict_walk->place,
> + &evict_walk->alloc_state->charge_pool, NULL);
Right, this is if charging against the 1st attempted placement failed.
It is a bit sub-optimal that the two placec doing the charge is split
like this.
Would it work to use ttm_bo_alloc_at_place() here as well?
Regards,
Tvrtko
> + if (lret == -EAGAIN)
> + return -EBUSY;
> + else if (lret)
> + return lret;
> + }
> if (evict_walk->res)
> lret = ttm_resource_alloc(evict_walk->evictor, evict_walk->place,
> - evict_walk->res, NULL);
> + evict_walk->res, evict_walk->alloc_state->charge_pool);
> if (lret == 0)
> return 1;
> out:
> @@ -724,10 +734,8 @@ static int ttm_bo_alloc_at_place(struct ttm_buffer_object *bo,
> int ret;
>
> may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
> -
> - ret = ttm_resource_alloc(bo, place, res,
> - force_space ? &alloc_state->limit_pool : NULL);
> -
> + ret = ttm_resource_try_charge(bo, place, &alloc_state->charge_pool,
> + force_space ? &alloc_state->limit_pool : NULL);
> if (ret) {
> /*
> * -EAGAIN means the charge failed, which we treat like an
> @@ -737,14 +745,22 @@ static int ttm_bo_alloc_at_place(struct ttm_buffer_object *bo,
> * attempt.
> */
> if (ret == -EAGAIN)
> - return may_evict ? -EBUSY : -ENOSPC;
> + ret = may_evict ? -EBUSY : -ENOSPC;
> + return ret;
> + }
>
> + ret = ttm_resource_alloc(bo, place, res, alloc_state->charge_pool);
> + if (ret) {
> if (ret == -ENOSPC && may_evict)
> - return -EBUSY;
> -
> + ret = -EBUSY;
> return ret;
> }
>
> + /*
> + * Ownership of charge_pool has been transferred to the TTM resource,
> + * don't make the caller think we still hold a reference to it.
> + */
> + alloc_state->charge_pool = NULL;
> return 0;
> }
>
> @@ -799,6 +815,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
> res, &alloc_state);
>
> if (ret == -ENOSPC) {
> + dmem_cgroup_uncharge(alloc_state.charge_pool, bo->base.size);
> dmem_cgroup_pool_state_put(alloc_state.limit_pool);
> continue;
> } else if (ret == -EBUSY) {
> @@ -807,11 +824,15 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
>
> dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>
> - if (ret == -EBUSY)
> - continue;
> - else if (ret)
> - return;
> + if (ret) {
> + dmem_cgroup_uncharge(alloc_state.charge_pool,
> + bo->base.size);
> + if (ret == -EBUSY)
> + continue;
> + return ret;
> + }
> } else if (ret) {
> + dmem_cgroup_uncharge(alloc_state.charge_pool, bo->base.size);
> dmem_cgroup_pool_state_put(alloc_state.limit_pool);
> return ret;
> }
> diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
> index 192fca24f37e4..a8a836f6e376a 100644
> --- a/drivers/gpu/drm/ttm/ttm_resource.c
> +++ b/drivers/gpu/drm/ttm/ttm_resource.c
> @@ -373,30 +373,52 @@ void ttm_resource_fini(struct ttm_resource_manager *man,
> }
> EXPORT_SYMBOL(ttm_resource_fini);
>
> +/**
> + * ttm_resource_try_charge - charge a resource manager's cgroup pool
> + * @bo: buffer for which an allocation should be charged
> + * @place: where the allocation is attempted to be placed
> + * @ret_pool: on charge success, the pool that was charged
> + * @ret_limit_pool: on charge failure, the pool responsible for the failure
> + *
> + * Should be used to charge cgroups before attempting resource allocation.
> + * When charging succeeds, the value of ret_pool should be passed to
> + * ttm_resource_alloc.
> + *
> + * Returns: 0 on charge success, negative errno on failure.
> + */
> +int ttm_resource_try_charge(struct ttm_buffer_object *bo,
> + const struct ttm_place *place,
> + struct dmem_cgroup_pool_state **ret_pool,
> + struct dmem_cgroup_pool_state **ret_limit_pool)
> +{
> + struct ttm_resource_manager *man =
> + ttm_manager_type(bo->bdev, place->mem_type);
> +
> + if (!man->cg) {
> + *ret_pool = NULL;
> + if (ret_limit_pool)
> + *ret_limit_pool = NULL;
> + return 0;
> + }
> +
> + return dmem_cgroup_try_charge(man->cg, bo->base.size, ret_pool,
> + ret_limit_pool);
> +}
> +
> int ttm_resource_alloc(struct ttm_buffer_object *bo,
> const struct ttm_place *place,
> struct ttm_resource **res_ptr,
> - struct dmem_cgroup_pool_state **ret_limit_pool)
> + struct dmem_cgroup_pool_state *charge_pool)
> {
> struct ttm_resource_manager *man =
> ttm_manager_type(bo->bdev, place->mem_type);
> - struct dmem_cgroup_pool_state *pool = NULL;
> int ret;
>
> - if (man->cg) {
> - ret = dmem_cgroup_try_charge(man->cg, bo->base.size, &pool, ret_limit_pool);
> - if (ret)
> - return ret;
> - }
> -
> ret = man->func->alloc(man, bo, place, res_ptr);
> - if (ret) {
> - if (pool)
> - dmem_cgroup_uncharge(pool, bo->base.size);
> + if (ret)
> return ret;
> - }
>
> - (*res_ptr)->css = pool;
> + (*res_ptr)->css = charge_pool;
>
> spin_lock(&bo->bdev->lru_lock);
> ttm_resource_add_bulk_move(*res_ptr, bo);
> diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h
> index 33e80f30b8b82..549b5b796884d 100644
> --- a/include/drm/ttm/ttm_resource.h
> +++ b/include/drm/ttm/ttm_resource.h
> @@ -456,10 +456,14 @@ void ttm_resource_init(struct ttm_buffer_object *bo,
> void ttm_resource_fini(struct ttm_resource_manager *man,
> struct ttm_resource *res);
>
> +int ttm_resource_try_charge(struct ttm_buffer_object *bo,
> + const struct ttm_place *place,
> + struct dmem_cgroup_pool_state **ret_pool,
> + struct dmem_cgroup_pool_state **ret_limit_pool);
> int ttm_resource_alloc(struct ttm_buffer_object *bo,
> const struct ttm_place *place,
> struct ttm_resource **res,
> - struct dmem_cgroup_pool_state **ret_limit_pool);
> + struct dmem_cgroup_pool_state *charge_pool);
> void ttm_resource_free(struct ttm_buffer_object *bo, struct ttm_resource **res);
> bool ttm_resource_intersects(struct ttm_device *bdev,
> struct ttm_resource *res,
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5 5/6] drm/ttm: Be more aggressive when allocating below protection limit
2026-03-02 12:37 ` [PATCH v5 5/6] drm/ttm: Be more aggressive when allocating below protection limit Natalie Vock
@ 2026-03-02 17:02 ` Tvrtko Ursulin
2026-03-13 11:39 ` Natalie Vock
0 siblings, 1 reply; 18+ messages in thread
From: Tvrtko Ursulin @ 2026-03-02 17:02 UTC (permalink / raw)
To: Natalie Vock, Maarten Lankhorst, Maxime Ripard, Tejun Heo,
Johannes Weiner, Michal Koutný, Christian Koenig, Huang Rui,
Matthew Auld, Matthew Brost, Maarten Lankhorst, Thomas Zimmermann,
David Airlie, Simona Vetter
Cc: cgroups, dri-devel
On 02/03/2026 12:37, Natalie Vock wrote:
> When the cgroup's memory usage is below the low/min limit and allocation
> fails, try evicting some unprotected buffers to make space. Otherwise,
> application buffers may be forced to go into GTT even though usage is
> below the corresponding low/min limit, if other applications filled VRAM
> with their allocations first.
>
> Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
> ---
> drivers/gpu/drm/ttm/ttm_bo.c | 52 +++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 47 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 53c4de4bcc1e3..86f99237f6490 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -494,6 +494,10 @@ struct ttm_bo_alloc_state {
> struct dmem_cgroup_pool_state *charge_pool;
> /** @limit_pool: Which pool limit we should test against */
> struct dmem_cgroup_pool_state *limit_pool;
> + /** @only_evict_unprotected: If only unprotected BOs, i.e. BOs whose cgroup
> + * is exceeding its dmem low/min protection, should be considered for eviction
> + */
> + bool only_evict_unprotected;
> };
>
> /**
> @@ -598,8 +602,12 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
> evict_walk.walk.arg.trylock_only = true;
> lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
>
> - /* One more attempt if we hit low limit? */
> - if (!lret && evict_walk.hit_low) {
> + /* If we failed to find enough BOs to evict, but we skipped over
> + * some BOs because they were covered by dmem low protection, retry
> + * evicting these protected BOs too, except if we're told not to
> + * consider protected BOs at all.
> + */
> + if (!lret && evict_walk.hit_low && !state->only_evict_unprotected) {
> evict_walk.try_low = true;
> lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
> }
> @@ -620,7 +628,8 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
> } while (!lret && evict_walk.evicted);
>
> /* We hit the low limit? Try once more */
> - if (!lret && evict_walk.hit_low && !evict_walk.try_low) {
> + if (!lret && evict_walk.hit_low && !evict_walk.try_low &&
> + !state->only_evict_unprotected) {
> evict_walk.try_low = true;
> goto retry;
> }
> @@ -730,7 +739,7 @@ static int ttm_bo_alloc_at_place(struct ttm_buffer_object *bo,
> struct ttm_resource **res,
> struct ttm_bo_alloc_state *alloc_state)
> {
> - bool may_evict;
> + bool may_evict, below_low;
> int ret;
>
> may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
> @@ -749,9 +758,42 @@ static int ttm_bo_alloc_at_place(struct ttm_buffer_object *bo,
> return ret;
> }
>
> + /*
> + * cgroup protection plays a special role in eviction.
> + * Conceptually, protection of memory via the dmem cgroup controller
> + * entitles the protected cgroup to use a certain amount of memory.
> + * There are two types of protection - the 'low' limit is a
> + * "best-effort" protection, whereas the 'min' limit provides a hard
> + * guarantee that memory within the cgroup's allowance will not be
> + * evicted under any circumstance.
> + *
> + * To faithfully model this concept in TTM, we also need to take cgroup
> + * protection into account when allocating. When allocation in one
> + * place fails, TTM will default to trying other places first before
> + * evicting.
> + * If the allocation is covered by dmem cgroup protection, however,
> + * this prevents the allocation from using the memory it is "entitled"
> + * to. To make sure unprotected allocations cannot push new protected
> + * allocations out of places they are "entitled" to use, we should
> + * evict buffers not covered by any cgroup protection, if this
> + * allocation is covered by cgroup protection.
> + *
> + * Buffers covered by 'min' protection are a special case - the 'min'
> + * limit is a stronger guarantee than 'low', and thus buffers protected
> + * by 'low' but not 'min' should also be considered for eviction.
> + * Buffers protected by 'min' will never be considered for eviction
> + * anyway, so the regular eviction path should be triggered here.
> + * Buffers protected by 'low' but not 'min' will take a special
> + * eviction path that only evicts buffers covered by neither 'low' or
> + * 'min' protections.
> + */
> + may_evict |= dmem_cgroup_below_min(NULL, alloc_state->charge_pool);
It may make sense to group the two lines which "calculate" may_evict
together. which would probably mean also pulling two lines below to
before try charge, so that the whole logical block is not split.
> + below_low = dmem_cgroup_below_low(NULL, alloc_state->charge_pool);
> + alloc_state->only_evict_unprotected = !may_evict && below_low;
Would it work to enable may_evict also if below_low is true, and assign
below_low directly to only_evict_unprotected? I mean along the lines of:
may_evict = force_space && place->mem_type != TTM_PL_SYSTEM;
may_evict |= dmem_cgroup_below_min(NULL, alloc_state->charge_pool);
alloc_state->only_evict_unprotected = dmem_cgroup_below_low(NULL,
alloc_state->charge_pool);
It would allow the if condition below to be simpler. Evict callback
would remain the same I guess.
And maybe only_evict_unprotected could be renamed to "try_low" to align
with the naming in there? Then in the callback the condition would be like:
/* We hit the low limit? Try once more */
if (!lret && evict_walk.hit_low &&
!(evict_walk.try_low | state->try_low))
evict_walk.try_low = true;
goto retry;
Give or take.. Would that be more readable eg. obvious? Although I am
endlessly confused how !try_low ends up being try_low = true in this
condition so maybe I am mixing something up. You get my gist though?
Unifying the naming and logic for easier understanding in essence if you
can find some workable way in this spirit I think it is worth thinking
about it.
Regards,
Tvrtko
> +
> ret = ttm_resource_alloc(bo, place, res, alloc_state->charge_pool);
> if (ret) {
> - if (ret == -ENOSPC && may_evict)
> + if (ret == -ENOSPC && (may_evict || below_low))
> ret = -EBUSY;
> return ret;
> }
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5 2/6] cgroup,cgroup/dmem: Add (dmem_)cgroup_common_ancestor helper
2026-03-02 14:38 ` Maarten Lankhorst
@ 2026-03-05 20:02 ` Natalie Vock
2026-03-10 12:48 ` Natalie Vock
0 siblings, 1 reply; 18+ messages in thread
From: Natalie Vock @ 2026-03-05 20:02 UTC (permalink / raw)
To: Maarten Lankhorst, Maarten Lankhorst, Maxime Ripard, Tejun Heo,
Johannes Weiner, Michal Koutný, Christian Koenig, Huang Rui,
Matthew Auld, Matthew Brost, Thomas Zimmermann, David Airlie,
Simona Vetter, Tvrtko Ursulin
Cc: cgroups, dri-devel
On 3/2/26 15:38, Maarten Lankhorst wrote:
> Hey,
>
> This should probably have a Co-developed-by: Tejun Heo <tj@kernel.org>
Oh, that's a good point, sorry!
Although, I think I also need to add a S-o-b tag, then, don't I?
Tejun, just to confirm, would you be fine with that? Wouldn't want to
claim people certify something without talking to them first :P
>
> I need to take a closer look at patch 4 and 6, to add my r-b over the rest.
>
> Den 2026-03-02 kl. 13:37, skrev Natalie Vock:
>> This helps to find a common subtree of two resources, which is important
>> when determining whether it's helpful to evict one resource in favor of
>> another.
>>
>> To facilitate this, add a common helper to find the ancestor of two
>> cgroups using each cgroup's ancestor array.
>>
>> Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
>> ---
>> include/linux/cgroup.h | 21 +++++++++++++++++++++
>> include/linux/cgroup_dmem.h | 9 +++++++++
>> kernel/cgroup/dmem.c | 43 ++++++++++++++++++++++++++++++++++++++++---
>> 3 files changed, 70 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
>> index bc892e3b37eea..560ae995e3a54 100644
>> --- a/include/linux/cgroup.h
>> +++ b/include/linux/cgroup.h
>> @@ -561,6 +561,27 @@ static inline struct cgroup *cgroup_ancestor(struct cgroup *cgrp,
>> return cgrp->ancestors[ancestor_level];
>> }
>>
>> +/**
>> + * cgroup_common_ancestor - find common ancestor of two cgroups
>> + * @a: first cgroup to find common ancestor of
>> + * @b: second cgroup to find common ancestor of
>> + *
>> + * Find the first cgroup that is an ancestor of both @a and @b, if it exists
>> + * and return a pointer to it. If such a cgroup doesn't exist, return NULL.
>> + *
>> + * This function is safe to call as long as both @a and @b are accessible.
>> + */
>> +static inline struct cgroup *cgroup_common_ancestor(struct cgroup *a,
>> + struct cgroup *b)
>> +{
>> + int level;
>> +
>> + for (level = min(a->level, b->level); level >= 0; level--)
>> + if (a->ancestors[level] == b->ancestors[level])
>> + return a->ancestors[level];
>> + return NULL;
>> +}
>> +
>> /**
>> * task_under_cgroup_hierarchy - test task's membership of cgroup ancestry
>> * @task: the task to be tested
>> diff --git a/include/linux/cgroup_dmem.h b/include/linux/cgroup_dmem.h
>> index 1a88cd0c9eb00..444b84f4c253a 100644
>> --- a/include/linux/cgroup_dmem.h
>> +++ b/include/linux/cgroup_dmem.h
>> @@ -28,6 +28,8 @@ bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state *root,
>> struct dmem_cgroup_pool_state *test);
>> bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
>> struct dmem_cgroup_pool_state *test);
>> +struct dmem_cgroup_pool_state *dmem_cgroup_common_ancestor(struct dmem_cgroup_pool_state *a,
>> + struct dmem_cgroup_pool_state *b);
>>
>> void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool);
>> #else
>> @@ -75,6 +77,13 @@ static inline bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
>> return false;
>> }
>>
>> +static inline
>> +struct dmem_cgroup_pool_state *dmem_cgroup_common_ancestor(struct dmem_cgroup_pool_state *a,
>> + struct dmem_cgroup_pool_state *b)
>> +{
>> + return NULL;
>> +}
>> +
>> static inline void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool)
>> { }
>>
>> diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
>> index 28227405f7cfe..a3ba865f4c68f 100644
>> --- a/kernel/cgroup/dmem.c
>> +++ b/kernel/cgroup/dmem.c
>> @@ -569,11 +569,10 @@ void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool)
>> EXPORT_SYMBOL_GPL(dmem_cgroup_pool_state_put);
>>
>> static struct dmem_cgroup_pool_state *
>> -get_cg_pool_unlocked(struct dmemcg_state *cg, struct dmem_cgroup_region *region)
>> +find_cg_pool_unlocked(struct dmemcg_state *cg, struct dmem_cgroup_region *region)
>> {
>> - struct dmem_cgroup_pool_state *pool, *allocpool = NULL;
>> + struct dmem_cgroup_pool_state *pool;
>>
>> - /* fastpath lookup? */
>> rcu_read_lock();
>> pool = find_cg_pool_locked(cg, region);
>> if (pool && !READ_ONCE(pool->inited))
>> @@ -582,6 +581,17 @@ get_cg_pool_unlocked(struct dmemcg_state *cg, struct dmem_cgroup_region *region)
>> pool = NULL;
>> rcu_read_unlock();
>>
>> + return pool;
>> +}
>> +
>> +static struct dmem_cgroup_pool_state *
>> +get_cg_pool_unlocked(struct dmemcg_state *cg, struct dmem_cgroup_region *region)
>> +{
>> + struct dmem_cgroup_pool_state *pool, *allocpool = NULL;
>> +
>> + /* fastpath lookup? */
>> + pool = find_cg_pool_unlocked(cg, region);
>> +
>> while (!pool) {
>> spin_lock(&dmemcg_lock);
>> if (!region->unregistered)
>> @@ -756,6 +766,33 @@ bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
>> }
>> EXPORT_SYMBOL_GPL(dmem_cgroup_below_low);
>>
>> +/**
>> + * dmem_cgroup_common_ancestor(): Find the first common ancestor of two pools.
>> + * @a: First pool to find the common ancestor of.
>> + * @b: First pool to find the common ancestor of.
>> + *
>> + * Return: The first pool that is a parent of both @a and @b, or NULL if either @a or @b are NULL,
>> + * or if such a pool does not exist.
>> + */
>> +struct dmem_cgroup_pool_state *dmem_cgroup_common_ancestor(struct dmem_cgroup_pool_state *a,
>> + struct dmem_cgroup_pool_state *b)
>> +{
>> + struct cgroup *ancestor_cgroup;
>> + struct cgroup_subsys_state *ancestor_css;
>> +
>> + if (!a || !b)
>> + return NULL;
>> +
>> + ancestor_cgroup = cgroup_common_ancestor(a->cs->css.cgroup, b->cs->css.cgroup);
>> + if (!ancestor_cgroup)
>> + return NULL;
>> +
>> + ancestor_css = cgroup_e_css(ancestor_cgroup, &dmem_cgrp_subsys);
>> +
>> + return find_cg_pool_unlocked(css_to_dmemcs(ancestor_css), a->region);
>> +}
>> +EXPORT_SYMBOL_GPL(dmem_cgroup_common_ancestor);
> From the naming, I would not expect a reference to be taken to the common ancestor, especially because the reference through a and b would both be able keep the ancestor alive. Otherwise it would not be an ancestor. Rename to dmem_cgroup_get_common_ancestor perhaps? Same for the find_, perhaps rename to lookup_ or use the unmodified get_cg_pool_unlocked version, because the common ancestor's pool_state definitely exists if either a or b do.
Right. Will rename to dmem_cgroup_get_common_ancestor, and also point
out in the documentation that a reference is taken.
Thanks,
Natalie
>
> Kind regards,
> ~Maarten Lankhorst
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5 2/6] cgroup,cgroup/dmem: Add (dmem_)cgroup_common_ancestor helper
2026-03-05 20:02 ` Natalie Vock
@ 2026-03-10 12:48 ` Natalie Vock
2026-03-10 16:42 ` Tejun Heo
0 siblings, 1 reply; 18+ messages in thread
From: Natalie Vock @ 2026-03-10 12:48 UTC (permalink / raw)
To: Maarten Lankhorst, Maarten Lankhorst, Maxime Ripard, Tejun Heo,
Johannes Weiner, Michal Koutný, Christian Koenig, Huang Rui,
Matthew Auld, Matthew Brost, Thomas Zimmermann, David Airlie,
Simona Vetter, Tvrtko Ursulin
Cc: cgroups, dri-devel
Hi,
On 3/5/26 21:02, Natalie Vock wrote:
> On 3/2/26 15:38, Maarten Lankhorst wrote:
>> Hey,
>>
>> This should probably have a Co-developed-by: Tejun Heo <tj@kernel.org>
>
> Oh, that's a good point, sorry!
>
> Although, I think I also need to add a S-o-b tag, then, don't I?
>
> Tejun, just to confirm, would you be fine with that? Wouldn't want to
> claim people certify something without talking to them first :P
Friendly ping on this :)
I intend to send out a new version with the outstanding feedback
addressed, although I'd like to resolve this before I do that.
Thanks,
Natalie
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5 2/6] cgroup,cgroup/dmem: Add (dmem_)cgroup_common_ancestor helper
2026-03-10 12:48 ` Natalie Vock
@ 2026-03-10 16:42 ` Tejun Heo
0 siblings, 0 replies; 18+ messages in thread
From: Tejun Heo @ 2026-03-10 16:42 UTC (permalink / raw)
To: Natalie Vock
Cc: Maarten Lankhorst, Maarten Lankhorst, Maxime Ripard,
Johannes Weiner, Michal Koutný, Christian Koenig, Huang Rui,
Matthew Auld, Matthew Brost, Thomas Zimmermann, David Airlie,
Simona Vetter, Tvrtko Ursulin, cgroups, dri-devel
On Tue, Mar 10, 2026 at 01:48:47PM +0100, Natalie Vock wrote:
> Hi,
>
> On 3/5/26 21:02, Natalie Vock wrote:
> > On 3/2/26 15:38, Maarten Lankhorst wrote:
> > > Hey,
> > >
> > > This should probably have a Co-developed-by: Tejun Heo <tj@kernel.org>
> >
> > Oh, that's a good point, sorry!
> >
> > Although, I think I also need to add a S-o-b tag, then, don't I?
> >
> > Tejun, just to confirm, would you be fine with that? Wouldn't want to
> > claim people certify something without talking to them first :P
>
> Friendly ping on this :)
>
> I intend to send out a new version with the outstanding feedback addressed,
> although I'd like to resolve this before I do that.
Sorry about the delay. It doesn't really matter. No need to attribute
anything.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5 1/6] cgroup/dmem: Add queries for protection values
2026-03-02 12:37 ` [PATCH v5 1/6] cgroup/dmem: Add queries for protection values Natalie Vock
@ 2026-03-11 1:12 ` Chen Ridong
2026-03-11 8:33 ` Natalie Vock
0 siblings, 1 reply; 18+ messages in thread
From: Chen Ridong @ 2026-03-11 1:12 UTC (permalink / raw)
To: Natalie Vock, Maarten Lankhorst, Maxime Ripard, Tejun Heo,
Johannes Weiner, Michal Koutný, Christian Koenig, Huang Rui,
Matthew Auld, Matthew Brost, Maarten Lankhorst, Thomas Zimmermann,
David Airlie, Simona Vetter, Tvrtko Ursulin
Cc: cgroups, dri-devel
On 2026/3/2 20:37, Natalie Vock wrote:
> Callers can use this feedback to be more aggressive in making space for
> allocations of a cgroup if they know it is protected.
>
> These are counterparts to memcg's mem_cgroup_below_{min,low}.
>
> Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
> ---
> include/linux/cgroup_dmem.h | 16 ++++++++++++
> kernel/cgroup/dmem.c | 62 +++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 78 insertions(+)
>
> diff --git a/include/linux/cgroup_dmem.h b/include/linux/cgroup_dmem.h
> index dd4869f1d736e..1a88cd0c9eb00 100644
> --- a/include/linux/cgroup_dmem.h
> +++ b/include/linux/cgroup_dmem.h
> @@ -24,6 +24,10 @@ void dmem_cgroup_uncharge(struct dmem_cgroup_pool_state *pool, u64 size);
> bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_pool_state *limit_pool,
> struct dmem_cgroup_pool_state *test_pool,
> bool ignore_low, bool *ret_hit_low);
> +bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state *root,
> + struct dmem_cgroup_pool_state *test);
> +bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
> + struct dmem_cgroup_pool_state *test);
>
> void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool);
> #else
> @@ -59,6 +63,18 @@ bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_pool_state *limit_pool,
> return true;
> }
>
> +static inline bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state *root,
> + struct dmem_cgroup_pool_state *test)
> +{
> + return false;
> +}
> +
> +static inline bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
> + struct dmem_cgroup_pool_state *test)
> +{
> + return false;
> +}
> +
> static inline void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool)
> { }
>
> diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
> index 9d95824dc6fa0..28227405f7cfe 100644
> --- a/kernel/cgroup/dmem.c
> +++ b/kernel/cgroup/dmem.c
> @@ -694,6 +694,68 @@ int dmem_cgroup_try_charge(struct dmem_cgroup_region *region, u64 size,
> }
> EXPORT_SYMBOL_GPL(dmem_cgroup_try_charge);
>
> +/**
> + * dmem_cgroup_below_min() - Tests whether current usage is within min limit.
> + *
> + * @root: Root of the subtree to calculate protection for, or NULL to calculate global protection.
> + * @test: The pool to test the usage/min limit of.
> + *
> + * Return: true if usage is below min and the cgroup is protected, false otherwise.
> + */
> +bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state *root,
> + struct dmem_cgroup_pool_state *test)
> +{
> + if (root == test || !pool_parent(test))
> + return false;
> +
> + if (!root) {
> + for (root = test; pool_parent(root); root = pool_parent(root))
> + {}
> + }
It seems we don't have find the global protection(root), since the root's
protection can not be set. If !root, we can return false directly, right?
Or do I miss anything?
```
{
.name = "min",
.write = dmem_cgroup_region_min_write,
.seq_show = dmem_cgroup_region_min_show,
.flags = CFTYPE_NOT_ON_ROOT,
},
{
.name = "low",
.write = dmem_cgroup_region_low_write,
.seq_show = dmem_cgroup_region_low_show,
.flags = CFTYPE_NOT_ON_ROOT,
},
```
> +
> + /*
> + * In mem_cgroup_below_min(), the memcg pendant, this call is missing.
> + * mem_cgroup_below_min() gets called during traversal of the cgroup tree, where
> + * protection is already calculated as part of the traversal. dmem cgroup eviction
> + * does not traverse the cgroup tree, so we need to recalculate effective protection
> + * here.
> + */
> + dmem_cgroup_calculate_protection(root, test);
> + return page_counter_read(&test->cnt) <= READ_ONCE(test->cnt.emin);
> +}
> +EXPORT_SYMBOL_GPL(dmem_cgroup_below_min);
> +
> +/**
> + * dmem_cgroup_below_low() - Tests whether current usage is within low limit.
> + *
> + * @root: Root of the subtree to calculate protection for, or NULL to calculate global protection.
> + * @test: The pool to test the usage/low limit of.
> + *
> + * Return: true if usage is below low and the cgroup is protected, false otherwise.
> + */
> +bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
> + struct dmem_cgroup_pool_state *test)
> +{
> + if (root == test || !pool_parent(test))
> + return false;
> +
> + if (!root) {
> + for (root = test; pool_parent(root); root = pool_parent(root))
> + {}
> + }
> +
> + /*
> + * In mem_cgroup_below_low(), the memcg pendant, this call is missing.
> + * mem_cgroup_below_low() gets called during traversal of the cgroup tree, where
> + * protection is already calculated as part of the traversal. dmem cgroup eviction
> + * does not traverse the cgroup tree, so we need to recalculate effective protection
> + * here.
> + */
> + dmem_cgroup_calculate_protection(root, test);
> + return page_counter_read(&test->cnt) <= READ_ONCE(test->cnt.elow);
> +}
> +EXPORT_SYMBOL_GPL(dmem_cgroup_below_low);
> +
> static int dmem_cgroup_region_capacity_show(struct seq_file *sf, void *v)
> {
> struct dmem_cgroup_region *region;
>
--
Best regards,
Ridong
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5 1/6] cgroup/dmem: Add queries for protection values
2026-03-11 1:12 ` Chen Ridong
@ 2026-03-11 8:33 ` Natalie Vock
2026-03-12 0:36 ` Chen Ridong
0 siblings, 1 reply; 18+ messages in thread
From: Natalie Vock @ 2026-03-11 8:33 UTC (permalink / raw)
To: Chen Ridong, Maarten Lankhorst, Maxime Ripard, Tejun Heo,
Johannes Weiner, Michal Koutný, Christian Koenig, Huang Rui,
Matthew Auld, Matthew Brost, Maarten Lankhorst, Thomas Zimmermann,
David Airlie, Simona Vetter, Tvrtko Ursulin
Cc: cgroups, dri-devel
On 3/11/26 02:12, Chen Ridong wrote:
>
>
> On 2026/3/2 20:37, Natalie Vock wrote:
>> Callers can use this feedback to be more aggressive in making space for
>> allocations of a cgroup if they know it is protected.
>>
>> These are counterparts to memcg's mem_cgroup_below_{min,low}.
>>
>> Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
>> ---
>> include/linux/cgroup_dmem.h | 16 ++++++++++++
>> kernel/cgroup/dmem.c | 62 +++++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 78 insertions(+)
>>
>> diff --git a/include/linux/cgroup_dmem.h b/include/linux/cgroup_dmem.h
>> index dd4869f1d736e..1a88cd0c9eb00 100644
>> --- a/include/linux/cgroup_dmem.h
>> +++ b/include/linux/cgroup_dmem.h
>> @@ -24,6 +24,10 @@ void dmem_cgroup_uncharge(struct dmem_cgroup_pool_state *pool, u64 size);
>> bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_pool_state *limit_pool,
>> struct dmem_cgroup_pool_state *test_pool,
>> bool ignore_low, bool *ret_hit_low);
>> +bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state *root,
>> + struct dmem_cgroup_pool_state *test);
>> +bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
>> + struct dmem_cgroup_pool_state *test);
>>
>> void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool);
>> #else
>> @@ -59,6 +63,18 @@ bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_pool_state *limit_pool,
>> return true;
>> }
>>
>> +static inline bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state *root,
>> + struct dmem_cgroup_pool_state *test)
>> +{
>> + return false;
>> +}
>> +
>> +static inline bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
>> + struct dmem_cgroup_pool_state *test)
>> +{
>> + return false;
>> +}
>> +
>> static inline void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool)
>> { }
>>
>> diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
>> index 9d95824dc6fa0..28227405f7cfe 100644
>> --- a/kernel/cgroup/dmem.c
>> +++ b/kernel/cgroup/dmem.c
>> @@ -694,6 +694,68 @@ int dmem_cgroup_try_charge(struct dmem_cgroup_region *region, u64 size,
>> }
>> EXPORT_SYMBOL_GPL(dmem_cgroup_try_charge);
>>
>> +/**
>> + * dmem_cgroup_below_min() - Tests whether current usage is within min limit.
>> + *
>> + * @root: Root of the subtree to calculate protection for, or NULL to calculate global protection.
>> + * @test: The pool to test the usage/min limit of.
>> + *
>> + * Return: true if usage is below min and the cgroup is protected, false otherwise.
>> + */
>> +bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state *root,
>> + struct dmem_cgroup_pool_state *test)
>> +{
>> + if (root == test || !pool_parent(test))
>> + return false;
>> +
>> + if (!root) {
>> + for (root = test; pool_parent(root); root = pool_parent(root))
>> + {}
>> + }
>
> It seems we don't have find the global protection(root), since the root's
> protection can not be set. If !root, we can return false directly, right?
>
> Or do I miss anything?
>
> ```
> {
> .name = "min",
> .write = dmem_cgroup_region_min_write,
> .seq_show = dmem_cgroup_region_min_show,
> .flags = CFTYPE_NOT_ON_ROOT,
> },
> {
> .name = "low",
> .write = dmem_cgroup_region_low_write,
> .seq_show = dmem_cgroup_region_low_show,
> .flags = CFTYPE_NOT_ON_ROOT,
> },
> ```
That's not quite how it works. You're correct that the min/low
properties don't exist on the root cgroup, but we don't use the root for
that.
The reason we have a root here in the first place has to do with how
recursive memory protection works in cgroups. Note that for the test
cgroup, we don't read the literal min/low protection setting, but the
"emin"/"elow" value, referring to effective protection. The effective
protection value depends not just on the settings of the "test" cgroup,
but also its ancestors (and potentially, their sibling groups). See [1]
for some documentation on how effective protection varies with different
cgroup relationships.
The "root" parameter here refers to the root of the common subtree
between the test cgroup and what the documentation refers to as the
"reclaim target". For device memory there usually isn't really any
reclaim happening in the traditional sense, but e.g. TTM evictions
follow the same principle (the reclaim target is simply the cgroup
owning the buffer that is to be evicted).
Sometimes, precise reclaim targets may not really be known yet (or we
want to try evicting different buffers originating from different
cgroups). In that case, the "root" parameter here is NULL. However, we
obviously know that all cgroups must be descendants of the root cgroup,
so the root cgroup is a guaranteed safe value for the shared subtree
between the test cgroup and any potential reclaim target.
In practice, this means that the effective min/low protection will be
capped by the protection value specified in all ancestors, which is the
most conservative estimate.
Regards,
Natalie
[1] https://docs.kernel.org/admin-guide/cgroup-v2.html#reclaim-protection
>
>> +
>> + /*
>> + * In mem_cgroup_below_min(), the memcg pendant, this call is missing.
>> + * mem_cgroup_below_min() gets called during traversal of the cgroup tree, where
>> + * protection is already calculated as part of the traversal. dmem cgroup eviction
>> + * does not traverse the cgroup tree, so we need to recalculate effective protection
>> + * here.
>> + */
>> + dmem_cgroup_calculate_protection(root, test);
>> + return page_counter_read(&test->cnt) <= READ_ONCE(test->cnt.emin);
>> +}
>> +EXPORT_SYMBOL_GPL(dmem_cgroup_below_min);
>> +
>> +/**
>> + * dmem_cgroup_below_low() - Tests whether current usage is within low limit.
>> + *
>> + * @root: Root of the subtree to calculate protection for, or NULL to calculate global protection.
>> + * @test: The pool to test the usage/low limit of.
>> + *
>> + * Return: true if usage is below low and the cgroup is protected, false otherwise.
>> + */
>> +bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
>> + struct dmem_cgroup_pool_state *test)
>> +{
>> + if (root == test || !pool_parent(test))
>> + return false;
>> +
>> + if (!root) {
>> + for (root = test; pool_parent(root); root = pool_parent(root))
>> + {}
>> + }
>> +
>> + /*
>> + * In mem_cgroup_below_low(), the memcg pendant, this call is missing.
>> + * mem_cgroup_below_low() gets called during traversal of the cgroup tree, where
>> + * protection is already calculated as part of the traversal. dmem cgroup eviction
>> + * does not traverse the cgroup tree, so we need to recalculate effective protection
>> + * here.
>> + */
>> + dmem_cgroup_calculate_protection(root, test);
>> + return page_counter_read(&test->cnt) <= READ_ONCE(test->cnt.elow);
>> +}
>> +EXPORT_SYMBOL_GPL(dmem_cgroup_below_low);
>> +
>> static int dmem_cgroup_region_capacity_show(struct seq_file *sf, void *v)
>> {
>> struct dmem_cgroup_region *region;
>>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5 1/6] cgroup/dmem: Add queries for protection values
2026-03-11 8:33 ` Natalie Vock
@ 2026-03-12 0:36 ` Chen Ridong
0 siblings, 0 replies; 18+ messages in thread
From: Chen Ridong @ 2026-03-12 0:36 UTC (permalink / raw)
To: Natalie Vock, Maarten Lankhorst, Maxime Ripard, Tejun Heo,
Johannes Weiner, Michal Koutný, Christian Koenig, Huang Rui,
Matthew Auld, Matthew Brost, Maarten Lankhorst, Thomas Zimmermann,
David Airlie, Simona Vetter, Tvrtko Ursulin
Cc: cgroups, dri-devel
On 2026/3/11 16:33, Natalie Vock wrote:
> On 3/11/26 02:12, Chen Ridong wrote:
>>
>>
>> On 2026/3/2 20:37, Natalie Vock wrote:
>>> Callers can use this feedback to be more aggressive in making space for
>>> allocations of a cgroup if they know it is protected.
>>>
>>> These are counterparts to memcg's mem_cgroup_below_{min,low}.
>>>
>>> Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
>>> ---
>>> include/linux/cgroup_dmem.h | 16 ++++++++++++
>>> kernel/cgroup/dmem.c | 62 +++++++++++++++++++++++++++++++++++++++++++++
>>> 2 files changed, 78 insertions(+)
>>>
>>> diff --git a/include/linux/cgroup_dmem.h b/include/linux/cgroup_dmem.h
>>> index dd4869f1d736e..1a88cd0c9eb00 100644
>>> --- a/include/linux/cgroup_dmem.h
>>> +++ b/include/linux/cgroup_dmem.h
>>> @@ -24,6 +24,10 @@ void dmem_cgroup_uncharge(struct dmem_cgroup_pool_state
>>> *pool, u64 size);
>>> bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_pool_state
>>> *limit_pool,
>>> struct dmem_cgroup_pool_state *test_pool,
>>> bool ignore_low, bool *ret_hit_low);
>>> +bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state *root,
>>> + struct dmem_cgroup_pool_state *test);
>>> +bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
>>> + struct dmem_cgroup_pool_state *test);
>>> void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool);
>>> #else
>>> @@ -59,6 +63,18 @@ bool dmem_cgroup_state_evict_valuable(struct
>>> dmem_cgroup_pool_state *limit_pool,
>>> return true;
>>> }
>>> +static inline bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state *root,
>>> + struct dmem_cgroup_pool_state *test)
>>> +{
>>> + return false;
>>> +}
>>> +
>>> +static inline bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
>>> + struct dmem_cgroup_pool_state *test)
>>> +{
>>> + return false;
>>> +}
>>> +
>>> static inline void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state
>>> *pool)
>>> { }
>>> diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
>>> index 9d95824dc6fa0..28227405f7cfe 100644
>>> --- a/kernel/cgroup/dmem.c
>>> +++ b/kernel/cgroup/dmem.c
>>> @@ -694,6 +694,68 @@ int dmem_cgroup_try_charge(struct dmem_cgroup_region
>>> *region, u64 size,
>>> }
>>> EXPORT_SYMBOL_GPL(dmem_cgroup_try_charge);
>>> +/**
>>> + * dmem_cgroup_below_min() - Tests whether current usage is within min limit.
>>> + *
>>> + * @root: Root of the subtree to calculate protection for, or NULL to
>>> calculate global protection.
>>> + * @test: The pool to test the usage/min limit of.
>>> + *
>>> + * Return: true if usage is below min and the cgroup is protected, false
>>> otherwise.
>>> + */
>>> +bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state *root,
>>> + struct dmem_cgroup_pool_state *test)
>>> +{
>>> + if (root == test || !pool_parent(test))
>>> + return false;
>>> +
>>> + if (!root) {
>>> + for (root = test; pool_parent(root); root = pool_parent(root))
>>> + {}
>>> + }
>>
>> It seems we don't have find the global protection(root), since the root's
>> protection can not be set. If !root, we can return false directly, right?
>>
>> Or do I miss anything?
>>
>> ```
>> {
>> .name = "min",
>> .write = dmem_cgroup_region_min_write,
>> .seq_show = dmem_cgroup_region_min_show,
>> .flags = CFTYPE_NOT_ON_ROOT,
>> },
>> {
>> .name = "low",
>> .write = dmem_cgroup_region_low_write,
>> .seq_show = dmem_cgroup_region_low_show,
>> .flags = CFTYPE_NOT_ON_ROOT,
>> },
>> ```
>
> That's not quite how it works. You're correct that the min/low properties don't
> exist on the root cgroup, but we don't use the root for that.
>
> The reason we have a root here in the first place has to do with how recursive
> memory protection works in cgroups. Note that for the test cgroup, we don't read
> the literal min/low protection setting, but the "emin"/"elow" value, referring
> to effective protection. The effective protection value depends not just on the
> settings of the "test" cgroup, but also its ancestors (and potentially, their
> sibling groups). See [1] for some documentation on how effective protection
> varies with different cgroup relationships.
>
> The "root" parameter here refers to the root of the common subtree between the
> test cgroup and what the documentation refers to as the "reclaim target". For
> device memory there usually isn't really any reclaim happening in the
> traditional sense, but e.g. TTM evictions follow the same principle (the reclaim
> target is simply the cgroup owning the buffer that is to be evicted).
>
> Sometimes, precise reclaim targets may not really be known yet (or we want to
> try evicting different buffers originating from different cgroups). In that
> case, the "root" parameter here is NULL. However, we obviously know that all
> cgroups must be descendants of the root cgroup, so the root cgroup is a
> guaranteed safe value for the shared subtree between the test cgroup and any
> potential reclaim target.
>
> In practice, this means that the effective min/low protection will be capped by
> the protection value specified in all ancestors, which is the most conservative
> estimate.
>
> Regards,
> Natalie
>
> [1] https://docs.kernel.org/admin-guide/cgroup-v2.html#reclaim-protection
>
Thank you for your explanation. I made a mistake.
Sorry for the noisy.
>>
>>> +
>>> + /*
>>> + * In mem_cgroup_below_min(), the memcg pendant, this call is missing.
>>> + * mem_cgroup_below_min() gets called during traversal of the cgroup
>>> tree, where
>>> + * protection is already calculated as part of the traversal. dmem
>>> cgroup eviction
>>> + * does not traverse the cgroup tree, so we need to recalculate
>>> effective protection
>>> + * here.
>>> + */
>>> + dmem_cgroup_calculate_protection(root, test);
>>> + return page_counter_read(&test->cnt) <= READ_ONCE(test->cnt.emin);
>>> +}
>>> +EXPORT_SYMBOL_GPL(dmem_cgroup_below_min);
>>> +
>>> +/**
>>> + * dmem_cgroup_below_low() - Tests whether current usage is within low limit.
>>> + *
>>> + * @root: Root of the subtree to calculate protection for, or NULL to
>>> calculate global protection.
>>> + * @test: The pool to test the usage/low limit of.
>>> + *
>>> + * Return: true if usage is below low and the cgroup is protected, false
>>> otherwise.
>>> + */
>>> +bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
>>> + struct dmem_cgroup_pool_state *test)
>>> +{
>>> + if (root == test || !pool_parent(test))
>>> + return false;
>>> +
>>> + if (!root) {
>>> + for (root = test; pool_parent(root); root = pool_parent(root))
>>> + {}
>>> + }
>>> +
>>> + /*
>>> + * In mem_cgroup_below_low(), the memcg pendant, this call is missing.
>>> + * mem_cgroup_below_low() gets called during traversal of the cgroup
>>> tree, where
>>> + * protection is already calculated as part of the traversal. dmem
>>> cgroup eviction
>>> + * does not traverse the cgroup tree, so we need to recalculate
>>> effective protection
>>> + * here.
>>> + */
>>> + dmem_cgroup_calculate_protection(root, test);
>>> + return page_counter_read(&test->cnt) <= READ_ONCE(test->cnt.elow);
>>> +}
>>> +EXPORT_SYMBOL_GPL(dmem_cgroup_below_low);
>>> +
>>> static int dmem_cgroup_region_capacity_show(struct seq_file *sf, void *v)
>>> {
>>> struct dmem_cgroup_region *region;
>>>
>>
--
Best regards,
Ridong
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5 5/6] drm/ttm: Be more aggressive when allocating below protection limit
2026-03-02 17:02 ` Tvrtko Ursulin
@ 2026-03-13 11:39 ` Natalie Vock
0 siblings, 0 replies; 18+ messages in thread
From: Natalie Vock @ 2026-03-13 11:39 UTC (permalink / raw)
To: Tvrtko Ursulin, Maarten Lankhorst, Maxime Ripard, Tejun Heo,
Johannes Weiner, Michal Koutný, Christian Koenig, Huang Rui,
Matthew Auld, Matthew Brost, Maarten Lankhorst, Thomas Zimmermann,
David Airlie, Simona Vetter
Cc: cgroups, dri-devel
On 3/2/26 18:02, Tvrtko Ursulin wrote:
>
> On 02/03/2026 12:37, Natalie Vock wrote:
>> When the cgroup's memory usage is below the low/min limit and allocation
>> fails, try evicting some unprotected buffers to make space. Otherwise,
>> application buffers may be forced to go into GTT even though usage is
>> below the corresponding low/min limit, if other applications filled VRAM
>> with their allocations first.
>>
>> Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
>> ---
>> drivers/gpu/drm/ttm/ttm_bo.c | 52 ++++++++++++++++++++++++++++++++++
>> +++++-----
>> 1 file changed, 47 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> index 53c4de4bcc1e3..86f99237f6490 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -494,6 +494,10 @@ struct ttm_bo_alloc_state {
>> struct dmem_cgroup_pool_state *charge_pool;
>> /** @limit_pool: Which pool limit we should test against */
>> struct dmem_cgroup_pool_state *limit_pool;
>> + /** @only_evict_unprotected: If only unprotected BOs, i.e. BOs
>> whose cgroup
>> + * is exceeding its dmem low/min protection, should be
>> considered for eviction
>> + */
>> + bool only_evict_unprotected;
>> };
>> /**
>> @@ -598,8 +602,12 @@ static int ttm_bo_evict_alloc(struct ttm_device
>> *bdev,
>> evict_walk.walk.arg.trylock_only = true;
>> lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
>> - /* One more attempt if we hit low limit? */
>> - if (!lret && evict_walk.hit_low) {
>> + /* If we failed to find enough BOs to evict, but we skipped over
>> + * some BOs because they were covered by dmem low protection, retry
>> + * evicting these protected BOs too, except if we're told not to
>> + * consider protected BOs at all.
>> + */
>> + if (!lret && evict_walk.hit_low && !state->only_evict_unprotected) {
>> evict_walk.try_low = true;
>> lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
>> }
>> @@ -620,7 +628,8 @@ static int ttm_bo_evict_alloc(struct ttm_device
>> *bdev,
>> } while (!lret && evict_walk.evicted);
>> /* We hit the low limit? Try once more */
>> - if (!lret && evict_walk.hit_low && !evict_walk.try_low) {
>> + if (!lret && evict_walk.hit_low && !evict_walk.try_low &&
>> + !state->only_evict_unprotected) {
>> evict_walk.try_low = true;
>> goto retry;
>> }
>> @@ -730,7 +739,7 @@ static int ttm_bo_alloc_at_place(struct
>> ttm_buffer_object *bo,
>> struct ttm_resource **res,
>> struct ttm_bo_alloc_state *alloc_state)
>> {
>> - bool may_evict;
>> + bool may_evict, below_low;
>> int ret;
>> may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
>> @@ -749,9 +758,42 @@ static int ttm_bo_alloc_at_place(struct
>> ttm_buffer_object *bo,
>> return ret;
>> }
>> + /*
>> + * cgroup protection plays a special role in eviction.
>> + * Conceptually, protection of memory via the dmem cgroup controller
>> + * entitles the protected cgroup to use a certain amount of memory.
>> + * There are two types of protection - the 'low' limit is a
>> + * "best-effort" protection, whereas the 'min' limit provides a hard
>> + * guarantee that memory within the cgroup's allowance will not be
>> + * evicted under any circumstance.
>> + *
>> + * To faithfully model this concept in TTM, we also need to take
>> cgroup
>> + * protection into account when allocating. When allocation in one
>> + * place fails, TTM will default to trying other places first before
>> + * evicting.
>> + * If the allocation is covered by dmem cgroup protection, however,
>> + * this prevents the allocation from using the memory it is
>> "entitled"
>> + * to. To make sure unprotected allocations cannot push new
>> protected
>> + * allocations out of places they are "entitled" to use, we should
>> + * evict buffers not covered by any cgroup protection, if this
>> + * allocation is covered by cgroup protection.
>> + *
>> + * Buffers covered by 'min' protection are a special case - the
>> 'min'
>> + * limit is a stronger guarantee than 'low', and thus buffers
>> protected
>> + * by 'low' but not 'min' should also be considered for eviction.
>> + * Buffers protected by 'min' will never be considered for eviction
>> + * anyway, so the regular eviction path should be triggered here.
>> + * Buffers protected by 'low' but not 'min' will take a special
>> + * eviction path that only evicts buffers covered by neither
>> 'low' or
>> + * 'min' protections.
>> + */
>> + may_evict |= dmem_cgroup_below_min(NULL, alloc_state->charge_pool);
>
> It may make sense to group the two lines which "calculate" may_evict
> together. which would probably mean also pulling two lines below to
> before try charge, so that the whole logical block is not split.
This is unfortunately not possible, because this logic depends on the
charge_pool which we don't have before try_charge.
>
>> + below_low = dmem_cgroup_below_low(NULL, alloc_state->charge_pool);
>> + alloc_state->only_evict_unprotected = !may_evict && below_low;
>
> Would it work to enable may_evict also if below_low is true, and assign
> below_low directly to only_evict_unprotected? I mean along the lines of:
>
> may_evict = force_space && place->mem_type != TTM_PL_SYSTEM;
> may_evict |= dmem_cgroup_below_min(NULL, alloc_state->charge_pool);
> alloc_state->only_evict_unprotected = dmem_cgroup_below_low(NULL,
> alloc_state->charge_pool);
>
> It would allow the if condition below to be simpler. Evict callback
> would remain the same I guess.
>
> And maybe only_evict_unprotected could be renamed to "try_low" to align
> with the naming in there? Then in the callback the condition would be like:
>
> /* We hit the low limit? Try once more */
> if (!lret && evict_walk.hit_low &&
> !(evict_walk.try_low | state->try_low))
> evict_walk.try_low = true;
> goto retry;
>
> Give or take.. Would that be more readable eg. obvious? Although I am
> endlessly confused how !try_low ends up being try_low = true in this
> condition so maybe I am mixing something up. You get my gist though?
> Unifying the naming and logic for easier understanding in essence if you
> can find some workable way in this spirit I think it is worth thinking
> about it.
Maybe that becomes clearer in v6, I'll include some cleanups based on
your suggestions here.
Thanks,
Natalie
>
> Regards,
>
> Tvrtko
>
>> +
>> ret = ttm_resource_alloc(bo, place, res, alloc_state->charge_pool);
>> if (ret) {
>> - if (ret == -ENOSPC && may_evict)
>> + if (ret == -ENOSPC && (may_evict || below_low))
>> ret = -EBUSY;
>> return ret;
>> }
>>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-03-13 11:40 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-02 12:37 [PATCH v5 0/6] cgroup/dmem,drm/ttm: Improve protection in contended cases Natalie Vock
2026-03-02 12:37 ` [PATCH v5 1/6] cgroup/dmem: Add queries for protection values Natalie Vock
2026-03-11 1:12 ` Chen Ridong
2026-03-11 8:33 ` Natalie Vock
2026-03-12 0:36 ` Chen Ridong
2026-03-02 12:37 ` [PATCH v5 2/6] cgroup,cgroup/dmem: Add (dmem_)cgroup_common_ancestor helper Natalie Vock
2026-03-02 14:38 ` Maarten Lankhorst
2026-03-05 20:02 ` Natalie Vock
2026-03-10 12:48 ` Natalie Vock
2026-03-10 16:42 ` Tejun Heo
2026-03-02 12:37 ` [PATCH v5 3/6] drm/ttm: Extract code for attempting allocation in a place Natalie Vock
2026-03-02 15:08 ` Tvrtko Ursulin
2026-03-02 12:37 ` [PATCH v5 4/6] drm/ttm: Split cgroup charge and resource allocation Natalie Vock
2026-03-02 15:25 ` Tvrtko Ursulin
2026-03-02 12:37 ` [PATCH v5 5/6] drm/ttm: Be more aggressive when allocating below protection limit Natalie Vock
2026-03-02 17:02 ` Tvrtko Ursulin
2026-03-13 11:39 ` Natalie Vock
2026-03-02 12:37 ` [PATCH v5 6/6] drm/ttm: Use common ancestor of evictor and evictee as limit pool Natalie Vock
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox