[rfc] drm/ttm/memcg: simplest initial memcg/ttm integration (series v3)

dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed

* [rfc] drm/ttm/memcg: simplest initial memcg/ttm integration (series v3)
@ 2025-05-12  6:12 Dave Airlie
  2025-05-12  6:12 ` [PATCH 1/7] mm: add gpu active/reclaim per-node stat counters Dave Airlie
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Dave Airlie @ 2025-05-12  6:12 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Waiman Long, simona

Hey,

This is my 3rd attempt to try and integrate ttm and memcg accounting.

I've tried to take on board the feedback given on the last series, and
made some compromises to try and close in on the solution.

Feedback:
1. memcg folks didn't really like the gpu specific stats due to a lack
of global stats being exposed.
2. After consideration of the resource level memcg accounting, I tried
to reason about swap evictions a bit and couldn't come up with a good
way to make it work, so I moved back down to ttm.
3. Use a placement flag instead of ctx flag.

This series starts by adding two per-node stats to the mm layers,
to track memory allocated to the gpu in active use, and memory sitting
in the reclaimable ttm pools.
Then it adds the memcg stat for active gpu as before, 
(reclaimable is not accounted to a memcg at all).
I didn't go back to use __GFP_ACCOUNT and manual stat tweaking, because
kmem is definitely not the correct place to account this memory. This
memory is never used by the kernel, it's used by userspace and the GPU
in nearly all cases, so I think accounting it under kmem is very wrong.
I'm hoping adding the global stats might alleviate any concerns.

I had to move back to ttm_tt accounting instead of ttm_resource, as
when a resource gets evicted to swap, the memory is freed and the memcg
accounting should be updated correctly, as such I ended up going back
to adding the accounting in ttm_tt population paths.

Regards,
Dave.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/7] mm: add gpu active/reclaim per-node stat counters
  2025-05-12  6:12 [rfc] drm/ttm/memcg: simplest initial memcg/ttm integration (series v3) Dave Airlie
@ 2025-05-12  6:12 ` Dave Airlie
  2025-05-12  6:12 ` [PATCH 2/7] ttm: use gpu mm stats to track gpu memory allocations Dave Airlie
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Dave Airlie @ 2025-05-12  6:12 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Waiman Long, simona

From: Dave Airlie <airlied@redhat.com>

These will be used to track pages actively allocated to the GPU,
and unused pages in pools that can be reclaimed by the shrinker.

Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 Documentation/filesystems/proc.rst | 6 ++++++
 drivers/base/node.c                | 5 +++++
 fs/proc/meminfo.c                  | 6 ++++++
 include/linux/mmzone.h             | 2 ++
 mm/show_mem.c                      | 9 +++++++--
 mm/vmstat.c                        | 2 ++
 6 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 2a17865dfe39..224b0568cf99 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -1093,6 +1093,8 @@ Example output. You may not have all of these fields.
     CmaFree:               0 kB
     Unaccepted:            0 kB
     Balloon:               0 kB
+    GPUActive:             0 kB
+    GPUReclaim:            0 kB
     HugePages_Total:       0
     HugePages_Free:        0
     HugePages_Rsvd:        0
@@ -1271,6 +1273,10 @@ Unaccepted
               Memory that has not been accepted by the guest
 Balloon
               Memory returned to Host by VM Balloon Drivers
+GPUActive
+              Memory allocated to GPU objects
+GPUReclaim
+              Memory in GPU allocator pools that is reclaimable
 HugePages_Total, HugePages_Free, HugePages_Rsvd, HugePages_Surp, Hugepagesize, Hugetlb
               See Documentation/admin-guide/mm/hugetlbpage.rst.
 DirectMap4k, DirectMap2M, DirectMap1G
diff --git a/drivers/base/node.c b/drivers/base/node.c
index cd13ef287011..669b1e1968a2 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -454,6 +454,8 @@ static ssize_t node_read_meminfo(struct device *dev,
 #ifdef CONFIG_UNACCEPTED_MEMORY
 			     "Node %d Unaccepted:     %8lu kB\n"
 #endif
+			     "Node %d GPUActive:      %8lu kB\n"
+			     "Node %d GPUReclaim:     %8lu kB\n"
 			     ,
 			     nid, K(node_page_state(pgdat, NR_FILE_DIRTY)),
 			     nid, K(node_page_state(pgdat, NR_WRITEBACK)),
@@ -487,6 +489,9 @@ static ssize_t node_read_meminfo(struct device *dev,
 			     ,
 			     nid, K(sum_zone_node_page_state(nid, NR_UNACCEPTED))
 #endif
+			     ,
+			     nid, K(node_page_state(pgdat, NR_GPU_ACTIVE)),
+			     nid, K(node_page_state(pgdat, NR_GPU_RECLAIM))
 			    );
 	len += hugetlb_report_node_meminfo(buf, len, nid);
 	return len;
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 83be312159c9..baf537044115 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -165,6 +165,12 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 	show_val_kb(m, "Balloon:        ",
 		    global_node_page_state(NR_BALLOON_PAGES));
 
+	show_val_kb(m, "GPUActive:      ",
+		    global_node_page_state(NR_GPU_ACTIVE));
+
+	show_val_kb(m, "GPUReclaim:     ",
+		    global_node_page_state(NR_GPU_RECLAIM));
+
 	hugetlb_report_meminfo(m);
 
 	arch_report_meminfo(m);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 6ccec1bf2896..ddc1b9b2a5e3 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -226,6 +226,8 @@ enum node_stat_item {
 	NR_HUGETLB,
 #endif
 	NR_BALLOON_PAGES,
+	NR_GPU_ACTIVE,          /* GPU pages assigned to an object */
+	NR_GPU_RECLAIM,         /* GPU pages in shrinkable pool */
 	NR_VM_NODE_STAT_ITEMS
 };
 
diff --git a/mm/show_mem.c b/mm/show_mem.c
index 6af13bcd2ab3..fe8cd06a9143 100644
--- a/mm/show_mem.c
+++ b/mm/show_mem.c
@@ -261,7 +261,9 @@ static void show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_z
 			" sec_pagetables:%lukB"
 			" all_unreclaimable? %s"
 			" Balloon:%lukB"
-			"\n",
+		        " gpu_active:%lukB"
+		        " gpu_reclaim:%lukB"
+		        "\n",
 			pgdat->node_id,
 			K(node_page_state(pgdat, NR_ACTIVE_ANON)),
 			K(node_page_state(pgdat, NR_INACTIVE_ANON)),
@@ -287,7 +289,10 @@ static void show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_z
 			K(node_page_state(pgdat, NR_PAGETABLE)),
 			K(node_page_state(pgdat, NR_SECONDARY_PAGETABLE)),
 			str_yes_no(pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES),
-			K(node_page_state(pgdat, NR_BALLOON_PAGES)));
+		        K(node_page_state(pgdat, NR_BALLOON_PAGES)),
+		        K(node_page_state(pgdat, NR_GPU_ACTIVE)),
+			K(node_page_state(pgdat, NR_GPU_RECLAIM)));
+
 	}
 
 	for_each_populated_zone(zone) {
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 4c268ce39ff2..52bec6522220 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1282,6 +1282,8 @@ const char * const vmstat_text[] = {
 	"nr_hugetlb",
 #endif
 	"nr_balloon_pages",
+	"nr_gpu_active",
+	"nr_gpu_reclaim",
 	/* system-wide enum vm_stat_item counters */
 	"nr_dirty_threshold",
 	"nr_dirty_background_threshold",
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/7] ttm: use gpu mm stats to track gpu memory allocations.
  2025-05-12  6:12 [rfc] drm/ttm/memcg: simplest initial memcg/ttm integration (series v3) Dave Airlie
  2025-05-12  6:12 ` [PATCH 1/7] mm: add gpu active/reclaim per-node stat counters Dave Airlie
@ 2025-05-12  6:12 ` Dave Airlie
  2025-05-12  6:12 ` [PATCH 3/7] memcg: add GPU statistic Dave Airlie
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Dave Airlie @ 2025-05-12  6:12 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Waiman Long, simona

From: Dave Airlie <airlied@redhat.com>

This uses the per-node stats to track GPU memory allocations,
across nodes when available. It also tracks the memory in the
pool.
---
 drivers/gpu/drm/ttm/ttm_pool.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index c2ea865be657..ccc3b9a13e9e 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -130,6 +130,16 @@ static struct list_head shrinker_list;
 static struct shrinker *mm_shrinker;
 static DECLARE_RWSEM(pool_shrink_rwsem);
 
+/* helper to get a current valid node id from a pool */
+static int ttm_pool_nid(struct ttm_pool *pool) {
+	int nid = NUMA_NO_NODE;
+	if (pool)
+		nid = pool->nid;
+	if (nid == NUMA_NO_NODE)
+		nid = numa_node_id();
+	return nid;
+}
+
 /* Allocate pages of size 1 << order with the given gfp_flags */
 static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 					unsigned int order)
@@ -149,8 +159,10 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 
 	if (!pool->use_dma_alloc) {
 		p = alloc_pages_node(pool->nid, gfp_flags, order);
-		if (p)
+		if (p) {
 			p->private = order;
+			mod_node_page_state(NODE_DATA(ttm_pool_nid(pool)), NR_GPU_ACTIVE, (1 << order));
+		}
 		return p;
 	}
 
@@ -201,6 +213,7 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
 
 	if (!pool || !pool->use_dma_alloc) {
 		__free_pages(p, order);
+		mod_node_page_state(NODE_DATA(ttm_pool_nid(pool)), NR_GPU_ACTIVE, -(1 << order));
 		return;
 	}
 
@@ -275,6 +288,7 @@ static void ttm_pool_unmap(struct ttm_pool *pool, dma_addr_t dma_addr,
 static void ttm_pool_type_give(struct ttm_pool_type *pt, struct page *p)
 {
 	unsigned int i, num_pages = 1 << pt->order;
+	int nid = ttm_pool_nid(pt->pool);
 
 	for (i = 0; i < num_pages; ++i) {
 		if (PageHighMem(p))
@@ -287,17 +301,23 @@ static void ttm_pool_type_give(struct ttm_pool_type *pt, struct page *p)
 	list_add(&p->lru, &pt->pages);
 	spin_unlock(&pt->lock);
 	atomic_long_add(1 << pt->order, &allocated_pages);
+
+	mod_node_page_state(NODE_DATA(nid), NR_GPU_ACTIVE, -(1 << pt->order));
+	mod_node_page_state(NODE_DATA(nid), NR_GPU_RECLAIM, (1 << pt->order));
 }
 
 /* Take pages from a specific pool_type, return NULL when nothing available */
 static struct page *ttm_pool_type_take(struct ttm_pool_type *pt)
 {
 	struct page *p;
+	int nid = ttm_pool_nid(pt->pool);
 
 	spin_lock(&pt->lock);
 	p = list_first_entry_or_null(&pt->pages, typeof(*p), lru);
 	if (p) {
 		atomic_long_sub(1 << pt->order, &allocated_pages);
+		mod_node_page_state(NODE_DATA(nid), NR_GPU_ACTIVE, (1 << pt->order));
+		mod_node_page_state(NODE_DATA(nid), NR_GPU_RECLAIM, -(1 << pt->order));
 		list_del(&p->lru);
 	}
 	spin_unlock(&pt->lock);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/7] memcg: add GPU statistic
  2025-05-12  6:12 [rfc] drm/ttm/memcg: simplest initial memcg/ttm integration (series v3) Dave Airlie
  2025-05-12  6:12 ` [PATCH 1/7] mm: add gpu active/reclaim per-node stat counters Dave Airlie
  2025-05-12  6:12 ` [PATCH 2/7] ttm: use gpu mm stats to track gpu memory allocations Dave Airlie
@ 2025-05-12  6:12 ` Dave Airlie
  2025-05-12  6:12 ` [PATCH 4/7] memcg: add hooks for gpu memcg charging/uncharging Dave Airlie
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Dave Airlie @ 2025-05-12  6:12 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Waiman Long, simona

From: Dave Airlie <airlied@redhat.com>

Discrete and Integrated GPUs can use system RAM instead of
VRAM for all or some allocations. These allocations happen
via drm/ttm subsystem and are currently not accounted for
in cgroups.

Add a gpu statistic to allow a place to visualise allocations
once they are supported.

Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 3 +++
 include/linux/memcontrol.h              | 1 +
 mm/memcontrol.c                         | 2 ++
 3 files changed, 6 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 1a16ce68a4d7..e10a1dfa6051 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1480,6 +1480,9 @@ The following nested keys are defined.
 	  vmalloc (npn)
 		Amount of memory used for vmap backed memory.
 
+	  gpu (npn)
+		Amount of memory used for GPU device system RAM.
+
 	  shmem
 		Amount of cached filesystem data that is swap-backed,
 		such as tmpfs, shm segments, shared anonymous mmap()s
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 53364526d877..4058d4bd94ed 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -36,6 +36,7 @@ enum memcg_stat_item {
 	MEMCG_SOCK,
 	MEMCG_PERCPU_B,
 	MEMCG_VMALLOC,
+	MEMCG_GPU,
 	MEMCG_KMEM,
 	MEMCG_ZSWAP_B,
 	MEMCG_ZSWAPPED,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c96c1f2b9cf5..25471a0fd0be 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -326,6 +326,7 @@ static const unsigned int memcg_stat_items[] = {
 	MEMCG_SOCK,
 	MEMCG_PERCPU_B,
 	MEMCG_VMALLOC,
+	MEMCG_GPU,
 	MEMCG_KMEM,
 	MEMCG_ZSWAP_B,
 	MEMCG_ZSWAPPED,
@@ -1358,6 +1359,7 @@ static const struct memory_stat memory_stats[] = {
 	{ "percpu",			MEMCG_PERCPU_B			},
 	{ "sock",			MEMCG_SOCK			},
 	{ "vmalloc",			MEMCG_VMALLOC			},
+	{ "gpu",			MEMCG_GPU			},
 	{ "shmem",			NR_SHMEM			},
 #ifdef CONFIG_ZSWAP
 	{ "zswap",			MEMCG_ZSWAP_B			},
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/7] memcg: add hooks for gpu memcg charging/uncharging.
  2025-05-12  6:12 [rfc] drm/ttm/memcg: simplest initial memcg/ttm integration (series v3) Dave Airlie
                   ` (2 preceding siblings ...)
  2025-05-12  6:12 ` [PATCH 3/7] memcg: add GPU statistic Dave Airlie
@ 2025-05-12  6:12 ` Dave Airlie
  2025-05-12  6:12 ` [PATCH 5/7] ttm: add initial memcg integration. (v4) Dave Airlie
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Dave Airlie @ 2025-05-12  6:12 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Waiman Long, simona

From: Dave Airlie <airlied@redhat.com>

As per the socket hooks, just adds two APIs to charge GPU pages
to the memcg and uncharge them.

Suggested by Waiman.

Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 include/linux/memcontrol.h |  5 +++++
 mm/memcontrol.c            | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 4058d4bd94ed..f74b7db89f00 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1618,6 +1618,11 @@ struct sock;
 bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages,
 			     gfp_t gfp_mask);
 void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
+
+bool mem_cgroup_charge_gpu(struct mem_cgroup *memcg, unsigned int nr_pages,
+			   gfp_t gfp_mask);
+void mem_cgroup_uncharge_gpu(struct mem_cgroup *memcg, unsigned int nr_pages);
+
 #ifdef CONFIG_MEMCG
 extern struct static_key_false memcg_sockets_enabled_key;
 #define mem_cgroup_sockets_enabled static_branch_unlikely(&memcg_sockets_enabled_key)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 25471a0fd0be..76a0ec34b7dc 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4958,6 +4958,40 @@ void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
 	refill_stock(memcg, nr_pages);
 }
 
+/**
+ * mem_cgroup_charge_gpu - charge GPU memory
+ * @memcg: memcg to charge
+ * @nr_pages: number of pages to charge
+ * @gfp_mask: reclaim mode
+ *
+ * Charges @nr_pages to @memcg. Returns %true if the charge fit within
+ * @memcg's configured limit, %false if it doesn't.
+ */
+bool mem_cgroup_charge_gpu(struct mem_cgroup *memcg, unsigned int nr_pages,
+			   gfp_t gfp_mask)
+{
+	if (try_charge_memcg(memcg, gfp_mask, nr_pages) == 0) {
+		mod_memcg_state(memcg, MEMCG_GPU, nr_pages);
+		return true;
+	}
+
+	return false;
+}
+EXPORT_SYMBOL_GPL(mem_cgroup_charge_gpu);
+
+/**
+ * mem_cgroup_uncharge_gpu - uncharge GPU memory
+ * @memcg: memcg to uncharge
+ * @nr_pages: number of pages to uncharge
+ */
+void mem_cgroup_uncharge_gpu(struct mem_cgroup *memcg, unsigned int nr_pages)
+{
+	mod_memcg_state(memcg, MEMCG_GPU, -nr_pages);
+
+	refill_stock(memcg, nr_pages);
+}
+EXPORT_SYMBOL_GPL(mem_cgroup_uncharge_gpu);
+
 static int __init cgroup_memory(char *s)
 {
 	char *token;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/7] ttm: add initial memcg integration. (v4)
  2025-05-12  6:12 [rfc] drm/ttm/memcg: simplest initial memcg/ttm integration (series v3) Dave Airlie
                   ` (3 preceding siblings ...)
  2025-05-12  6:12 ` [PATCH 4/7] memcg: add hooks for gpu memcg charging/uncharging Dave Airlie
@ 2025-05-12  6:12 ` Dave Airlie
  2025-05-12 14:42   ` kernel test robot
                     ` (2 more replies)
  2025-05-12  6:12 ` [PATCH 6/7] amdgpu: add support for memcg integration Dave Airlie
  2025-05-12  6:12 ` [PATCH 7/7] nouveau: add " Dave Airlie
  6 siblings, 3 replies; 16+ messages in thread
From: Dave Airlie @ 2025-05-12  6:12 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Waiman Long, simona

From: Dave Airlie <airlied@redhat.com>

Doing proper integration of TTM system memory allocations with
memcg is a difficult ask, primarily due to difficulties around
accounting for evictions properly.

However there are systems where userspace will be allocating
objects in system memory and they won't be prone to migrating
or evicting and we should start with at least accounting those.

This adds a memcg group to ttm bo and tt objects.

This memcg is used when:
a) when a tt is populated (and unpopulated)
b) the TTM_PL_FLAG_MEMCG is set on the placement for the
bo when the tt is allocated.

The placement flag is set for all non-eviction placements.

This version moves back from the resource to the tt layer,
when accounting at the resource layer, if an object is swapped
out there was no way to remove it from the accounting, whereas
the tt layer has more info for this.

v4: move back to the tt layer from the resource layer to
handle swap, but keep the memcg charging hooks for now.
v3: moves from having a flags on the op ctx to the using a
placement flag.
v2: moved the charging up a level and also no longer used
__GFP_ACCOUNT, or attached the memcg to object pages, it instead
uses the same approach as socket memory and just charges/uncharges
at the object level. This was suggested by Christian.

Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c      |  6 ++++--
 drivers/gpu/drm/ttm/ttm_bo_util.c |  6 +++---
 drivers/gpu/drm/ttm/ttm_bo_vm.c   |  4 +++-
 drivers/gpu/drm/ttm/ttm_tt.c      | 17 ++++++++++++++++-
 include/drm/ttm/ttm_bo.h          |  7 +++++++
 include/drm/ttm/ttm_placement.h   |  3 +++
 include/drm/ttm/ttm_tt.h          |  9 ++++++++-
 7 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 5bf3c969907c..1630ef28e5a8 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -140,7 +140,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
 			goto out_err;
 
 		if (mem->mem_type != TTM_PL_SYSTEM) {
-			ret = ttm_bo_populate(bo, ctx);
+			ret = ttm_bo_populate(bo, mem->placement & TTM_PL_FLAG_MEMCG, ctx);
 			if (ret)
 				goto out_err;
 		}
@@ -1237,6 +1237,7 @@ void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
 /**
  * ttm_bo_populate() - Ensure that a buffer object has backing pages
  * @bo: The buffer object
+ * @memcg_account: account this memory with memcg if needed
  * @ctx: The ttm_operation_ctx governing the operation.
  *
  * For buffer objects in a memory type whose manager uses
@@ -1250,6 +1251,7 @@ void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
  * is set to true.
  */
 int ttm_bo_populate(struct ttm_buffer_object *bo,
+		    bool memcg_account,
 		    struct ttm_operation_ctx *ctx)
 {
 	struct ttm_tt *tt = bo->ttm;
@@ -1262,7 +1264,7 @@ int ttm_bo_populate(struct ttm_buffer_object *bo,
 		return 0;
 
 	swapped = ttm_tt_is_swapped(tt);
-	ret = ttm_tt_populate(bo->bdev, tt, ctx);
+	ret = ttm_tt_populate(bo->bdev, tt, memcg_account, ctx);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 15cab9bda17f..7d599d0707e4 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -163,7 +163,7 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo,
 	src_man = ttm_manager_type(bdev, src_mem->mem_type);
 	if (ttm && ((ttm->page_flags & TTM_TT_FLAG_SWAPPED) ||
 		    dst_man->use_tt)) {
-		ret = ttm_bo_populate(bo, ctx);
+		ret = ttm_bo_populate(bo, dst_mem->placement & TTM_PL_FLAG_MEMCG, ctx);
 		if (ret)
 			return ret;
 	}
@@ -350,7 +350,7 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo,
 
 	BUG_ON(!ttm);
 
-	ret = ttm_bo_populate(bo, &ctx);
+	ret = ttm_bo_populate(bo, mem->placement & TTM_PL_FLAG_MEMCG, &ctx);
 	if (ret)
 		return ret;
 
@@ -507,7 +507,7 @@ int ttm_bo_vmap(struct ttm_buffer_object *bo, struct iosys_map *map)
 		pgprot_t prot;
 		void *vaddr;
 
-		ret = ttm_bo_populate(bo, &ctx);
+		ret = ttm_bo_populate(bo, mem->placement & TTM_PL_FLAG_MEMCG, &ctx);
 		if (ret)
 			return ret;
 
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index a194db83421d..02aea23a34e7 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -224,7 +224,9 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
 		};
 
 		ttm = bo->ttm;
-		err = ttm_bo_populate(bo, &ctx);
+		err = ttm_bo_populate(bo,
+				      bo->resource->placement & TTM_PL_FLAG_MEMCG,
+				      &ctx);
 		if (err) {
 			if (err == -EINTR || err == -ERESTARTSYS ||
 			    err == -EAGAIN)
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 698cd4bf5e46..81c4cbbeb130 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -161,6 +161,7 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
 	ttm->caching = caching;
 	ttm->restore = NULL;
 	ttm->backup = NULL;
+	ttm->memcg = bo->memcg;
 }
 
 int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
@@ -365,7 +366,9 @@ int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
 EXPORT_SYMBOL_FOR_TESTS_ONLY(ttm_tt_swapout);
 
 int ttm_tt_populate(struct ttm_device *bdev,
-		    struct ttm_tt *ttm, struct ttm_operation_ctx *ctx)
+		    struct ttm_tt *ttm,
+		    bool memcg_account_tt,
+		    struct ttm_operation_ctx *ctx)
 {
 	int ret;
 
@@ -376,6 +379,14 @@ int ttm_tt_populate(struct ttm_device *bdev,
 		return 0;
 
 	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
+		if (ttm->memcg && memcg_account_tt) {
+			gfp_t gfp_flags = GFP_USER;
+			if (ctx->gfp_retry_mayfail)
+				gfp_flags |= __GFP_RETRY_MAYFAIL;
+			if (!mem_cgroup_charge_gpu(ttm->memcg, ttm->num_pages, gfp_flags))
+				return -ENOMEM;
+			ttm->page_flags |= TTM_TT_FLAG_ACCOUNTED;
+		}
 		atomic_long_add(ttm->num_pages, &ttm_pages_allocated);
 		if (bdev->pool.use_dma32)
 			atomic_long_add(ttm->num_pages,
@@ -437,6 +448,10 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm)
 		ttm_pool_free(&bdev->pool, ttm);
 
 	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
+		if (ttm->page_flags & TTM_TT_FLAG_ACCOUNTED) {
+			mem_cgroup_uncharge_gpu(ttm->memcg, ttm->num_pages);
+			ttm->page_flags &= ~TTM_TT_FLAG_ACCOUNTED;
+		}
 		atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
 		if (bdev->pool.use_dma32)
 			atomic_long_sub(ttm->num_pages,
diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
index 903cd1030110..d7c0dd9e0746 100644
--- a/include/drm/ttm/ttm_bo.h
+++ b/include/drm/ttm/ttm_bo.h
@@ -135,6 +135,12 @@ struct ttm_buffer_object {
 	 * reservation lock.
 	 */
 	struct sg_table *sg;
+
+	/**
+	 * @memcg: memory cgroup to charge this to if it ends up using system memory.
+	 * NULL means don't charge.
+	 */
+	struct mem_cgroup *memcg;
 };
 
 #define TTM_BO_MAP_IOMEM_MASK 0x80
@@ -486,6 +492,7 @@ pgprot_t ttm_io_prot(struct ttm_buffer_object *bo, struct ttm_resource *res,
 		     pgprot_t tmp);
 void ttm_bo_tt_destroy(struct ttm_buffer_object *bo);
 int ttm_bo_populate(struct ttm_buffer_object *bo,
+		    bool memcg_account,
 		    struct ttm_operation_ctx *ctx);
 
 /* Driver LRU walk helpers initially targeted for shrinking. */
diff --git a/include/drm/ttm/ttm_placement.h b/include/drm/ttm/ttm_placement.h
index b510a4812609..668798072292 100644
--- a/include/drm/ttm/ttm_placement.h
+++ b/include/drm/ttm/ttm_placement.h
@@ -70,6 +70,9 @@
 /* Placement is only used during eviction */
 #define TTM_PL_FLAG_FALLBACK	(1 << 4)
 
+/* Placement causes memcg accounting */
+#define TTM_PL_FLAG_MEMCG	(1 << 5)
+
 /**
  * struct ttm_place
  *
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index 406437ad674b..2790fc82edc3 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -90,6 +90,8 @@ struct ttm_tt {
 	 * TTM_TT_FLAG_BACKED_UP: TTM internal only. This is set if the
 	 * struct ttm_tt has been (possibly partially) backed up.
 	 *
+	 * TTM_TT_FLAG_ACCOUNTED: TTM internal. This tt has been accounted.
+	 *
 	 * TTM_TT_FLAG_PRIV_POPULATED: TTM internal only. DO NOT USE. This is
 	 * set by TTM after ttm_tt_populate() has successfully returned, and is
 	 * then unset when TTM calls ttm_tt_unpopulate().
@@ -101,8 +103,9 @@ struct ttm_tt {
 #define TTM_TT_FLAG_EXTERNAL_MAPPABLE	BIT(3)
 #define TTM_TT_FLAG_DECRYPTED		BIT(4)
 #define TTM_TT_FLAG_BACKED_UP	        BIT(5)
+#define TTM_TT_FLAG_ACCOUNTED	        BIT(6)
 
-#define TTM_TT_FLAG_PRIV_POPULATED	BIT(6)
+#define TTM_TT_FLAG_PRIV_POPULATED	BIT(7)
 	uint32_t page_flags;
 	/** @num_pages: Number of pages in the page array. */
 	uint32_t num_pages;
@@ -126,6 +129,8 @@ struct ttm_tt {
 	enum ttm_caching caching;
 	/** @restore: Partial restoration from backup state. TTM private */
 	struct ttm_pool_tt_restore *restore;
+	/** @memcg: Memory cgroup for this TT allocation */
+	struct mem_cgroup *memcg;
 };
 
 /**
@@ -245,11 +250,13 @@ int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
  *
  * @bdev: the ttm_device this object belongs to
  * @ttm: Pointer to the ttm_tt structure
+ * @mem_account_tt: Account this population to the memcg
  * @ctx: operation context for populating the tt object.
  *
  * Calls the driver method to allocate pages for a ttm
  */
 int ttm_tt_populate(struct ttm_device *bdev, struct ttm_tt *ttm,
+		    bool mem_account_tt,
 		    struct ttm_operation_ctx *ctx);
 
 /**
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/7] amdgpu: add support for memcg integration
  2025-05-12  6:12 [rfc] drm/ttm/memcg: simplest initial memcg/ttm integration (series v3) Dave Airlie
                   ` (4 preceding siblings ...)
  2025-05-12  6:12 ` [PATCH 5/7] ttm: add initial memcg integration. (v4) Dave Airlie
@ 2025-05-12  6:12 ` Dave Airlie
  2025-05-13 13:21   ` Christian König
  2025-05-12  6:12 ` [PATCH 7/7] nouveau: add " Dave Airlie
  6 siblings, 1 reply; 16+ messages in thread
From: Dave Airlie @ 2025-05-12  6:12 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Waiman Long, simona

From: Dave Airlie <airlied@redhat.com>

This adds the memcg object for any user allocated objects,
add uses the MEMCG placement flags in the correct places.

Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c     |  5 ++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 16 +++++++++++-----
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    |  2 ++
 5 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 82df06a72ee0..1684a7e6d6cd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -839,7 +839,10 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 				union drm_amdgpu_cs *cs)
 {
 	struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
-	struct ttm_operation_ctx ctx = { true, false };
+	struct ttm_operation_ctx ctx = {
+		.interruptible = true,
+		.no_wait_gpu = false,
+	};
 	struct amdgpu_vm *vm = &fpriv->vm;
 	struct amdgpu_bo_list_entry *e;
 	struct drm_gem_object *obj;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 69429df09477..bdad9a862ed3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -89,6 +89,7 @@ static void amdgpu_gem_object_free(struct drm_gem_object *gobj)
 	struct amdgpu_bo *aobj = gem_to_amdgpu_bo(gobj);
 
 	amdgpu_hmm_unregister(aobj);
+	mem_cgroup_put(aobj->tbo.memcg);
 	ttm_bo_put(&aobj->tbo);
 }
 
@@ -116,6 +117,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size,
 	bp.domain = initial_domain;
 	bp.bo_ptr_size = sizeof(struct amdgpu_bo);
 	bp.xcp_id_plus1 = xcp_id_plus1;
+	bp.memcg = get_mem_cgroup_from_mm(current->mm);
 
 	r = amdgpu_bo_create_user(adev, &bp, &ubo);
 	if (r)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 0b9987781f76..1d930421354a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -158,7 +158,7 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain)
 		places[c].mem_type =
 			abo->flags & AMDGPU_GEM_CREATE_PREEMPTIBLE ?
 			AMDGPU_PL_PREEMPT : TTM_PL_TT;
-		places[c].flags = 0;
+		places[c].flags = TTM_PL_FLAG_MEMCG;
 		/*
 		 * When GTT is just an alternative to VRAM make sure that we
 		 * only use it as fallback and still try to fill up VRAM first.
@@ -173,7 +173,7 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain)
 		places[c].fpfn = 0;
 		places[c].lpfn = 0;
 		places[c].mem_type = TTM_PL_SYSTEM;
-		places[c].flags = 0;
+		places[c].flags = TTM_PL_FLAG_MEMCG;
 		c++;
 	}
 
@@ -657,16 +657,21 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
 		size = ALIGN(size, PAGE_SIZE);
 	}
 
-	if (!amdgpu_bo_validate_size(adev, size, bp->domain))
+	if (!amdgpu_bo_validate_size(adev, size, bp->domain)) {
+		mem_cgroup_put(bp->memcg);
 		return -ENOMEM;
+	}
 
 	BUG_ON(bp->bo_ptr_size < sizeof(struct amdgpu_bo));
 
 	*bo_ptr = NULL;
 	bo = kvzalloc(bp->bo_ptr_size, GFP_KERNEL);
-	if (bo == NULL)
+	if (bo == NULL) {
+		mem_cgroup_put(bp->memcg);
 		return -ENOMEM;
+	}
 	drm_gem_private_object_init(adev_to_drm(adev), &bo->tbo.base, size);
+	bo->tbo.memcg = bp->memcg;
 	bo->tbo.base.funcs = &amdgpu_gem_object_funcs;
 	bo->vm_bo = NULL;
 	bo->preferred_domains = bp->preferred_domain ? bp->preferred_domain :
@@ -1341,7 +1346,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
 vm_fault_t amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo)
 {
 	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
-	struct ttm_operation_ctx ctx = { false, false };
+	struct ttm_operation_ctx ctx = { .interruptible = false,
+					 .no_wait_gpu = false };
 	struct amdgpu_bo *abo = ttm_to_amdgpu_bo(bo);
 	int r;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 375448627f7b..9a4c506cfb76 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -55,6 +55,7 @@ struct amdgpu_bo_param {
 	enum ttm_bo_type		type;
 	bool				no_wait_gpu;
 	struct dma_resv			*resv;
+	struct mem_cgroup               *memcg;
 	void				(*destroy)(struct ttm_buffer_object *bo);
 	/* xcp partition number plus 1, 0 means any partition */
 	int8_t				xcp_id_plus1;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 53b71e9d8076..f40b0c0a820b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -151,11 +151,13 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
 			amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_GTT |
 							AMDGPU_GEM_DOMAIN_CPU);
 		}
+		abo->placements[0].flags &= ~TTM_PL_FLAG_MEMCG;
 		break;
 	case TTM_PL_TT:
 	case AMDGPU_PL_PREEMPT:
 	default:
 		amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_CPU);
+		abo->placements[0].flags &= ~TTM_PL_FLAG_MEMCG;
 		break;
 	}
 	*placement = abo->placement;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 7/7] nouveau: add memcg integration
  2025-05-12  6:12 [rfc] drm/ttm/memcg: simplest initial memcg/ttm integration (series v3) Dave Airlie
                   ` (5 preceding siblings ...)
  2025-05-12  6:12 ` [PATCH 6/7] amdgpu: add support for memcg integration Dave Airlie
@ 2025-05-12  6:12 ` Dave Airlie
  6 siblings, 0 replies; 16+ messages in thread
From: Dave Airlie @ 2025-05-12  6:12 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Waiman Long, simona

From: Dave Airlie <airlied@redhat.com>

This just adds the memcg init and memcg placement flag support.

Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_bo.c  | 5 +++--
 drivers/gpu/drm/nouveau/nouveau_gem.c | 2 ++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 2016c1e7242f..6bd8d9ed9f35 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -450,13 +450,13 @@ nouveau_bo_placement_set(struct nouveau_bo *nvbo, uint32_t domain,
 	if (domain & NOUVEAU_GEM_DOMAIN_GART) {
 		pl[*n].mem_type = TTM_PL_TT;
 		pl[*n].flags = busy & NOUVEAU_GEM_DOMAIN_GART ?
-			TTM_PL_FLAG_FALLBACK : 0;
+			TTM_PL_FLAG_FALLBACK : TTM_PL_FLAG_MEMCG;
 		(*n)++;
 	}
 	if (domain & NOUVEAU_GEM_DOMAIN_CPU) {
 		pl[*n].mem_type = TTM_PL_SYSTEM;
 		pl[*n].flags = busy & NOUVEAU_GEM_DOMAIN_CPU ?
-			TTM_PL_FLAG_FALLBACK : 0;
+			TTM_PL_FLAG_FALLBACK : TTM_PL_FLAG_MEMCG;
 		(*n)++;
 	}
 
@@ -814,6 +814,7 @@ nouveau_bo_evict_flags(struct ttm_buffer_object *bo, struct ttm_placement *pl)
 	case TTM_PL_VRAM:
 		nouveau_bo_placement_set(nvbo, NOUVEAU_GEM_DOMAIN_GART,
 					 NOUVEAU_GEM_DOMAIN_CPU);
+		nvbo->placements[0].flags &= ~TTM_PL_FLAG_MEMCG;
 		break;
 	default:
 		nouveau_bo_placement_set(nvbo, NOUVEAU_GEM_DOMAIN_CPU, 0);
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 67e3c99de73a..56899c89bdd8 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -87,6 +87,7 @@ nouveau_gem_object_del(struct drm_gem_object *gem)
 		return;
 	}
 
+	mem_cgroup_put(nvbo->bo.memcg);
 	ttm_bo_put(&nvbo->bo);
 
 	pm_runtime_mark_last_busy(dev);
@@ -254,6 +255,7 @@ nouveau_gem_new(struct nouveau_cli *cli, u64 size, int align, uint32_t domain,
 	if (IS_ERR(nvbo))
 		return PTR_ERR(nvbo);
 
+	nvbo->bo.memcg = get_mem_cgroup_from_mm(current->mm);
 	nvbo->bo.base.funcs = &nouveau_gem_object_funcs;
 	nvbo->no_share = domain & NOUVEAU_GEM_DOMAIN_NO_SHARE;
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] ttm: add initial memcg integration. (v4)
  2025-05-12  6:12 ` [PATCH 5/7] ttm: add initial memcg integration. (v4) Dave Airlie
@ 2025-05-12 14:42   ` kernel test robot
  2025-05-13 13:30   ` Christian König
  2025-05-14 11:41   ` [5/7] " Maarten Lankhorst
  2 siblings, 0 replies; 16+ messages in thread
From: kernel test robot @ 2025-05-12 14:42 UTC (permalink / raw)
  To: Dave Airlie, dri-devel, tj, christian.koenig, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: oe-kbuild-all, cgroups, Waiman Long, simona

Hi Dave,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Dave-Airlie/ttm-use-gpu-mm-stats-to-track-gpu-memory-allocations/20250512-182204
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20250512061913.3522902-6-airlied%40gmail.com
patch subject: [PATCH 5/7] ttm: add initial memcg integration. (v4)
config: riscv-randconfig-001-20250512 (https://download.01.org/0day-ci/archive/20250512/202505122244.IEuTPfWF-lkp@intel.com/config)
compiler: riscv64-linux-gcc (GCC) 7.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250512/202505122244.IEuTPfWF-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505122244.IEuTPfWF-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   drivers/gpu/drm/xe/xe_bo.c: In function 'xe_bo_evict_pinned':
>> drivers/gpu/drm/xe/xe_bo.c:1147:2: warning: the address of 'ctx' will always evaluate as 'true' [-Waddress]
     ret = ttm_bo_populate(&bo->ttm, &ctx);
     ^~~
>> drivers/gpu/drm/xe/xe_bo.c:1147:8: error: too few arguments to function 'ttm_bo_populate'
     ret = ttm_bo_populate(&bo->ttm, &ctx);
           ^~~~~~~~~~~~~~~
   In file included from drivers/gpu/drm/xe/xe_bo_types.h:12:0,
                    from drivers/gpu/drm/xe/xe_bo.h:11,
                    from drivers/gpu/drm/xe/xe_bo.c:6:
   include/drm/ttm/ttm_bo.h:494:5: note: declared here
    int ttm_bo_populate(struct ttm_buffer_object *bo,
        ^~~~~~~~~~~~~~~
   drivers/gpu/drm/xe/xe_bo.c: In function 'xe_bo_restore_pinned':
   drivers/gpu/drm/xe/xe_bo.c:1208:2: warning: the address of 'ctx' will always evaluate as 'true' [-Waddress]
     ret = ttm_bo_populate(&bo->ttm, &ctx);
     ^~~
   drivers/gpu/drm/xe/xe_bo.c:1208:8: error: too few arguments to function 'ttm_bo_populate'
     ret = ttm_bo_populate(&bo->ttm, &ctx);
           ^~~~~~~~~~~~~~~
   In file included from drivers/gpu/drm/xe/xe_bo_types.h:12:0,
                    from drivers/gpu/drm/xe/xe_bo.h:11,
                    from drivers/gpu/drm/xe/xe_bo.c:6:
   include/drm/ttm/ttm_bo.h:494:5: note: declared here
    int ttm_bo_populate(struct ttm_buffer_object *bo,
        ^~~~~~~~~~~~~~~


vim +/ttm_bo_populate +1147 drivers/gpu/drm/xe/xe_bo.c

00c8efc3180f0c Thomas Hellström 2025-03-05  1096  
36919ebeaacab3 Matthew Auld     2023-04-06  1097  /**
36919ebeaacab3 Matthew Auld     2023-04-06  1098   * xe_bo_evict_pinned() - Evict a pinned VRAM object to system memory
36919ebeaacab3 Matthew Auld     2023-04-06  1099   * @bo: The buffer object to move.
36919ebeaacab3 Matthew Auld     2023-04-06  1100   *
75fd04f276de31 Nitin Gote       2025-01-06  1101   * On successful completion, the object memory will be moved to system memory.
36919ebeaacab3 Matthew Auld     2023-04-06  1102   *
36919ebeaacab3 Matthew Auld     2023-04-06  1103   * This is needed to for special handling of pinned VRAM object during
36919ebeaacab3 Matthew Auld     2023-04-06  1104   * suspend-resume.
36919ebeaacab3 Matthew Auld     2023-04-06  1105   *
36919ebeaacab3 Matthew Auld     2023-04-06  1106   * Return: 0 on success. Negative error code on failure.
36919ebeaacab3 Matthew Auld     2023-04-06  1107   */
36919ebeaacab3 Matthew Auld     2023-04-06  1108  int xe_bo_evict_pinned(struct xe_bo *bo)
36919ebeaacab3 Matthew Auld     2023-04-06  1109  {
36919ebeaacab3 Matthew Auld     2023-04-06  1110  	struct ttm_place place = {
36919ebeaacab3 Matthew Auld     2023-04-06  1111  		.mem_type = XE_PL_TT,
36919ebeaacab3 Matthew Auld     2023-04-06  1112  	};
36919ebeaacab3 Matthew Auld     2023-04-06  1113  	struct ttm_placement placement = {
36919ebeaacab3 Matthew Auld     2023-04-06  1114  		.placement = &place,
36919ebeaacab3 Matthew Auld     2023-04-06  1115  		.num_placement = 1,
36919ebeaacab3 Matthew Auld     2023-04-06  1116  	};
36919ebeaacab3 Matthew Auld     2023-04-06  1117  	struct ttm_operation_ctx ctx = {
36919ebeaacab3 Matthew Auld     2023-04-06  1118  		.interruptible = false,
6bd49cc1a8924c Thomas Hellström 2024-10-31  1119  		.gfp_retry_mayfail = true,
36919ebeaacab3 Matthew Auld     2023-04-06  1120  	};
36919ebeaacab3 Matthew Auld     2023-04-06  1121  	struct ttm_resource *new_mem;
36919ebeaacab3 Matthew Auld     2023-04-06  1122  	int ret;
36919ebeaacab3 Matthew Auld     2023-04-06  1123  
36919ebeaacab3 Matthew Auld     2023-04-06  1124  	xe_bo_assert_held(bo);
36919ebeaacab3 Matthew Auld     2023-04-06  1125  
36919ebeaacab3 Matthew Auld     2023-04-06  1126  	if (WARN_ON(!bo->ttm.resource))
36919ebeaacab3 Matthew Auld     2023-04-06  1127  		return -EINVAL;
36919ebeaacab3 Matthew Auld     2023-04-06  1128  
36919ebeaacab3 Matthew Auld     2023-04-06  1129  	if (WARN_ON(!xe_bo_is_pinned(bo)))
36919ebeaacab3 Matthew Auld     2023-04-06  1130  		return -EINVAL;
36919ebeaacab3 Matthew Auld     2023-04-06  1131  
a19d1db9a3fa89 Matthew Brost    2024-10-31  1132  	if (!xe_bo_is_vram(bo))
a19d1db9a3fa89 Matthew Brost    2024-10-31  1133  		return 0;
36919ebeaacab3 Matthew Auld     2023-04-06  1134  
36919ebeaacab3 Matthew Auld     2023-04-06  1135  	ret = ttm_bo_mem_space(&bo->ttm, &placement, &new_mem, &ctx);
36919ebeaacab3 Matthew Auld     2023-04-06  1136  	if (ret)
36919ebeaacab3 Matthew Auld     2023-04-06  1137  		return ret;
36919ebeaacab3 Matthew Auld     2023-04-06  1138  
36919ebeaacab3 Matthew Auld     2023-04-06  1139  	if (!bo->ttm.ttm) {
36919ebeaacab3 Matthew Auld     2023-04-06  1140  		bo->ttm.ttm = xe_ttm_tt_create(&bo->ttm, 0);
36919ebeaacab3 Matthew Auld     2023-04-06  1141  		if (!bo->ttm.ttm) {
36919ebeaacab3 Matthew Auld     2023-04-06  1142  			ret = -ENOMEM;
36919ebeaacab3 Matthew Auld     2023-04-06  1143  			goto err_res_free;
36919ebeaacab3 Matthew Auld     2023-04-06  1144  		}
36919ebeaacab3 Matthew Auld     2023-04-06  1145  	}
36919ebeaacab3 Matthew Auld     2023-04-06  1146  
fc5d96670eb254 Thomas Hellström 2024-09-11 @1147  	ret = ttm_bo_populate(&bo->ttm, &ctx);
36919ebeaacab3 Matthew Auld     2023-04-06  1148  	if (ret)
36919ebeaacab3 Matthew Auld     2023-04-06  1149  		goto err_res_free;
36919ebeaacab3 Matthew Auld     2023-04-06  1150  
36919ebeaacab3 Matthew Auld     2023-04-06  1151  	ret = dma_resv_reserve_fences(bo->ttm.base.resv, 1);
36919ebeaacab3 Matthew Auld     2023-04-06  1152  	if (ret)
36919ebeaacab3 Matthew Auld     2023-04-06  1153  		goto err_res_free;
36919ebeaacab3 Matthew Auld     2023-04-06  1154  
36919ebeaacab3 Matthew Auld     2023-04-06  1155  	ret = xe_bo_move(&bo->ttm, false, &ctx, new_mem, NULL);
36919ebeaacab3 Matthew Auld     2023-04-06  1156  	if (ret)
36919ebeaacab3 Matthew Auld     2023-04-06  1157  		goto err_res_free;
36919ebeaacab3 Matthew Auld     2023-04-06  1158  
36919ebeaacab3 Matthew Auld     2023-04-06  1159  	return 0;
36919ebeaacab3 Matthew Auld     2023-04-06  1160  
36919ebeaacab3 Matthew Auld     2023-04-06  1161  err_res_free:
36919ebeaacab3 Matthew Auld     2023-04-06  1162  	ttm_resource_free(&bo->ttm, &new_mem);
36919ebeaacab3 Matthew Auld     2023-04-06  1163  	return ret;
36919ebeaacab3 Matthew Auld     2023-04-06  1164  }
36919ebeaacab3 Matthew Auld     2023-04-06  1165  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/7] amdgpu: add support for memcg integration
  2025-05-12  6:12 ` [PATCH 6/7] amdgpu: add support for memcg integration Dave Airlie
@ 2025-05-13 13:21   ` Christian König
  0 siblings, 0 replies; 16+ messages in thread
From: Christian König @ 2025-05-13 13:21 UTC (permalink / raw)
  To: Dave Airlie, dri-devel, tj, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Waiman Long, simona

On 5/12/25 08:12, Dave Airlie wrote:
> From: Dave Airlie <airlied@redhat.com>
> 
> This adds the memcg object for any user allocated objects,
> add uses the MEMCG placement flags in the correct places.
> 
> Signed-off-by: Dave Airlie <airlied@redhat.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c     |  5 ++++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    |  2 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 16 +++++++++++-----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    |  2 ++
>  5 files changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 82df06a72ee0..1684a7e6d6cd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -839,7 +839,10 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>  				union drm_amdgpu_cs *cs)
>  {
>  	struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
> -	struct ttm_operation_ctx ctx = { true, false };
> +	struct ttm_operation_ctx ctx = {
> +		.interruptible = true,
> +		.no_wait_gpu = false,
> +	};

In general this change here is very welcomed, but should probably be in a separate patch.

Looks likely a leftover from when the flag was in the context.

Apart from that the patch series looks totally fine to me.

Regards,
Christian.

>  	struct amdgpu_vm *vm = &fpriv->vm;
>  	struct amdgpu_bo_list_entry *e;
>  	struct drm_gem_object *obj;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index 69429df09477..bdad9a862ed3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -89,6 +89,7 @@ static void amdgpu_gem_object_free(struct drm_gem_object *gobj)
>  	struct amdgpu_bo *aobj = gem_to_amdgpu_bo(gobj);
>  
>  	amdgpu_hmm_unregister(aobj);
> +	mem_cgroup_put(aobj->tbo.memcg);
>  	ttm_bo_put(&aobj->tbo);
>  }
>  
> @@ -116,6 +117,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size,
>  	bp.domain = initial_domain;
>  	bp.bo_ptr_size = sizeof(struct amdgpu_bo);
>  	bp.xcp_id_plus1 = xcp_id_plus1;
> +	bp.memcg = get_mem_cgroup_from_mm(current->mm);
>  
>  	r = amdgpu_bo_create_user(adev, &bp, &ubo);
>  	if (r)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 0b9987781f76..1d930421354a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -158,7 +158,7 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain)
>  		places[c].mem_type =
>  			abo->flags & AMDGPU_GEM_CREATE_PREEMPTIBLE ?
>  			AMDGPU_PL_PREEMPT : TTM_PL_TT;
> -		places[c].flags = 0;
> +		places[c].flags = TTM_PL_FLAG_MEMCG;
>  		/*
>  		 * When GTT is just an alternative to VRAM make sure that we
>  		 * only use it as fallback and still try to fill up VRAM first.
> @@ -173,7 +173,7 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain)
>  		places[c].fpfn = 0;
>  		places[c].lpfn = 0;
>  		places[c].mem_type = TTM_PL_SYSTEM;
> -		places[c].flags = 0;
> +		places[c].flags = TTM_PL_FLAG_MEMCG;
>  		c++;
>  	}
>  
> @@ -657,16 +657,21 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>  		size = ALIGN(size, PAGE_SIZE);
>  	}
>  
> -	if (!amdgpu_bo_validate_size(adev, size, bp->domain))
> +	if (!amdgpu_bo_validate_size(adev, size, bp->domain)) {
> +		mem_cgroup_put(bp->memcg);
>  		return -ENOMEM;
> +	}
>  
>  	BUG_ON(bp->bo_ptr_size < sizeof(struct amdgpu_bo));
>  
>  	*bo_ptr = NULL;
>  	bo = kvzalloc(bp->bo_ptr_size, GFP_KERNEL);
> -	if (bo == NULL)
> +	if (bo == NULL) {
> +		mem_cgroup_put(bp->memcg);
>  		return -ENOMEM;
> +	}
>  	drm_gem_private_object_init(adev_to_drm(adev), &bo->tbo.base, size);
> +	bo->tbo.memcg = bp->memcg;
>  	bo->tbo.base.funcs = &amdgpu_gem_object_funcs;
>  	bo->vm_bo = NULL;
>  	bo->preferred_domains = bp->preferred_domain ? bp->preferred_domain :
> @@ -1341,7 +1346,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
>  vm_fault_t amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo)
>  {
>  	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
> -	struct ttm_operation_ctx ctx = { false, false };
> +	struct ttm_operation_ctx ctx = { .interruptible = false,
> +					 .no_wait_gpu = false };
>  	struct amdgpu_bo *abo = ttm_to_amdgpu_bo(bo);
>  	int r;
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> index 375448627f7b..9a4c506cfb76 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> @@ -55,6 +55,7 @@ struct amdgpu_bo_param {
>  	enum ttm_bo_type		type;
>  	bool				no_wait_gpu;
>  	struct dma_resv			*resv;
> +	struct mem_cgroup               *memcg;
>  	void				(*destroy)(struct ttm_buffer_object *bo);
>  	/* xcp partition number plus 1, 0 means any partition */
>  	int8_t				xcp_id_plus1;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 53b71e9d8076..f40b0c0a820b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -151,11 +151,13 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
>  			amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_GTT |
>  							AMDGPU_GEM_DOMAIN_CPU);
>  		}
> +		abo->placements[0].flags &= ~TTM_PL_FLAG_MEMCG;
>  		break;
>  	case TTM_PL_TT:
>  	case AMDGPU_PL_PREEMPT:
>  	default:
>  		amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_CPU);
> +		abo->placements[0].flags &= ~TTM_PL_FLAG_MEMCG;
>  		break;
>  	}
>  	*placement = abo->placement;


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] ttm: add initial memcg integration. (v4)
  2025-05-12  6:12 ` [PATCH 5/7] ttm: add initial memcg integration. (v4) Dave Airlie
  2025-05-12 14:42   ` kernel test robot
@ 2025-05-13 13:30   ` Christian König
  2025-05-14 11:41   ` [5/7] " Maarten Lankhorst
  2 siblings, 0 replies; 16+ messages in thread
From: Christian König @ 2025-05-13 13:30 UTC (permalink / raw)
  To: Dave Airlie, dri-devel, tj, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Waiman Long, simona

On 5/12/25 08:12, Dave Airlie wrote:
> From: Dave Airlie <airlied@redhat.com>
> 
> Doing proper integration of TTM system memory allocations with
> memcg is a difficult ask, primarily due to difficulties around
> accounting for evictions properly.
> 
> However there are systems where userspace will be allocating
> objects in system memory and they won't be prone to migrating
> or evicting and we should start with at least accounting those.
> 
> This adds a memcg group to ttm bo and tt objects.
> 
> This memcg is used when:
> a) when a tt is populated (and unpopulated)
> b) the TTM_PL_FLAG_MEMCG is set on the placement for the
> bo when the tt is allocated.
> 
> The placement flag is set for all non-eviction placements.
> 
> This version moves back from the resource to the tt layer,
> when accounting at the resource layer, if an object is swapped
> out there was no way to remove it from the accounting, whereas
> the tt layer has more info for this.

Good point, we should probably really come up with a SWAPED TTM domain.

The nice thing about attaching it to the resource would have been that you could charge evicted BOs when they are re-validated by the driver.

But that can also come much later.

Regards,
Christian.


> 
> v4: move back to the tt layer from the resource layer to
> handle swap, but keep the memcg charging hooks for now.
> v3: moves from having a flags on the op ctx to the using a
> placement flag.
> v2: moved the charging up a level and also no longer used
> __GFP_ACCOUNT, or attached the memcg to object pages, it instead
> uses the same approach as socket memory and just charges/uncharges
> at the object level. This was suggested by Christian.
> 
> Signed-off-by: Dave Airlie <airlied@redhat.com>
> ---
>  drivers/gpu/drm/ttm/ttm_bo.c      |  6 ++++--
>  drivers/gpu/drm/ttm/ttm_bo_util.c |  6 +++---
>  drivers/gpu/drm/ttm/ttm_bo_vm.c   |  4 +++-
>  drivers/gpu/drm/ttm/ttm_tt.c      | 17 ++++++++++++++++-
>  include/drm/ttm/ttm_bo.h          |  7 +++++++
>  include/drm/ttm/ttm_placement.h   |  3 +++
>  include/drm/ttm/ttm_tt.h          |  9 ++++++++-
>  7 files changed, 44 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 5bf3c969907c..1630ef28e5a8 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -140,7 +140,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
>  			goto out_err;
>  
>  		if (mem->mem_type != TTM_PL_SYSTEM) {
> -			ret = ttm_bo_populate(bo, ctx);
> +			ret = ttm_bo_populate(bo, mem->placement & TTM_PL_FLAG_MEMCG, ctx);
>  			if (ret)
>  				goto out_err;
>  		}
> @@ -1237,6 +1237,7 @@ void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
>  /**
>   * ttm_bo_populate() - Ensure that a buffer object has backing pages
>   * @bo: The buffer object
> + * @memcg_account: account this memory with memcg if needed
>   * @ctx: The ttm_operation_ctx governing the operation.
>   *
>   * For buffer objects in a memory type whose manager uses
> @@ -1250,6 +1251,7 @@ void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
>   * is set to true.
>   */
>  int ttm_bo_populate(struct ttm_buffer_object *bo,
> +		    bool memcg_account,
>  		    struct ttm_operation_ctx *ctx)
>  {
>  	struct ttm_tt *tt = bo->ttm;
> @@ -1262,7 +1264,7 @@ int ttm_bo_populate(struct ttm_buffer_object *bo,
>  		return 0;
>  
>  	swapped = ttm_tt_is_swapped(tt);
> -	ret = ttm_tt_populate(bo->bdev, tt, ctx);
> +	ret = ttm_tt_populate(bo->bdev, tt, memcg_account, ctx);
>  	if (ret)
>  		return ret;
>  
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
> index 15cab9bda17f..7d599d0707e4 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> @@ -163,7 +163,7 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo,
>  	src_man = ttm_manager_type(bdev, src_mem->mem_type);
>  	if (ttm && ((ttm->page_flags & TTM_TT_FLAG_SWAPPED) ||
>  		    dst_man->use_tt)) {
> -		ret = ttm_bo_populate(bo, ctx);
> +		ret = ttm_bo_populate(bo, dst_mem->placement & TTM_PL_FLAG_MEMCG, ctx);
>  		if (ret)
>  			return ret;
>  	}
> @@ -350,7 +350,7 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo,
>  
>  	BUG_ON(!ttm);
>  
> -	ret = ttm_bo_populate(bo, &ctx);
> +	ret = ttm_bo_populate(bo, mem->placement & TTM_PL_FLAG_MEMCG, &ctx);
>  	if (ret)
>  		return ret;
>  
> @@ -507,7 +507,7 @@ int ttm_bo_vmap(struct ttm_buffer_object *bo, struct iosys_map *map)
>  		pgprot_t prot;
>  		void *vaddr;
>  
> -		ret = ttm_bo_populate(bo, &ctx);
> +		ret = ttm_bo_populate(bo, mem->placement & TTM_PL_FLAG_MEMCG, &ctx);
>  		if (ret)
>  			return ret;
>  
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> index a194db83421d..02aea23a34e7 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> @@ -224,7 +224,9 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
>  		};
>  
>  		ttm = bo->ttm;
> -		err = ttm_bo_populate(bo, &ctx);
> +		err = ttm_bo_populate(bo,
> +				      bo->resource->placement & TTM_PL_FLAG_MEMCG,
> +				      &ctx);
>  		if (err) {
>  			if (err == -EINTR || err == -ERESTARTSYS ||
>  			    err == -EAGAIN)
> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> index 698cd4bf5e46..81c4cbbeb130 100644
> --- a/drivers/gpu/drm/ttm/ttm_tt.c
> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> @@ -161,6 +161,7 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
>  	ttm->caching = caching;
>  	ttm->restore = NULL;
>  	ttm->backup = NULL;
> +	ttm->memcg = bo->memcg;
>  }
>  
>  int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
> @@ -365,7 +366,9 @@ int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
>  EXPORT_SYMBOL_FOR_TESTS_ONLY(ttm_tt_swapout);
>  
>  int ttm_tt_populate(struct ttm_device *bdev,
> -		    struct ttm_tt *ttm, struct ttm_operation_ctx *ctx)
> +		    struct ttm_tt *ttm,
> +		    bool memcg_account_tt,
> +		    struct ttm_operation_ctx *ctx)
>  {
>  	int ret;
>  
> @@ -376,6 +379,14 @@ int ttm_tt_populate(struct ttm_device *bdev,
>  		return 0;
>  
>  	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
> +		if (ttm->memcg && memcg_account_tt) {
> +			gfp_t gfp_flags = GFP_USER;
> +			if (ctx->gfp_retry_mayfail)
> +				gfp_flags |= __GFP_RETRY_MAYFAIL;
> +			if (!mem_cgroup_charge_gpu(ttm->memcg, ttm->num_pages, gfp_flags))
> +				return -ENOMEM;
> +			ttm->page_flags |= TTM_TT_FLAG_ACCOUNTED;
> +		}
>  		atomic_long_add(ttm->num_pages, &ttm_pages_allocated);
>  		if (bdev->pool.use_dma32)
>  			atomic_long_add(ttm->num_pages,
> @@ -437,6 +448,10 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm)
>  		ttm_pool_free(&bdev->pool, ttm);
>  
>  	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
> +		if (ttm->page_flags & TTM_TT_FLAG_ACCOUNTED) {
> +			mem_cgroup_uncharge_gpu(ttm->memcg, ttm->num_pages);
> +			ttm->page_flags &= ~TTM_TT_FLAG_ACCOUNTED;
> +		}
>  		atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
>  		if (bdev->pool.use_dma32)
>  			atomic_long_sub(ttm->num_pages,
> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> index 903cd1030110..d7c0dd9e0746 100644
> --- a/include/drm/ttm/ttm_bo.h
> +++ b/include/drm/ttm/ttm_bo.h
> @@ -135,6 +135,12 @@ struct ttm_buffer_object {
>  	 * reservation lock.
>  	 */
>  	struct sg_table *sg;
> +
> +	/**
> +	 * @memcg: memory cgroup to charge this to if it ends up using system memory.
> +	 * NULL means don't charge.
> +	 */
> +	struct mem_cgroup *memcg;
>  };
>  
>  #define TTM_BO_MAP_IOMEM_MASK 0x80
> @@ -486,6 +492,7 @@ pgprot_t ttm_io_prot(struct ttm_buffer_object *bo, struct ttm_resource *res,
>  		     pgprot_t tmp);
>  void ttm_bo_tt_destroy(struct ttm_buffer_object *bo);
>  int ttm_bo_populate(struct ttm_buffer_object *bo,
> +		    bool memcg_account,
>  		    struct ttm_operation_ctx *ctx);
>  
>  /* Driver LRU walk helpers initially targeted for shrinking. */
> diff --git a/include/drm/ttm/ttm_placement.h b/include/drm/ttm/ttm_placement.h
> index b510a4812609..668798072292 100644
> --- a/include/drm/ttm/ttm_placement.h
> +++ b/include/drm/ttm/ttm_placement.h
> @@ -70,6 +70,9 @@
>  /* Placement is only used during eviction */
>  #define TTM_PL_FLAG_FALLBACK	(1 << 4)
>  
> +/* Placement causes memcg accounting */
> +#define TTM_PL_FLAG_MEMCG	(1 << 5)
> +
>  /**
>   * struct ttm_place
>   *
> diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
> index 406437ad674b..2790fc82edc3 100644
> --- a/include/drm/ttm/ttm_tt.h
> +++ b/include/drm/ttm/ttm_tt.h
> @@ -90,6 +90,8 @@ struct ttm_tt {
>  	 * TTM_TT_FLAG_BACKED_UP: TTM internal only. This is set if the
>  	 * struct ttm_tt has been (possibly partially) backed up.
>  	 *
> +	 * TTM_TT_FLAG_ACCOUNTED: TTM internal. This tt has been accounted.
> +	 *
>  	 * TTM_TT_FLAG_PRIV_POPULATED: TTM internal only. DO NOT USE. This is
>  	 * set by TTM after ttm_tt_populate() has successfully returned, and is
>  	 * then unset when TTM calls ttm_tt_unpopulate().
> @@ -101,8 +103,9 @@ struct ttm_tt {
>  #define TTM_TT_FLAG_EXTERNAL_MAPPABLE	BIT(3)
>  #define TTM_TT_FLAG_DECRYPTED		BIT(4)
>  #define TTM_TT_FLAG_BACKED_UP	        BIT(5)
> +#define TTM_TT_FLAG_ACCOUNTED	        BIT(6)
>  
> -#define TTM_TT_FLAG_PRIV_POPULATED	BIT(6)
> +#define TTM_TT_FLAG_PRIV_POPULATED	BIT(7)
>  	uint32_t page_flags;
>  	/** @num_pages: Number of pages in the page array. */
>  	uint32_t num_pages;
> @@ -126,6 +129,8 @@ struct ttm_tt {
>  	enum ttm_caching caching;
>  	/** @restore: Partial restoration from backup state. TTM private */
>  	struct ttm_pool_tt_restore *restore;
> +	/** @memcg: Memory cgroup for this TT allocation */
> +	struct mem_cgroup *memcg;
>  };
>  
>  /**
> @@ -245,11 +250,13 @@ int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
>   *
>   * @bdev: the ttm_device this object belongs to
>   * @ttm: Pointer to the ttm_tt structure
> + * @mem_account_tt: Account this population to the memcg
>   * @ctx: operation context for populating the tt object.
>   *
>   * Calls the driver method to allocate pages for a ttm
>   */
>  int ttm_tt_populate(struct ttm_device *bdev, struct ttm_tt *ttm,
> +		    bool mem_account_tt,
>  		    struct ttm_operation_ctx *ctx);
>  
>  /**


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [5/7] ttm: add initial memcg integration. (v4)
  2025-05-12  6:12 ` [PATCH 5/7] ttm: add initial memcg integration. (v4) Dave Airlie
  2025-05-12 14:42   ` kernel test robot
  2025-05-13 13:30   ` Christian König
@ 2025-05-14 11:41   ` Maarten Lankhorst
  2025-05-14 11:55     ` Christian König
  2 siblings, 1 reply; 16+ messages in thread
From: Maarten Lankhorst @ 2025-05-14 11:41 UTC (permalink / raw)
  To: Dave Airlie, dri-devel, tj, christian.koenig, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Waiman Long, simona

[-- Attachment #1: Type: text/plain, Size: 10851 bytes --]

Hi Dave,

We've had a small discussion on irc, so I wanted to summarize it here: All memory allocated should be accounted, even memory that is being evicted from VRAM. This may cause the process that originally allocated the VRAM to go over the memcg limit,that should be solved by invoking OOM condition on the original process, which may have ways to solve it like purging purgeable memory, or as last resort OOM killing. The VRAM evicter is already memcg aware, so it should be possible to do the same for the shrinker. I created a patch to use the same cgroup for memcg as for dmem, we shouldprobably extract the cgroup from mm->owner, and create a function to charge dmemcg and memcg with a specified cgroup. For applications that use a centralised allocator, it might be needed to charge a different cgroup when exporting.

Kind regards, Maarten

On 2025-05-12 08:12, Dave Airlie wrote:
> From: Dave Airlie <airlied@redhat.com>
>
> Doing proper integration of TTM system memory allocations with
> memcg is a difficult ask, primarily due to difficulties around
> accounting for evictions properly.
>
> However there are systems where userspace will be allocating
> objects in system memory and they won't be prone to migrating
> or evicting and we should start with at least accounting those.
>
> This adds a memcg group to ttm bo and tt objects.
>
> This memcg is used when:
> a) when a tt is populated (and unpopulated)
> b) the TTM_PL_FLAG_MEMCG is set on the placement for the
> bo when the tt is allocated.
>
> The placement flag is set for all non-eviction placements.
>
> This version moves back from the resource to the tt layer,
> when accounting at the resource layer, if an object is swapped
> out there was no way to remove it from the accounting, whereas
> the tt layer has more info for this.
>
> v4: move back to the tt layer from the resource layer to
> handle swap, but keep the memcg charging hooks for now.
> v3: moves from having a flags on the op ctx to the using a
> placement flag.
> v2: moved the charging up a level and also no longer used
> __GFP_ACCOUNT, or attached the memcg to object pages, it instead
> uses the same approach as socket memory and just charges/uncharges
> at the object level. This was suggested by Christian.
>
> Signed-off-by: Dave Airlie <airlied@redhat.com>
> ---
>  drivers/gpu/drm/ttm/ttm_bo.c      |  6 ++++--
>  drivers/gpu/drm/ttm/ttm_bo_util.c |  6 +++---
>  drivers/gpu/drm/ttm/ttm_bo_vm.c   |  4 +++-
>  drivers/gpu/drm/ttm/ttm_tt.c      | 17 ++++++++++++++++-
>  include/drm/ttm/ttm_bo.h          |  7 +++++++
>  include/drm/ttm/ttm_placement.h   |  3 +++
>  include/drm/ttm/ttm_tt.h          |  9 ++++++++-
>  7 files changed, 44 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 5bf3c969907c..1630ef28e5a8 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -140,7 +140,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
>  			goto out_err;
>  
>  		if (mem->mem_type != TTM_PL_SYSTEM) {
> -			ret = ttm_bo_populate(bo, ctx);
> +			ret = ttm_bo_populate(bo, mem->placement & TTM_PL_FLAG_MEMCG, ctx);
>  			if (ret)
>  				goto out_err;
>  		}
> @@ -1237,6 +1237,7 @@ void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
>  /**
>   * ttm_bo_populate() - Ensure that a buffer object has backing pages
>   * @bo: The buffer object
> + * @memcg_account: account this memory with memcg if needed
>   * @ctx: The ttm_operation_ctx governing the operation.
>   *
>   * For buffer objects in a memory type whose manager uses
> @@ -1250,6 +1251,7 @@ void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
>   * is set to true.
>   */
>  int ttm_bo_populate(struct ttm_buffer_object *bo,
> +		    bool memcg_account,
>  		    struct ttm_operation_ctx *ctx)
>  {
>  	struct ttm_tt *tt = bo->ttm;
> @@ -1262,7 +1264,7 @@ int ttm_bo_populate(struct ttm_buffer_object *bo,
>  		return 0;
>  
>  	swapped = ttm_tt_is_swapped(tt);
> -	ret = ttm_tt_populate(bo->bdev, tt, ctx);
> +	ret = ttm_tt_populate(bo->bdev, tt, memcg_account, ctx);
>  	if (ret)
>  		return ret;
>  
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
> index 15cab9bda17f..7d599d0707e4 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> @@ -163,7 +163,7 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo,
>  	src_man = ttm_manager_type(bdev, src_mem->mem_type);
>  	if (ttm && ((ttm->page_flags & TTM_TT_FLAG_SWAPPED) ||
>  		    dst_man->use_tt)) {
> -		ret = ttm_bo_populate(bo, ctx);
> +		ret = ttm_bo_populate(bo, dst_mem->placement & TTM_PL_FLAG_MEMCG, ctx);
>  		if (ret)
>  			return ret;
>  	}
> @@ -350,7 +350,7 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo,
>  
>  	BUG_ON(!ttm);
>  
> -	ret = ttm_bo_populate(bo, &ctx);
> +	ret = ttm_bo_populate(bo, mem->placement & TTM_PL_FLAG_MEMCG, &ctx);
>  	if (ret)
>  		return ret;
>  
> @@ -507,7 +507,7 @@ int ttm_bo_vmap(struct ttm_buffer_object *bo, struct iosys_map *map)
>  		pgprot_t prot;
>  		void *vaddr;
>  
> -		ret = ttm_bo_populate(bo, &ctx);
> +		ret = ttm_bo_populate(bo, mem->placement & TTM_PL_FLAG_MEMCG, &ctx);
>  		if (ret)
>  			return ret;
>  
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> index a194db83421d..02aea23a34e7 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> @@ -224,7 +224,9 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
>  		};
>  
>  		ttm = bo->ttm;
> -		err = ttm_bo_populate(bo, &ctx);
> +		err = ttm_bo_populate(bo,
> +				      bo->resource->placement & TTM_PL_FLAG_MEMCG,
> +				      &ctx);
>  		if (err) {
>  			if (err == -EINTR || err == -ERESTARTSYS ||
>  			    err == -EAGAIN)
> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> index 698cd4bf5e46..81c4cbbeb130 100644
> --- a/drivers/gpu/drm/ttm/ttm_tt.c
> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> @@ -161,6 +161,7 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
>  	ttm->caching = caching;
>  	ttm->restore = NULL;
>  	ttm->backup = NULL;
> +	ttm->memcg = bo->memcg;
>  }
>  
>  int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
> @@ -365,7 +366,9 @@ int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
>  EXPORT_SYMBOL_FOR_TESTS_ONLY(ttm_tt_swapout);
>  
>  int ttm_tt_populate(struct ttm_device *bdev,
> -		    struct ttm_tt *ttm, struct ttm_operation_ctx *ctx)
> +		    struct ttm_tt *ttm,
> +		    bool memcg_account_tt,
> +		    struct ttm_operation_ctx *ctx)
>  {
>  	int ret;
>  
> @@ -376,6 +379,14 @@ int ttm_tt_populate(struct ttm_device *bdev,
>  		return 0;
>  
>  	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
> +		if (ttm->memcg && memcg_account_tt) {
> +			gfp_t gfp_flags = GFP_USER;
> +			if (ctx->gfp_retry_mayfail)
> +				gfp_flags |= __GFP_RETRY_MAYFAIL;
> +			if (!mem_cgroup_charge_gpu(ttm->memcg, ttm->num_pages, gfp_flags))
> +				return -ENOMEM;
> +			ttm->page_flags |= TTM_TT_FLAG_ACCOUNTED;
> +		}
>  		atomic_long_add(ttm->num_pages, &ttm_pages_allocated);
>  		if (bdev->pool.use_dma32)
>  			atomic_long_add(ttm->num_pages,
> @@ -437,6 +448,10 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm)
>  		ttm_pool_free(&bdev->pool, ttm);
>  
>  	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
> +		if (ttm->page_flags & TTM_TT_FLAG_ACCOUNTED) {
> +			mem_cgroup_uncharge_gpu(ttm->memcg, ttm->num_pages);
> +			ttm->page_flags &= ~TTM_TT_FLAG_ACCOUNTED;
> +		}
>  		atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
>  		if (bdev->pool.use_dma32)
>  			atomic_long_sub(ttm->num_pages,
> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> index 903cd1030110..d7c0dd9e0746 100644
> --- a/include/drm/ttm/ttm_bo.h
> +++ b/include/drm/ttm/ttm_bo.h
> @@ -135,6 +135,12 @@ struct ttm_buffer_object {
>  	 * reservation lock.
>  	 */
>  	struct sg_table *sg;
> +
> +	/**
> +	 * @memcg: memory cgroup to charge this to if it ends up using system memory.
> +	 * NULL means don't charge.
> +	 */
> +	struct mem_cgroup *memcg;
>  };
>  
>  #define TTM_BO_MAP_IOMEM_MASK 0x80
> @@ -486,6 +492,7 @@ pgprot_t ttm_io_prot(struct ttm_buffer_object *bo, struct ttm_resource *res,
>  		     pgprot_t tmp);
>  void ttm_bo_tt_destroy(struct ttm_buffer_object *bo);
>  int ttm_bo_populate(struct ttm_buffer_object *bo,
> +		    bool memcg_account,
>  		    struct ttm_operation_ctx *ctx);
>  
>  /* Driver LRU walk helpers initially targeted for shrinking. */
> diff --git a/include/drm/ttm/ttm_placement.h b/include/drm/ttm/ttm_placement.h
> index b510a4812609..668798072292 100644
> --- a/include/drm/ttm/ttm_placement.h
> +++ b/include/drm/ttm/ttm_placement.h
> @@ -70,6 +70,9 @@
>  /* Placement is only used during eviction */
>  #define TTM_PL_FLAG_FALLBACK	(1 << 4)
>  
> +/* Placement causes memcg accounting */
> +#define TTM_PL_FLAG_MEMCG	(1 << 5)
> +
>  /**
>   * struct ttm_place
>   *
> diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
> index 406437ad674b..2790fc82edc3 100644
> --- a/include/drm/ttm/ttm_tt.h
> +++ b/include/drm/ttm/ttm_tt.h
> @@ -90,6 +90,8 @@ struct ttm_tt {
>  	 * TTM_TT_FLAG_BACKED_UP: TTM internal only. This is set if the
>  	 * struct ttm_tt has been (possibly partially) backed up.
>  	 *
> +	 * TTM_TT_FLAG_ACCOUNTED: TTM internal. This tt has been accounted.
> +	 *
>  	 * TTM_TT_FLAG_PRIV_POPULATED: TTM internal only. DO NOT USE. This is
>  	 * set by TTM after ttm_tt_populate() has successfully returned, and is
>  	 * then unset when TTM calls ttm_tt_unpopulate().
> @@ -101,8 +103,9 @@ struct ttm_tt {
>  #define TTM_TT_FLAG_EXTERNAL_MAPPABLE	BIT(3)
>  #define TTM_TT_FLAG_DECRYPTED		BIT(4)
>  #define TTM_TT_FLAG_BACKED_UP	        BIT(5)
> +#define TTM_TT_FLAG_ACCOUNTED	        BIT(6)
>  
> -#define TTM_TT_FLAG_PRIV_POPULATED	BIT(6)
> +#define TTM_TT_FLAG_PRIV_POPULATED	BIT(7)
>  	uint32_t page_flags;
>  	/** @num_pages: Number of pages in the page array. */
>  	uint32_t num_pages;
> @@ -126,6 +129,8 @@ struct ttm_tt {
>  	enum ttm_caching caching;
>  	/** @restore: Partial restoration from backup state. TTM private */
>  	struct ttm_pool_tt_restore *restore;
> +	/** @memcg: Memory cgroup for this TT allocation */
> +	struct mem_cgroup *memcg;
>  };
>  
>  /**
> @@ -245,11 +250,13 @@ int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
>   *
>   * @bdev: the ttm_device this object belongs to
>   * @ttm: Pointer to the ttm_tt structure
> + * @mem_account_tt: Account this population to the memcg
>   * @ctx: operation context for populating the tt object.
>   *
>   * Calls the driver method to allocate pages for a ttm
>   */
>  int ttm_tt_populate(struct ttm_device *bdev, struct ttm_tt *ttm,
> +		    bool mem_account_tt,
>  		    struct ttm_operation_ctx *ctx);
>  
>  /**

[-- Attachment #2: Type: text/html, Size: 11371 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [5/7] ttm: add initial memcg integration. (v4)
  2025-05-14 11:41   ` [5/7] " Maarten Lankhorst
@ 2025-05-14 11:55     ` Christian König
  2025-05-14 17:07       ` Maarten Lankhorst
  0 siblings, 1 reply; 16+ messages in thread
From: Christian König @ 2025-05-14 11:55 UTC (permalink / raw)
  To: Maarten Lankhorst, Dave Airlie, dri-devel, tj, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Waiman Long, simona

On 5/14/25 13:41, Maarten Lankhorst wrote:
> Hi Dave,
> 
> We've had a small discussion on irc, so I wanted to summarize it here:
> 
> All memory allocated should be accounted, even memory that is being
> evicted from VRAM.

That sounds like a really bad idea to me.

> This may cause the process that originally allocated
> the VRAM to go over the memcg limit, that should be solved by invoking
> OOM condition on the original process, which may have ways to solve it
> like purging purgeable memory, or as last resort OOM killing.

You are basically suggesting to kill an application for something it never requested in the first place.

In other words when an application requested a buffer to be placed in VRAM we can't make it responsible that the buffer had to be moved to system memory because of over allocation.

As far as I can see and have discussed with others so far this approach is a clear no-go.

Regards,
Christian.


> The VRAM evicter is already memcg aware, so it should be possible to do
> the same for the shrinker. I created a patch to use the same cgroup for
> memcg as for dmem, we should probably extract the cgroup from mm->owner,
> and create a function to charge dmemcg and memcg with a specified cgroup.
> 
> For applications that use a centralised allocator, it might be needed to
> charge a different cgroup when exporting.
> 
> Kind regards,
> Maarten
> 
> On 2025-05-12 08:12, Dave Airlie wrote:
>> From: Dave Airlie <airlied@redhat.com>
>>
>> Doing proper integration of TTM system memory allocations with
>> memcg is a difficult ask, primarily due to difficulties around
>> accounting for evictions properly.
>>
>> However there are systems where userspace will be allocating
>> objects in system memory and they won't be prone to migrating
>> or evicting and we should start with at least accounting those.
>>
>> This adds a memcg group to ttm bo and tt objects.
>>
>> This memcg is used when:
>> a) when a tt is populated (and unpopulated)
>> b) the TTM_PL_FLAG_MEMCG is set on the placement for the
>> bo when the tt is allocated.
>>
>> The placement flag is set for all non-eviction placements.
>>
>> This version moves back from the resource to the tt layer,
>> when accounting at the resource layer, if an object is swapped
>> out there was no way to remove it from the accounting, whereas
>> the tt layer has more info for this.
>>
>> v4: move back to the tt layer from the resource layer to
>> handle swap, but keep the memcg charging hooks for now.
>> v3: moves from having a flags on the op ctx to the using a
>> placement flag.
>> v2: moved the charging up a level and also no longer used
>> __GFP_ACCOUNT, or attached the memcg to object pages, it instead
>> uses the same approach as socket memory and just charges/uncharges
>> at the object level. This was suggested by Christian.
>>
>> Signed-off-by: Dave Airlie <airlied@redhat.com>
>> ---
>>  drivers/gpu/drm/ttm/ttm_bo.c      |  6 ++++--
>>  drivers/gpu/drm/ttm/ttm_bo_util.c |  6 +++---
>>  drivers/gpu/drm/ttm/ttm_bo_vm.c   |  4 +++-
>>  drivers/gpu/drm/ttm/ttm_tt.c      | 17 ++++++++++++++++-
>>  include/drm/ttm/ttm_bo.h          |  7 +++++++
>>  include/drm/ttm/ttm_placement.h   |  3 +++
>>  include/drm/ttm/ttm_tt.h          |  9 ++++++++-
>>  7 files changed, 44 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> index 5bf3c969907c..1630ef28e5a8 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -140,7 +140,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
>>  			goto out_err;
>>  
>>  		if (mem->mem_type != TTM_PL_SYSTEM) {
>> -			ret = ttm_bo_populate(bo, ctx);
>> +			ret = ttm_bo_populate(bo, mem->placement & TTM_PL_FLAG_MEMCG, ctx);
>>  			if (ret)
>>  				goto out_err;
>>  		}
>> @@ -1237,6 +1237,7 @@ void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
>>  /**
>>   * ttm_bo_populate() - Ensure that a buffer object has backing pages
>>   * @bo: The buffer object
>> + * @memcg_account: account this memory with memcg if needed
>>   * @ctx: The ttm_operation_ctx governing the operation.
>>   *
>>   * For buffer objects in a memory type whose manager uses
>> @@ -1250,6 +1251,7 @@ void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
>>   * is set to true.
>>   */
>>  int ttm_bo_populate(struct ttm_buffer_object *bo,
>> +		    bool memcg_account,
>>  		    struct ttm_operation_ctx *ctx)
>>  {
>>  	struct ttm_tt *tt = bo->ttm;
>> @@ -1262,7 +1264,7 @@ int ttm_bo_populate(struct ttm_buffer_object *bo,
>>  		return 0;
>>  
>>  	swapped = ttm_tt_is_swapped(tt);
>> -	ret = ttm_tt_populate(bo->bdev, tt, ctx);
>> +	ret = ttm_tt_populate(bo->bdev, tt, memcg_account, ctx);
>>  	if (ret)
>>  		return ret;
>>  
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
>> index 15cab9bda17f..7d599d0707e4 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
>> @@ -163,7 +163,7 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo,
>>  	src_man = ttm_manager_type(bdev, src_mem->mem_type);
>>  	if (ttm && ((ttm->page_flags & TTM_TT_FLAG_SWAPPED) ||
>>  		    dst_man->use_tt)) {
>> -		ret = ttm_bo_populate(bo, ctx);
>> +		ret = ttm_bo_populate(bo, dst_mem->placement & TTM_PL_FLAG_MEMCG, ctx);
>>  		if (ret)
>>  			return ret;
>>  	}
>> @@ -350,7 +350,7 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo,
>>  
>>  	BUG_ON(!ttm);
>>  
>> -	ret = ttm_bo_populate(bo, &ctx);
>> +	ret = ttm_bo_populate(bo, mem->placement & TTM_PL_FLAG_MEMCG, &ctx);
>>  	if (ret)
>>  		return ret;
>>  
>> @@ -507,7 +507,7 @@ int ttm_bo_vmap(struct ttm_buffer_object *bo, struct iosys_map *map)
>>  		pgprot_t prot;
>>  		void *vaddr;
>>  
>> -		ret = ttm_bo_populate(bo, &ctx);
>> +		ret = ttm_bo_populate(bo, mem->placement & TTM_PL_FLAG_MEMCG, &ctx);
>>  		if (ret)
>>  			return ret;
>>  
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> index a194db83421d..02aea23a34e7 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> @@ -224,7 +224,9 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
>>  		};
>>  
>>  		ttm = bo->ttm;
>> -		err = ttm_bo_populate(bo, &ctx);
>> +		err = ttm_bo_populate(bo,
>> +				      bo->resource->placement & TTM_PL_FLAG_MEMCG,
>> +				      &ctx);
>>  		if (err) {
>>  			if (err == -EINTR || err == -ERESTARTSYS ||
>>  			    err == -EAGAIN)
>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
>> index 698cd4bf5e46..81c4cbbeb130 100644
>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>> @@ -161,6 +161,7 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
>>  	ttm->caching = caching;
>>  	ttm->restore = NULL;
>>  	ttm->backup = NULL;
>> +	ttm->memcg = bo->memcg;
>>  }
>>  
>>  int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
>> @@ -365,7 +366,9 @@ int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
>>  EXPORT_SYMBOL_FOR_TESTS_ONLY(ttm_tt_swapout);
>>  
>>  int ttm_tt_populate(struct ttm_device *bdev,
>> -		    struct ttm_tt *ttm, struct ttm_operation_ctx *ctx)
>> +		    struct ttm_tt *ttm,
>> +		    bool memcg_account_tt,
>> +		    struct ttm_operation_ctx *ctx)
>>  {
>>  	int ret;
>>  
>> @@ -376,6 +379,14 @@ int ttm_tt_populate(struct ttm_device *bdev,
>>  		return 0;
>>  
>>  	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
>> +		if (ttm->memcg && memcg_account_tt) {
>> +			gfp_t gfp_flags = GFP_USER;
>> +			if (ctx->gfp_retry_mayfail)
>> +				gfp_flags |= __GFP_RETRY_MAYFAIL;
>> +			if (!mem_cgroup_charge_gpu(ttm->memcg, ttm->num_pages, gfp_flags))
>> +				return -ENOMEM;
>> +			ttm->page_flags |= TTM_TT_FLAG_ACCOUNTED;
>> +		}
>>  		atomic_long_add(ttm->num_pages, &ttm_pages_allocated);
>>  		if (bdev->pool.use_dma32)
>>  			atomic_long_add(ttm->num_pages,
>> @@ -437,6 +448,10 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm)
>>  		ttm_pool_free(&bdev->pool, ttm);
>>  
>>  	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
>> +		if (ttm->page_flags & TTM_TT_FLAG_ACCOUNTED) {
>> +			mem_cgroup_uncharge_gpu(ttm->memcg, ttm->num_pages);
>> +			ttm->page_flags &= ~TTM_TT_FLAG_ACCOUNTED;
>> +		}
>>  		atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
>>  		if (bdev->pool.use_dma32)
>>  			atomic_long_sub(ttm->num_pages,
>> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
>> index 903cd1030110..d7c0dd9e0746 100644
>> --- a/include/drm/ttm/ttm_bo.h
>> +++ b/include/drm/ttm/ttm_bo.h
>> @@ -135,6 +135,12 @@ struct ttm_buffer_object {
>>  	 * reservation lock.
>>  	 */
>>  	struct sg_table *sg;
>> +
>> +	/**
>> +	 * @memcg: memory cgroup to charge this to if it ends up using system memory.
>> +	 * NULL means don't charge.
>> +	 */
>> +	struct mem_cgroup *memcg;
>>  };
>>  
>>  #define TTM_BO_MAP_IOMEM_MASK 0x80
>> @@ -486,6 +492,7 @@ pgprot_t ttm_io_prot(struct ttm_buffer_object *bo, struct ttm_resource *res,
>>  		     pgprot_t tmp);
>>  void ttm_bo_tt_destroy(struct ttm_buffer_object *bo);
>>  int ttm_bo_populate(struct ttm_buffer_object *bo,
>> +		    bool memcg_account,
>>  		    struct ttm_operation_ctx *ctx);
>>  
>>  /* Driver LRU walk helpers initially targeted for shrinking. */
>> diff --git a/include/drm/ttm/ttm_placement.h b/include/drm/ttm/ttm_placement.h
>> index b510a4812609..668798072292 100644
>> --- a/include/drm/ttm/ttm_placement.h
>> +++ b/include/drm/ttm/ttm_placement.h
>> @@ -70,6 +70,9 @@
>>  /* Placement is only used during eviction */
>>  #define TTM_PL_FLAG_FALLBACK	(1 << 4)
>>  
>> +/* Placement causes memcg accounting */
>> +#define TTM_PL_FLAG_MEMCG	(1 << 5)
>> +
>>  /**
>>   * struct ttm_place
>>   *
>> diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
>> index 406437ad674b..2790fc82edc3 100644
>> --- a/include/drm/ttm/ttm_tt.h
>> +++ b/include/drm/ttm/ttm_tt.h
>> @@ -90,6 +90,8 @@ struct ttm_tt {
>>  	 * TTM_TT_FLAG_BACKED_UP: TTM internal only. This is set if the
>>  	 * struct ttm_tt has been (possibly partially) backed up.
>>  	 *
>> +	 * TTM_TT_FLAG_ACCOUNTED: TTM internal. This tt has been accounted.
>> +	 *
>>  	 * TTM_TT_FLAG_PRIV_POPULATED: TTM internal only. DO NOT USE. This is
>>  	 * set by TTM after ttm_tt_populate() has successfully returned, and is
>>  	 * then unset when TTM calls ttm_tt_unpopulate().
>> @@ -101,8 +103,9 @@ struct ttm_tt {
>>  #define TTM_TT_FLAG_EXTERNAL_MAPPABLE	BIT(3)
>>  #define TTM_TT_FLAG_DECRYPTED		BIT(4)
>>  #define TTM_TT_FLAG_BACKED_UP	        BIT(5)
>> +#define TTM_TT_FLAG_ACCOUNTED	        BIT(6)
>>  
>> -#define TTM_TT_FLAG_PRIV_POPULATED	BIT(6)
>> +#define TTM_TT_FLAG_PRIV_POPULATED	BIT(7)
>>  	uint32_t page_flags;
>>  	/** @num_pages: Number of pages in the page array. */
>>  	uint32_t num_pages;
>> @@ -126,6 +129,8 @@ struct ttm_tt {
>>  	enum ttm_caching caching;
>>  	/** @restore: Partial restoration from backup state. TTM private */
>>  	struct ttm_pool_tt_restore *restore;
>> +	/** @memcg: Memory cgroup for this TT allocation */
>> +	struct mem_cgroup *memcg;
>>  };
>>  
>>  /**
>> @@ -245,11 +250,13 @@ int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
>>   *
>>   * @bdev: the ttm_device this object belongs to
>>   * @ttm: Pointer to the ttm_tt structure
>> + * @mem_account_tt: Account this population to the memcg
>>   * @ctx: operation context for populating the tt object.
>>   *
>>   * Calls the driver method to allocate pages for a ttm
>>   */
>>  int ttm_tt_populate(struct ttm_device *bdev, struct ttm_tt *ttm,
>> +		    bool mem_account_tt,
>>  		    struct ttm_operation_ctx *ctx);
>>  
>>  /**


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [5/7] ttm: add initial memcg integration. (v4)
  2025-05-14 11:55     ` Christian König
@ 2025-05-14 17:07       ` Maarten Lankhorst
  2025-05-15  8:40         ` Christian König
  0 siblings, 1 reply; 16+ messages in thread
From: Maarten Lankhorst @ 2025-05-14 17:07 UTC (permalink / raw)
  To: Christian König, Dave Airlie, dri-devel, tj, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Waiman Long, simona

Hey,

On 2025-05-14 13:55, Christian König wrote:
> On 5/14/25 13:41, Maarten Lankhorst wrote:
>> Hi Dave,
>>
>> We've had a small discussion on irc, so I wanted to summarize it here:
>>
>> All memory allocated should be accounted, even memory that is being
>> evicted from VRAM.
> 
> That sounds like a really bad idea to me.
> 
>> This may cause the process that originally allocated
>> the VRAM to go over the memcg limit, that should be solved by invoking
>> OOM condition on the original process, which may have ways to solve it
>> like purging purgeable memory, or as last resort OOM killing.
> 
> You are basically suggesting to kill an application for something it never requested in the first place.
> 
> In other words when an application requested a buffer to be placed in VRAM we can't make it responsible that the buffer had to be moved to system memory because of over allocation.
> 
> As far as I can see and have discussed with others so far this approach is a clear no-go.
There is absolutely no need to kill an application. You can set dmem limits in such a way that a buffer will never be evicted.
Killing would be an absolute last resort, and only happens when maximum amount of memory is set.

Alternatively we could count memory in VRAM similar to swapped out memory, since it's just another placement of allocated memory. :)

> 
> 
>> The VRAM evicter is already memcg aware, so it should be possible to do
>> the same for the shrinker. I created a patch to use the same cgroup for
>> memcg as for dmem, we should probably extract the cgroup from mm->owner,
>> and create a function to charge dmemcg and memcg with a specified cgroup.
>>
>> For applications that use a centralised allocator, it might be needed to
>> charge a different cgroup when exporting.
>>
>> Kind regards,
>> Maarten
>>
>> On 2025-05-12 08:12, Dave Airlie wrote:
>>> From: Dave Airlie <airlied@redhat.com>
>>>
>>> Doing proper integration of TTM system memory allocations with
>>> memcg is a difficult ask, primarily due to difficulties around
>>> accounting for evictions properly.
>>>
>>> However there are systems where userspace will be allocating
>>> objects in system memory and they won't be prone to migrating
>>> or evicting and we should start with at least accounting those.
>>>
>>> This adds a memcg group to ttm bo and tt objects.
>>>
>>> This memcg is used when:
>>> a) when a tt is populated (and unpopulated)
>>> b) the TTM_PL_FLAG_MEMCG is set on the placement for the
>>> bo when the tt is allocated.
>>>
>>> The placement flag is set for all non-eviction placements.
>>>
>>> This version moves back from the resource to the tt layer,
>>> when accounting at the resource layer, if an object is swapped
>>> out there was no way to remove it from the accounting, whereas
>>> the tt layer has more info for this.
>>>
>>> v4: move back to the tt layer from the resource layer to
>>> handle swap, but keep the memcg charging hooks for now.
>>> v3: moves from having a flags on the op ctx to the using a
>>> placement flag.
>>> v2: moved the charging up a level and also no longer used
>>> __GFP_ACCOUNT, or attached the memcg to object pages, it instead
>>> uses the same approach as socket memory and just charges/uncharges
>>> at the object level. This was suggested by Christian.
>>>
>>> Signed-off-by: Dave Airlie <airlied@redhat.com>
>>> ---
>>>  drivers/gpu/drm/ttm/ttm_bo.c      |  6 ++++--
>>>  drivers/gpu/drm/ttm/ttm_bo_util.c |  6 +++---
>>>  drivers/gpu/drm/ttm/ttm_bo_vm.c   |  4 +++-
>>>  drivers/gpu/drm/ttm/ttm_tt.c      | 17 ++++++++++++++++-
>>>  include/drm/ttm/ttm_bo.h          |  7 +++++++
>>>  include/drm/ttm/ttm_placement.h   |  3 +++
>>>  include/drm/ttm/ttm_tt.h          |  9 ++++++++-
>>>  7 files changed, 44 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index 5bf3c969907c..1630ef28e5a8 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -140,7 +140,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
>>>  			goto out_err;
>>>  
>>>  		if (mem->mem_type != TTM_PL_SYSTEM) {
>>> -			ret = ttm_bo_populate(bo, ctx);
>>> +			ret = ttm_bo_populate(bo, mem->placement & TTM_PL_FLAG_MEMCG, ctx);
>>>  			if (ret)
>>>  				goto out_err;
>>>  		}
>>> @@ -1237,6 +1237,7 @@ void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
>>>  /**
>>>   * ttm_bo_populate() - Ensure that a buffer object has backing pages
>>>   * @bo: The buffer object
>>> + * @memcg_account: account this memory with memcg if needed
>>>   * @ctx: The ttm_operation_ctx governing the operation.
>>>   *
>>>   * For buffer objects in a memory type whose manager uses
>>> @@ -1250,6 +1251,7 @@ void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
>>>   * is set to true.
>>>   */
>>>  int ttm_bo_populate(struct ttm_buffer_object *bo,
>>> +		    bool memcg_account,
>>>  		    struct ttm_operation_ctx *ctx)
>>>  {
>>>  	struct ttm_tt *tt = bo->ttm;
>>> @@ -1262,7 +1264,7 @@ int ttm_bo_populate(struct ttm_buffer_object *bo,
>>>  		return 0;
>>>  
>>>  	swapped = ttm_tt_is_swapped(tt);
>>> -	ret = ttm_tt_populate(bo->bdev, tt, ctx);
>>> +	ret = ttm_tt_populate(bo->bdev, tt, memcg_account, ctx);
>>>  	if (ret)
>>>  		return ret;
>>>  
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>> index 15cab9bda17f..7d599d0707e4 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>> @@ -163,7 +163,7 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo,
>>>  	src_man = ttm_manager_type(bdev, src_mem->mem_type);
>>>  	if (ttm && ((ttm->page_flags & TTM_TT_FLAG_SWAPPED) ||
>>>  		    dst_man->use_tt)) {
>>> -		ret = ttm_bo_populate(bo, ctx);
>>> +		ret = ttm_bo_populate(bo, dst_mem->placement & TTM_PL_FLAG_MEMCG, ctx);
>>>  		if (ret)
>>>  			return ret;
>>>  	}
>>> @@ -350,7 +350,7 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo,
>>>  
>>>  	BUG_ON(!ttm);
>>>  
>>> -	ret = ttm_bo_populate(bo, &ctx);
>>> +	ret = ttm_bo_populate(bo, mem->placement & TTM_PL_FLAG_MEMCG, &ctx);
>>>  	if (ret)
>>>  		return ret;
>>>  
>>> @@ -507,7 +507,7 @@ int ttm_bo_vmap(struct ttm_buffer_object *bo, struct iosys_map *map)
>>>  		pgprot_t prot;
>>>  		void *vaddr;
>>>  
>>> -		ret = ttm_bo_populate(bo, &ctx);
>>> +		ret = ttm_bo_populate(bo, mem->placement & TTM_PL_FLAG_MEMCG, &ctx);
>>>  		if (ret)
>>>  			return ret;
>>>  
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>> index a194db83421d..02aea23a34e7 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>> @@ -224,7 +224,9 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
>>>  		};
>>>  
>>>  		ttm = bo->ttm;
>>> -		err = ttm_bo_populate(bo, &ctx);
>>> +		err = ttm_bo_populate(bo,
>>> +				      bo->resource->placement & TTM_PL_FLAG_MEMCG,
>>> +				      &ctx);
>>>  		if (err) {
>>>  			if (err == -EINTR || err == -ERESTARTSYS ||
>>>  			    err == -EAGAIN)
>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
>>> index 698cd4bf5e46..81c4cbbeb130 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>>> @@ -161,6 +161,7 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
>>>  	ttm->caching = caching;
>>>  	ttm->restore = NULL;
>>>  	ttm->backup = NULL;
>>> +	ttm->memcg = bo->memcg;
>>>  }
>>>  
>>>  int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
>>> @@ -365,7 +366,9 @@ int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
>>>  EXPORT_SYMBOL_FOR_TESTS_ONLY(ttm_tt_swapout);
>>>  
>>>  int ttm_tt_populate(struct ttm_device *bdev,
>>> -		    struct ttm_tt *ttm, struct ttm_operation_ctx *ctx)
>>> +		    struct ttm_tt *ttm,
>>> +		    bool memcg_account_tt,
>>> +		    struct ttm_operation_ctx *ctx)
>>>  {
>>>  	int ret;
>>>  
>>> @@ -376,6 +379,14 @@ int ttm_tt_populate(struct ttm_device *bdev,
>>>  		return 0;
>>>  
>>>  	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
>>> +		if (ttm->memcg && memcg_account_tt) {
>>> +			gfp_t gfp_flags = GFP_USER;
>>> +			if (ctx->gfp_retry_mayfail)
>>> +				gfp_flags |= __GFP_RETRY_MAYFAIL;
>>> +			if (!mem_cgroup_charge_gpu(ttm->memcg, ttm->num_pages, gfp_flags))
>>> +				return -ENOMEM;
>>> +			ttm->page_flags |= TTM_TT_FLAG_ACCOUNTED;
>>> +		}
>>>  		atomic_long_add(ttm->num_pages, &ttm_pages_allocated);
>>>  		if (bdev->pool.use_dma32)
>>>  			atomic_long_add(ttm->num_pages,
>>> @@ -437,6 +448,10 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm)
>>>  		ttm_pool_free(&bdev->pool, ttm);
>>>  
>>>  	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
>>> +		if (ttm->page_flags & TTM_TT_FLAG_ACCOUNTED) {
>>> +			mem_cgroup_uncharge_gpu(ttm->memcg, ttm->num_pages);
>>> +			ttm->page_flags &= ~TTM_TT_FLAG_ACCOUNTED;
>>> +		}
>>>  		atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
>>>  		if (bdev->pool.use_dma32)
>>>  			atomic_long_sub(ttm->num_pages,
>>> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
>>> index 903cd1030110..d7c0dd9e0746 100644
>>> --- a/include/drm/ttm/ttm_bo.h
>>> +++ b/include/drm/ttm/ttm_bo.h
>>> @@ -135,6 +135,12 @@ struct ttm_buffer_object {
>>>  	 * reservation lock.
>>>  	 */
>>>  	struct sg_table *sg;
>>> +
>>> +	/**
>>> +	 * @memcg: memory cgroup to charge this to if it ends up using system memory.
>>> +	 * NULL means don't charge.
>>> +	 */
>>> +	struct mem_cgroup *memcg;
>>>  };
>>>  
>>>  #define TTM_BO_MAP_IOMEM_MASK 0x80
>>> @@ -486,6 +492,7 @@ pgprot_t ttm_io_prot(struct ttm_buffer_object *bo, struct ttm_resource *res,
>>>  		     pgprot_t tmp);
>>>  void ttm_bo_tt_destroy(struct ttm_buffer_object *bo);
>>>  int ttm_bo_populate(struct ttm_buffer_object *bo,
>>> +		    bool memcg_account,
>>>  		    struct ttm_operation_ctx *ctx);
>>>  
>>>  /* Driver LRU walk helpers initially targeted for shrinking. */
>>> diff --git a/include/drm/ttm/ttm_placement.h b/include/drm/ttm/ttm_placement.h
>>> index b510a4812609..668798072292 100644
>>> --- a/include/drm/ttm/ttm_placement.h
>>> +++ b/include/drm/ttm/ttm_placement.h
>>> @@ -70,6 +70,9 @@
>>>  /* Placement is only used during eviction */
>>>  #define TTM_PL_FLAG_FALLBACK	(1 << 4)
>>>  
>>> +/* Placement causes memcg accounting */
>>> +#define TTM_PL_FLAG_MEMCG	(1 << 5)
>>> +
>>>  /**
>>>   * struct ttm_place
>>>   *
>>> diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
>>> index 406437ad674b..2790fc82edc3 100644
>>> --- a/include/drm/ttm/ttm_tt.h
>>> +++ b/include/drm/ttm/ttm_tt.h
>>> @@ -90,6 +90,8 @@ struct ttm_tt {
>>>  	 * TTM_TT_FLAG_BACKED_UP: TTM internal only. This is set if the
>>>  	 * struct ttm_tt has been (possibly partially) backed up.
>>>  	 *
>>> +	 * TTM_TT_FLAG_ACCOUNTED: TTM internal. This tt has been accounted.
>>> +	 *
>>>  	 * TTM_TT_FLAG_PRIV_POPULATED: TTM internal only. DO NOT USE. This is
>>>  	 * set by TTM after ttm_tt_populate() has successfully returned, and is
>>>  	 * then unset when TTM calls ttm_tt_unpopulate().
>>> @@ -101,8 +103,9 @@ struct ttm_tt {
>>>  #define TTM_TT_FLAG_EXTERNAL_MAPPABLE	BIT(3)
>>>  #define TTM_TT_FLAG_DECRYPTED		BIT(4)
>>>  #define TTM_TT_FLAG_BACKED_UP	        BIT(5)
>>> +#define TTM_TT_FLAG_ACCOUNTED	        BIT(6)
>>>  
>>> -#define TTM_TT_FLAG_PRIV_POPULATED	BIT(6)
>>> +#define TTM_TT_FLAG_PRIV_POPULATED	BIT(7)
>>>  	uint32_t page_flags;
>>>  	/** @num_pages: Number of pages in the page array. */
>>>  	uint32_t num_pages;
>>> @@ -126,6 +129,8 @@ struct ttm_tt {
>>>  	enum ttm_caching caching;
>>>  	/** @restore: Partial restoration from backup state. TTM private */
>>>  	struct ttm_pool_tt_restore *restore;
>>> +	/** @memcg: Memory cgroup for this TT allocation */
>>> +	struct mem_cgroup *memcg;
>>>  };
>>>  
>>>  /**
>>> @@ -245,11 +250,13 @@ int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
>>>   *
>>>   * @bdev: the ttm_device this object belongs to
>>>   * @ttm: Pointer to the ttm_tt structure
>>> + * @mem_account_tt: Account this population to the memcg
>>>   * @ctx: operation context for populating the tt object.
>>>   *
>>>   * Calls the driver method to allocate pages for a ttm
>>>   */
>>>  int ttm_tt_populate(struct ttm_device *bdev, struct ttm_tt *ttm,
>>> +		    bool mem_account_tt,
>>>  		    struct ttm_operation_ctx *ctx);
>>>  
>>>  /**
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [5/7] ttm: add initial memcg integration. (v4)
  2025-05-14 17:07       ` Maarten Lankhorst
@ 2025-05-15  8:40         ` Christian König
  2025-05-15  9:28           ` Maarten Lankhorst
  0 siblings, 1 reply; 16+ messages in thread
From: Christian König @ 2025-05-15  8:40 UTC (permalink / raw)
  To: Maarten Lankhorst, Dave Airlie, dri-devel, tj, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Waiman Long, simona

On 5/14/25 19:07, Maarten Lankhorst wrote:
> Hey,
> 
> On 2025-05-14 13:55, Christian König wrote:
>> On 5/14/25 13:41, Maarten Lankhorst wrote:
>>> Hi Dave,
>>>
>>> We've had a small discussion on irc, so I wanted to summarize it here:
>>>
>>> All memory allocated should be accounted, even memory that is being
>>> evicted from VRAM.
>>
>> That sounds like a really bad idea to me.
>>
>>> This may cause the process that originally allocated
>>> the VRAM to go over the memcg limit, that should be solved by invoking
>>> OOM condition on the original process, which may have ways to solve it
>>> like purging purgeable memory, or as last resort OOM killing.
>>
>> You are basically suggesting to kill an application for something it never requested in the first place.
>>
>> In other words when an application requested a buffer to be placed in VRAM we can't make it responsible that the buffer had to be moved to system memory because of over allocation.
>>
>> As far as I can see and have discussed with others so far this approach is a clear no-go.
> There is absolutely no need to kill an application. You can set dmem limits in such a way that a buffer will never be evicted.
> Killing would be an absolute last resort, and only happens when maximum amount of memory is set.

Yeah, exactly that doesn't work. You kind of must kill it at that moment because you don't have any other chance to signal that this cgroup is OOM.

In other words when you have a cenario like application A causes eviction, application B is the victiom of that eviction who do you charge the system memory needed for the eviction to?

Application A caused the eviction but never asked for system memory in the first place. So letting that allocation fail because of some other application is OOM would result in basically not reproducible behavior.

Application B is now owning that memory, but you can only send a SIGKILL to that application and not an ENOMEM because it is currently not doing anything.

T.J. suggested that we charge but don't enforce a limit for application B. I would say that this is kind of ok for now until we have found a better solution, but it is clearly not the end of the story.

> Alternatively we could count memory in VRAM similar to swapped out memory, since it's just another placement of allocated memory. :)

Yeah, that idea came to my mind as well. But there're problems with that approach as well.

Regards,
Christian.


> 
>>
>>
>>> The VRAM evicter is already memcg aware, so it should be possible to do
>>> the same for the shrinker. I created a patch to use the same cgroup for
>>> memcg as for dmem, we should probably extract the cgroup from mm->owner,
>>> and create a function to charge dmemcg and memcg with a specified cgroup.
>>>
>>> For applications that use a centralised allocator, it might be needed to
>>> charge a different cgroup when exporting.
>>>
>>> Kind regards,
>>> Maarten
>>>
>>> On 2025-05-12 08:12, Dave Airlie wrote:
>>>> From: Dave Airlie <airlied@redhat.com>
>>>>
>>>> Doing proper integration of TTM system memory allocations with
>>>> memcg is a difficult ask, primarily due to difficulties around
>>>> accounting for evictions properly.
>>>>
>>>> However there are systems where userspace will be allocating
>>>> objects in system memory and they won't be prone to migrating
>>>> or evicting and we should start with at least accounting those.
>>>>
>>>> This adds a memcg group to ttm bo and tt objects.
>>>>
>>>> This memcg is used when:
>>>> a) when a tt is populated (and unpopulated)
>>>> b) the TTM_PL_FLAG_MEMCG is set on the placement for the
>>>> bo when the tt is allocated.
>>>>
>>>> The placement flag is set for all non-eviction placements.
>>>>
>>>> This version moves back from the resource to the tt layer,
>>>> when accounting at the resource layer, if an object is swapped
>>>> out there was no way to remove it from the accounting, whereas
>>>> the tt layer has more info for this.
>>>>
>>>> v4: move back to the tt layer from the resource layer to
>>>> handle swap, but keep the memcg charging hooks for now.
>>>> v3: moves from having a flags on the op ctx to the using a
>>>> placement flag.
>>>> v2: moved the charging up a level and also no longer used
>>>> __GFP_ACCOUNT, or attached the memcg to object pages, it instead
>>>> uses the same approach as socket memory and just charges/uncharges
>>>> at the object level. This was suggested by Christian.
>>>>
>>>> Signed-off-by: Dave Airlie <airlied@redhat.com>
>>>> ---
>>>>  drivers/gpu/drm/ttm/ttm_bo.c      |  6 ++++--
>>>>  drivers/gpu/drm/ttm/ttm_bo_util.c |  6 +++---
>>>>  drivers/gpu/drm/ttm/ttm_bo_vm.c   |  4 +++-
>>>>  drivers/gpu/drm/ttm/ttm_tt.c      | 17 ++++++++++++++++-
>>>>  include/drm/ttm/ttm_bo.h          |  7 +++++++
>>>>  include/drm/ttm/ttm_placement.h   |  3 +++
>>>>  include/drm/ttm/ttm_tt.h          |  9 ++++++++-
>>>>  7 files changed, 44 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>>> index 5bf3c969907c..1630ef28e5a8 100644
>>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>> @@ -140,7 +140,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
>>>>  			goto out_err;
>>>>  
>>>>  		if (mem->mem_type != TTM_PL_SYSTEM) {
>>>> -			ret = ttm_bo_populate(bo, ctx);
>>>> +			ret = ttm_bo_populate(bo, mem->placement & TTM_PL_FLAG_MEMCG, ctx);
>>>>  			if (ret)
>>>>  				goto out_err;
>>>>  		}
>>>> @@ -1237,6 +1237,7 @@ void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
>>>>  /**
>>>>   * ttm_bo_populate() - Ensure that a buffer object has backing pages
>>>>   * @bo: The buffer object
>>>> + * @memcg_account: account this memory with memcg if needed
>>>>   * @ctx: The ttm_operation_ctx governing the operation.
>>>>   *
>>>>   * For buffer objects in a memory type whose manager uses
>>>> @@ -1250,6 +1251,7 @@ void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
>>>>   * is set to true.
>>>>   */
>>>>  int ttm_bo_populate(struct ttm_buffer_object *bo,
>>>> +		    bool memcg_account,
>>>>  		    struct ttm_operation_ctx *ctx)
>>>>  {
>>>>  	struct ttm_tt *tt = bo->ttm;
>>>> @@ -1262,7 +1264,7 @@ int ttm_bo_populate(struct ttm_buffer_object *bo,
>>>>  		return 0;
>>>>  
>>>>  	swapped = ttm_tt_is_swapped(tt);
>>>> -	ret = ttm_tt_populate(bo->bdev, tt, ctx);
>>>> +	ret = ttm_tt_populate(bo->bdev, tt, memcg_account, ctx);
>>>>  	if (ret)
>>>>  		return ret;
>>>>  
>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>>> index 15cab9bda17f..7d599d0707e4 100644
>>>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>>> @@ -163,7 +163,7 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo,
>>>>  	src_man = ttm_manager_type(bdev, src_mem->mem_type);
>>>>  	if (ttm && ((ttm->page_flags & TTM_TT_FLAG_SWAPPED) ||
>>>>  		    dst_man->use_tt)) {
>>>> -		ret = ttm_bo_populate(bo, ctx);
>>>> +		ret = ttm_bo_populate(bo, dst_mem->placement & TTM_PL_FLAG_MEMCG, ctx);
>>>>  		if (ret)
>>>>  			return ret;
>>>>  	}
>>>> @@ -350,7 +350,7 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo,
>>>>  
>>>>  	BUG_ON(!ttm);
>>>>  
>>>> -	ret = ttm_bo_populate(bo, &ctx);
>>>> +	ret = ttm_bo_populate(bo, mem->placement & TTM_PL_FLAG_MEMCG, &ctx);
>>>>  	if (ret)
>>>>  		return ret;
>>>>  
>>>> @@ -507,7 +507,7 @@ int ttm_bo_vmap(struct ttm_buffer_object *bo, struct iosys_map *map)
>>>>  		pgprot_t prot;
>>>>  		void *vaddr;
>>>>  
>>>> -		ret = ttm_bo_populate(bo, &ctx);
>>>> +		ret = ttm_bo_populate(bo, mem->placement & TTM_PL_FLAG_MEMCG, &ctx);
>>>>  		if (ret)
>>>>  			return ret;
>>>>  
>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>> index a194db83421d..02aea23a34e7 100644
>>>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>> @@ -224,7 +224,9 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
>>>>  		};
>>>>  
>>>>  		ttm = bo->ttm;
>>>> -		err = ttm_bo_populate(bo, &ctx);
>>>> +		err = ttm_bo_populate(bo,
>>>> +				      bo->resource->placement & TTM_PL_FLAG_MEMCG,
>>>> +				      &ctx);
>>>>  		if (err) {
>>>>  			if (err == -EINTR || err == -ERESTARTSYS ||
>>>>  			    err == -EAGAIN)
>>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
>>>> index 698cd4bf5e46..81c4cbbeb130 100644
>>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>>>> @@ -161,6 +161,7 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
>>>>  	ttm->caching = caching;
>>>>  	ttm->restore = NULL;
>>>>  	ttm->backup = NULL;
>>>> +	ttm->memcg = bo->memcg;
>>>>  }
>>>>  
>>>>  int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
>>>> @@ -365,7 +366,9 @@ int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
>>>>  EXPORT_SYMBOL_FOR_TESTS_ONLY(ttm_tt_swapout);
>>>>  
>>>>  int ttm_tt_populate(struct ttm_device *bdev,
>>>> -		    struct ttm_tt *ttm, struct ttm_operation_ctx *ctx)
>>>> +		    struct ttm_tt *ttm,
>>>> +		    bool memcg_account_tt,
>>>> +		    struct ttm_operation_ctx *ctx)
>>>>  {
>>>>  	int ret;
>>>>  
>>>> @@ -376,6 +379,14 @@ int ttm_tt_populate(struct ttm_device *bdev,
>>>>  		return 0;
>>>>  
>>>>  	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
>>>> +		if (ttm->memcg && memcg_account_tt) {
>>>> +			gfp_t gfp_flags = GFP_USER;
>>>> +			if (ctx->gfp_retry_mayfail)
>>>> +				gfp_flags |= __GFP_RETRY_MAYFAIL;
>>>> +			if (!mem_cgroup_charge_gpu(ttm->memcg, ttm->num_pages, gfp_flags))
>>>> +				return -ENOMEM;
>>>> +			ttm->page_flags |= TTM_TT_FLAG_ACCOUNTED;
>>>> +		}
>>>>  		atomic_long_add(ttm->num_pages, &ttm_pages_allocated);
>>>>  		if (bdev->pool.use_dma32)
>>>>  			atomic_long_add(ttm->num_pages,
>>>> @@ -437,6 +448,10 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm)
>>>>  		ttm_pool_free(&bdev->pool, ttm);
>>>>  
>>>>  	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
>>>> +		if (ttm->page_flags & TTM_TT_FLAG_ACCOUNTED) {
>>>> +			mem_cgroup_uncharge_gpu(ttm->memcg, ttm->num_pages);
>>>> +			ttm->page_flags &= ~TTM_TT_FLAG_ACCOUNTED;
>>>> +		}
>>>>  		atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
>>>>  		if (bdev->pool.use_dma32)
>>>>  			atomic_long_sub(ttm->num_pages,
>>>> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
>>>> index 903cd1030110..d7c0dd9e0746 100644
>>>> --- a/include/drm/ttm/ttm_bo.h
>>>> +++ b/include/drm/ttm/ttm_bo.h
>>>> @@ -135,6 +135,12 @@ struct ttm_buffer_object {
>>>>  	 * reservation lock.
>>>>  	 */
>>>>  	struct sg_table *sg;
>>>> +
>>>> +	/**
>>>> +	 * @memcg: memory cgroup to charge this to if it ends up using system memory.
>>>> +	 * NULL means don't charge.
>>>> +	 */
>>>> +	struct mem_cgroup *memcg;
>>>>  };
>>>>  
>>>>  #define TTM_BO_MAP_IOMEM_MASK 0x80
>>>> @@ -486,6 +492,7 @@ pgprot_t ttm_io_prot(struct ttm_buffer_object *bo, struct ttm_resource *res,
>>>>  		     pgprot_t tmp);
>>>>  void ttm_bo_tt_destroy(struct ttm_buffer_object *bo);
>>>>  int ttm_bo_populate(struct ttm_buffer_object *bo,
>>>> +		    bool memcg_account,
>>>>  		    struct ttm_operation_ctx *ctx);
>>>>  
>>>>  /* Driver LRU walk helpers initially targeted for shrinking. */
>>>> diff --git a/include/drm/ttm/ttm_placement.h b/include/drm/ttm/ttm_placement.h
>>>> index b510a4812609..668798072292 100644
>>>> --- a/include/drm/ttm/ttm_placement.h
>>>> +++ b/include/drm/ttm/ttm_placement.h
>>>> @@ -70,6 +70,9 @@
>>>>  /* Placement is only used during eviction */
>>>>  #define TTM_PL_FLAG_FALLBACK	(1 << 4)
>>>>  
>>>> +/* Placement causes memcg accounting */
>>>> +#define TTM_PL_FLAG_MEMCG	(1 << 5)
>>>> +
>>>>  /**
>>>>   * struct ttm_place
>>>>   *
>>>> diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
>>>> index 406437ad674b..2790fc82edc3 100644
>>>> --- a/include/drm/ttm/ttm_tt.h
>>>> +++ b/include/drm/ttm/ttm_tt.h
>>>> @@ -90,6 +90,8 @@ struct ttm_tt {
>>>>  	 * TTM_TT_FLAG_BACKED_UP: TTM internal only. This is set if the
>>>>  	 * struct ttm_tt has been (possibly partially) backed up.
>>>>  	 *
>>>> +	 * TTM_TT_FLAG_ACCOUNTED: TTM internal. This tt has been accounted.
>>>> +	 *
>>>>  	 * TTM_TT_FLAG_PRIV_POPULATED: TTM internal only. DO NOT USE. This is
>>>>  	 * set by TTM after ttm_tt_populate() has successfully returned, and is
>>>>  	 * then unset when TTM calls ttm_tt_unpopulate().
>>>> @@ -101,8 +103,9 @@ struct ttm_tt {
>>>>  #define TTM_TT_FLAG_EXTERNAL_MAPPABLE	BIT(3)
>>>>  #define TTM_TT_FLAG_DECRYPTED		BIT(4)
>>>>  #define TTM_TT_FLAG_BACKED_UP	        BIT(5)
>>>> +#define TTM_TT_FLAG_ACCOUNTED	        BIT(6)
>>>>  
>>>> -#define TTM_TT_FLAG_PRIV_POPULATED	BIT(6)
>>>> +#define TTM_TT_FLAG_PRIV_POPULATED	BIT(7)
>>>>  	uint32_t page_flags;
>>>>  	/** @num_pages: Number of pages in the page array. */
>>>>  	uint32_t num_pages;
>>>> @@ -126,6 +129,8 @@ struct ttm_tt {
>>>>  	enum ttm_caching caching;
>>>>  	/** @restore: Partial restoration from backup state. TTM private */
>>>>  	struct ttm_pool_tt_restore *restore;
>>>> +	/** @memcg: Memory cgroup for this TT allocation */
>>>> +	struct mem_cgroup *memcg;
>>>>  };
>>>>  
>>>>  /**
>>>> @@ -245,11 +250,13 @@ int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
>>>>   *
>>>>   * @bdev: the ttm_device this object belongs to
>>>>   * @ttm: Pointer to the ttm_tt structure
>>>> + * @mem_account_tt: Account this population to the memcg
>>>>   * @ctx: operation context for populating the tt object.
>>>>   *
>>>>   * Calls the driver method to allocate pages for a ttm
>>>>   */
>>>>  int ttm_tt_populate(struct ttm_device *bdev, struct ttm_tt *ttm,
>>>> +		    bool mem_account_tt,
>>>>  		    struct ttm_operation_ctx *ctx);
>>>>  
>>>>  /**
>>
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [5/7] ttm: add initial memcg integration. (v4)
  2025-05-15  8:40         ` Christian König
@ 2025-05-15  9:28           ` Maarten Lankhorst
  0 siblings, 0 replies; 16+ messages in thread
From: Maarten Lankhorst @ 2025-05-15  9:28 UTC (permalink / raw)
  To: Christian König, Dave Airlie, dri-devel, tj, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Waiman Long, simona

Hey,

On 2025-05-15 10:40, Christian König wrote:
> On 5/14/25 19:07, Maarten Lankhorst wrote:
>> Hey,
>>
>> On 2025-05-14 13:55, Christian König wrote:
>>> On 5/14/25 13:41, Maarten Lankhorst wrote:
>>>> Hi Dave,
>>>>
>>>> We've had a small discussion on irc, so I wanted to summarize it here:
>>>>
>>>> All memory allocated should be accounted, even memory that is being
>>>> evicted from VRAM.
>>>
>>> That sounds like a really bad idea to me.
>>>
>>>> This may cause the process that originally allocated
>>>> the VRAM to go over the memcg limit, that should be solved by invoking
>>>> OOM condition on the original process, which may have ways to solve it
>>>> like purging purgeable memory, or as last resort OOM killing.
>>>
>>> You are basically suggesting to kill an application for something it never requested in the first place.
>>>
>>> In other words when an application requested a buffer to be placed in VRAM we can't make it responsible that the buffer had to be moved to system memory because of over allocation.
>>>
>>> As far as I can see and have discussed with others so far this approach is a clear no-go.
>> There is absolutely no need to kill an application. You can set dmem limits in such a way that a buffer will never be evicted.
>> Killing would be an absolute last resort, and only happens when maximum amount of memory is set.
> 
> Yeah, exactly that doesn't work. You kind of must kill it at that moment because you don't have any other chance to signal that this cgroup is OOM.
> 
> In other words when you have a cenario like application A causes eviction, application B is the victiom of that eviction who do you charge the system memory needed for the eviction to?
> 
> Application A caused the eviction but never asked for system memory in the first place. So letting that allocation fail because of some other application is OOM would result in basically not reproducible behavior.
> 
> Application B is now owning that memory, but you can only send a SIGKILL to that application and not an ENOMEM because it is currently not doing anything.
> 
> T.J. suggested that we charge but don't enforce a limit for application B. I would say that this is kind of ok for now until we have found a better solution, but it is clearly not the end of the story.
If I look at the cgroup-v2.rst documentation, specifically at "controller issues
and remedies; memory" it seems there are 2 knobs. memory.high and memory.max:

"The memory.high boundary[,] when hit, throttles allocations by forcing them
into direct reclaim to work off the excess, but it never invokes the
OOM killer. [...]

In extreme cases, with many concurrent allocations and a complete
breakdown of reclaim progress within the group, the high boundary can
be exceeded. [...] memory.max is there to limit this type of spillover
and ultimately contain buggy or even malicious applications."

So it should be acceptable for us to go above memory.high for eviction, as long
as we stay below memory.max

>> Alternatively we could count memory in VRAM similar to swapped out memory, since it's just another placement of allocated memory. :)
> 
> Yeah, that idea came to my mind as well. But there're problems with that approach as well.

I think this could potentially also be a correct solution, since VRAM is still memory . What are the
specific problems you see with this approach?

> Regards,
> Christian.

Best regards,
Maarten


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-05-15  9:28 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-12  6:12 [rfc] drm/ttm/memcg: simplest initial memcg/ttm integration (series v3) Dave Airlie
2025-05-12  6:12 ` [PATCH 1/7] mm: add gpu active/reclaim per-node stat counters Dave Airlie
2025-05-12  6:12 ` [PATCH 2/7] ttm: use gpu mm stats to track gpu memory allocations Dave Airlie
2025-05-12  6:12 ` [PATCH 3/7] memcg: add GPU statistic Dave Airlie
2025-05-12  6:12 ` [PATCH 4/7] memcg: add hooks for gpu memcg charging/uncharging Dave Airlie
2025-05-12  6:12 ` [PATCH 5/7] ttm: add initial memcg integration. (v4) Dave Airlie
2025-05-12 14:42   ` kernel test robot
2025-05-13 13:30   ` Christian König
2025-05-14 11:41   ` [5/7] " Maarten Lankhorst
2025-05-14 11:55     ` Christian König
2025-05-14 17:07       ` Maarten Lankhorst
2025-05-15  8:40         ` Christian König
2025-05-15  9:28           ` Maarten Lankhorst
2025-05-12  6:12 ` [PATCH 6/7] amdgpu: add support for memcg integration Dave Airlie
2025-05-13 13:21   ` Christian König
2025-05-12  6:12 ` [PATCH 7/7] nouveau: add " Dave Airlie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).