drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v4)

cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v4)
@ 2025-10-16  2:31 Dave Airlie
  2025-10-16  2:31 ` [PATCH 01/16] mm: add gpu active/reclaim per-node stat counters (v2) Dave Airlie
                   ` (15 more replies)
  0 siblings, 16 replies; 18+ messages in thread
From: Dave Airlie @ 2025-10-16  2:31 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Dave Chinner, Waiman Long, simona

Hi all,

This is a another repost with some fixes and cleanups. I've added Christian's acks/reviews from the
previous round. I've fixed the obj_cgroup_put into the core, instead of in the drivers.

I'd really like to land this into drm-next, I've added Maarten xe support patch to this. I'd like
to get any missing acks/reviews.

Christian, I think you said patch 4 got lost last time, hopefully you get it this time.

Patches still needing ack/review:
ttm/pool: drop numa specific pools
ttm/pool: track allocated_pages per numa node.
ttm: add objcg pointer to bo and tt (v2)
ttm/pool: enable memcg tracking and shrinker. (v2)
amdgpu: add support for memory cgroups

Differences since v1 posting:
1. added ttm_bo_set_cgroup wrapper - the cgroup reference is passed to the ttm object.
2. put the cgroup reference in ttm object release
3. rebase onto 6.19-rc1
4. added xe support patch from Maarten.

Differences since v2 posting:
1. Squashed exports into where they are used (Shakeel)
2. Fixed bug in uncharge path memcg
3. Fixed config bug in the module option.

Differences since 1st posting:
1. Added patch 18: add a module option to allow pooled pages to not be stored in the lru per-memcg
   (Requested by Christian Konig)
2. Converged the naming and stats between vmstat and memcg (Suggested by Shakeel Butt)
3. Cleaned up the charge/uncharge code and some other bits.

Dave.

Original cover letter:
tl;dr: start using list_lru/numa/memcg in GPU driver core and amdgpu driver for now.

This is a complete series of patches, some of which have been sent before and reviewed,
but I want to get the complete picture for others, and try to figure out how best to land this.

There are 3 pieces to this:
01->02: add support for global gpu stat counters (previously posted, patch 2 is newer)
03->06: port ttm pools to list_lru for numa awareness
07->13: add memcg stats + gpu apis, then port ttm pools to memcg aware list_lru and shrinker
14: enable amdgpu to use new functionality.
15: add a module option to turn it all off.

The biggest difference in the memcg code from previously is I discovered what
obj cgroups were designed for and I'm reusing the page/objcg intergration that 
already exists, to avoid reinventing that wheel right now.

There are some igt-gpu-tools tests I've written at:
https://gitlab.freedesktop.org/airlied/igt-gpu-tools/-/tree/amdgpu-cgroups?ref_type=heads

One problem is there are a lot of delayed action, that probably means the testing
needs a bit more robustness, but the tests validate all the basic paths.

Regards,
Dave.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 01/16] mm: add gpu active/reclaim per-node stat counters (v2)
  2025-10-16  2:31 drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v4) Dave Airlie
@ 2025-10-16  2:31 ` Dave Airlie
  2025-10-16  7:48   ` Christian König
  2025-10-16  2:31 ` [PATCH 02/16] drm/ttm: use gpu mm stats to track gpu memory allocations. (v4) Dave Airlie
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 18+ messages in thread
From: Dave Airlie @ 2025-10-16  2:31 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Dave Chinner, Waiman Long, simona

From: Dave Airlie <airlied@redhat.com>

While discussing memcg intergration with gpu memory allocations,
it was pointed out that there was no numa/system counters for
GPU memory allocations.

With more integrated memory GPU server systems turning up, and
more requirements for memory tracking it seems we should start
closing the gap.

Add two counters to track GPU per-node system memory allocations.

The first is currently allocated to GPU objects, and the second
is for memory that is stored in GPU page pools that can be reclaimed,
by the shrinker.

Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

---

v2: add more info to the documentation on this memory.

I'd like to get acks to merge this via the drm tree, if possible,

Dave.
---
 Documentation/filesystems/proc.rst | 8 ++++++++
 drivers/base/node.c                | 5 +++++
 fs/proc/meminfo.c                  | 6 ++++++
 include/linux/mmzone.h             | 2 ++
 mm/show_mem.c                      | 8 ++++++--
 mm/vmstat.c                        | 2 ++
 6 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 0b86a8022fa1..76e358274692 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -1088,6 +1088,8 @@ Example output. You may not have all of these fields.
     CmaFree:               0 kB
     Unaccepted:            0 kB
     Balloon:               0 kB
+    GPUActive:             0 kB
+    GPUReclaim:            0 kB
     HugePages_Total:       0
     HugePages_Free:        0
     HugePages_Rsvd:        0
@@ -1268,6 +1270,12 @@ Unaccepted
               Memory that has not been accepted by the guest
 Balloon
               Memory returned to Host by VM Balloon Drivers
+GPUActive
+              System memory allocated to active GPU objects
+GPUReclaim
+              System memory stored in GPU pools for reuse. This memory is not
+              counted in GPUActive. It is shrinker reclaimable memory kept in a reuse
+              pool because it has non-standard page table attributes, like WC or UC.
 HugePages_Total, HugePages_Free, HugePages_Rsvd, HugePages_Surp, Hugepagesize, Hugetlb
               See Documentation/admin-guide/mm/hugetlbpage.rst.
 DirectMap4k, DirectMap2M, DirectMap1G
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 83aeb0518e1d..c606b637f3f2 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -523,6 +523,8 @@ static ssize_t node_read_meminfo(struct device *dev,
 #ifdef CONFIG_UNACCEPTED_MEMORY
 			     "Node %d Unaccepted:     %8lu kB\n"
 #endif
+			     "Node %d GPUActive:      %8lu kB\n"
+			     "Node %d GPUReclaim:     %8lu kB\n"
 			     ,
 			     nid, K(node_page_state(pgdat, NR_FILE_DIRTY)),
 			     nid, K(node_page_state(pgdat, NR_WRITEBACK)),
@@ -556,6 +558,9 @@ static ssize_t node_read_meminfo(struct device *dev,
 			     ,
 			     nid, K(sum_zone_node_page_state(nid, NR_UNACCEPTED))
 #endif
+			     ,
+			     nid, K(node_page_state(pgdat, NR_GPU_ACTIVE)),
+			     nid, K(node_page_state(pgdat, NR_GPU_RECLAIM))
 			    );
 	len += hugetlb_report_node_meminfo(buf, len, nid);
 	return len;
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index a458f1e112fd..65ba49ec3a63 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -163,6 +163,12 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 	show_val_kb(m, "Balloon:        ",
 		    global_node_page_state(NR_BALLOON_PAGES));
 
+	show_val_kb(m, "GPUActive:      ",
+		    global_node_page_state(NR_GPU_ACTIVE));
+
+	show_val_kb(m, "GPUReclaim:     ",
+		    global_node_page_state(NR_GPU_RECLAIM));
+
 	hugetlb_report_meminfo(m);
 
 	arch_report_meminfo(m);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 7fb7331c5725..8455551b93f6 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -260,6 +260,8 @@ enum node_stat_item {
 #endif
 	NR_BALLOON_PAGES,
 	NR_KERNEL_FILE_PAGES,
+	NR_GPU_ACTIVE,          /* Pages assigned to GPU objects */
+	NR_GPU_RECLAIM,         /* Pages in shrinkable GPU pools */
 	NR_VM_NODE_STAT_ITEMS
 };
 
diff --git a/mm/show_mem.c b/mm/show_mem.c
index 3a4b5207635d..fb99465616cf 100644
--- a/mm/show_mem.c
+++ b/mm/show_mem.c
@@ -254,7 +254,9 @@ static void show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_z
 			" sec_pagetables:%lukB"
 			" all_unreclaimable? %s"
 			" Balloon:%lukB"
-			"\n",
+		        " gpu_active:%lukB"
+		        " gpu_reclaim:%lukB"
+		        "\n",
 			pgdat->node_id,
 			K(node_page_state(pgdat, NR_ACTIVE_ANON)),
 			K(node_page_state(pgdat, NR_INACTIVE_ANON)),
@@ -280,7 +282,9 @@ static void show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_z
 			K(node_page_state(pgdat, NR_SECONDARY_PAGETABLE)),
 			str_yes_no(atomic_read(&pgdat->kswapd_failures) >=
 				   MAX_RECLAIM_RETRIES),
-			K(node_page_state(pgdat, NR_BALLOON_PAGES)));
+		        K(node_page_state(pgdat, NR_BALLOON_PAGES)),
+		        K(node_page_state(pgdat, NR_GPU_ACTIVE)),
+			K(node_page_state(pgdat, NR_GPU_RECLAIM)));
 	}
 
 	for_each_populated_zone(zone) {
diff --git a/mm/vmstat.c b/mm/vmstat.c
index bb09c032eecf..b4df2b85739f 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1291,6 +1291,8 @@ const char * const vmstat_text[] = {
 #endif
 	[I(NR_BALLOON_PAGES)]			= "nr_balloon_pages",
 	[I(NR_KERNEL_FILE_PAGES)]		= "nr_kernel_file_pages",
+	[I(NR_GPU_ACTIVE)]			= "nr_gpu_active",
+	[I(NR_GPU_RECLAIM)]			= "nr_gpu_reclaim",
 #undef I
 
 	/* system-wide enum vm_stat_item counters */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 02/16] drm/ttm: use gpu mm stats to track gpu memory allocations. (v4)
  2025-10-16  2:31 drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v4) Dave Airlie
  2025-10-16  2:31 ` [PATCH 01/16] mm: add gpu active/reclaim per-node stat counters (v2) Dave Airlie
@ 2025-10-16  2:31 ` Dave Airlie
  2025-10-16  2:31 ` [PATCH 03/16] ttm/pool: port to list_lru. (v2) Dave Airlie
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Dave Airlie @ 2025-10-16  2:31 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Dave Chinner, Waiman Long, simona

From: Dave Airlie <airlied@redhat.com>

This uses the newly introduced per-node gpu tracking stats,
to track GPU memory allocated via TTM and reclaimable memory in
the TTM page pools.

These stats will be useful later for system information and
later when mem cgroups are integrated.

Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

---
v2: add reclaim parameters and adjust the right counters.
v3: drop the nid helper and get it from page.
v4: use mod_lruvec_page_state (Shakeel)
---
 drivers/gpu/drm/ttm/ttm_pool.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index baf27c70a419..148c7530738d 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -150,8 +150,10 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 
 	if (!pool->use_dma_alloc) {
 		p = alloc_pages_node(pool->nid, gfp_flags, order);
-		if (p)
+		if (p) {
 			p->private = order;
+			mod_lruvec_page_state(p, NR_GPU_ACTIVE, 1 << order);
+		}
 		return p;
 	}
 
@@ -186,7 +188,7 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 
 /* Reset the caching and pages of size 1 << order */
 static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
-			       unsigned int order, struct page *p)
+			       unsigned int order, struct page *p, bool reclaim)
 {
 	unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
 	struct ttm_pool_dma *dma;
@@ -201,6 +203,8 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
 #endif
 
 	if (!pool || !pool->use_dma_alloc) {
+		mod_lruvec_page_state(p, reclaim ? NR_GPU_RECLAIM : NR_GPU_ACTIVE,
+				      -(1 << order));
 		__free_pages(p, order);
 		return;
 	}
@@ -288,6 +292,9 @@ static void ttm_pool_type_give(struct ttm_pool_type *pt, struct page *p)
 	list_add(&p->lru, &pt->pages);
 	spin_unlock(&pt->lock);
 	atomic_long_add(1 << pt->order, &allocated_pages);
+
+	mod_lruvec_page_state(p, NR_GPU_ACTIVE, -num_pages);
+	mod_lruvec_page_state(p, NR_GPU_RECLAIM, num_pages);
 }
 
 /* Take pages from a specific pool_type, return NULL when nothing available */
@@ -299,6 +306,8 @@ static struct page *ttm_pool_type_take(struct ttm_pool_type *pt)
 	p = list_first_entry_or_null(&pt->pages, typeof(*p), lru);
 	if (p) {
 		atomic_long_sub(1 << pt->order, &allocated_pages);
+		mod_lruvec_page_state(p, NR_GPU_ACTIVE, (1 << pt->order));
+		mod_lruvec_page_state(p, NR_GPU_RECLAIM, -(1 << pt->order));
 		list_del(&p->lru);
 	}
 	spin_unlock(&pt->lock);
@@ -331,7 +340,7 @@ static void ttm_pool_type_fini(struct ttm_pool_type *pt)
 	spin_unlock(&shrinker_lock);
 
 	while ((p = ttm_pool_type_take(pt)))
-		ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
+		ttm_pool_free_page(pt->pool, pt->caching, pt->order, p, true);
 }
 
 /* Return the pool_type to use for the given caching and order */
@@ -383,7 +392,7 @@ static unsigned int ttm_pool_shrink(void)
 
 	p = ttm_pool_type_take(pt);
 	if (p) {
-		ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
+		ttm_pool_free_page(pt->pool, pt->caching, pt->order, p, true);
 		num_pages = 1 << pt->order;
 	} else {
 		num_pages = 0;
@@ -475,7 +484,7 @@ static pgoff_t ttm_pool_unmap_and_free(struct ttm_pool *pool, struct page *page,
 	if (pt)
 		ttm_pool_type_give(pt, page);
 	else
-		ttm_pool_free_page(pool, caching, order, page);
+		ttm_pool_free_page(pool, caching, order, page, false);
 
 	return nr;
 }
@@ -780,7 +789,7 @@ static int __ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 	return 0;
 
 error_free_page:
-	ttm_pool_free_page(pool, page_caching, order, p);
+	ttm_pool_free_page(pool, page_caching, order, p, false);
 
 error_free_all:
 	if (tt->restore)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 03/16] ttm/pool: port to list_lru. (v2)
  2025-10-16  2:31 drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v4) Dave Airlie
  2025-10-16  2:31 ` [PATCH 01/16] mm: add gpu active/reclaim per-node stat counters (v2) Dave Airlie
  2025-10-16  2:31 ` [PATCH 02/16] drm/ttm: use gpu mm stats to track gpu memory allocations. (v4) Dave Airlie
@ 2025-10-16  2:31 ` Dave Airlie
  2025-10-16  2:31 ` [PATCH 04/16] ttm/pool: drop numa specific pools Dave Airlie
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Dave Airlie @ 2025-10-16  2:31 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Dave Chinner, Waiman Long, simona

From: Dave Airlie <airlied@redhat.com>

This is an initial port of the TTM pools for
write combined and uncached pages to use the list_lru.

This makes the pool's more NUMA aware and avoids
needing separate NUMA pools (later commit enables this).

Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

---
v2: drop the pt->lock, lru list has it's own lock which is sufficent.
rearrange list isolates to fix bad locking orders.
---
 drivers/gpu/drm/ttm/tests/ttm_device_test.c |  2 +-
 drivers/gpu/drm/ttm/tests/ttm_pool_test.c   | 32 ++++----
 drivers/gpu/drm/ttm/ttm_pool.c              | 89 ++++++++++++++-------
 include/drm/ttm/ttm_pool.h                  |  7 +-
 mm/list_lru.c                               |  1 +
 5 files changed, 82 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/ttm/tests/ttm_device_test.c b/drivers/gpu/drm/ttm/tests/ttm_device_test.c
index 1621903818e5..1f207fd222bc 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_device_test.c
+++ b/drivers/gpu/drm/ttm/tests/ttm_device_test.c
@@ -183,7 +183,7 @@ static void ttm_device_init_pools(struct kunit *test)
 
 				if (params->use_dma_alloc)
 					KUNIT_ASSERT_FALSE(test,
-							   list_empty(&pt.pages));
+							   !list_lru_count(&pt.pages));
 			}
 		}
 	}
diff --git a/drivers/gpu/drm/ttm/tests/ttm_pool_test.c b/drivers/gpu/drm/ttm/tests/ttm_pool_test.c
index 8ade53371f72..39234a3e98c4 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_pool_test.c
+++ b/drivers/gpu/drm/ttm/tests/ttm_pool_test.c
@@ -248,7 +248,7 @@ static void ttm_pool_alloc_order_caching_match(struct kunit *test)
 	pool = ttm_pool_pre_populated(test, size, caching);
 
 	pt = &pool->caching[caching].orders[order];
-	KUNIT_ASSERT_FALSE(test, list_empty(&pt->pages));
+	KUNIT_ASSERT_FALSE(test, !list_lru_count(&pt->pages));
 
 	tt = ttm_tt_kunit_init(test, 0, caching, size);
 	KUNIT_ASSERT_NOT_NULL(test, tt);
@@ -256,7 +256,7 @@ static void ttm_pool_alloc_order_caching_match(struct kunit *test)
 	err = ttm_pool_alloc(pool, tt, &simple_ctx);
 	KUNIT_ASSERT_EQ(test, err, 0);
 
-	KUNIT_ASSERT_TRUE(test, list_empty(&pt->pages));
+	KUNIT_ASSERT_TRUE(test, !list_lru_count(&pt->pages));
 
 	ttm_pool_free(pool, tt);
 	ttm_tt_fini(tt);
@@ -282,8 +282,8 @@ static void ttm_pool_alloc_caching_mismatch(struct kunit *test)
 	tt = ttm_tt_kunit_init(test, 0, tt_caching, size);
 	KUNIT_ASSERT_NOT_NULL(test, tt);
 
-	KUNIT_ASSERT_FALSE(test, list_empty(&pt_pool->pages));
-	KUNIT_ASSERT_TRUE(test, list_empty(&pt_tt->pages));
+	KUNIT_ASSERT_FALSE(test, !list_lru_count(&pt_pool->pages));
+	KUNIT_ASSERT_TRUE(test, !list_lru_count(&pt_tt->pages));
 
 	err = ttm_pool_alloc(pool, tt, &simple_ctx);
 	KUNIT_ASSERT_EQ(test, err, 0);
@@ -291,8 +291,8 @@ static void ttm_pool_alloc_caching_mismatch(struct kunit *test)
 	ttm_pool_free(pool, tt);
 	ttm_tt_fini(tt);
 
-	KUNIT_ASSERT_FALSE(test, list_empty(&pt_pool->pages));
-	KUNIT_ASSERT_FALSE(test, list_empty(&pt_tt->pages));
+	KUNIT_ASSERT_FALSE(test, !list_lru_count(&pt_pool->pages));
+	KUNIT_ASSERT_FALSE(test, !list_lru_count(&pt_tt->pages));
 
 	ttm_pool_fini(pool);
 }
@@ -316,8 +316,8 @@ static void ttm_pool_alloc_order_mismatch(struct kunit *test)
 	tt = ttm_tt_kunit_init(test, 0, caching, snd_size);
 	KUNIT_ASSERT_NOT_NULL(test, tt);
 
-	KUNIT_ASSERT_FALSE(test, list_empty(&pt_pool->pages));
-	KUNIT_ASSERT_TRUE(test, list_empty(&pt_tt->pages));
+	KUNIT_ASSERT_FALSE(test, !list_lru_count(&pt_pool->pages));
+	KUNIT_ASSERT_TRUE(test, !list_lru_count(&pt_tt->pages));
 
 	err = ttm_pool_alloc(pool, tt, &simple_ctx);
 	KUNIT_ASSERT_EQ(test, err, 0);
@@ -325,8 +325,8 @@ static void ttm_pool_alloc_order_mismatch(struct kunit *test)
 	ttm_pool_free(pool, tt);
 	ttm_tt_fini(tt);
 
-	KUNIT_ASSERT_FALSE(test, list_empty(&pt_pool->pages));
-	KUNIT_ASSERT_FALSE(test, list_empty(&pt_tt->pages));
+	KUNIT_ASSERT_FALSE(test, !list_lru_count(&pt_pool->pages));
+	KUNIT_ASSERT_FALSE(test, !list_lru_count(&pt_tt->pages));
 
 	ttm_pool_fini(pool);
 }
@@ -352,12 +352,12 @@ static void ttm_pool_free_dma_alloc(struct kunit *test)
 	ttm_pool_alloc(pool, tt, &simple_ctx);
 
 	pt = &pool->caching[caching].orders[order];
-	KUNIT_ASSERT_TRUE(test, list_empty(&pt->pages));
+	KUNIT_ASSERT_TRUE(test, !list_lru_count(&pt->pages));
 
 	ttm_pool_free(pool, tt);
 	ttm_tt_fini(tt);
 
-	KUNIT_ASSERT_FALSE(test, list_empty(&pt->pages));
+	KUNIT_ASSERT_FALSE(test, !list_lru_count(&pt->pages));
 
 	ttm_pool_fini(pool);
 }
@@ -383,12 +383,12 @@ static void ttm_pool_free_no_dma_alloc(struct kunit *test)
 	ttm_pool_alloc(pool, tt, &simple_ctx);
 
 	pt = &pool->caching[caching].orders[order];
-	KUNIT_ASSERT_TRUE(test, list_is_singular(&pt->pages));
+	KUNIT_ASSERT_TRUE(test, list_lru_count(&pt->pages) == 1);
 
 	ttm_pool_free(pool, tt);
 	ttm_tt_fini(tt);
 
-	KUNIT_ASSERT_TRUE(test, list_is_singular(&pt->pages));
+	KUNIT_ASSERT_TRUE(test, list_lru_count(&pt->pages) == 1);
 
 	ttm_pool_fini(pool);
 }
@@ -404,11 +404,11 @@ static void ttm_pool_fini_basic(struct kunit *test)
 	pool = ttm_pool_pre_populated(test, size, caching);
 	pt = &pool->caching[caching].orders[order];
 
-	KUNIT_ASSERT_FALSE(test, list_empty(&pt->pages));
+	KUNIT_ASSERT_FALSE(test, !list_lru_count(&pt->pages));
 
 	ttm_pool_fini(pool);
 
-	KUNIT_ASSERT_TRUE(test, list_empty(&pt->pages));
+	KUNIT_ASSERT_TRUE(test, !list_lru_count(&pt->pages));
 }
 
 static struct kunit_case ttm_pool_test_cases[] = {
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 148c7530738d..e236f78c5d9d 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -131,6 +131,15 @@ static struct list_head shrinker_list;
 static struct shrinker *mm_shrinker;
 static DECLARE_RWSEM(pool_shrink_rwsem);
 
+static int ttm_pool_nid(struct ttm_pool *pool) {
+	int nid = NUMA_NO_NODE;
+	if (pool)
+		nid = pool->nid;
+	if (nid == NUMA_NO_NODE)
+		nid = numa_node_id();
+	return nid;
+}
+
 /* Allocate pages of size 1 << order with the given gfp_flags */
 static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 					unsigned int order)
@@ -288,30 +297,41 @@ static void ttm_pool_type_give(struct ttm_pool_type *pt, struct page *p)
 			clear_page(page_address(p + i));
 	}
 
-	spin_lock(&pt->lock);
-	list_add(&p->lru, &pt->pages);
-	spin_unlock(&pt->lock);
+	INIT_LIST_HEAD(&p->lru);
+	rcu_read_lock();
+	list_lru_add(&pt->pages, &p->lru, page_to_nid(p), NULL);
+	rcu_read_unlock();
 	atomic_long_add(1 << pt->order, &allocated_pages);
 
 	mod_lruvec_page_state(p, NR_GPU_ACTIVE, -num_pages);
 	mod_lruvec_page_state(p, NR_GPU_RECLAIM, num_pages);
 }
 
+static enum lru_status take_one_from_lru(struct list_head *item,
+					 struct list_lru_one *list,
+					 void *cb_arg)
+{
+	struct page **out_page = cb_arg;
+	struct page *p = container_of(item, struct page, lru);
+	list_lru_isolate(list, item);
+
+	*out_page = p;
+	return LRU_REMOVED;
+}
+
 /* Take pages from a specific pool_type, return NULL when nothing available */
-static struct page *ttm_pool_type_take(struct ttm_pool_type *pt)
+static struct page *ttm_pool_type_take(struct ttm_pool_type *pt, int nid)
 {
-	struct page *p;
+	int ret;
+	struct page *p = NULL;
+	unsigned long nr_to_walk = 1;
 
-	spin_lock(&pt->lock);
-	p = list_first_entry_or_null(&pt->pages, typeof(*p), lru);
-	if (p) {
+	ret = list_lru_walk_node(&pt->pages, nid, take_one_from_lru, (void *)&p, &nr_to_walk);
+	if (ret == 1 && p) {
 		atomic_long_sub(1 << pt->order, &allocated_pages);
 		mod_lruvec_page_state(p, NR_GPU_ACTIVE, (1 << pt->order));
 		mod_lruvec_page_state(p, NR_GPU_RECLAIM, -(1 << pt->order));
-		list_del(&p->lru);
 	}
-	spin_unlock(&pt->lock);
-
 	return p;
 }
 
@@ -322,25 +342,47 @@ static void ttm_pool_type_init(struct ttm_pool_type *pt, struct ttm_pool *pool,
 	pt->pool = pool;
 	pt->caching = caching;
 	pt->order = order;
-	spin_lock_init(&pt->lock);
-	INIT_LIST_HEAD(&pt->pages);
+	list_lru_init(&pt->pages);
 
 	spin_lock(&shrinker_lock);
 	list_add_tail(&pt->shrinker_list, &shrinker_list);
 	spin_unlock(&shrinker_lock);
 }
 
+static enum lru_status pool_move_to_dispose_list(struct list_head *item,
+						 struct list_lru_one *list,
+						 void *cb_arg)
+{
+	struct list_head *dispose = cb_arg;
+
+	list_lru_isolate_move(list, item, dispose);
+
+	return LRU_REMOVED;
+}
+
+static void ttm_pool_dispose_list(struct ttm_pool_type *pt,
+				  struct list_head *dispose)
+{
+	while (!list_empty(dispose)) {
+		struct page *p;
+		p = list_first_entry(dispose, struct page, lru);
+		list_del_init(&p->lru);
+		atomic_long_sub(1 << pt->order, &allocated_pages);
+		ttm_pool_free_page(pt->pool, pt->caching, pt->order, p, true);
+	}
+}
+
 /* Remove a pool_type from the global shrinker list and free all pages */
 static void ttm_pool_type_fini(struct ttm_pool_type *pt)
 {
-	struct page *p;
+	LIST_HEAD(dispose);
 
 	spin_lock(&shrinker_lock);
 	list_del(&pt->shrinker_list);
 	spin_unlock(&shrinker_lock);
 
-	while ((p = ttm_pool_type_take(pt)))
-		ttm_pool_free_page(pt->pool, pt->caching, pt->order, p, true);
+	list_lru_walk(&pt->pages, pool_move_to_dispose_list, &dispose, LONG_MAX);
+	ttm_pool_dispose_list(pt, &dispose);
 }
 
 /* Return the pool_type to use for the given caching and order */
@@ -390,7 +432,7 @@ static unsigned int ttm_pool_shrink(void)
 	list_move_tail(&pt->shrinker_list, &shrinker_list);
 	spin_unlock(&shrinker_lock);
 
-	p = ttm_pool_type_take(pt);
+	p = ttm_pool_type_take(pt, ttm_pool_nid(pt->pool));
 	if (p) {
 		ttm_pool_free_page(pt->pool, pt->caching, pt->order, p, true);
 		num_pages = 1 << pt->order;
@@ -744,7 +786,7 @@ static int __ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 		p = NULL;
 		pt = ttm_pool_select_type(pool, page_caching, order);
 		if (pt && allow_pools)
-			p = ttm_pool_type_take(pt);
+			p = ttm_pool_type_take(pt, ttm_pool_nid(pool));
 		/*
 		 * If that fails or previously failed, allocate from system.
 		 * Note that this also disallows additional pool allocations using
@@ -1173,16 +1215,7 @@ static unsigned long ttm_pool_shrinker_count(struct shrinker *shrink,
 /* Count the number of pages available in a pool_type */
 static unsigned int ttm_pool_type_count(struct ttm_pool_type *pt)
 {
-	unsigned int count = 0;
-	struct page *p;
-
-	spin_lock(&pt->lock);
-	/* Only used for debugfs, the overhead doesn't matter */
-	list_for_each_entry(p, &pt->pages, lru)
-		++count;
-	spin_unlock(&pt->lock);
-
-	return count;
+	return list_lru_count(&pt->pages);
 }
 
 /* Print a nice header for the order */
diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h
index 54cd34a6e4c0..82124cb5c9e4 100644
--- a/include/drm/ttm/ttm_pool.h
+++ b/include/drm/ttm/ttm_pool.h
@@ -29,6 +29,7 @@
 #include <linux/mmzone.h>
 #include <linux/llist.h>
 #include <linux/spinlock.h>
+#include <linux/list_lru.h>
 #include <drm/ttm/ttm_caching.h>
 
 struct device;
@@ -45,8 +46,7 @@ struct ttm_tt;
  * @order: the allocation order our pages have
  * @caching: the caching type our pages have
  * @shrinker_list: our place on the global shrinker list
- * @lock: protection of the page list
- * @pages: the list of pages in the pool
+ * @pages: the lru_list of pages in the pool
  */
 struct ttm_pool_type {
 	struct ttm_pool *pool;
@@ -55,8 +55,7 @@ struct ttm_pool_type {
 
 	struct list_head shrinker_list;
 
-	spinlock_t lock;
-	struct list_head pages;
+	struct list_lru pages;
 };
 
 /**
diff --git a/mm/list_lru.c b/mm/list_lru.c
index ec48b5dadf51..627589d75320 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -179,6 +179,7 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid,
 	unlock_list_lru(l, false);
 	return false;
 }
+EXPORT_SYMBOL_GPL(list_lru_add);
 
 bool list_lru_add_obj(struct list_lru *lru, struct list_head *item)
 {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 04/16] ttm/pool: drop numa specific pools
  2025-10-16  2:31 drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v4) Dave Airlie
                   ` (2 preceding siblings ...)
  2025-10-16  2:31 ` [PATCH 03/16] ttm/pool: port to list_lru. (v2) Dave Airlie
@ 2025-10-16  2:31 ` Dave Airlie
  2025-10-16  2:31 ` [PATCH 05/16] ttm/pool: make pool shrinker NUMA aware Dave Airlie
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Dave Airlie @ 2025-10-16  2:31 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Dave Chinner, Waiman Long, simona

From: Dave Airlie <airlied@redhat.com>

The list_lru will now handle numa for us, so need to keep
separate pool types for it. Just consoldiate into the global ones.

This adds a debugfs change to avoid dumping non-existant orders due
to this change.

Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index e236f78c5d9d..02c19395080c 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -396,17 +396,11 @@ static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool,
 #ifdef CONFIG_X86
 	switch (caching) {
 	case ttm_write_combined:
-		if (pool->nid != NUMA_NO_NODE)
-			return &pool->caching[caching].orders[order];
-
 		if (pool->use_dma32)
 			return &global_dma32_write_combined[order];
 
 		return &global_write_combined[order];
 	case ttm_uncached:
-		if (pool->nid != NUMA_NO_NODE)
-			return &pool->caching[caching].orders[order];
-
 		if (pool->use_dma32)
 			return &global_dma32_uncached[order];
 
@@ -1281,7 +1275,7 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m)
 {
 	unsigned int i;
 
-	if (!pool->use_dma_alloc && pool->nid == NUMA_NO_NODE) {
+	if (!pool->use_dma_alloc) {
 		seq_puts(m, "unused\n");
 		return 0;
 	}
@@ -1292,10 +1286,7 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m)
 	for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) {
 		if (!ttm_pool_select_type(pool, i, 0))
 			continue;
-		if (pool->use_dma_alloc)
-			seq_puts(m, "DMA ");
-		else
-			seq_printf(m, "N%d ", pool->nid);
+		seq_puts(m, "DMA ");
 		switch (i) {
 		case ttm_cached:
 			seq_puts(m, "\t:");
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 05/16] ttm/pool: make pool shrinker NUMA aware
  2025-10-16  2:31 drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v4) Dave Airlie
                   ` (3 preceding siblings ...)
  2025-10-16  2:31 ` [PATCH 04/16] ttm/pool: drop numa specific pools Dave Airlie
@ 2025-10-16  2:31 ` Dave Airlie
  2025-10-16  2:31 ` [PATCH 06/16] ttm/pool: track allocated_pages per numa node Dave Airlie
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Dave Airlie @ 2025-10-16  2:31 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Dave Chinner, Waiman Long, simona

From: Dave Airlie <airlied@redhat.com>

This enable NUMA awareness for the shrinker on the
ttm pools.

Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 38 +++++++++++++++++++---------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 02c19395080c..ae54f01f240b 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -413,12 +413,12 @@ static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool,
 	return NULL;
 }
 
-/* Free pages using the global shrinker list */
-static unsigned int ttm_pool_shrink(void)
+/* Free pages using the per-node shrinker list */
+static unsigned int ttm_pool_shrink(int nid, unsigned long num_to_free)
 {
+	LIST_HEAD(dispose);
 	struct ttm_pool_type *pt;
 	unsigned int num_pages;
-	struct page *p;
 
 	down_read(&pool_shrink_rwsem);
 	spin_lock(&shrinker_lock);
@@ -426,13 +426,10 @@ static unsigned int ttm_pool_shrink(void)
 	list_move_tail(&pt->shrinker_list, &shrinker_list);
 	spin_unlock(&shrinker_lock);
 
-	p = ttm_pool_type_take(pt, ttm_pool_nid(pt->pool));
-	if (p) {
-		ttm_pool_free_page(pt->pool, pt->caching, pt->order, p, true);
-		num_pages = 1 << pt->order;
-	} else {
-		num_pages = 0;
-	}
+	num_pages = list_lru_walk_node(&pt->pages, nid, pool_move_to_dispose_list, &dispose, &num_to_free);
+	num_pages *= 1 << pt->order;
+
+	ttm_pool_dispose_list(pt, &dispose);
 	up_read(&pool_shrink_rwsem);
 
 	return num_pages;
@@ -781,6 +778,7 @@ static int __ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 		pt = ttm_pool_select_type(pool, page_caching, order);
 		if (pt && allow_pools)
 			p = ttm_pool_type_take(pt, ttm_pool_nid(pool));
+
 		/*
 		 * If that fails or previously failed, allocate from system.
 		 * Note that this also disallows additional pool allocations using
@@ -929,8 +927,10 @@ void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt)
 {
 	ttm_pool_free_range(pool, tt, tt->caching, 0, tt->num_pages);
 
-	while (atomic_long_read(&allocated_pages) > page_pool_size)
-		ttm_pool_shrink();
+	while (atomic_long_read(&allocated_pages) > page_pool_size) {
+		unsigned long diff = page_pool_size - atomic_long_read(&allocated_pages);
+		ttm_pool_shrink(ttm_pool_nid(pool), diff);
+	}
 }
 EXPORT_SYMBOL(ttm_pool_free);
 
@@ -1187,7 +1187,7 @@ static unsigned long ttm_pool_shrinker_scan(struct shrinker *shrink,
 	unsigned long num_freed = 0;
 
 	do
-		num_freed += ttm_pool_shrink();
+		num_freed += ttm_pool_shrink(sc->nid, sc->nr_to_scan);
 	while (num_freed < sc->nr_to_scan &&
 	       atomic_long_read(&allocated_pages));
 
@@ -1315,11 +1315,15 @@ static int ttm_pool_debugfs_shrink_show(struct seq_file *m, void *data)
 		.nr_to_scan = TTM_SHRINKER_BATCH,
 	};
 	unsigned long count;
+	int nid;
 
 	fs_reclaim_acquire(GFP_KERNEL);
-	count = ttm_pool_shrinker_count(mm_shrinker, &sc);
-	seq_printf(m, "%lu/%lu\n", count,
-		   ttm_pool_shrinker_scan(mm_shrinker, &sc));
+	for_each_node(nid) {
+		sc.nid = nid;
+		count = ttm_pool_shrinker_count(mm_shrinker, &sc);
+		seq_printf(m, "%d: %lu/%lu\n", nid, count,
+			   ttm_pool_shrinker_scan(mm_shrinker, &sc));
+	}
 	fs_reclaim_release(GFP_KERNEL);
 
 	return 0;
@@ -1367,7 +1371,7 @@ int ttm_pool_mgr_init(unsigned long num_pages)
 #endif
 #endif
 
-	mm_shrinker = shrinker_alloc(0, "drm-ttm_pool");
+	mm_shrinker = shrinker_alloc(SHRINKER_NUMA_AWARE, "drm-ttm_pool");
 	if (!mm_shrinker)
 		return -ENOMEM;
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 06/16] ttm/pool: track allocated_pages per numa node.
  2025-10-16  2:31 drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v4) Dave Airlie
                   ` (4 preceding siblings ...)
  2025-10-16  2:31 ` [PATCH 05/16] ttm/pool: make pool shrinker NUMA aware Dave Airlie
@ 2025-10-16  2:31 ` Dave Airlie
  2025-10-16  2:31 ` [PATCH 07/16] memcg: add support for GPU page counters. (v3) Dave Airlie
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Dave Airlie @ 2025-10-16  2:31 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Dave Chinner, Waiman Long, simona

From: Dave Airlie <airlied@redhat.com>

This gets the memory sizes from the nodes and stores the limit
as 50% of those. I think eventually we should drop the limits
once we have memcg aware shrinking, but this should be more NUMA
friendly, and I think seems like what people would prefer to
happen on NUMA aware systems.

Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 60 +++++++++++++++++++++++++---------
 1 file changed, 45 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index ae54f01f240b..a6b055256150 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -115,10 +115,11 @@ struct ttm_pool_tt_restore {
 
 static unsigned long page_pool_size;
 
-MODULE_PARM_DESC(page_pool_size, "Number of pages in the WC/UC/DMA pool");
+MODULE_PARM_DESC(page_pool_size, "Number of pages in the WC/UC/DMA pool per NUMA node");
 module_param(page_pool_size, ulong, 0644);
 
-static atomic_long_t allocated_pages;
+static unsigned long pool_node_limit[MAX_NUMNODES];
+static atomic_long_t allocated_pages[MAX_NUMNODES];
 
 static struct ttm_pool_type global_write_combined[NR_PAGE_ORDERS];
 static struct ttm_pool_type global_uncached[NR_PAGE_ORDERS];
@@ -289,6 +290,7 @@ static void ttm_pool_unmap(struct ttm_pool *pool, dma_addr_t dma_addr,
 static void ttm_pool_type_give(struct ttm_pool_type *pt, struct page *p)
 {
 	unsigned int i, num_pages = 1 << pt->order;
+	int nid = page_to_nid(p);
 
 	for (i = 0; i < num_pages; ++i) {
 		if (PageHighMem(p))
@@ -299,10 +301,10 @@ static void ttm_pool_type_give(struct ttm_pool_type *pt, struct page *p)
 
 	INIT_LIST_HEAD(&p->lru);
 	rcu_read_lock();
-	list_lru_add(&pt->pages, &p->lru, page_to_nid(p), NULL);
+	list_lru_add(&pt->pages, &p->lru, nid, NULL);
 	rcu_read_unlock();
-	atomic_long_add(1 << pt->order, &allocated_pages);
 
+	atomic_long_add(num_pages, &allocated_pages[nid]);	
 	mod_lruvec_page_state(p, NR_GPU_ACTIVE, -num_pages);
 	mod_lruvec_page_state(p, NR_GPU_RECLAIM, num_pages);
 }
@@ -328,7 +330,7 @@ static struct page *ttm_pool_type_take(struct ttm_pool_type *pt, int nid)
 
 	ret = list_lru_walk_node(&pt->pages, nid, take_one_from_lru, (void *)&p, &nr_to_walk);
 	if (ret == 1 && p) {
-		atomic_long_sub(1 << pt->order, &allocated_pages);
+		atomic_long_sub(1 << pt->order, &allocated_pages[nid]);
 		mod_lruvec_page_state(p, NR_GPU_ACTIVE, (1 << pt->order));
 		mod_lruvec_page_state(p, NR_GPU_RECLAIM, -(1 << pt->order));
 	}
@@ -367,7 +369,7 @@ static void ttm_pool_dispose_list(struct ttm_pool_type *pt,
 		struct page *p;
 		p = list_first_entry(dispose, struct page, lru);
 		list_del_init(&p->lru);
-		atomic_long_sub(1 << pt->order, &allocated_pages);
+		atomic_long_sub(1 << pt->order, &allocated_pages[page_to_nid(p)]);
 		ttm_pool_free_page(pt->pool, pt->caching, pt->order, p, true);
 	}
 }
@@ -925,11 +927,13 @@ int ttm_pool_restore_and_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
  */
 void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt)
 {
+	int nid = ttm_pool_nid(pool);
+
 	ttm_pool_free_range(pool, tt, tt->caching, 0, tt->num_pages);
 
-	while (atomic_long_read(&allocated_pages) > page_pool_size) {
-		unsigned long diff = page_pool_size - atomic_long_read(&allocated_pages);
-		ttm_pool_shrink(ttm_pool_nid(pool), diff);
+	while (atomic_long_read(&allocated_pages[nid]) > pool_node_limit[nid]) {
+		unsigned long diff = pool_node_limit[nid] - atomic_long_read(&allocated_pages[nid]);
+		ttm_pool_shrink(nid, diff);
 	}
 }
 EXPORT_SYMBOL(ttm_pool_free);
@@ -1189,7 +1193,7 @@ static unsigned long ttm_pool_shrinker_scan(struct shrinker *shrink,
 	do
 		num_freed += ttm_pool_shrink(sc->nid, sc->nr_to_scan);
 	while (num_freed < sc->nr_to_scan &&
-	       atomic_long_read(&allocated_pages));
+	       atomic_long_read(&allocated_pages[sc->nid]));
 
 	sc->nr_scanned = num_freed;
 
@@ -1200,7 +1204,7 @@ static unsigned long ttm_pool_shrinker_scan(struct shrinker *shrink,
 static unsigned long ttm_pool_shrinker_count(struct shrinker *shrink,
 					     struct shrink_control *sc)
 {
-	unsigned long num_pages = atomic_long_read(&allocated_pages);
+	unsigned long num_pages = atomic_long_read(&allocated_pages[sc->nid]);
 
 	return num_pages ? num_pages : SHRINK_EMPTY;
 }
@@ -1237,8 +1241,12 @@ static void ttm_pool_debugfs_orders(struct ttm_pool_type *pt,
 /* Dump the total amount of allocated pages */
 static void ttm_pool_debugfs_footer(struct seq_file *m)
 {
-	seq_printf(m, "\ntotal\t: %8lu of %8lu\n",
-		   atomic_long_read(&allocated_pages), page_pool_size);
+	int nid;
+
+	for_each_node(nid) {
+		seq_printf(m, "\ntotal node%d\t: %8lu of %8lu\n", nid,
+			   atomic_long_read(&allocated_pages[nid]), pool_node_limit[nid]);
+	}
 }
 
 /* Dump the information for the global pools */
@@ -1332,6 +1340,22 @@ DEFINE_SHOW_ATTRIBUTE(ttm_pool_debugfs_shrink);
 
 #endif
 
+static inline uint64_t ttm_get_node_memory_size(int nid)
+{
+	/* This is directly using si_meminfo_node implementation as the
+	 * function is not exported.
+	 */
+	int zone_type;
+	uint64_t managed_pages = 0;
+
+	pg_data_t *pgdat = NODE_DATA(nid);
+
+	for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++)
+		managed_pages +=
+			zone_managed_pages(&pgdat->node_zones[zone_type]);
+	return managed_pages * PAGE_SIZE;
+}
+
 /**
  * ttm_pool_mgr_init - Initialize globals
  *
@@ -1343,8 +1367,14 @@ int ttm_pool_mgr_init(unsigned long num_pages)
 {
 	unsigned int i;
 
-	if (!page_pool_size)
-		page_pool_size = num_pages;
+	int nid;
+	for_each_node(nid) {
+		if (!page_pool_size) {
+			uint64_t node_size = ttm_get_node_memory_size(nid);
+			pool_node_limit[nid] = (node_size >> PAGE_SHIFT) / 2;
+		} else
+			pool_node_limit[nid] = page_pool_size;
+	}
 
 	spin_lock_init(&shrinker_lock);
 	INIT_LIST_HEAD(&shrinker_list);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 07/16] memcg: add support for GPU page counters. (v3)
  2025-10-16  2:31 drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v4) Dave Airlie
                   ` (5 preceding siblings ...)
  2025-10-16  2:31 ` [PATCH 06/16] ttm/pool: track allocated_pages per numa node Dave Airlie
@ 2025-10-16  2:31 ` Dave Airlie
  2025-10-16  2:31 ` [PATCH 08/16] ttm: add a memcg accounting flag to the alloc/populate APIs Dave Airlie
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Dave Airlie @ 2025-10-16  2:31 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Dave Chinner, Waiman Long, simona

From: Dave Airlie <airlied@redhat.com>

This introduces 2 new statistics and 3 new memcontrol APIs for dealing
with GPU system memory allocations.

The stats corresponds to the same stats in the global vmstat,
for number of active GPU pages, and number of pages in pools that
can be reclaimed.

The first API charges a order of pages to a objcg, and sets
the objcg on the pages like kmem does, and updates the active/reclaim
statistic.

The second API uncharges a page from the obj cgroup it is currently charged
to.

The third API allows moving a page to/from reclaim and between obj cgroups.
When pages are added to the pool lru, this just updates accounting.
When pages are being removed from a pool lru, they can be taken from
the parent objcg so this allows them to be uncharged from there and transferred
to a new child objcg.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
---
v2: use memcg_node_stat_items
v3: fix null ptr dereference in uncharge
---
 Documentation/admin-guide/cgroup-v2.rst |   6 ++
 include/linux/memcontrol.h              |  11 +++
 mm/memcontrol.c                         | 107 ++++++++++++++++++++++++
 3 files changed, 124 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 0e6c67ac585a..9aa9b28562b8 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1574,6 +1574,12 @@ The following nested keys are defined.
 	  vmalloc (npn)
 		Amount of memory used for vmap backed memory.
 
+	  gpu_active (npn)
+		Amount of system memory used for GPU devices.
+
+	  gpu_reclaim (npn)
+		Amount of system memory cached for GPU devices.
+
 	  shmem
 		Amount of cached filesystem data that is swap-backed,
 		such as tmpfs, shm segments, shared anonymous mmap()s
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 873e510d6f8d..62c46c33f84f 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1624,6 +1624,17 @@ static inline void mem_cgroup_flush_foreign(struct bdi_writeback *wb)
 #endif	/* CONFIG_CGROUP_WRITEBACK */
 
 struct sock;
+bool mem_cgroup_charge_gpu_page(struct obj_cgroup *objcg, struct page *page,
+			   unsigned int nr_pages,
+			   gfp_t gfp_mask, bool reclaim);
+void mem_cgroup_uncharge_gpu_page(struct page *page,
+				  unsigned int nr_pages,
+				  bool reclaim);
+bool mem_cgroup_move_gpu_page_reclaim(struct obj_cgroup *objcg,
+				      struct page *page,
+				      unsigned int order,
+				      bool to_reclaim);
+
 #ifdef CONFIG_MEMCG
 extern struct static_key_false memcg_sockets_enabled_key;
 #define mem_cgroup_sockets_enabled static_branch_unlikely(&memcg_sockets_enabled_key)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4deda33625f4..ece340f3e391 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -330,6 +330,8 @@ static const unsigned int memcg_node_stat_items[] = {
 #ifdef CONFIG_HUGETLB_PAGE
 	NR_HUGETLB,
 #endif
+	NR_GPU_ACTIVE,
+	NR_GPU_RECLAIM,
 };
 
 static const unsigned int memcg_stat_items[] = {
@@ -1341,6 +1343,8 @@ static const struct memory_stat memory_stats[] = {
 	{ "percpu",			MEMCG_PERCPU_B			},
 	{ "sock",			MEMCG_SOCK			},
 	{ "vmalloc",			MEMCG_VMALLOC			},
+	{ "gpu_active",			NR_GPU_ACTIVE			},
+	{ "gpu_reclaim",		NR_GPU_RECLAIM	                },
 	{ "shmem",			NR_SHMEM			},
 #ifdef CONFIG_ZSWAP
 	{ "zswap",			MEMCG_ZSWAP_B			},
@@ -5085,6 +5089,109 @@ void mem_cgroup_sk_uncharge(const struct sock *sk, unsigned int nr_pages)
 	refill_stock(memcg, nr_pages);
 }
 
+/**
+ * mem_cgroup_charge_gpu_page - charge a page to GPU memory tracking
+ * @objcg: objcg to charge, NULL charges root memcg
+ * @page: page to charge
+ * @order: page allocation order
+ * @gfp_mask: gfp mode
+ * @reclaim: charge the reclaim counter instead of the active one.
+ *
+ * Charge the order sized @page to the objcg. Returns %true if the charge fit within
+ * @objcg's configured limit, %false if it doesn't.
+ */
+bool mem_cgroup_charge_gpu_page(struct obj_cgroup *objcg, struct page *page,
+				unsigned int order, gfp_t gfp_mask, bool reclaim)
+{
+	unsigned int nr_pages = 1 << order;
+	struct mem_cgroup *memcg = NULL;
+	struct lruvec *lruvec;
+	int ret;
+
+	if (objcg) {
+		memcg = get_mem_cgroup_from_objcg(objcg);
+
+		ret = try_charge_memcg(memcg, gfp_mask, nr_pages);
+		if (ret) {
+			mem_cgroup_put(memcg);
+			return false;
+		}
+
+		obj_cgroup_get(objcg);
+		page_set_objcg(page, objcg);
+	}
+
+	lruvec = mem_cgroup_lruvec(memcg, page_pgdat(page));
+	mod_lruvec_state(lruvec, reclaim ? NR_GPU_RECLAIM : NR_GPU_ACTIVE, nr_pages);
+
+	mem_cgroup_put(memcg);
+	return true;
+}
+EXPORT_SYMBOL_GPL(mem_cgroup_charge_gpu_page);
+
+/**
+ * mem_cgroup_uncharge_gpu_page - uncharge a page from GPU memory tracking
+ * @page: page to uncharge
+ * @order: order of the page allocation
+ * @reclaim: uncharge the reclaim counter instead of the active.
+ */
+void mem_cgroup_uncharge_gpu_page(struct page *page,
+				  unsigned int order, bool reclaim)
+{
+	struct obj_cgroup *objcg = page_objcg(page);
+	struct mem_cgroup *memcg;
+	struct lruvec *lruvec;
+	int nr_pages = 1 << order;
+
+	memcg = objcg ? get_mem_cgroup_from_objcg(objcg) : NULL;
+
+	lruvec = mem_cgroup_lruvec(memcg, page_pgdat(page));
+	mod_lruvec_state(lruvec, reclaim ? NR_GPU_RECLAIM : NR_GPU_ACTIVE, -nr_pages);
+
+	if (memcg && !mem_cgroup_is_root(memcg))
+		refill_stock(memcg, nr_pages);
+	page->memcg_data = 0;
+	obj_cgroup_put(objcg);
+	mem_cgroup_put(memcg);
+}
+EXPORT_SYMBOL_GPL(mem_cgroup_uncharge_gpu_page);
+
+/**
+ * mem_cgroup_move_gpu_reclaim - move pages from gpu to gpu reclaim and back
+ * @new_objcg: objcg to move page to, NULL if just stats update.
+ * @nr_pages: number of pages to move
+ * @to_reclaim: true moves pages into reclaim, false moves them back
+ */
+bool mem_cgroup_move_gpu_page_reclaim(struct obj_cgroup *new_objcg,
+				      struct page *page,
+				      unsigned int order,
+				      bool to_reclaim)
+{
+	struct obj_cgroup *objcg = page_objcg(page);
+
+	if (!objcg)
+		return false;
+
+	if (!new_objcg || objcg == new_objcg) {
+		struct mem_cgroup *memcg = get_mem_cgroup_from_objcg(objcg);
+		struct lruvec *lruvec;
+		unsigned long flags;
+		int nr_pages = 1 << order;
+
+		lruvec = mem_cgroup_lruvec(memcg, page_pgdat(page));
+		local_irq_save(flags);
+		__mod_lruvec_state(lruvec, to_reclaim ? NR_GPU_RECLAIM : NR_GPU_ACTIVE, nr_pages);
+		__mod_lruvec_state(lruvec, to_reclaim ? NR_GPU_ACTIVE : NR_GPU_RECLAIM, -nr_pages);
+		local_irq_restore(flags);
+		mem_cgroup_put(memcg);
+		return true;
+	} else {
+		mem_cgroup_uncharge_gpu_page(page, order, true);
+		return mem_cgroup_charge_gpu_page(new_objcg, page, order, 0, false);
+	}
+}
+EXPORT_SYMBOL_GPL(mem_cgroup_move_gpu_page_reclaim);
+
 static int __init cgroup_memory(char *s)
 {
 	char *token;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 08/16] ttm: add a memcg accounting flag to the alloc/populate APIs
  2025-10-16  2:31 drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v4) Dave Airlie
                   ` (6 preceding siblings ...)
  2025-10-16  2:31 ` [PATCH 07/16] memcg: add support for GPU page counters. (v3) Dave Airlie
@ 2025-10-16  2:31 ` Dave Airlie
  2025-10-16  2:31 ` [PATCH 09/16] ttm/pool: initialise the shrinker earlier Dave Airlie
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Dave Airlie @ 2025-10-16  2:31 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Dave Chinner, Waiman Long, simona

From: Dave Airlie <airlied@redhat.com>

This flag does nothing yet, but this just changes the APIs to accept
it in the future across all users.

This flag will eventually be filled out with when to account a tt
populate to a memcg.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c          |  3 ++-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c          |  5 +++--
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c     |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c       |  4 ++--
 drivers/gpu/drm/loongson/lsdc_ttm.c              |  3 ++-
 drivers/gpu/drm/nouveau/nouveau_bo.c             |  6 ++++--
 drivers/gpu/drm/radeon/radeon_ttm.c              |  3 ++-
 drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c |  2 +-
 drivers/gpu/drm/ttm/tests/ttm_pool_test.c        | 16 ++++++++--------
 drivers/gpu/drm/ttm/tests/ttm_tt_test.c          | 12 ++++++------
 drivers/gpu/drm/ttm/ttm_bo.c                     |  7 ++++---
 drivers/gpu/drm/ttm/ttm_bo_util.c                |  6 +++---
 drivers/gpu/drm/ttm/ttm_bo_vm.c                  |  4 +++-
 drivers/gpu/drm/ttm/ttm_pool.c                   |  6 ++++--
 drivers/gpu/drm/ttm/ttm_tt.c                     |  8 +++++---
 drivers/gpu/drm/vmwgfx/vmwgfx_blit.c             |  4 ++--
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c       |  7 ++++---
 drivers/gpu/drm/xe/xe_bo.c                       |  5 +++--
 include/drm/ttm/ttm_bo.h                         |  1 +
 include/drm/ttm/ttm_device.h                     |  1 +
 include/drm/ttm/ttm_pool.h                       |  1 +
 include/drm/ttm/ttm_tt.h                         |  1 +
 22 files changed, 63 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index aa9ee5dffa45..bcab4a83137b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1152,6 +1152,7 @@ static struct ttm_tt *amdgpu_ttm_tt_create(struct ttm_buffer_object *bo,
  */
 static int amdgpu_ttm_tt_populate(struct ttm_device *bdev,
 				  struct ttm_tt *ttm,
+				  bool memcg_account,
 				  struct ttm_operation_ctx *ctx)
 {
 	struct amdgpu_device *adev = amdgpu_ttm_adev(bdev);
@@ -1175,7 +1176,7 @@ static int amdgpu_ttm_tt_populate(struct ttm_device *bdev,
 		pool = &adev->mman.ttm_pools[gtt->pool_id];
 	else
 		pool = &adev->mman.bdev.pool;
-	ret = ttm_pool_alloc(pool, ttm, ctx);
+	ret = ttm_pool_alloc(pool, ttm, memcg_account, ctx);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 1f4814968868..6cdaf3696583 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -314,6 +314,7 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 
 static int i915_ttm_tt_populate(struct ttm_device *bdev,
 				struct ttm_tt *ttm,
+				bool memcg_account,
 				struct ttm_operation_ctx *ctx)
 {
 	struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
@@ -321,7 +322,7 @@ static int i915_ttm_tt_populate(struct ttm_device *bdev,
 	if (i915_tt->is_shmem)
 		return i915_ttm_tt_shmem_populate(bdev, ttm, ctx);
 
-	return ttm_pool_alloc(&bdev->pool, ttm, ctx);
+	return ttm_pool_alloc(&bdev->pool, ttm, memcg_account, ctx);
 }
 
 static void i915_ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm)
@@ -808,7 +809,7 @@ static int __i915_ttm_get_pages(struct drm_i915_gem_object *obj,
 	}
 
 	if (bo->ttm && !ttm_tt_is_populated(bo->ttm)) {
-		ret = ttm_bo_populate(bo, &ctx);
+		ret = ttm_bo_populate(bo, false, &ctx);
 		if (ret)
 			return ret;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 2f6b33edb9c9..4ab1eb3e42bc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -624,7 +624,7 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
 
 	/* Populate ttm with pages if needed. Typically system memory. */
 	if (ttm && (dst_man->use_tt || (ttm->page_flags & TTM_TT_FLAG_SWAPPED))) {
-		ret = ttm_bo_populate(bo, ctx);
+		ret = ttm_bo_populate(bo, false, ctx);
 		if (ret)
 			return ret;
 	}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
index 61596cecce4d..0b555979d786 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
@@ -90,7 +90,7 @@ static int i915_ttm_backup(struct i915_gem_apply_to_region *apply,
 		goto out_no_lock;
 
 	backup_bo = i915_gem_to_ttm(backup);
-	err = ttm_bo_populate(backup_bo, &ctx);
+	err = ttm_bo_populate(backup_bo, false, &ctx);
 	if (err)
 		goto out_no_populate;
 
@@ -189,7 +189,7 @@ static int i915_ttm_restore(struct i915_gem_apply_to_region *apply,
 	if (!backup_bo->resource)
 		err = ttm_bo_validate(backup_bo, i915_ttm_sys_placement(), &ctx);
 	if (!err)
-		err = ttm_bo_populate(backup_bo, &ctx);
+		err = ttm_bo_populate(backup_bo, false, &ctx);
 	if (!err) {
 		err = i915_gem_obj_copy_ttm(obj, backup, pm_apply->allow_gpu,
 					    false);
diff --git a/drivers/gpu/drm/loongson/lsdc_ttm.c b/drivers/gpu/drm/loongson/lsdc_ttm.c
index 2e42c6970c9f..6d8781506802 100644
--- a/drivers/gpu/drm/loongson/lsdc_ttm.c
+++ b/drivers/gpu/drm/loongson/lsdc_ttm.c
@@ -110,6 +110,7 @@ lsdc_ttm_tt_create(struct ttm_buffer_object *tbo, uint32_t page_flags)
 
 static int lsdc_ttm_tt_populate(struct ttm_device *bdev,
 				struct ttm_tt *ttm,
+				bool memcg_account,
 				struct ttm_operation_ctx *ctx)
 {
 	bool slave = !!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL);
@@ -122,7 +123,7 @@ static int lsdc_ttm_tt_populate(struct ttm_device *bdev,
 		return 0;
 	}
 
-	return ttm_pool_alloc(&bdev->pool, ttm, ctx);
+	return ttm_pool_alloc(&bdev->pool, ttm, memcg_account, ctx);
 }
 
 static void lsdc_ttm_tt_unpopulate(struct ttm_device *bdev,
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index f26562eafffc..7427dd049b39 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -1417,7 +1417,9 @@ vm_fault_t nouveau_ttm_fault_reserve_notify(struct ttm_buffer_object *bo)
 
 static int
 nouveau_ttm_tt_populate(struct ttm_device *bdev,
-			struct ttm_tt *ttm, struct ttm_operation_ctx *ctx)
+			struct ttm_tt *ttm,
+			bool memcg_account,
+			struct ttm_operation_ctx *ctx)
 {
 	struct ttm_tt *ttm_dma = (void *)ttm;
 	struct nouveau_drm *drm;
@@ -1434,7 +1436,7 @@ nouveau_ttm_tt_populate(struct ttm_device *bdev,
 
 	drm = nouveau_bdev(bdev);
 
-	return ttm_pool_alloc(&drm->ttm.bdev.pool, ttm, ctx);
+	return ttm_pool_alloc(&drm->ttm.bdev.pool, ttm, memcg_account, ctx);
 }
 
 static void
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index 616d25c8c2de..8c4273239d16 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -526,6 +526,7 @@ static struct radeon_ttm_tt *radeon_ttm_tt_to_gtt(struct radeon_device *rdev,
 
 static int radeon_ttm_tt_populate(struct ttm_device *bdev,
 				  struct ttm_tt *ttm,
+				  bool memcg_account,
 				  struct ttm_operation_ctx *ctx)
 {
 	struct radeon_device *rdev = radeon_get_rdev(bdev);
@@ -547,7 +548,7 @@ static int radeon_ttm_tt_populate(struct ttm_device *bdev,
 		return 0;
 	}
 
-	return ttm_pool_alloc(&rdev->mman.bdev.pool, ttm, ctx);
+	return ttm_pool_alloc(&rdev->mman.bdev.pool, ttm, memcg_account, ctx);
 }
 
 static void radeon_ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm)
diff --git a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c
index 1bcc67977f48..9869586ee57e 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c
+++ b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c
@@ -538,7 +538,7 @@ static void ttm_bo_validate_no_placement_signaled(struct kunit *test)
 
 	if (params->with_ttm) {
 		old_tt = priv->ttm_dev->funcs->ttm_tt_create(bo, 0);
-		ttm_pool_alloc(&priv->ttm_dev->pool, old_tt, &ctx);
+		ttm_pool_alloc(&priv->ttm_dev->pool, old_tt, false, &ctx);
 		bo->ttm = old_tt;
 	}
 
diff --git a/drivers/gpu/drm/ttm/tests/ttm_pool_test.c b/drivers/gpu/drm/ttm/tests/ttm_pool_test.c
index 39234a3e98c4..aaf152c2383d 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_pool_test.c
+++ b/drivers/gpu/drm/ttm/tests/ttm_pool_test.c
@@ -88,7 +88,7 @@ static struct ttm_pool *ttm_pool_pre_populated(struct kunit *test,
 
 	ttm_pool_init(pool, devs->dev, NUMA_NO_NODE, true, false);
 
-	err = ttm_pool_alloc(pool, tt, &simple_ctx);
+	err = ttm_pool_alloc(pool, tt, false, &simple_ctx);
 	KUNIT_ASSERT_EQ(test, err, 0);
 
 	ttm_pool_free(pool, tt);
@@ -157,7 +157,7 @@ static void ttm_pool_alloc_basic(struct kunit *test)
 	KUNIT_ASSERT_EQ(test, pool->nid, NUMA_NO_NODE);
 	KUNIT_ASSERT_EQ(test, pool->use_dma_alloc, params->use_dma_alloc);
 
-	err = ttm_pool_alloc(pool, tt, &simple_ctx);
+	err = ttm_pool_alloc(pool, tt, false, &simple_ctx);
 	KUNIT_ASSERT_EQ(test, err, 0);
 	KUNIT_ASSERT_EQ(test, tt->num_pages, expected_num_pages);
 
@@ -220,7 +220,7 @@ static void ttm_pool_alloc_basic_dma_addr(struct kunit *test)
 
 	ttm_pool_init(pool, devs->dev, NUMA_NO_NODE, true, false);
 
-	err = ttm_pool_alloc(pool, tt, &simple_ctx);
+	err = ttm_pool_alloc(pool, tt, false, &simple_ctx);
 	KUNIT_ASSERT_EQ(test, err, 0);
 	KUNIT_ASSERT_EQ(test, tt->num_pages, expected_num_pages);
 
@@ -253,7 +253,7 @@ static void ttm_pool_alloc_order_caching_match(struct kunit *test)
 	tt = ttm_tt_kunit_init(test, 0, caching, size);
 	KUNIT_ASSERT_NOT_NULL(test, tt);
 
-	err = ttm_pool_alloc(pool, tt, &simple_ctx);
+	err = ttm_pool_alloc(pool, tt, false, &simple_ctx);
 	KUNIT_ASSERT_EQ(test, err, 0);
 
 	KUNIT_ASSERT_TRUE(test, !list_lru_count(&pt->pages));
@@ -285,7 +285,7 @@ static void ttm_pool_alloc_caching_mismatch(struct kunit *test)
 	KUNIT_ASSERT_FALSE(test, !list_lru_count(&pt_pool->pages));
 	KUNIT_ASSERT_TRUE(test, !list_lru_count(&pt_tt->pages));
 
-	err = ttm_pool_alloc(pool, tt, &simple_ctx);
+	err = ttm_pool_alloc(pool, tt, false, &simple_ctx);
 	KUNIT_ASSERT_EQ(test, err, 0);
 
 	ttm_pool_free(pool, tt);
@@ -319,7 +319,7 @@ static void ttm_pool_alloc_order_mismatch(struct kunit *test)
 	KUNIT_ASSERT_FALSE(test, !list_lru_count(&pt_pool->pages));
 	KUNIT_ASSERT_TRUE(test, !list_lru_count(&pt_tt->pages));
 
-	err = ttm_pool_alloc(pool, tt, &simple_ctx);
+	err = ttm_pool_alloc(pool, tt, false, &simple_ctx);
 	KUNIT_ASSERT_EQ(test, err, 0);
 
 	ttm_pool_free(pool, tt);
@@ -349,7 +349,7 @@ static void ttm_pool_free_dma_alloc(struct kunit *test)
 	KUNIT_ASSERT_NOT_NULL(test, pool);
 
 	ttm_pool_init(pool, devs->dev, NUMA_NO_NODE, true, false);
-	ttm_pool_alloc(pool, tt, &simple_ctx);
+	ttm_pool_alloc(pool, tt, false, &simple_ctx);
 
 	pt = &pool->caching[caching].orders[order];
 	KUNIT_ASSERT_TRUE(test, !list_lru_count(&pt->pages));
@@ -380,7 +380,7 @@ static void ttm_pool_free_no_dma_alloc(struct kunit *test)
 	KUNIT_ASSERT_NOT_NULL(test, pool);
 
 	ttm_pool_init(pool, devs->dev, NUMA_NO_NODE, false, false);
-	ttm_pool_alloc(pool, tt, &simple_ctx);
+	ttm_pool_alloc(pool, tt, false, &simple_ctx);
 
 	pt = &pool->caching[caching].orders[order];
 	KUNIT_ASSERT_TRUE(test, list_lru_count(&pt->pages) == 1);
diff --git a/drivers/gpu/drm/ttm/tests/ttm_tt_test.c b/drivers/gpu/drm/ttm/tests/ttm_tt_test.c
index 61ec6f580b62..333c503e218b 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_tt_test.c
+++ b/drivers/gpu/drm/ttm/tests/ttm_tt_test.c
@@ -262,7 +262,7 @@ static void ttm_tt_populate_null_ttm(struct kunit *test)
 	struct ttm_operation_ctx ctx = { };
 	int err;
 
-	err = ttm_tt_populate(devs->ttm_dev, NULL, &ctx);
+	err = ttm_tt_populate(devs->ttm_dev, NULL, false, &ctx);
 	KUNIT_ASSERT_EQ(test, err, -EINVAL);
 }
 
@@ -283,11 +283,11 @@ static void ttm_tt_populate_populated_ttm(struct kunit *test)
 	err = ttm_tt_init(tt, bo, 0, ttm_cached, 0);
 	KUNIT_ASSERT_EQ(test, err, 0);
 
-	err = ttm_tt_populate(devs->ttm_dev, tt, &ctx);
+	err = ttm_tt_populate(devs->ttm_dev, tt, false, &ctx);
 	KUNIT_ASSERT_EQ(test, err, 0);
 	populated_page = *tt->pages;
 
-	err = ttm_tt_populate(devs->ttm_dev, tt, &ctx);
+	err = ttm_tt_populate(devs->ttm_dev, tt, false, &ctx);
 	KUNIT_ASSERT_PTR_EQ(test, populated_page, *tt->pages);
 }
 
@@ -307,7 +307,7 @@ static void ttm_tt_unpopulate_basic(struct kunit *test)
 	err = ttm_tt_init(tt, bo, 0, ttm_cached, 0);
 	KUNIT_ASSERT_EQ(test, err, 0);
 
-	err = ttm_tt_populate(devs->ttm_dev, tt, &ctx);
+	err = ttm_tt_populate(devs->ttm_dev, tt, false, &ctx);
 	KUNIT_ASSERT_EQ(test, err, 0);
 	KUNIT_ASSERT_TRUE(test, ttm_tt_is_populated(tt));
 
@@ -351,7 +351,7 @@ static void ttm_tt_swapin_basic(struct kunit *test)
 	err = ttm_tt_init(tt, bo, 0, ttm_cached, 0);
 	KUNIT_ASSERT_EQ(test, err, 0);
 
-	err = ttm_tt_populate(devs->ttm_dev, tt, &ctx);
+	err = ttm_tt_populate(devs->ttm_dev, tt, false, &ctx);
 	KUNIT_ASSERT_EQ(test, err, 0);
 	KUNIT_ASSERT_TRUE(test, ttm_tt_is_populated(tt));
 
@@ -361,7 +361,7 @@ static void ttm_tt_swapin_basic(struct kunit *test)
 	KUNIT_ASSERT_TRUE(test, tt->page_flags & TTM_TT_FLAG_SWAPPED);
 
 	/* Swapout depopulates TT, allocate pages and then swap them in */
-	err = ttm_pool_alloc(&devs->ttm_dev->pool, tt, &ctx);
+	err = ttm_pool_alloc(&devs->ttm_dev->pool, tt, false, &ctx);
 	KUNIT_ASSERT_EQ(test, err, 0);
 
 	err = ttm_tt_swapin(tt);
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 29423ceeec5c..5d84af5e0d74 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -142,7 +142,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
 			goto out_err;
 
 		if (mem->mem_type != TTM_PL_SYSTEM) {
-			ret = ttm_bo_populate(bo, ctx);
+			ret = ttm_bo_populate(bo, false, ctx);
 			if (ret)
 				goto out_err;
 		}
@@ -1256,6 +1256,7 @@ void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
  * is set to true.
  */
 int ttm_bo_populate(struct ttm_buffer_object *bo,
+		    bool memcg_account,
 		    struct ttm_operation_ctx *ctx)
 {
 	struct ttm_tt *tt = bo->ttm;
@@ -1268,7 +1269,7 @@ int ttm_bo_populate(struct ttm_buffer_object *bo,
 		return 0;
 
 	swapped = ttm_tt_is_swapped(tt);
-	ret = ttm_tt_populate(bo->bdev, tt, ctx);
+	ret = ttm_tt_populate(bo->bdev, tt, memcg_account, ctx);
 	if (ret)
 		return ret;
 
@@ -1293,7 +1294,7 @@ int ttm_bo_setup_export(struct ttm_buffer_object *bo,
 	if (ret != 0)
 		return ret;
 
-	ret = ttm_bo_populate(bo, ctx);
+	ret = ttm_bo_populate(bo, false, ctx);
 	ttm_bo_unreserve(bo);
 	return ret;
 }
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index acbbca9d5c92..13a9e9bba968 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -167,7 +167,7 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo,
 	src_man = ttm_manager_type(bdev, src_mem->mem_type);
 	if (ttm && ((ttm->page_flags & TTM_TT_FLAG_SWAPPED) ||
 		    dst_man->use_tt)) {
-		ret = ttm_bo_populate(bo, ctx);
+		ret = ttm_bo_populate(bo, false, ctx);
 		if (ret)
 			return ret;
 	}
@@ -355,7 +355,7 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo,
 
 	BUG_ON(!ttm);
 
-	ret = ttm_bo_populate(bo, &ctx);
+	ret = ttm_bo_populate(bo, false, &ctx);
 	if (ret)
 		return ret;
 
@@ -538,7 +538,7 @@ int ttm_bo_vmap(struct ttm_buffer_object *bo, struct iosys_map *map)
 		pgprot_t prot;
 		void *vaddr;
 
-		ret = ttm_bo_populate(bo, &ctx);
+		ret = ttm_bo_populate(bo, false, &ctx);
 		if (ret)
 			return ret;
 
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index b47020fca199..c5ad447debe3 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -225,7 +225,9 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
 		};
 
 		ttm = bo->ttm;
-		err = ttm_bo_populate(bo, &ctx);
+		err = ttm_bo_populate(bo,
+				      false,
+				      &ctx);
 		if (err) {
 			if (err == -EINTR || err == -ERESTARTSYS ||
 			    err == -EAGAIN)
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index a6b055256150..b068b9715354 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -742,6 +742,7 @@ static unsigned int ttm_pool_alloc_find_order(unsigned int highest,
 }
 
 static int __ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
+			    bool memcg_account,
 			    const struct ttm_operation_ctx *ctx,
 			    struct ttm_pool_alloc_state *alloc,
 			    struct ttm_pool_tt_restore *restore)
@@ -852,6 +853,7 @@ static int __ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
  * Returns: 0 on successe, negative error code otherwise.
  */
 int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
+		   bool memcg_account,
 		   struct ttm_operation_ctx *ctx)
 {
 	struct ttm_pool_alloc_state alloc;
@@ -861,7 +863,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 
 	ttm_pool_alloc_state_init(tt, &alloc);
 
-	return __ttm_pool_alloc(pool, tt, ctx, &alloc, NULL);
+	return __ttm_pool_alloc(pool, tt, memcg_account, ctx, &alloc, NULL);
 }
 EXPORT_SYMBOL(ttm_pool_alloc);
 
@@ -914,7 +916,7 @@ int ttm_pool_restore_and_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 			return 0;
 	}
 
-	return __ttm_pool_alloc(pool, tt, ctx, &alloc, tt->restore);
+	return __ttm_pool_alloc(pool, tt, false, ctx, &alloc, tt->restore);
 }
 
 /**
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 506e257dfba8..8f38de3b2f1c 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -366,7 +366,9 @@ int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
 EXPORT_SYMBOL_FOR_TESTS_ONLY(ttm_tt_swapout);
 
 int ttm_tt_populate(struct ttm_device *bdev,
-		    struct ttm_tt *ttm, struct ttm_operation_ctx *ctx)
+		    struct ttm_tt *ttm,
+		    bool memcg_account,
+		    struct ttm_operation_ctx *ctx)
 {
 	int ret;
 
@@ -395,9 +397,9 @@ int ttm_tt_populate(struct ttm_device *bdev,
 	}
 
 	if (bdev->funcs->ttm_tt_populate)
-		ret = bdev->funcs->ttm_tt_populate(bdev, ttm, ctx);
+		ret = bdev->funcs->ttm_tt_populate(bdev, ttm, memcg_account, ctx);
 	else
-		ret = ttm_pool_alloc(&bdev->pool, ttm, ctx);
+		ret = ttm_pool_alloc(&bdev->pool, ttm, memcg_account, ctx);
 	if (ret)
 		goto error;
 
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c b/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
index fa5841fda659..a4d4ebf585fe 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
@@ -569,13 +569,13 @@ int vmw_bo_cpu_blit(struct vmw_bo *vmw_dst,
 		dma_resv_assert_held(src->base.resv);
 
 	if (!ttm_tt_is_populated(dst->ttm)) {
-		ret = dst->bdev->funcs->ttm_tt_populate(dst->bdev, dst->ttm, &ctx);
+		ret = dst->bdev->funcs->ttm_tt_populate(dst->bdev, dst->ttm, false, &ctx);
 		if (ret)
 			return ret;
 	}
 
 	if (!ttm_tt_is_populated(src->ttm)) {
-		ret = src->bdev->funcs->ttm_tt_populate(src->bdev, src->ttm, &ctx);
+		ret = src->bdev->funcs->ttm_tt_populate(src->bdev, src->ttm, false, &ctx);
 		if (ret)
 			return ret;
 	}
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
index 5553892d7c3e..2351dafc1c68 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
@@ -360,7 +360,8 @@ static void vmw_ttm_destroy(struct ttm_device *bdev, struct ttm_tt *ttm)
 
 
 static int vmw_ttm_populate(struct ttm_device *bdev,
-			    struct ttm_tt *ttm, struct ttm_operation_ctx *ctx)
+			    struct ttm_tt *ttm, bool memcg_account,
+			    struct ttm_operation_ctx *ctx)
 {
 	bool external = (ttm->page_flags & TTM_TT_FLAG_EXTERNAL) != 0;
 
@@ -372,7 +373,7 @@ static int vmw_ttm_populate(struct ttm_device *bdev,
 						       ttm->dma_address,
 						       ttm->num_pages);
 
-	return ttm_pool_alloc(&bdev->pool, ttm, ctx);
+	return ttm_pool_alloc(&bdev->pool, ttm, memcg_account, ctx);
 }
 
 static void vmw_ttm_unpopulate(struct ttm_device *bdev,
@@ -580,7 +581,7 @@ int vmw_bo_create_and_populate(struct vmw_private *dev_priv,
 	if (unlikely(ret != 0))
 		return ret;
 
-	ret = vmw_ttm_populate(vbo->tbo.bdev, vbo->tbo.ttm, &ctx);
+	ret = vmw_ttm_populate(vbo->tbo.bdev, vbo->tbo.ttm, false, &ctx);
 	if (likely(ret == 0)) {
 		struct vmw_ttm_tt *vmw_tt =
 			container_of(vbo->tbo.ttm, struct vmw_ttm_tt, dma_ttm);
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 4410e28dee54..8af0a5e5324d 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -522,6 +522,7 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
 }
 
 static int xe_ttm_tt_populate(struct ttm_device *ttm_dev, struct ttm_tt *tt,
+			      bool memcg_account,
 			      struct ttm_operation_ctx *ctx)
 {
 	struct xe_ttm_tt *xe_tt = container_of(tt, struct xe_ttm_tt, ttm);
@@ -539,7 +540,7 @@ static int xe_ttm_tt_populate(struct ttm_device *ttm_dev, struct ttm_tt *tt,
 		err = ttm_tt_restore(ttm_dev, tt, ctx);
 	} else {
 		ttm_tt_clear_backed_up(tt);
-		err = ttm_pool_alloc(&ttm_dev->pool, tt, ctx);
+		err = ttm_pool_alloc(&ttm_dev->pool, tt, memcg_account, ctx);
 	}
 	if (err)
 		return err;
@@ -1765,7 +1766,7 @@ static int xe_bo_fault_migrate(struct xe_bo *bo, struct ttm_operation_ctx *ctx,
 	if (ttm_manager_type(tbo->bdev, tbo->resource->mem_type)->use_tt) {
 		err = xe_bo_wait_usage_kernel(bo, ctx);
 		if (!err)
-			err = ttm_bo_populate(&bo->ttm, ctx);
+			err = ttm_bo_populate(&bo->ttm, false, ctx);
 	} else if (should_migrate_to_smem(bo)) {
 		xe_assert(xe_bo_device(bo), bo->flags & XE_BO_FLAG_SYSTEM);
 		err = xe_bo_migrate(bo, XE_PL_TT, ctx, exec);
diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
index e664a96540eb..60676f2c1077 100644
--- a/include/drm/ttm/ttm_bo.h
+++ b/include/drm/ttm/ttm_bo.h
@@ -465,6 +465,7 @@ pgprot_t ttm_io_prot(struct ttm_buffer_object *bo, struct ttm_resource *res,
 		     pgprot_t tmp);
 void ttm_bo_tt_destroy(struct ttm_buffer_object *bo);
 int ttm_bo_populate(struct ttm_buffer_object *bo,
+		    bool memcg_account,
 		    struct ttm_operation_ctx *ctx);
 int ttm_bo_setup_export(struct ttm_buffer_object *bo,
 			struct ttm_operation_ctx *ctx);
diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
index 592b5f802859..dcecb06b67b3 100644
--- a/include/drm/ttm/ttm_device.h
+++ b/include/drm/ttm/ttm_device.h
@@ -84,6 +84,7 @@ struct ttm_device_funcs {
 	 */
 	int (*ttm_tt_populate)(struct ttm_device *bdev,
 			       struct ttm_tt *ttm,
+			       bool memcg_account,
 			       struct ttm_operation_ctx *ctx);
 
 	/**
diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h
index 82124cb5c9e4..b946ee4569c9 100644
--- a/include/drm/ttm/ttm_pool.h
+++ b/include/drm/ttm/ttm_pool.h
@@ -80,6 +80,7 @@ struct ttm_pool {
 };
 
 int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
+		   bool memcg_account,
 		   struct ttm_operation_ctx *ctx);
 void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt);
 
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index 406437ad674b..15d4019685f6 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -250,6 +250,7 @@ int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
  * Calls the driver method to allocate pages for a ttm
  */
 int ttm_tt_populate(struct ttm_device *bdev, struct ttm_tt *ttm,
+		    bool memcg_account,
 		    struct ttm_operation_ctx *ctx);
 
 /**
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 09/16] ttm/pool: initialise the shrinker earlier
  2025-10-16  2:31 drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v4) Dave Airlie
                   ` (7 preceding siblings ...)
  2025-10-16  2:31 ` [PATCH 08/16] ttm: add a memcg accounting flag to the alloc/populate APIs Dave Airlie
@ 2025-10-16  2:31 ` Dave Airlie
  2025-10-16  2:31 ` [PATCH 10/16] ttm: add objcg pointer to bo and tt (v2) Dave Airlie
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Dave Airlie @ 2025-10-16  2:31 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Dave Chinner, Waiman Long, simona

From: Dave Airlie <airlied@redhat.com>

Later memcg enablement needs the shrinker initialised before the list lru,
Just move it for now.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index b068b9715354..c990d4084208 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -1381,6 +1381,17 @@ int ttm_pool_mgr_init(unsigned long num_pages)
 	spin_lock_init(&shrinker_lock);
 	INIT_LIST_HEAD(&shrinker_list);
 
+	mm_shrinker = shrinker_alloc(SHRINKER_NUMA_AWARE, "drm-ttm_pool");
+	if (!mm_shrinker)
+		return -ENOMEM;
+
+	mm_shrinker->count_objects = ttm_pool_shrinker_count;
+	mm_shrinker->scan_objects = ttm_pool_shrinker_scan;
+	mm_shrinker->batch = TTM_SHRINKER_BATCH;
+	mm_shrinker->seeks = 1;
+
+	shrinker_register(mm_shrinker);
+
 	for (i = 0; i < NR_PAGE_ORDERS; ++i) {
 		ttm_pool_type_init(&global_write_combined[i], NULL,
 				   ttm_write_combined, i);
@@ -1403,17 +1414,6 @@ int ttm_pool_mgr_init(unsigned long num_pages)
 #endif
 #endif
 
-	mm_shrinker = shrinker_alloc(SHRINKER_NUMA_AWARE, "drm-ttm_pool");
-	if (!mm_shrinker)
-		return -ENOMEM;
-
-	mm_shrinker->count_objects = ttm_pool_shrinker_count;
-	mm_shrinker->scan_objects = ttm_pool_shrinker_scan;
-	mm_shrinker->batch = TTM_SHRINKER_BATCH;
-	mm_shrinker->seeks = 1;
-
-	shrinker_register(mm_shrinker);
-
 	return 0;
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 10/16] ttm: add objcg pointer to bo and tt (v2)
  2025-10-16  2:31 drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v4) Dave Airlie
                   ` (8 preceding siblings ...)
  2025-10-16  2:31 ` [PATCH 09/16] ttm/pool: initialise the shrinker earlier Dave Airlie
@ 2025-10-16  2:31 ` Dave Airlie
  2025-10-16  2:31 ` [PATCH 11/16] ttm/pool: enable memcg tracking and shrinker. (v2) Dave Airlie
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Dave Airlie @ 2025-10-16  2:31 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Dave Chinner, Waiman Long, simona

From: Dave Airlie <airlied@redhat.com>

This just adds the obj cgroup pointer to the bo and tt structs,
and sets it between them.

Signed-off-by: Dave Airlie <airlied@redhat.com>

v2: add the put and a setter helper
---
 drivers/gpu/drm/ttm/ttm_bo.c |  2 ++
 drivers/gpu/drm/ttm/ttm_tt.c |  1 +
 include/drm/ttm/ttm_bo.h     | 20 ++++++++++++++++++++
 include/drm/ttm/ttm_tt.h     |  2 ++
 4 files changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 5d84af5e0d74..073a1840ed9d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -45,6 +45,7 @@
 #include <linux/atomic.h>
 #include <linux/cgroup_dmem.h>
 #include <linux/dma-resv.h>
+#include <linux/memcontrol.h>
 
 #include "ttm_module.h"
 #include "ttm_bo_internal.h"
@@ -314,6 +315,7 @@ static void ttm_bo_release(struct kref *kref)
 		dma_resv_unlock(bo->base.resv);
 	}
 
+	obj_cgroup_put(bo->objcg);
 	atomic_dec(&ttm_glob.bo_count);
 	bo->destroy(bo);
 }
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 8f38de3b2f1c..0c54d5e2bfdd 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -162,6 +162,7 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
 	ttm->caching = caching;
 	ttm->restore = NULL;
 	ttm->backup = NULL;
+	ttm->objcg = bo->objcg;
 }
 
 int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
index 60676f2c1077..154805627065 100644
--- a/include/drm/ttm/ttm_bo.h
+++ b/include/drm/ttm/ttm_bo.h
@@ -135,6 +135,12 @@ struct ttm_buffer_object {
 	 * reservation lock.
 	 */
 	struct sg_table *sg;
+
+	/**
+	 * @objcg: object cgroup to charge this to if it ends up using system memory.
+	 * NULL means don't charge.
+	 */
+	struct obj_cgroup *objcg;
 };
 
 #define TTM_BO_MAP_IOMEM_MASK 0x80
@@ -334,6 +340,20 @@ ttm_bo_move_to_lru_tail_unlocked(struct ttm_buffer_object *bo)
 	spin_unlock(&bo->bdev->lru_lock);
 }
 
+/**
+ * ttm_bo_set_cgroup - assign a cgroup to a buffer object.
+ * @bo: The bo to set the cgroup for
+ * @objcg: the cgroup to set.
+ *
+ * This transfers the cgroup reference to the bo. From this
+ * point on the cgroup reference is owned by the ttm bo.
+ */
+static inline void ttm_bo_set_cgroup(struct ttm_buffer_object *bo,
+				     struct obj_cgroup *objcg)
+{
+	bo->objcg = objcg;
+}
+
 static inline void ttm_bo_assign_mem(struct ttm_buffer_object *bo,
 				     struct ttm_resource *new_mem)
 {
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index 15d4019685f6..c13fea4c2915 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -126,6 +126,8 @@ struct ttm_tt {
 	enum ttm_caching caching;
 	/** @restore: Partial restoration from backup state. TTM private */
 	struct ttm_pool_tt_restore *restore;
+	/** @objcg: Object cgroup for this TT allocation */
+	struct obj_cgroup *objcg;
 };
 
 /**
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 11/16] ttm/pool: enable memcg tracking and shrinker. (v2)
  2025-10-16  2:31 drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v4) Dave Airlie
                   ` (9 preceding siblings ...)
  2025-10-16  2:31 ` [PATCH 10/16] ttm: add objcg pointer to bo and tt (v2) Dave Airlie
@ 2025-10-16  2:31 ` Dave Airlie
  2025-10-16  2:31 ` [PATCH 12/16] ttm: hook up memcg placement flags Dave Airlie
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Dave Airlie @ 2025-10-16  2:31 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Dave Chinner, Waiman Long, simona

From: Dave Airlie <airlied@redhat.com>

This enables all the backend code to use the list lru in memcg mode,
and set the shrinker to be memcg aware.

It adds the loop case for when pooled pages end up being reparented
to a higher memcg group, that newer memcg can search for them there
and take them back.

Signed-off-by: Dave Airlie <airlied@redhat.com>

---
v2: just use the proper stats.
---
 drivers/gpu/drm/ttm/ttm_pool.c | 127 ++++++++++++++++++++++++++-------
 mm/list_lru.c                  |   1 +
 2 files changed, 104 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index c990d4084208..1e6da2cc1f06 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -142,7 +142,9 @@ static int ttm_pool_nid(struct ttm_pool *pool) {
 }
 
 /* Allocate pages of size 1 << order with the given gfp_flags */
-static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
+static struct page *ttm_pool_alloc_page(struct ttm_pool *pool,
+					struct obj_cgroup *objcg,
+					gfp_t gfp_flags,
 					unsigned int order)
 {
 	unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
@@ -162,7 +164,10 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 		p = alloc_pages_node(pool->nid, gfp_flags, order);
 		if (p) {
 			p->private = order;
-			mod_lruvec_page_state(p, NR_GPU_ACTIVE, 1 << order);
+			if (!mem_cgroup_charge_gpu_page(objcg, p, order, gfp_flags, false)) {
+				__free_pages(p, order);
+				return NULL;
+			}
 		}
 		return p;
 	}
@@ -213,8 +218,7 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
 #endif
 
 	if (!pool || !pool->use_dma_alloc) {
-		mod_lruvec_page_state(p, reclaim ? NR_GPU_RECLAIM : NR_GPU_ACTIVE,
-				      -(1 << order));
+		mem_cgroup_uncharge_gpu_page(p, order, reclaim);
 		__free_pages(p, order);
 		return;
 	}
@@ -301,12 +305,11 @@ static void ttm_pool_type_give(struct ttm_pool_type *pt, struct page *p)
 
 	INIT_LIST_HEAD(&p->lru);
 	rcu_read_lock();
-	list_lru_add(&pt->pages, &p->lru, nid, NULL);
+	list_lru_add(&pt->pages, &p->lru, nid, page_memcg_check(p));
 	rcu_read_unlock();
 
-	atomic_long_add(num_pages, &allocated_pages[nid]);	
-	mod_lruvec_page_state(p, NR_GPU_ACTIVE, -num_pages);
-	mod_lruvec_page_state(p, NR_GPU_RECLAIM, num_pages);
+	atomic_long_add(num_pages, &allocated_pages[nid]);
+	mem_cgroup_move_gpu_page_reclaim(NULL, p, pt->order, true);
 }
 
 static enum lru_status take_one_from_lru(struct list_head *item,
@@ -321,20 +324,56 @@ static enum lru_status take_one_from_lru(struct list_head *item,
 	return LRU_REMOVED;
 }
 
-/* Take pages from a specific pool_type, return NULL when nothing available */
-static struct page *ttm_pool_type_take(struct ttm_pool_type *pt, int nid)
+static int pool_lru_get_page(struct ttm_pool_type *pt, int nid,
+			     struct page **page_out,
+			     struct obj_cgroup *objcg,
+			     struct mem_cgroup *memcg)
 {
 	int ret;
 	struct page *p = NULL;
 	unsigned long nr_to_walk = 1;
+	unsigned int num_pages = 1 << pt->order;
 
-	ret = list_lru_walk_node(&pt->pages, nid, take_one_from_lru, (void *)&p, &nr_to_walk);
+	ret = list_lru_walk_one(&pt->pages, nid, memcg, take_one_from_lru, (void *)&p, &nr_to_walk);
 	if (ret == 1 && p) {
-		atomic_long_sub(1 << pt->order, &allocated_pages[nid]);
-		mod_lruvec_page_state(p, NR_GPU_ACTIVE, (1 << pt->order));
-		mod_lruvec_page_state(p, NR_GPU_RECLAIM, -(1 << pt->order));
+		atomic_long_sub(num_pages, &allocated_pages[nid]);
+
+		if (!mem_cgroup_move_gpu_page_reclaim(objcg, p, pt->order, false)) {
+			__free_pages(p, pt->order);
+			p = NULL;
+		}
 	}
-	return p;
+	*page_out = p;
+	return ret;
+}
+
+/* Take pages from a specific pool_type, return NULL when nothing available */
+static struct page *ttm_pool_type_take(struct ttm_pool_type *pt, int nid,
+				       struct obj_cgroup *orig_objcg)
+{
+	struct page *page_out = NULL;
+	int ret;
+	struct mem_cgroup *orig_memcg = orig_objcg ? get_mem_cgroup_from_objcg(orig_objcg) : NULL;
+	struct mem_cgroup *memcg = orig_memcg;
+
+	/*
+	 * Attempt to get a page from the current memcg, but if it hasn't got any in it's level,
+	 * go up to the parent and check there. This helps the scenario where multiple apps get
+	 * started into their own cgroup from a common parent and want to reuse the pools.
+	 */
+	while (!page_out) {
+		ret = pool_lru_get_page(pt, nid, &page_out, orig_objcg, memcg);
+		if (ret == 1)
+			break;
+		if (!memcg)
+			break;
+		memcg = parent_mem_cgroup(memcg);
+		if (!memcg)
+			break;
+	}
+
+	mem_cgroup_put(orig_memcg);
+	return page_out;
 }
 
 /* Initialize and add a pool type to the global shrinker list */
@@ -344,7 +383,7 @@ static void ttm_pool_type_init(struct ttm_pool_type *pt, struct ttm_pool *pool,
 	pt->pool = pool;
 	pt->caching = caching;
 	pt->order = order;
-	list_lru_init(&pt->pages);
+	list_lru_init_memcg(&pt->pages, mm_shrinker);
 
 	spin_lock(&shrinker_lock);
 	list_add_tail(&pt->shrinker_list, &shrinker_list);
@@ -387,6 +426,30 @@ static void ttm_pool_type_fini(struct ttm_pool_type *pt)
 	ttm_pool_dispose_list(pt, &dispose);
 }
 
+static int ttm_pool_check_objcg(struct obj_cgroup *objcg)
+{
+#ifdef CONFIG_MEMCG
+	int r = 0;
+	struct mem_cgroup *memcg;
+	if (!objcg)
+		return 0;
+
+	memcg = get_mem_cgroup_from_objcg(objcg);
+	for (unsigned i = 0; i < NR_PAGE_ORDERS; i++) {
+		r = memcg_list_lru_alloc(memcg, &global_write_combined[i].pages, GFP_KERNEL);
+		if (r) {
+			break;
+		}
+		r = memcg_list_lru_alloc(memcg, &global_uncached[i].pages, GFP_KERNEL);
+		if (r) {
+			break;
+		}
+	}
+	mem_cgroup_put(memcg);
+#endif
+	return 0;
+}
+
 /* Return the pool_type to use for the given caching and order */
 static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool,
 						  enum ttm_caching caching,
@@ -416,7 +479,9 @@ static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool,
 }
 
 /* Free pages using the per-node shrinker list */
-static unsigned int ttm_pool_shrink(int nid, unsigned long num_to_free)
+static unsigned int ttm_pool_shrink(int nid,
+				    struct mem_cgroup *memcg,
+				    unsigned long num_to_free)
 {
 	LIST_HEAD(dispose);
 	struct ttm_pool_type *pt;
@@ -428,7 +493,11 @@ static unsigned int ttm_pool_shrink(int nid, unsigned long num_to_free)
 	list_move_tail(&pt->shrinker_list, &shrinker_list);
 	spin_unlock(&shrinker_lock);
 
-	num_pages = list_lru_walk_node(&pt->pages, nid, pool_move_to_dispose_list, &dispose, &num_to_free);
+	if (!memcg) {
+		num_pages = list_lru_walk_node(&pt->pages, nid, pool_move_to_dispose_list, &dispose, &num_to_free);
+	} else {
+		num_pages = list_lru_walk_one(&pt->pages, nid, memcg, pool_move_to_dispose_list, &dispose, &num_to_free);
+	}
 	num_pages *= 1 << pt->order;
 
 	ttm_pool_dispose_list(pt, &dispose);
@@ -593,6 +662,7 @@ static int ttm_pool_restore_commit(struct ttm_pool_tt_restore *restore,
 			 */
 			ttm_pool_split_for_swap(restore->pool, p);
 			copy_highpage(restore->alloced_page + i, p);
+			p->memcg_data = 0;
 			__free_pages(p, 0);
 		}
 
@@ -754,6 +824,7 @@ static int __ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 	bool allow_pools;
 	struct page *p;
 	int r;
+	struct obj_cgroup *objcg = memcg_account ? tt->objcg : NULL;
 
 	WARN_ON(!alloc->remaining_pages || ttm_tt_is_populated(tt));
 	WARN_ON(alloc->dma_addr && !pool->dev);
@@ -771,6 +842,9 @@ static int __ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 
 	page_caching = tt->caching;
 	allow_pools = true;
+
+	ttm_pool_check_objcg(objcg);
+
 	for (order = ttm_pool_alloc_find_order(MAX_PAGE_ORDER, alloc);
 	     alloc->remaining_pages;
 	     order = ttm_pool_alloc_find_order(order, alloc)) {
@@ -780,7 +854,7 @@ static int __ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 		p = NULL;
 		pt = ttm_pool_select_type(pool, page_caching, order);
 		if (pt && allow_pools)
-			p = ttm_pool_type_take(pt, ttm_pool_nid(pool));
+			p = ttm_pool_type_take(pt, ttm_pool_nid(pool), objcg);
 
 		/*
 		 * If that fails or previously failed, allocate from system.
@@ -791,7 +865,7 @@ static int __ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 		if (!p) {
 			page_caching = ttm_cached;
 			allow_pools = false;
-			p = ttm_pool_alloc_page(pool, gfp_flags, order);
+			p = ttm_pool_alloc_page(pool, objcg, gfp_flags, order);
 		}
 		/* If that fails, lower the order if possible and retry. */
 		if (!p) {
@@ -935,7 +1009,7 @@ void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt)
 
 	while (atomic_long_read(&allocated_pages[nid]) > pool_node_limit[nid]) {
 		unsigned long diff = pool_node_limit[nid] - atomic_long_read(&allocated_pages[nid]);
-		ttm_pool_shrink(nid, diff);
+		ttm_pool_shrink(nid, NULL, diff);
 	}
 }
 EXPORT_SYMBOL(ttm_pool_free);
@@ -1055,6 +1129,7 @@ long ttm_pool_backup(struct ttm_pool *pool, struct ttm_tt *tt,
 			if (flags->purge) {
 				shrunken += num_pages;
 				page->private = 0;
+				page->memcg_data = 0;
 				__free_pages(page, order);
 				memset(tt->pages + i, 0,
 				       num_pages * sizeof(*tt->pages));
@@ -1191,10 +1266,14 @@ static unsigned long ttm_pool_shrinker_scan(struct shrinker *shrink,
 					    struct shrink_control *sc)
 {
 	unsigned long num_freed = 0;
+	int num_pools;
+	spin_lock(&shrinker_lock);
+	num_pools = list_count_nodes(&shrinker_list);
+	spin_unlock(&shrinker_lock);
 
 	do
-		num_freed += ttm_pool_shrink(sc->nid, sc->nr_to_scan);
-	while (num_freed < sc->nr_to_scan &&
+		num_freed += ttm_pool_shrink(sc->nid, sc->memcg, sc->nr_to_scan);
+	while (num_pools-- >= 0 && num_freed < sc->nr_to_scan &&
 	       atomic_long_read(&allocated_pages[sc->nid]));
 
 	sc->nr_scanned = num_freed;
@@ -1381,7 +1460,7 @@ int ttm_pool_mgr_init(unsigned long num_pages)
 	spin_lock_init(&shrinker_lock);
 	INIT_LIST_HEAD(&shrinker_list);
 
-	mm_shrinker = shrinker_alloc(SHRINKER_NUMA_AWARE, "drm-ttm_pool");
+	mm_shrinker = shrinker_alloc(SHRINKER_MEMCG_AWARE | SHRINKER_NUMA_AWARE, "drm-ttm_pool");
 	if (!mm_shrinker)
 		return -ENOMEM;
 
diff --git a/mm/list_lru.c b/mm/list_lru.c
index 627589d75320..6a277f479dc3 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -562,6 +562,7 @@ int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
 
 	return xas_error(&xas);
 }
+EXPORT_SYMBOL_GPL(memcg_list_lru_alloc);
 #else
 static inline void memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
 {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 12/16] ttm: hook up memcg placement flags.
  2025-10-16  2:31 drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v4) Dave Airlie
                   ` (10 preceding siblings ...)
  2025-10-16  2:31 ` [PATCH 11/16] ttm/pool: enable memcg tracking and shrinker. (v2) Dave Airlie
@ 2025-10-16  2:31 ` Dave Airlie
  2025-10-16  2:31 ` [PATCH 13/16] memcontrol: allow objcg api when memcg is config off Dave Airlie
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Dave Airlie @ 2025-10-16  2:31 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Dave Chinner, Waiman Long, simona

From: Dave Airlie <airlied@redhat.com>

This adds a placement flag that requests that any bo with this
placement flag set gets accounted for memcg if it's a system memory
allocation.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c      | 4 ++--
 drivers/gpu/drm/ttm/ttm_bo_util.c | 6 +++---
 drivers/gpu/drm/ttm/ttm_bo_vm.c   | 2 +-
 drivers/gpu/drm/xe/xe_bo.c        | 2 +-
 include/drm/ttm/ttm_placement.h   | 3 +++
 5 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 073a1840ed9d..78c463c72817 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -143,7 +143,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
 			goto out_err;
 
 		if (mem->mem_type != TTM_PL_SYSTEM) {
-			ret = ttm_bo_populate(bo, false, ctx);
+			ret = ttm_bo_populate(bo, mem->placement & TTM_PL_FLAG_MEMCG, ctx);
 			if (ret)
 				goto out_err;
 		}
@@ -1296,7 +1296,7 @@ int ttm_bo_setup_export(struct ttm_buffer_object *bo,
 	if (ret != 0)
 		return ret;
 
-	ret = ttm_bo_populate(bo, false, ctx);
+	ret = ttm_bo_populate(bo, bo->resource->placement & TTM_PL_FLAG_MEMCG, ctx);
 	ttm_bo_unreserve(bo);
 	return ret;
 }
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 13a9e9bba968..dc43804658b4 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -167,7 +167,7 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo,
 	src_man = ttm_manager_type(bdev, src_mem->mem_type);
 	if (ttm && ((ttm->page_flags & TTM_TT_FLAG_SWAPPED) ||
 		    dst_man->use_tt)) {
-		ret = ttm_bo_populate(bo, false, ctx);
+		ret = ttm_bo_populate(bo, dst_mem->placement & TTM_PL_FLAG_MEMCG, ctx);
 		if (ret)
 			return ret;
 	}
@@ -355,7 +355,7 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo,
 
 	BUG_ON(!ttm);
 
-	ret = ttm_bo_populate(bo, false, &ctx);
+	ret = ttm_bo_populate(bo, mem->placement & TTM_PL_FLAG_MEMCG, &ctx);
 	if (ret)
 		return ret;
 
@@ -538,7 +538,7 @@ int ttm_bo_vmap(struct ttm_buffer_object *bo, struct iosys_map *map)
 		pgprot_t prot;
 		void *vaddr;
 
-		ret = ttm_bo_populate(bo, false, &ctx);
+		ret = ttm_bo_populate(bo, mem->placement & TTM_PL_FLAG_MEMCG, &ctx);
 		if (ret)
 			return ret;
 
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index c5ad447debe3..dddc904f8727 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -226,7 +226,7 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
 
 		ttm = bo->ttm;
 		err = ttm_bo_populate(bo,
-				      false,
+				      bo->resource->placement & TTM_PL_FLAG_MEMCG,
 				      &ctx);
 		if (err) {
 			if (err == -EINTR || err == -ERESTARTSYS ||
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 8af0a5e5324d..95e607842474 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1766,7 +1766,7 @@ static int xe_bo_fault_migrate(struct xe_bo *bo, struct ttm_operation_ctx *ctx,
 	if (ttm_manager_type(tbo->bdev, tbo->resource->mem_type)->use_tt) {
 		err = xe_bo_wait_usage_kernel(bo, ctx);
 		if (!err)
-			err = ttm_bo_populate(&bo->ttm, false, ctx);
+			err = ttm_bo_populate(&bo->ttm, tbo->resource->placement & TTM_PL_FLAG_MEMCG, ctx);
 	} else if (should_migrate_to_smem(bo)) {
 		xe_assert(xe_bo_device(bo), bo->flags & XE_BO_FLAG_SYSTEM);
 		err = xe_bo_migrate(bo, XE_PL_TT, ctx, exec);
diff --git a/include/drm/ttm/ttm_placement.h b/include/drm/ttm/ttm_placement.h
index b510a4812609..4e9f07d70483 100644
--- a/include/drm/ttm/ttm_placement.h
+++ b/include/drm/ttm/ttm_placement.h
@@ -70,6 +70,9 @@
 /* Placement is only used during eviction */
 #define TTM_PL_FLAG_FALLBACK	(1 << 4)
 
+/* Placement should account mem cgroup */
+#define TTM_PL_FLAG_MEMCG	(1 << 5)
+
 /**
  * struct ttm_place
  *
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 13/16] memcontrol: allow objcg api when memcg is config off.
  2025-10-16  2:31 drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v4) Dave Airlie
                   ` (11 preceding siblings ...)
  2025-10-16  2:31 ` [PATCH 12/16] ttm: hook up memcg placement flags Dave Airlie
@ 2025-10-16  2:31 ` Dave Airlie
  2025-10-16  2:31 ` [PATCH 14/16] amdgpu: add support for memory cgroups Dave Airlie
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Dave Airlie @ 2025-10-16  2:31 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Dave Chinner, Waiman Long, simona

From: Dave Airlie <airlied@redhat.com>

amdgpu wants to use the objcg api and not have to enable ifdef
around it, so just add a dummy function for the config off path.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 include/linux/memcontrol.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 62c46c33f84f..8401b272495e 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1828,6 +1828,11 @@ static inline void __memcg_kmem_uncharge_page(struct page *page, int order)
 {
 }
 
+static inline struct obj_cgroup *get_obj_cgroup_from_current(void)
+{
+	return NULL;
+}
+
 static inline struct obj_cgroup *get_obj_cgroup_from_folio(struct folio *folio)
 {
 	return NULL;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 14/16] amdgpu: add support for memory cgroups
  2025-10-16  2:31 drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v4) Dave Airlie
                   ` (12 preceding siblings ...)
  2025-10-16  2:31 ` [PATCH 13/16] memcontrol: allow objcg api when memcg is config off Dave Airlie
@ 2025-10-16  2:31 ` Dave Airlie
  2025-10-16  2:31 ` [PATCH 15/16] ttm: add support for a module option to disable memcg integration Dave Airlie
  2025-10-16  2:31 ` [PATCH 16/16] xe: create a flag to enable memcg accounting for XE as well Dave Airlie
  15 siblings, 0 replies; 18+ messages in thread
From: Dave Airlie @ 2025-10-16  2:31 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Dave Chinner, Waiman Long, simona

From: Dave Airlie <airlied@redhat.com>

This adds support for adding a obj cgroup to a buffer object,
and passing in the placement flags to make sure it's accounted
properly.

Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 13 +++++++++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    |  2 ++
 mm/memcontrol.c                            |  1 +
 5 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index b7ebae289bea..85ff70a399bc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -225,6 +225,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size,
 	bp.domain = initial_domain;
 	bp.bo_ptr_size = sizeof(struct amdgpu_bo);
 	bp.xcp_id_plus1 = xcp_id_plus1;
+	bp.objcg = get_obj_cgroup_from_current();
 
 	r = amdgpu_bo_create_user(adev, &bp, &ubo);
 	if (r)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index e08f58de4b17..d6a6f7e17a2a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -167,7 +167,7 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain)
 		places[c].mem_type =
 			abo->flags & AMDGPU_GEM_CREATE_PREEMPTIBLE ?
 			AMDGPU_PL_PREEMPT : TTM_PL_TT;
-		places[c].flags = 0;
+		places[c].flags = TTM_PL_FLAG_MEMCG;
 		/*
 		 * When GTT is just an alternative to VRAM make sure that we
 		 * only use it as fallback and still try to fill up VRAM first.
@@ -182,7 +182,7 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain)
 		places[c].fpfn = 0;
 		places[c].lpfn = 0;
 		places[c].mem_type = TTM_PL_SYSTEM;
-		places[c].flags = 0;
+		places[c].flags = TTM_PL_FLAG_MEMCG;
 		c++;
 	}
 
@@ -662,16 +662,21 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
 		size = ALIGN(size, PAGE_SIZE);
 	}
 
-	if (!amdgpu_bo_validate_size(adev, size, bp->domain))
+	if (!amdgpu_bo_validate_size(adev, size, bp->domain)) {
+		obj_cgroup_put(bp->objcg);
 		return -ENOMEM;
+	}
 
 	BUG_ON(bp->bo_ptr_size < sizeof(struct amdgpu_bo));
 
 	*bo_ptr = NULL;
 	bo = kvzalloc(bp->bo_ptr_size, GFP_KERNEL);
-	if (bo == NULL)
+	if (bo == NULL) {
+		obj_cgroup_put(bp->objcg);
 		return -ENOMEM;
+	}
 	drm_gem_private_object_init(adev_to_drm(adev), &bo->tbo.base, size);
+	ttm_bo_set_cgroup(&bo->tbo, bp->objcg); /* hand the reference to the ttm bo */
 	bo->tbo.base.funcs = &amdgpu_gem_object_funcs;
 	bo->vm_bo = NULL;
 	bo->preferred_domains = bp->preferred_domain ? bp->preferred_domain :
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 656b8a931dae..b07a168a6665 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -55,6 +55,7 @@ struct amdgpu_bo_param {
 	enum ttm_bo_type		type;
 	bool				no_wait_gpu;
 	struct dma_resv			*resv;
+	struct obj_cgroup               *objcg;
 	void				(*destroy)(struct ttm_buffer_object *bo);
 	/* xcp partition number plus 1, 0 means any partition */
 	int8_t				xcp_id_plus1;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index bcab4a83137b..7642c17ebda7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -152,11 +152,13 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
 			amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_GTT |
 							AMDGPU_GEM_DOMAIN_CPU);
 		}
+		abo->placements[0].flags &= ~TTM_PL_FLAG_MEMCG;
 		break;
 	case TTM_PL_TT:
 	case AMDGPU_PL_PREEMPT:
 	default:
 		amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_CPU);
+		abo->placements[0].flags &= ~TTM_PL_FLAG_MEMCG;
 		break;
 	}
 	*placement = abo->placement;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ece340f3e391..b5782f6d21c2 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2721,6 +2721,7 @@ __always_inline struct obj_cgroup *current_obj_cgroup(void)
 
 	return objcg;
 }
+EXPORT_SYMBOL_GPL(current_obj_cgroup);
 
 struct obj_cgroup *get_obj_cgroup_from_folio(struct folio *folio)
 {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 15/16] ttm: add support for a module option to disable memcg integration
  2025-10-16  2:31 drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v4) Dave Airlie
                   ` (13 preceding siblings ...)
  2025-10-16  2:31 ` [PATCH 14/16] amdgpu: add support for memory cgroups Dave Airlie
@ 2025-10-16  2:31 ` Dave Airlie
  2025-10-16  2:31 ` [PATCH 16/16] xe: create a flag to enable memcg accounting for XE as well Dave Airlie
  15 siblings, 0 replies; 18+ messages in thread
From: Dave Airlie @ 2025-10-16  2:31 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Dave Chinner, Waiman Long, simona

From: Dave Airlie <airlied@redhat.com>

This adds a kconfig and a module option to turn off ttm memcg
integration completely.

When this is used, no object will ever end up using memcg aware
paths.

There is an existing workload that cgroup support might regress,
the systems are setup to allocate 1GB of uncached pages at system
startup to prime the pool, then any further users will take them
from the pool. The current cgroup code might handle that, but
it also may regress, so add an option to ttm to avoid using
memcg for the pool pages.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/Kconfig        |  7 +++++++
 drivers/gpu/drm/ttm/ttm_pool.c | 24 +++++++++++++++++++++---
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 7e6bc0b3a589..536789f8217e 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -239,6 +239,13 @@ config DRM_TTM_HELPER
 	help
 	  Helpers for ttm-based gem objects
 
+config DRM_TTM_MEMCG
+	bool "Enable TTM mem cgroup by default"
+	depends on DRM_TTM
+	depends on MEMCG
+	help
+	  Enable the memcg intergration by default
+
 config DRM_GEM_DMA_HELPER
 	tristate
 	depends on DRM
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 1e6da2cc1f06..009e7016bd4c 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -118,6 +118,24 @@ static unsigned long page_pool_size;
 MODULE_PARM_DESC(page_pool_size, "Number of pages in the WC/UC/DMA pool per NUMA node");
 module_param(page_pool_size, ulong, 0644);
 
+/*
+ * Don't use the memcg aware lru for pooled pages.
+ *
+ * There are use-cases where for example one application in a cgroup will preallocate 1GB
+ * of uncached pages, and immediately release them into the pool, for other consumers
+ * to use. This use-case could be handled with a proper cgroup hierarchy, but to allow
+ * that use case to continue to operate as-is, add a module option.
+ *
+ * This still stores the pages in the list_lru, it just doesn't use the memcg when
+ * adding/removing them.
+ */
+#define DEFAULT_TTM_MEMCG IS_ENABLED(CONFIG_DRM_TTM_MEMCG)
+static bool ttm_memcg = DEFAULT_TTM_MEMCG;
+
+MODULE_PARM_DESC(ttm_memcg, "Allow using cgroups with TTM "
+		 "[default=" __stringify(DEFAULT_TTM_MEMCG) "])");
+module_param(ttm_memcg, bool, 0444);
+
 static unsigned long pool_node_limit[MAX_NUMNODES];
 static atomic_long_t allocated_pages[MAX_NUMNODES];
 
@@ -305,7 +323,7 @@ static void ttm_pool_type_give(struct ttm_pool_type *pt, struct page *p)
 
 	INIT_LIST_HEAD(&p->lru);
 	rcu_read_lock();
-	list_lru_add(&pt->pages, &p->lru, nid, page_memcg_check(p));
+	list_lru_add(&pt->pages, &p->lru, nid, ttm_memcg ? page_memcg_check(p) : NULL);
 	rcu_read_unlock();
 
 	atomic_long_add(num_pages, &allocated_pages[nid]);
@@ -354,7 +372,7 @@ static struct page *ttm_pool_type_take(struct ttm_pool_type *pt, int nid,
 	struct page *page_out = NULL;
 	int ret;
 	struct mem_cgroup *orig_memcg = orig_objcg ? get_mem_cgroup_from_objcg(orig_objcg) : NULL;
-	struct mem_cgroup *memcg = orig_memcg;
+	struct mem_cgroup *memcg = ttm_memcg ? orig_memcg : NULL;
 
 	/*
 	 * Attempt to get a page from the current memcg, but if it hasn't got any in it's level,
@@ -824,7 +842,7 @@ static int __ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 	bool allow_pools;
 	struct page *p;
 	int r;
-	struct obj_cgroup *objcg = memcg_account ? tt->objcg : NULL;
+	struct obj_cgroup *objcg = (ttm_memcg && memcg_account) ? tt->objcg : NULL;
 
 	WARN_ON(!alloc->remaining_pages || ttm_tt_is_populated(tt));
 	WARN_ON(alloc->dma_addr && !pool->dev);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 16/16] xe: create a flag to enable memcg accounting for XE as well.
  2025-10-16  2:31 drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v4) Dave Airlie
                   ` (14 preceding siblings ...)
  2025-10-16  2:31 ` [PATCH 15/16] ttm: add support for a module option to disable memcg integration Dave Airlie
@ 2025-10-16  2:31 ` Dave Airlie
  15 siblings, 0 replies; 18+ messages in thread
From: Dave Airlie @ 2025-10-16  2:31 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Dave Chinner, Waiman Long, simona

From: Maarten Lankhorst <dev@lankhorst.se>

This adds support for memcg accounting to ttm object used by xe driver.

Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/xe/xe_bo.c  | 16 +++++++++++-----
 drivers/gpu/drm/xe/xe_bo.h  |  1 +
 drivers/gpu/drm/xe/xe_lrc.c |  3 ++-
 drivers/gpu/drm/xe/xe_oa.c  |  3 ++-
 drivers/gpu/drm/xe/xe_pt.c  |  3 ++-
 5 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 95e607842474..8a511077708d 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -54,6 +54,7 @@ static const struct ttm_place sys_placement_flags = {
 	.flags = 0,
 };
 
+/* TTM_PL_FLAG_MEMCG is not set, those placements are used for eviction */
 static struct ttm_placement sys_placement = {
 	.num_placement = 1,
 	.placement = &sys_placement_flags,
@@ -188,8 +189,8 @@ static void try_add_system(struct xe_device *xe, struct xe_bo *bo,
 
 		bo->placements[*c] = (struct ttm_place) {
 			.mem_type = XE_PL_TT,
-			.flags = (bo_flags & XE_BO_FLAG_VRAM_MASK) ?
-			TTM_PL_FLAG_FALLBACK : 0,
+			.flags = TTM_PL_FLAG_MEMCG | ((bo_flags & XE_BO_FLAG_VRAM_MASK) ?
+			TTM_PL_FLAG_FALLBACK : 0),
 		};
 		*c += 1;
 	}
@@ -1695,6 +1696,8 @@ static void xe_ttm_bo_destroy(struct ttm_buffer_object *ttm_bo)
 
 static void xe_gem_object_free(struct drm_gem_object *obj)
 {
+	struct xe_bo *bo = gem_to_xe_bo(obj);
+
 	/* Our BO reference counting scheme works as follows:
 	 *
 	 * The gem object kref is typically used throughout the driver,
@@ -1708,8 +1711,8 @@ static void xe_gem_object_free(struct drm_gem_object *obj)
 	 * driver ttm callbacks is allowed to use the ttm_buffer_object
 	 * refcount directly if needed.
 	 */
-	__xe_bo_vunmap(gem_to_xe_bo(obj));
-	ttm_bo_put(container_of(obj, struct ttm_buffer_object, base));
+	__xe_bo_vunmap(bo);
+	ttm_bo_put(&bo->ttm);
 }
 
 static void xe_gem_object_close(struct drm_gem_object *obj,
@@ -2176,6 +2179,9 @@ struct xe_bo *xe_bo_init_locked(struct xe_device *xe, struct xe_bo *bo,
 	placement = (type == ttm_bo_type_sg ||
 		     bo->flags & XE_BO_FLAG_DEFER_BACKING) ? &sys_placement :
 		&bo->placement;
+
+	if (bo->flags & XE_BO_FLAG_ACCOUNTED)
+		ttm_bo_set_cgroup(&bo->ttm, get_obj_cgroup_from_current());
 	err = ttm_bo_init_reserved(&xe->ttm, &bo->ttm, type,
 				   placement, alignment,
 				   &ctx, NULL, resv, xe_ttm_bo_destroy);
@@ -3149,7 +3155,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
 	if (XE_IOCTL_DBG(xe, args->size & ~PAGE_MASK))
 		return -EINVAL;
 
-	bo_flags = 0;
+	bo_flags = XE_BO_FLAG_ACCOUNTED;
 	if (args->flags & DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING)
 		bo_flags |= XE_BO_FLAG_DEFER_BACKING;
 
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index a77af42b5f9e..fc1e7d0ebf1c 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -49,6 +49,7 @@
 #define XE_BO_FLAG_GGTT2		BIT(22)
 #define XE_BO_FLAG_GGTT3		BIT(23)
 #define XE_BO_FLAG_CPU_ADDR_MIRROR	BIT(24)
+#define XE_BO_FLAG_ACCOUNTED		BIT(25)
 
 /* this one is trigger internally only */
 #define XE_BO_FLAG_INTERNAL_TEST	BIT(30)
diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index 47e9df775072..db7707256039 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -1413,7 +1413,8 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
 	bo_flags = XE_BO_FLAG_VRAM_IF_DGFX(tile) | XE_BO_FLAG_GGTT |
 		   XE_BO_FLAG_GGTT_INVALIDATE;
 	if (vm && vm->xef) /* userspace */
-		bo_flags |= XE_BO_FLAG_PINNED_LATE_RESTORE;
+		bo_flags |= XE_BO_FLAG_PINNED_LATE_RESTORE |
+			    XE_BO_FLAG_ACCOUNTED;
 
 	lrc->bo = xe_bo_create_pin_map_novm(xe, tile,
 					    bo_size,
diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
index a4894eb0d7f3..5b455397e1d8 100644
--- a/drivers/gpu/drm/xe/xe_oa.c
+++ b/drivers/gpu/drm/xe/xe_oa.c
@@ -885,7 +885,8 @@ static int xe_oa_alloc_oa_buffer(struct xe_oa_stream *stream, size_t size)
 
 	bo = xe_bo_create_pin_map_novm(stream->oa->xe, stream->gt->tile,
 				       size, ttm_bo_type_kernel,
-				       XE_BO_FLAG_SYSTEM | XE_BO_FLAG_GGTT, false);
+				       XE_BO_FLAG_SYSTEM | XE_BO_FLAG_GGTT |
+				       XE_BO_FLAG_ACCOUNTED, false);
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
 
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index a1c88f9a6c76..3fc11019ffa2 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -122,7 +122,8 @@ struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
 		   XE_BO_FLAG_IGNORE_MIN_PAGE_SIZE |
 		   XE_BO_FLAG_NO_RESV_EVICT | XE_BO_FLAG_PAGETABLE;
 	if (vm->xef) /* userspace */
-		bo_flags |= XE_BO_FLAG_PINNED_LATE_RESTORE;
+		bo_flags |= XE_BO_FLAG_PINNED_LATE_RESTORE |
+			    XE_BO_FLAG_ACCOUNTED;
 
 	pt->level = level;
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 01/16] mm: add gpu active/reclaim per-node stat counters (v2)
  2025-10-16  2:31 ` [PATCH 01/16] mm: add gpu active/reclaim per-node stat counters (v2) Dave Airlie
@ 2025-10-16  7:48   ` Christian König
  0 siblings, 0 replies; 18+ messages in thread
From: Christian König @ 2025-10-16  7:48 UTC (permalink / raw)
  To: Dave Airlie, dri-devel, tj, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Dave Chinner, Waiman Long, simona

On 16.10.25 04:31, Dave Airlie wrote:
> From: Dave Airlie <airlied@redhat.com>
> 
> While discussing memcg intergration with gpu memory allocations,
> it was pointed out that there was no numa/system counters for
> GPU memory allocations.
> 
> With more integrated memory GPU server systems turning up, and
> more requirements for memory tracking it seems we should start
> closing the gap.
> 
> Add two counters to track GPU per-node system memory allocations.
> 
> The first is currently allocated to GPU objects, and the second
> is for memory that is stored in GPU page pools that can be reclaimed,
> by the shrinker.
> 
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: linux-mm@kvack.org
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Acked-by: Zi Yan <ziy@nvidia.com>
> Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
> Acked-by: Andrew Morton <akpm@linux-foundation.org>
> Acked-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Dave Airlie <airlied@redhat.com>
> 
> ---
> 
> v2: add more info to the documentation on this memory.
> 
> I'd like to get acks to merge this via the drm tree, if possible,


Today is the last day before my vacation, so bear with me that I will only have time to look into it when I'm back.

Christian.

> 
> Dave.
> ---
>  Documentation/filesystems/proc.rst | 8 ++++++++
>  drivers/base/node.c                | 5 +++++
>  fs/proc/meminfo.c                  | 6 ++++++
>  include/linux/mmzone.h             | 2 ++
>  mm/show_mem.c                      | 8 ++++++--
>  mm/vmstat.c                        | 2 ++
>  6 files changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> index 0b86a8022fa1..76e358274692 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -1088,6 +1088,8 @@ Example output. You may not have all of these fields.
>      CmaFree:               0 kB
>      Unaccepted:            0 kB
>      Balloon:               0 kB
> +    GPUActive:             0 kB
> +    GPUReclaim:            0 kB
>      HugePages_Total:       0
>      HugePages_Free:        0
>      HugePages_Rsvd:        0
> @@ -1268,6 +1270,12 @@ Unaccepted
>                Memory that has not been accepted by the guest
>  Balloon
>                Memory returned to Host by VM Balloon Drivers
> +GPUActive
> +              System memory allocated to active GPU objects
> +GPUReclaim
> +              System memory stored in GPU pools for reuse. This memory is not
> +              counted in GPUActive. It is shrinker reclaimable memory kept in a reuse
> +              pool because it has non-standard page table attributes, like WC or UC.
>  HugePages_Total, HugePages_Free, HugePages_Rsvd, HugePages_Surp, Hugepagesize, Hugetlb
>                See Documentation/admin-guide/mm/hugetlbpage.rst.
>  DirectMap4k, DirectMap2M, DirectMap1G
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 83aeb0518e1d..c606b637f3f2 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -523,6 +523,8 @@ static ssize_t node_read_meminfo(struct device *dev,
>  #ifdef CONFIG_UNACCEPTED_MEMORY
>  			     "Node %d Unaccepted:     %8lu kB\n"
>  #endif
> +			     "Node %d GPUActive:      %8lu kB\n"
> +			     "Node %d GPUReclaim:     %8lu kB\n"
>  			     ,
>  			     nid, K(node_page_state(pgdat, NR_FILE_DIRTY)),
>  			     nid, K(node_page_state(pgdat, NR_WRITEBACK)),
> @@ -556,6 +558,9 @@ static ssize_t node_read_meminfo(struct device *dev,
>  			     ,
>  			     nid, K(sum_zone_node_page_state(nid, NR_UNACCEPTED))
>  #endif
> +			     ,
> +			     nid, K(node_page_state(pgdat, NR_GPU_ACTIVE)),
> +			     nid, K(node_page_state(pgdat, NR_GPU_RECLAIM))
>  			    );
>  	len += hugetlb_report_node_meminfo(buf, len, nid);
>  	return len;
> diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> index a458f1e112fd..65ba49ec3a63 100644
> --- a/fs/proc/meminfo.c
> +++ b/fs/proc/meminfo.c
> @@ -163,6 +163,12 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
>  	show_val_kb(m, "Balloon:        ",
>  		    global_node_page_state(NR_BALLOON_PAGES));
>  
> +	show_val_kb(m, "GPUActive:      ",
> +		    global_node_page_state(NR_GPU_ACTIVE));
> +
> +	show_val_kb(m, "GPUReclaim:     ",
> +		    global_node_page_state(NR_GPU_RECLAIM));
> +
>  	hugetlb_report_meminfo(m);
>  
>  	arch_report_meminfo(m);
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 7fb7331c5725..8455551b93f6 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -260,6 +260,8 @@ enum node_stat_item {
>  #endif
>  	NR_BALLOON_PAGES,
>  	NR_KERNEL_FILE_PAGES,
> +	NR_GPU_ACTIVE,          /* Pages assigned to GPU objects */
> +	NR_GPU_RECLAIM,         /* Pages in shrinkable GPU pools */
>  	NR_VM_NODE_STAT_ITEMS
>  };
>  
> diff --git a/mm/show_mem.c b/mm/show_mem.c
> index 3a4b5207635d..fb99465616cf 100644
> --- a/mm/show_mem.c
> +++ b/mm/show_mem.c
> @@ -254,7 +254,9 @@ static void show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_z
>  			" sec_pagetables:%lukB"
>  			" all_unreclaimable? %s"
>  			" Balloon:%lukB"
> -			"\n",
> +		        " gpu_active:%lukB"
> +		        " gpu_reclaim:%lukB"
> +		        "\n",
>  			pgdat->node_id,
>  			K(node_page_state(pgdat, NR_ACTIVE_ANON)),
>  			K(node_page_state(pgdat, NR_INACTIVE_ANON)),
> @@ -280,7 +282,9 @@ static void show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_z
>  			K(node_page_state(pgdat, NR_SECONDARY_PAGETABLE)),
>  			str_yes_no(atomic_read(&pgdat->kswapd_failures) >=
>  				   MAX_RECLAIM_RETRIES),
> -			K(node_page_state(pgdat, NR_BALLOON_PAGES)));
> +		        K(node_page_state(pgdat, NR_BALLOON_PAGES)),
> +		        K(node_page_state(pgdat, NR_GPU_ACTIVE)),
> +			K(node_page_state(pgdat, NR_GPU_RECLAIM)));
>  	}
>  
>  	for_each_populated_zone(zone) {
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index bb09c032eecf..b4df2b85739f 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1291,6 +1291,8 @@ const char * const vmstat_text[] = {
>  #endif
>  	[I(NR_BALLOON_PAGES)]			= "nr_balloon_pages",
>  	[I(NR_KERNEL_FILE_PAGES)]		= "nr_kernel_file_pages",
> +	[I(NR_GPU_ACTIVE)]			= "nr_gpu_active",
> +	[I(NR_GPU_RECLAIM)]			= "nr_gpu_reclaim",
>  #undef I
>  
>  	/* system-wide enum vm_stat_item counters */


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-10-16  7:48 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-16  2:31 drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v4) Dave Airlie
2025-10-16  2:31 ` [PATCH 01/16] mm: add gpu active/reclaim per-node stat counters (v2) Dave Airlie
2025-10-16  7:48   ` Christian König
2025-10-16  2:31 ` [PATCH 02/16] drm/ttm: use gpu mm stats to track gpu memory allocations. (v4) Dave Airlie
2025-10-16  2:31 ` [PATCH 03/16] ttm/pool: port to list_lru. (v2) Dave Airlie
2025-10-16  2:31 ` [PATCH 04/16] ttm/pool: drop numa specific pools Dave Airlie
2025-10-16  2:31 ` [PATCH 05/16] ttm/pool: make pool shrinker NUMA aware Dave Airlie
2025-10-16  2:31 ` [PATCH 06/16] ttm/pool: track allocated_pages per numa node Dave Airlie
2025-10-16  2:31 ` [PATCH 07/16] memcg: add support for GPU page counters. (v3) Dave Airlie
2025-10-16  2:31 ` [PATCH 08/16] ttm: add a memcg accounting flag to the alloc/populate APIs Dave Airlie
2025-10-16  2:31 ` [PATCH 09/16] ttm/pool: initialise the shrinker earlier Dave Airlie
2025-10-16  2:31 ` [PATCH 10/16] ttm: add objcg pointer to bo and tt (v2) Dave Airlie
2025-10-16  2:31 ` [PATCH 11/16] ttm/pool: enable memcg tracking and shrinker. (v2) Dave Airlie
2025-10-16  2:31 ` [PATCH 12/16] ttm: hook up memcg placement flags Dave Airlie
2025-10-16  2:31 ` [PATCH 13/16] memcontrol: allow objcg api when memcg is config off Dave Airlie
2025-10-16  2:31 ` [PATCH 14/16] amdgpu: add support for memory cgroups Dave Airlie
2025-10-16  2:31 ` [PATCH 15/16] ttm: add support for a module option to disable memcg integration Dave Airlie
2025-10-16  2:31 ` [PATCH 16/16] xe: create a flag to enable memcg accounting for XE as well Dave Airlie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).