drm/ttm: port ttm pools to NUMA aware lru

dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed

* drm/ttm: port ttm pools to NUMA aware lru_list
@ 2025-06-05  6:19 Dave Airlie
  2025-06-05  6:19 ` [PATCH 1/5] mm/list_lru: export list_lru_add Dave Airlie
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Dave Airlie @ 2025-06-05  6:19 UTC (permalink / raw)
  To: dri-devel; +Cc: Christian Koenig, Matthew Brost

(RH email ate this the first time).

This is a bit of a tangent before continuing the tangent that is memcg aware pools.

The pools code is already NUMA aware, but it rolls it's own numa awareness, and
in order to move towards memcg awareness, I think a first step is porting the pool
code to lru_list and making the current shrinker NUMA aware. Once this is done
then the next step should be to make the pools/shrinker memcg aware.

I've done light testing of this on a single node rx7900xt and a 4 node MI300A,
and it seems to operate the way I'd expect, but just wanted to get some feedback on
the idea and if anyone can spot any big problems with the strategy.

Thanks,
Dave.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/5] mm/list_lru: export list_lru_add.
  2025-06-05  6:19 drm/ttm: port ttm pools to NUMA aware lru_list Dave Airlie
@ 2025-06-05  6:19 ` Dave Airlie
  2025-06-05  6:19 ` [PATCH 2/5] ttm/pool: port to list_lru Dave Airlie
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Dave Airlie @ 2025-06-05  6:19 UTC (permalink / raw)
  To: dri-devel
  Cc: Christian Koenig, Matthew Brost, Dave Airlie, Kairui Song,
	Johannes Weiner, Shakeel Butt

From: Dave Airlie <airlied@redhat.com>

DRM/TTM wants to use this for it's page pool
LRU tracking.

This effective is a revert of
78c0ed09131b772f062b986a2fcca6600daa6285
Author: Kairui Song <kasong@tencent.com>
Date:   Tue Nov 5 01:52:53 2024 +0800

    mm/list_lru: don't export list_lru_add

Cc: Kairui Song <kasong@tencent.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 mm/list_lru.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/list_lru.c b/mm/list_lru.c
index 490473af3122..315362e3df3d 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -175,6 +175,7 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid,
 	unlock_list_lru(l, false);
 	return false;
 }
+EXPORT_SYMBOL_GPL(list_lru_add);
 
 bool list_lru_add_obj(struct list_lru *lru, struct list_head *item)
 {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/5] ttm/pool: port to list_lru.
  2025-06-05  6:19 drm/ttm: port ttm pools to NUMA aware lru_list Dave Airlie
  2025-06-05  6:19 ` [PATCH 1/5] mm/list_lru: export list_lru_add Dave Airlie
@ 2025-06-05  6:19 ` Dave Airlie
  2025-06-05 22:40   ` Dave Chinner
  2025-06-05  6:19 ` [PATCH 3/5] ttm/pool: drop numa specific pools Dave Airlie
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 8+ messages in thread
From: Dave Airlie @ 2025-06-05  6:19 UTC (permalink / raw)
  To: dri-devel
  Cc: Christian Koenig, Matthew Brost, Dave Airlie, Johannes Weiner,
	Dave Chinner

From: Dave Airlie <airlied@redhat.com>

This is an initial port of the TTM pools for
write combined and uncached pages to use the list_lru.

This makes the pool's more NUMA aware and avoids
needing separate NUMA pools (later commit enables this).

Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/ttm/tests/ttm_device_test.c |  2 +-
 drivers/gpu/drm/ttm/tests/ttm_pool_test.c   | 32 ++++----
 drivers/gpu/drm/ttm/ttm_pool.c              | 84 +++++++++++++++------
 include/drm/ttm/ttm_pool.h                  |  2 +-
 4 files changed, 81 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/ttm/tests/ttm_device_test.c b/drivers/gpu/drm/ttm/tests/ttm_device_test.c
index 1621903818e5..1f207fd222bc 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_device_test.c
+++ b/drivers/gpu/drm/ttm/tests/ttm_device_test.c
@@ -183,7 +183,7 @@ static void ttm_device_init_pools(struct kunit *test)
 
 				if (params->use_dma_alloc)
 					KUNIT_ASSERT_FALSE(test,
-							   list_empty(&pt.pages));
+							   !list_lru_count(&pt.pages));
 			}
 		}
 	}
diff --git a/drivers/gpu/drm/ttm/tests/ttm_pool_test.c b/drivers/gpu/drm/ttm/tests/ttm_pool_test.c
index 8ade53371f72..39234a3e98c4 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_pool_test.c
+++ b/drivers/gpu/drm/ttm/tests/ttm_pool_test.c
@@ -248,7 +248,7 @@ static void ttm_pool_alloc_order_caching_match(struct kunit *test)
 	pool = ttm_pool_pre_populated(test, size, caching);
 
 	pt = &pool->caching[caching].orders[order];
-	KUNIT_ASSERT_FALSE(test, list_empty(&pt->pages));
+	KUNIT_ASSERT_FALSE(test, !list_lru_count(&pt->pages));
 
 	tt = ttm_tt_kunit_init(test, 0, caching, size);
 	KUNIT_ASSERT_NOT_NULL(test, tt);
@@ -256,7 +256,7 @@ static void ttm_pool_alloc_order_caching_match(struct kunit *test)
 	err = ttm_pool_alloc(pool, tt, &simple_ctx);
 	KUNIT_ASSERT_EQ(test, err, 0);
 
-	KUNIT_ASSERT_TRUE(test, list_empty(&pt->pages));
+	KUNIT_ASSERT_TRUE(test, !list_lru_count(&pt->pages));
 
 	ttm_pool_free(pool, tt);
 	ttm_tt_fini(tt);
@@ -282,8 +282,8 @@ static void ttm_pool_alloc_caching_mismatch(struct kunit *test)
 	tt = ttm_tt_kunit_init(test, 0, tt_caching, size);
 	KUNIT_ASSERT_NOT_NULL(test, tt);
 
-	KUNIT_ASSERT_FALSE(test, list_empty(&pt_pool->pages));
-	KUNIT_ASSERT_TRUE(test, list_empty(&pt_tt->pages));
+	KUNIT_ASSERT_FALSE(test, !list_lru_count(&pt_pool->pages));
+	KUNIT_ASSERT_TRUE(test, !list_lru_count(&pt_tt->pages));
 
 	err = ttm_pool_alloc(pool, tt, &simple_ctx);
 	KUNIT_ASSERT_EQ(test, err, 0);
@@ -291,8 +291,8 @@ static void ttm_pool_alloc_caching_mismatch(struct kunit *test)
 	ttm_pool_free(pool, tt);
 	ttm_tt_fini(tt);
 
-	KUNIT_ASSERT_FALSE(test, list_empty(&pt_pool->pages));
-	KUNIT_ASSERT_FALSE(test, list_empty(&pt_tt->pages));
+	KUNIT_ASSERT_FALSE(test, !list_lru_count(&pt_pool->pages));
+	KUNIT_ASSERT_FALSE(test, !list_lru_count(&pt_tt->pages));
 
 	ttm_pool_fini(pool);
 }
@@ -316,8 +316,8 @@ static void ttm_pool_alloc_order_mismatch(struct kunit *test)
 	tt = ttm_tt_kunit_init(test, 0, caching, snd_size);
 	KUNIT_ASSERT_NOT_NULL(test, tt);
 
-	KUNIT_ASSERT_FALSE(test, list_empty(&pt_pool->pages));
-	KUNIT_ASSERT_TRUE(test, list_empty(&pt_tt->pages));
+	KUNIT_ASSERT_FALSE(test, !list_lru_count(&pt_pool->pages));
+	KUNIT_ASSERT_TRUE(test, !list_lru_count(&pt_tt->pages));
 
 	err = ttm_pool_alloc(pool, tt, &simple_ctx);
 	KUNIT_ASSERT_EQ(test, err, 0);
@@ -325,8 +325,8 @@ static void ttm_pool_alloc_order_mismatch(struct kunit *test)
 	ttm_pool_free(pool, tt);
 	ttm_tt_fini(tt);
 
-	KUNIT_ASSERT_FALSE(test, list_empty(&pt_pool->pages));
-	KUNIT_ASSERT_FALSE(test, list_empty(&pt_tt->pages));
+	KUNIT_ASSERT_FALSE(test, !list_lru_count(&pt_pool->pages));
+	KUNIT_ASSERT_FALSE(test, !list_lru_count(&pt_tt->pages));
 
 	ttm_pool_fini(pool);
 }
@@ -352,12 +352,12 @@ static void ttm_pool_free_dma_alloc(struct kunit *test)
 	ttm_pool_alloc(pool, tt, &simple_ctx);
 
 	pt = &pool->caching[caching].orders[order];
-	KUNIT_ASSERT_TRUE(test, list_empty(&pt->pages));
+	KUNIT_ASSERT_TRUE(test, !list_lru_count(&pt->pages));
 
 	ttm_pool_free(pool, tt);
 	ttm_tt_fini(tt);
 
-	KUNIT_ASSERT_FALSE(test, list_empty(&pt->pages));
+	KUNIT_ASSERT_FALSE(test, !list_lru_count(&pt->pages));
 
 	ttm_pool_fini(pool);
 }
@@ -383,12 +383,12 @@ static void ttm_pool_free_no_dma_alloc(struct kunit *test)
 	ttm_pool_alloc(pool, tt, &simple_ctx);
 
 	pt = &pool->caching[caching].orders[order];
-	KUNIT_ASSERT_TRUE(test, list_is_singular(&pt->pages));
+	KUNIT_ASSERT_TRUE(test, list_lru_count(&pt->pages) == 1);
 
 	ttm_pool_free(pool, tt);
 	ttm_tt_fini(tt);
 
-	KUNIT_ASSERT_TRUE(test, list_is_singular(&pt->pages));
+	KUNIT_ASSERT_TRUE(test, list_lru_count(&pt->pages) == 1);
 
 	ttm_pool_fini(pool);
 }
@@ -404,11 +404,11 @@ static void ttm_pool_fini_basic(struct kunit *test)
 	pool = ttm_pool_pre_populated(test, size, caching);
 	pt = &pool->caching[caching].orders[order];
 
-	KUNIT_ASSERT_FALSE(test, list_empty(&pt->pages));
+	KUNIT_ASSERT_FALSE(test, !list_lru_count(&pt->pages));
 
 	ttm_pool_fini(pool);
 
-	KUNIT_ASSERT_TRUE(test, list_empty(&pt->pages));
+	KUNIT_ASSERT_TRUE(test, !list_lru_count(&pt->pages));
 }
 
 static struct kunit_case ttm_pool_test_cases[] = {
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 99197aac09a1..785b141d18df 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -130,6 +130,16 @@ static struct list_head shrinker_list;
 static struct shrinker *mm_shrinker;
 static DECLARE_RWSEM(pool_shrink_rwsem);
 
+/* helper to get a current valid node id from a pool */
+static int ttm_pool_nid(struct ttm_pool *pool) {
+	int nid = NUMA_NO_NODE;
+	if (pool)
+		nid = pool->nid;
+	if (nid == NUMA_NO_NODE)
+		nid = numa_node_id();
+	return nid;
+}
+
 /* Allocate pages of size 1 << order with the given gfp_flags */
 static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 					unsigned int order)
@@ -272,7 +282,7 @@ static void ttm_pool_unmap(struct ttm_pool *pool, dma_addr_t dma_addr,
 }
 
 /* Give pages into a specific pool_type */
-static void ttm_pool_type_give(struct ttm_pool_type *pt, struct page *p)
+static void ttm_pool_type_give(struct ttm_pool_type *pt, int nid, struct page *p)
 {
 	unsigned int i, num_pages = 1 << pt->order;
 
@@ -284,25 +294,46 @@ static void ttm_pool_type_give(struct ttm_pool_type *pt, struct page *p)
 	}
 
 	spin_lock(&pt->lock);
-	list_add(&p->lru, &pt->pages);
+	INIT_LIST_HEAD(&p->lru);
+	rcu_read_lock();
+	list_lru_add(&pt->pages, &p->lru, nid, NULL);
+	rcu_read_unlock();
 	spin_unlock(&pt->lock);
 	atomic_long_add(1 << pt->order, &allocated_pages);
 }
 
+struct take_one_info {
+	struct ttm_pool_type *pt;
+	struct page *out;
+};
+
+static enum lru_status take_one_from_lru(struct list_head *item,
+					 struct list_lru_one *list,
+					 void *cb_arg)
+{
+	struct take_one_info *info = cb_arg;
+	struct ttm_pool_type *pt = info->pt;
+	struct page *p = container_of(item, struct page, lru);
+	list_lru_isolate(list, item);
+	atomic_long_sub(1 << pt->order, &allocated_pages);
+	info->out = p;
+	return LRU_REMOVED;
+}
+
 /* Take pages from a specific pool_type, return NULL when nothing available */
-static struct page *ttm_pool_type_take(struct ttm_pool_type *pt)
+static struct page *ttm_pool_type_take(struct ttm_pool_type *pt, int nid)
 {
-	struct page *p;
+	struct take_one_info info = {
+		.pt = pt,
+		.out = NULL,
+	};
+	unsigned long nr_to_walk = 1;
 
 	spin_lock(&pt->lock);
-	p = list_first_entry_or_null(&pt->pages, typeof(*p), lru);
-	if (p) {
-		atomic_long_sub(1 << pt->order, &allocated_pages);
-		list_del(&p->lru);
-	}
+	list_lru_walk_node(&pt->pages, nid, take_one_from_lru, (void *)&info, &nr_to_walk);
 	spin_unlock(&pt->lock);
 
-	return p;
+	return info.out;
 }
 
 /* Initialize and add a pool type to the global shrinker list */
@@ -313,24 +344,37 @@ static void ttm_pool_type_init(struct ttm_pool_type *pt, struct ttm_pool *pool,
 	pt->caching = caching;
 	pt->order = order;
 	spin_lock_init(&pt->lock);
-	INIT_LIST_HEAD(&pt->pages);
+	list_lru_init(&pt->pages);
 
 	spin_lock(&shrinker_lock);
 	list_add_tail(&pt->shrinker_list, &shrinker_list);
 	spin_unlock(&shrinker_lock);
 }
 
+static enum lru_status pool_free_page(struct list_head *item,
+					  struct list_lru_one *list,
+					  void *cb_arg)
+{
+	struct ttm_pool_type *pt = cb_arg;
+	struct page *p = container_of(item, struct page, lru);
+
+	list_lru_isolate(list, item);
+
+	atomic_long_sub(1 << pt->order, &allocated_pages);
+	ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
+	return LRU_REMOVED;
+}
+
 /* Remove a pool_type from the global shrinker list and free all pages */
 static void ttm_pool_type_fini(struct ttm_pool_type *pt)
 {
-	struct page *p;
-
 	spin_lock(&shrinker_lock);
 	list_del(&pt->shrinker_list);
 	spin_unlock(&shrinker_lock);
 
-	while ((p = ttm_pool_type_take(pt)))
-		ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
+	spin_lock(&pt->lock);
+	list_lru_walk(&pt->pages, pool_free_page, pt, LONG_MAX);
+	spin_unlock(&pt->lock);
 }
 
 /* Return the pool_type to use for the given caching and order */
@@ -380,7 +424,7 @@ static unsigned int ttm_pool_shrink(void)
 	list_move_tail(&pt->shrinker_list, &shrinker_list);
 	spin_unlock(&shrinker_lock);
 
-	p = ttm_pool_type_take(pt);
+	p = ttm_pool_type_take(pt, ttm_pool_nid(pt->pool));
 	if (p) {
 		ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
 		num_pages = 1 << pt->order;
@@ -472,7 +516,7 @@ static pgoff_t ttm_pool_unmap_and_free(struct ttm_pool *pool, struct page *page,
 	}
 
 	if (pt)
-		ttm_pool_type_give(pt, page);
+		ttm_pool_type_give(pt, page_to_nid(page), page);
 	else
 		ttm_pool_free_page(pool, caching, order, page);
 
@@ -734,7 +778,7 @@ static int __ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 		p = NULL;
 		pt = ttm_pool_select_type(pool, page_caching, order);
 		if (pt && allow_pools)
-			p = ttm_pool_type_take(pt);
+			p = ttm_pool_type_take(pt, ttm_pool_nid(pool));
 		/*
 		 * If that fails or previously failed, allocate from system.
 		 * Note that this also disallows additional pool allocations using
@@ -1164,12 +1208,10 @@ static unsigned long ttm_pool_shrinker_count(struct shrinker *shrink,
 static unsigned int ttm_pool_type_count(struct ttm_pool_type *pt)
 {
 	unsigned int count = 0;
-	struct page *p;
 
 	spin_lock(&pt->lock);
 	/* Only used for debugfs, the overhead doesn't matter */
-	list_for_each_entry(p, &pt->pages, lru)
-		++count;
+	count = list_lru_count(&pt->pages);
 	spin_unlock(&pt->lock);
 
 	return count;
diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h
index 54cd34a6e4c0..d1c574f2c58a 100644
--- a/include/drm/ttm/ttm_pool.h
+++ b/include/drm/ttm/ttm_pool.h
@@ -56,7 +56,7 @@ struct ttm_pool_type {
 	struct list_head shrinker_list;
 
 	spinlock_t lock;
-	struct list_head pages;
+	struct list_lru pages;
 };
 
 /**
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/5] ttm/pool: port to list_lru.
  2025-06-05  6:19 ` [PATCH 2/5] ttm/pool: port to list_lru Dave Airlie
@ 2025-06-05 22:40   ` Dave Chinner
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2025-06-05 22:40 UTC (permalink / raw)
  To: Dave Airlie
  Cc: dri-devel, Christian Koenig, Matthew Brost, Dave Airlie,
	Johannes Weiner

On Thu, Jun 05, 2025 at 04:19:22PM +1000, Dave Airlie wrote:
> From: Dave Airlie <airlied@redhat.com>
> 
> This is an initial port of the TTM pools for
> write combined and uncached pages to use the list_lru.
> 
> This makes the pool's more NUMA aware and avoids
> needing separate NUMA pools (later commit enables this).
> 
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Dave Airlie <airlied@redhat.com>

I can't review this in isolation. Please cc me on the entire patch
series and the cover letter.

-Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 3/5] ttm/pool: drop numa specific pools
  2025-06-05  6:19 drm/ttm: port ttm pools to NUMA aware lru_list Dave Airlie
  2025-06-05  6:19 ` [PATCH 1/5] mm/list_lru: export list_lru_add Dave Airlie
  2025-06-05  6:19 ` [PATCH 2/5] ttm/pool: port to list_lru Dave Airlie
@ 2025-06-05  6:19 ` Dave Airlie
  2025-06-05  6:19 ` [PATCH 4/5] ttm/pool: make pool shrinker NUMA aware Dave Airlie
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Dave Airlie @ 2025-06-05  6:19 UTC (permalink / raw)
  To: dri-devel; +Cc: Christian Koenig, Matthew Brost, Dave Airlie, Johannes Weiner

From: Dave Airlie <airlied@redhat.com>

The list_lru will now handle numa for us, so need to keep
separate pool types for it. Just consoldiate into the global ones.

This adds a debugfs change to avoid dumping non-existant orders due
to this change.

Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 785b141d18df..ad06f2f8fd2d 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -388,17 +388,11 @@ static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool,
 #ifdef CONFIG_X86
 	switch (caching) {
 	case ttm_write_combined:
-		if (pool->nid != NUMA_NO_NODE)
-			return &pool->caching[caching].orders[order];
-
 		if (pool->use_dma32)
 			return &global_dma32_write_combined[order];
 
 		return &global_write_combined[order];
 	case ttm_uncached:
-		if (pool->nid != NUMA_NO_NODE)
-			return &pool->caching[caching].orders[order];
-
 		if (pool->use_dma32)
 			return &global_dma32_uncached[order];
 
@@ -1291,6 +1285,8 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m)
 	for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) {
 		if (!ttm_pool_select_type(pool, i, 0))
 			continue;
+		if (pool->caching[i].orders[0].pool != pool)
+			continue;
 		if (pool->use_dma_alloc)
 			seq_puts(m, "DMA ");
 		else
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/5] ttm/pool: make pool shrinker NUMA aware
  2025-06-05  6:19 drm/ttm: port ttm pools to NUMA aware lru_list Dave Airlie
                   ` (2 preceding siblings ...)
  2025-06-05  6:19 ` [PATCH 3/5] ttm/pool: drop numa specific pools Dave Airlie
@ 2025-06-05  6:19 ` Dave Airlie
  2025-06-05  6:19 ` [PATCH 5/5] ttm/pool: track allocated_pages per numa node Dave Airlie
  2025-06-05  8:34 ` drm/ttm: port ttm pools to NUMA aware lru_list Christian König
  5 siblings, 0 replies; 8+ messages in thread
From: Dave Airlie @ 2025-06-05  6:19 UTC (permalink / raw)
  To: dri-devel; +Cc: Christian Koenig, Matthew Brost, Dave Airlie, Dave Chinner

From: Dave Airlie <airlied@redhat.com>

This enable NUMA awareness for the shrinker on the
ttm pools.

Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 35 +++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index ad06f2f8fd2d..902dd682afc0 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -405,12 +405,11 @@ static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool,
 	return NULL;
 }
 
-/* Free pages using the global shrinker list */
-static unsigned int ttm_pool_shrink(void)
+/* Free pages using the per-node shrinker list */
+static unsigned int ttm_pool_shrink(int nid, unsigned long num_to_free)
 {
 	struct ttm_pool_type *pt;
 	unsigned int num_pages;
-	struct page *p;
 
 	down_read(&pool_shrink_rwsem);
 	spin_lock(&shrinker_lock);
@@ -418,13 +417,8 @@ static unsigned int ttm_pool_shrink(void)
 	list_move_tail(&pt->shrinker_list, &shrinker_list);
 	spin_unlock(&shrinker_lock);
 
-	p = ttm_pool_type_take(pt, ttm_pool_nid(pt->pool));
-	if (p) {
-		ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
-		num_pages = 1 << pt->order;
-	} else {
-		num_pages = 0;
-	}
+	num_pages = list_lru_walk_node(&pt->pages, nid, pool_free_page, pt, &num_to_free);
+	num_pages *= 1 << pt->order;
 	up_read(&pool_shrink_rwsem);
 
 	return num_pages;
@@ -773,6 +767,7 @@ static int __ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 		pt = ttm_pool_select_type(pool, page_caching, order);
 		if (pt && allow_pools)
 			p = ttm_pool_type_take(pt, ttm_pool_nid(pool));
+
 		/*
 		 * If that fails or previously failed, allocate from system.
 		 * Note that this also disallows additional pool allocations using
@@ -921,8 +916,10 @@ void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt)
 {
 	ttm_pool_free_range(pool, tt, tt->caching, 0, tt->num_pages);
 
-	while (atomic_long_read(&allocated_pages) > page_pool_size)
-		ttm_pool_shrink();
+	while (atomic_long_read(&allocated_pages) > page_pool_size) {
+		unsigned long diff = page_pool_size - atomic_long_read(&allocated_pages);
+		ttm_pool_shrink(ttm_pool_nid(pool), diff);
+	}
 }
 EXPORT_SYMBOL(ttm_pool_free);
 
@@ -1179,7 +1176,7 @@ static unsigned long ttm_pool_shrinker_scan(struct shrinker *shrink,
 	unsigned long num_freed = 0;
 
 	do
-		num_freed += ttm_pool_shrink();
+		num_freed += ttm_pool_shrink(sc->nid, sc->nr_to_scan);
 	while (num_freed < sc->nr_to_scan &&
 	       atomic_long_read(&allocated_pages));
 
@@ -1319,11 +1316,15 @@ static int ttm_pool_debugfs_shrink_show(struct seq_file *m, void *data)
 		.nr_to_scan = TTM_SHRINKER_BATCH,
 	};
 	unsigned long count;
+	int nid;
 
 	fs_reclaim_acquire(GFP_KERNEL);
-	count = ttm_pool_shrinker_count(mm_shrinker, &sc);
-	seq_printf(m, "%lu/%lu\n", count,
-		   ttm_pool_shrinker_scan(mm_shrinker, &sc));
+	for_each_node(nid) {
+		sc.nid = nid;
+		count = ttm_pool_shrinker_count(mm_shrinker, &sc);
+		seq_printf(m, "%d: %lu/%lu\n", nid, count,
+			   ttm_pool_shrinker_scan(mm_shrinker, &sc));
+	}
 	fs_reclaim_release(GFP_KERNEL);
 
 	return 0;
@@ -1371,7 +1372,7 @@ int ttm_pool_mgr_init(unsigned long num_pages)
 #endif
 #endif
 
-	mm_shrinker = shrinker_alloc(0, "drm-ttm_pool");
+	mm_shrinker = shrinker_alloc(SHRINKER_NUMA_AWARE, "drm-ttm_pool");
 	if (!mm_shrinker)
 		return -ENOMEM;
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/5] ttm/pool: track allocated_pages per numa node.
  2025-06-05  6:19 drm/ttm: port ttm pools to NUMA aware lru_list Dave Airlie
                   ` (3 preceding siblings ...)
  2025-06-05  6:19 ` [PATCH 4/5] ttm/pool: make pool shrinker NUMA aware Dave Airlie
@ 2025-06-05  6:19 ` Dave Airlie
  2025-06-05  8:34 ` drm/ttm: port ttm pools to NUMA aware lru_list Christian König
  5 siblings, 0 replies; 8+ messages in thread
From: Dave Airlie @ 2025-06-05  6:19 UTC (permalink / raw)
  To: dri-devel; +Cc: Christian Koenig, Matthew Brost, Dave Airlie

From: Dave Airlie <airlied@redhat.com>

This gets the memory sizes from the nodes and stores the limit
as 50% of those. I think eventually we should drop the limits
once we have memcg aware shrinking, but this should be more NUMA
friendly, and I think seems like what people would prefer to
happen on NUMA aware systems.

Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 57 +++++++++++++++++++++++++---------
 1 file changed, 43 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 902dd682afc0..508b50f6901b 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -114,10 +114,11 @@ struct ttm_pool_tt_restore {
 
 static unsigned long page_pool_size;
 
-MODULE_PARM_DESC(page_pool_size, "Number of pages in the WC/UC/DMA pool");
+MODULE_PARM_DESC(page_pool_size, "Number of pages in the WC/UC/DMA pool per NUMA node");
 module_param(page_pool_size, ulong, 0644);
 
-static atomic_long_t allocated_pages;
+static unsigned long pool_node_limit[MAX_NUMNODES];
+static atomic_long_t allocated_pages[MAX_NUMNODES];
 
 static struct ttm_pool_type global_write_combined[NR_PAGE_ORDERS];
 static struct ttm_pool_type global_uncached[NR_PAGE_ORDERS];
@@ -299,7 +300,7 @@ static void ttm_pool_type_give(struct ttm_pool_type *pt, int nid, struct page *p
 	list_lru_add(&pt->pages, &p->lru, nid, NULL);
 	rcu_read_unlock();
 	spin_unlock(&pt->lock);
-	atomic_long_add(1 << pt->order, &allocated_pages);
+	atomic_long_add(1 << pt->order, &allocated_pages[nid]);
 }
 
 struct take_one_info {
@@ -315,7 +316,7 @@ static enum lru_status take_one_from_lru(struct list_head *item,
 	struct ttm_pool_type *pt = info->pt;
 	struct page *p = container_of(item, struct page, lru);
 	list_lru_isolate(list, item);
-	atomic_long_sub(1 << pt->order, &allocated_pages);
+	atomic_long_sub(1 << pt->order, &allocated_pages[page_to_nid(p)]);
 	info->out = p;
 	return LRU_REMOVED;
 }
@@ -360,7 +361,7 @@ static enum lru_status pool_free_page(struct list_head *item,
 
 	list_lru_isolate(list, item);
 
-	atomic_long_sub(1 << pt->order, &allocated_pages);
+	atomic_long_sub(1 << pt->order, &allocated_pages[page_to_nid(p)]);
 	ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
 	return LRU_REMOVED;
 }
@@ -914,11 +915,13 @@ int ttm_pool_restore_and_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
  */
 void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt)
 {
+	int nid = ttm_pool_nid(pool);
+
 	ttm_pool_free_range(pool, tt, tt->caching, 0, tt->num_pages);
 
-	while (atomic_long_read(&allocated_pages) > page_pool_size) {
-		unsigned long diff = page_pool_size - atomic_long_read(&allocated_pages);
-		ttm_pool_shrink(ttm_pool_nid(pool), diff);
+	while (atomic_long_read(&allocated_pages[nid]) > pool_node_limit[nid]) {
+		unsigned long diff = pool_node_limit[nid] - atomic_long_read(&allocated_pages[nid]);
+		ttm_pool_shrink(nid, diff);
 	}
 }
 EXPORT_SYMBOL(ttm_pool_free);
@@ -1178,7 +1181,7 @@ static unsigned long ttm_pool_shrinker_scan(struct shrinker *shrink,
 	do
 		num_freed += ttm_pool_shrink(sc->nid, sc->nr_to_scan);
 	while (num_freed < sc->nr_to_scan &&
-	       atomic_long_read(&allocated_pages));
+	       atomic_long_read(&allocated_pages[sc->nid]));
 
 	sc->nr_scanned = num_freed;
 
@@ -1189,7 +1192,7 @@ static unsigned long ttm_pool_shrinker_scan(struct shrinker *shrink,
 static unsigned long ttm_pool_shrinker_count(struct shrinker *shrink,
 					     struct shrink_control *sc)
 {
-	unsigned long num_pages = atomic_long_read(&allocated_pages);
+	unsigned long num_pages = atomic_long_read(&allocated_pages[sc->nid]);
 
 	return num_pages ? num_pages : SHRINK_EMPTY;
 }
@@ -1233,8 +1236,12 @@ static void ttm_pool_debugfs_orders(struct ttm_pool_type *pt,
 /* Dump the total amount of allocated pages */
 static void ttm_pool_debugfs_footer(struct seq_file *m)
 {
-	seq_printf(m, "\ntotal\t: %8lu of %8lu\n",
-		   atomic_long_read(&allocated_pages), page_pool_size);
+	int nid;
+
+	for_each_node(nid) {
+		seq_printf(m, "\ntotal node%d\t: %8lu of %8lu\n", nid,
+			   atomic_long_read(&allocated_pages[nid]), pool_node_limit[nid]);
+	}
 }
 
 /* Dump the information for the global pools */
@@ -1333,6 +1340,22 @@ DEFINE_SHOW_ATTRIBUTE(ttm_pool_debugfs_shrink);
 
 #endif
 
+static inline uint64_t ttm_get_node_memory_size(int nid)
+{
+        /* This is directly using si_meminfo_node implementation as the
+         * function is not exported.
+         */
+        int zone_type;
+        uint64_t managed_pages = 0;
+
+        pg_data_t *pgdat = NODE_DATA(nid);
+
+        for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++)
+                managed_pages +=
+                        zone_managed_pages(&pgdat->node_zones[zone_type]);
+        return managed_pages * PAGE_SIZE;
+}
+
 /**
  * ttm_pool_mgr_init - Initialize globals
  *
@@ -1344,8 +1367,14 @@ int ttm_pool_mgr_init(unsigned long num_pages)
 {
 	unsigned int i;
 
-	if (!page_pool_size)
-		page_pool_size = num_pages;
+	int nid;
+	for_each_node(nid) {
+		if (!page_pool_size) {
+			uint64_t node_size = ttm_get_node_memory_size(nid);
+			pool_node_limit[nid] = (node_size >> PAGE_SHIFT) / 2;
+		} else
+			pool_node_limit[nid] = page_pool_size;
+	}
 
 	spin_lock_init(&shrinker_lock);
 	INIT_LIST_HEAD(&shrinker_list);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: drm/ttm: port ttm pools to NUMA aware lru_list
  2025-06-05  6:19 drm/ttm: port ttm pools to NUMA aware lru_list Dave Airlie
                   ` (4 preceding siblings ...)
  2025-06-05  6:19 ` [PATCH 5/5] ttm/pool: track allocated_pages per numa node Dave Airlie
@ 2025-06-05  8:34 ` Christian König
  5 siblings, 0 replies; 8+ messages in thread
From: Christian König @ 2025-06-05  8:34 UTC (permalink / raw)
  To: Dave Airlie, dri-devel; +Cc: Matthew Brost

On 6/5/25 08:19, Dave Airlie wrote:
> (RH email ate this the first time).
> 
> This is a bit of a tangent before continuing the tangent that is memcg aware pools.
> 
> The pools code is already NUMA aware, but it rolls it's own numa awareness, and
> in order to move towards memcg awareness, I think a first step is porting the pool
> code to lru_list and making the current shrinker NUMA aware. Once this is done
> then the next step should be to make the pools/shrinker memcg aware.
> 
> I've done light testing of this on a single node rx7900xt and a 4 node MI300A,
> and it seems to operate the way I'd expect, but just wanted to get some feedback on
> the idea and if anyone can spot any big problems with the strategy.

Since the list_lru_add requires the RCU look I think we can also nuke the spinlock protection since the LRU is self protected, isn't it?

Except for that and a few style nit picks it looks absolutely sane to me.

Regards,
Christian.

> 
> Thanks,
> Dave.
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-06-05 22:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-05  6:19 drm/ttm: port ttm pools to NUMA aware lru_list Dave Airlie
2025-06-05  6:19 ` [PATCH 1/5] mm/list_lru: export list_lru_add Dave Airlie
2025-06-05  6:19 ` [PATCH 2/5] ttm/pool: port to list_lru Dave Airlie
2025-06-05 22:40   ` Dave Chinner
2025-06-05  6:19 ` [PATCH 3/5] ttm/pool: drop numa specific pools Dave Airlie
2025-06-05  6:19 ` [PATCH 4/5] ttm/pool: make pool shrinker NUMA aware Dave Airlie
2025-06-05  6:19 ` [PATCH 5/5] ttm/pool: track allocated_pages per numa node Dave Airlie
2025-06-05  8:34 ` drm/ttm: port ttm pools to NUMA aware lru_list Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).