[RFC v2 0/5] Improving the worst case TTM large allocation latency

AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [RFC v2 0/5] Improving the worst case TTM large allocation latency
@ 2025-10-03 13:58 Tvrtko Ursulin
  2025-10-03 13:58 ` [RFC v2 1/5] drm/ttm: Add getter for some pool properties Tvrtko Ursulin
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2025-10-03 13:58 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: kernel-dev, Tvrtko Ursulin, Alex Deucher, Christian König,
	Danilo Krummrich, Dave Airlie, Gerd Hoffmann, Joonas Lahtinen,
	Lucas De Marchi, Lyude Paul, Maarten Lankhorst, Maxime Ripard,
	Rodrigo Vivi, Sui Jingfeng, Thadeu Lima de Souza Cascardo,
	Thomas Hellström, Thomas Zimmermann, Zack Rusin

Disclaimer:
Please note that as this series includes a patch which touches a good number of
drivers I will only copy everyone in the cover letter and the respective patch.
Assumption is people are subscribed to dri-devel so can look at the whole series
there. I know someone is bound to complain for both the case when everyone is
copied on everything for getting too much email, and also for this other case.
So please be flexible.

Description:

All drivers which use the TTM pool allocator end up requesting large order
allocations when allocating large buffers. Those can be slow due memory pressure
and so add latency to buffer creation. But there is often also a size limit
above which contiguous blocks do not bring any performance benefits. This series
allows drivers to say when it is okay for the TTM to try a bit less hard.

We do this by allowing drivers to specify this cut off point when creating the
TTM device and pools. Allocations above this size will skip direct reclaim so
under memory pressure worst case latency will improve. Background reclaim is
still kicked off and both before and after the memory pressure all the TTM pool
buckets remain to be used as they are today.

This is especially interesting if someone has configured MAX_PAGE_ORDER to
higher than the default. And even with the default, with amdgpu for example,
the last patch in the series makes use of the new feature by telling TTM that
above 2MiB we do not expect performance benefits. Which makes TTM not try direct
reclaim for the top bucket (4MiB).

End result is TTM drivers become a tiny bit nicer mm citizens and users benefit
from better worst case buffer creation latencies. As a side benefit we get rid
of two instances of those often very unreadable mutliple nameless booleans
function signatures.

If this sounds interesting and gets merge the invidual drivers can follow up
with patches configuring their thresholds.

v2:
 * Christian suggested to pass in the new data by changing the function signatures.

v1 thread:
https://lore.kernel.org/dri-devel/20250919131127.90932-1-tvrtko.ursulin@igalia.com/

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Sui Jingfeng <suijingfeng@loongson.cn>
Cc: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Zack Rusin <zack.rusin@broadcom.com>

Tvrtko Ursulin (5):
  drm/ttm: Add getter for some pool properties
  drm/ttm: Replace multiple booleans with flags in pool init
  drm/ttm: Replace multiple booleans with flags in device init
  drm/ttm: Allow drivers to specify maximum beneficial TTM pool size
  drm/amdgpu: Configure max beneficial TTM pool allocation order

 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |  7 +--
 drivers/gpu/drm/drm_gem_vram_helper.c         |  2 +-
 drivers/gpu/drm/i915/intel_region_ttm.c       |  2 +-
 drivers/gpu/drm/loongson/lsdc_ttm.c           |  2 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c         |  4 +-
 drivers/gpu/drm/qxl/qxl_ttm.c                 |  2 +-
 drivers/gpu/drm/radeon/radeon_ttm.c           |  4 +-
 drivers/gpu/drm/ttm/tests/ttm_bo_test.c       | 16 +++----
 .../gpu/drm/ttm/tests/ttm_bo_validate_test.c  |  2 +-
 drivers/gpu/drm/ttm/tests/ttm_device_test.c   | 31 +++++--------
 drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c | 22 ++++------
 drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.h |  7 +--
 drivers/gpu/drm/ttm/tests/ttm_pool_test.c     | 23 +++++-----
 drivers/gpu/drm/ttm/ttm_device.c              |  7 ++-
 drivers/gpu/drm/ttm/ttm_pool.c                | 44 +++++++++++--------
 drivers/gpu/drm/ttm/ttm_tt.c                  |  9 ++--
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c           |  4 +-
 drivers/gpu/drm/xe/xe_device.c                |  2 +-
 include/drm/ttm/ttm_device.h                  |  2 +-
 include/drm/ttm/ttm_pool.h                    | 27 +++++++++---
 20 files changed, 113 insertions(+), 106 deletions(-)

-- 
2.48.0

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC v2 1/5] drm/ttm: Add getter for some pool properties
  2025-10-03 13:58 [RFC v2 0/5] Improving the worst case TTM large allocation latency Tvrtko Ursulin
@ 2025-10-03 13:58 ` Tvrtko Ursulin
  2025-10-06  8:38   ` Christian König
  2025-10-03 13:58 ` [RFC v2 2/5] drm/ttm: Replace multiple booleans with flags in pool init Tvrtko Ursulin
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Tvrtko Ursulin @ 2025-10-03 13:58 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: kernel-dev, Tvrtko Ursulin, Christian König,
	Thomas Hellström

No functional change but to allow easier refactoring in the future.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 28 ++++++++++++++--------------
 drivers/gpu/drm/ttm/ttm_tt.c   |  9 +++++----
 include/drm/ttm/ttm_pool.h     | 10 ++++++++++
 3 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index baf27c70a419..a9430b516fc3 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -148,7 +148,7 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 		gfp_flags |= __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN |
 			__GFP_THISNODE;
 
-	if (!pool->use_dma_alloc) {
+	if (!ttm_pool_uses_dma_alloc(pool)) {
 		p = alloc_pages_node(pool->nid, gfp_flags, order);
 		if (p)
 			p->private = order;
@@ -200,7 +200,7 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
 		set_pages_wb(p, 1 << order);
 #endif
 
-	if (!pool || !pool->use_dma_alloc) {
+	if (!pool || !ttm_pool_uses_dma_alloc(pool)) {
 		__free_pages(p, order);
 		return;
 	}
@@ -243,7 +243,7 @@ static int ttm_pool_map(struct ttm_pool *pool, unsigned int order,
 {
 	dma_addr_t addr;
 
-	if (pool->use_dma_alloc) {
+	if (ttm_pool_uses_dma_alloc(pool)) {
 		struct ttm_pool_dma *dma = (void *)p->private;
 
 		addr = dma->addr;
@@ -265,7 +265,7 @@ static void ttm_pool_unmap(struct ttm_pool *pool, dma_addr_t dma_addr,
 			   unsigned int num_pages)
 {
 	/* Unmapped while freeing the page */
-	if (pool->use_dma_alloc)
+	if (ttm_pool_uses_dma_alloc(pool))
 		return;
 
 	dma_unmap_page(pool->dev, dma_addr, (long)num_pages << PAGE_SHIFT,
@@ -339,7 +339,7 @@ static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool,
 						  enum ttm_caching caching,
 						  unsigned int order)
 {
-	if (pool->use_dma_alloc)
+	if (ttm_pool_uses_dma_alloc(pool))
 		return &pool->caching[caching].orders[order];
 
 #ifdef CONFIG_X86
@@ -348,7 +348,7 @@ static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool,
 		if (pool->nid != NUMA_NO_NODE)
 			return &pool->caching[caching].orders[order];
 
-		if (pool->use_dma32)
+		if (ttm_pool_uses_dma32(pool))
 			return &global_dma32_write_combined[order];
 
 		return &global_write_combined[order];
@@ -356,7 +356,7 @@ static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool,
 		if (pool->nid != NUMA_NO_NODE)
 			return &pool->caching[caching].orders[order];
 
-		if (pool->use_dma32)
+		if (ttm_pool_uses_dma32(pool))
 			return &global_dma32_uncached[order];
 
 		return &global_uncached[order];
@@ -396,7 +396,7 @@ static unsigned int ttm_pool_shrink(void)
 /* Return the allocation order based for a page */
 static unsigned int ttm_pool_page_order(struct ttm_pool *pool, struct page *p)
 {
-	if (pool->use_dma_alloc) {
+	if (ttm_pool_uses_dma_alloc(pool)) {
 		struct ttm_pool_dma *dma = (void *)p->private;
 
 		return dma->vaddr & ~PAGE_MASK;
@@ -719,7 +719,7 @@ static int __ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 	if (ctx->gfp_retry_mayfail)
 		gfp_flags |= __GFP_RETRY_MAYFAIL;
 
-	if (pool->use_dma32)
+	if (ttm_pool_uses_dma32(pool))
 		gfp_flags |= GFP_DMA32;
 	else
 		gfp_flags |= GFP_HIGHUSER;
@@ -977,7 +977,7 @@ long ttm_pool_backup(struct ttm_pool *pool, struct ttm_tt *tt,
 		return -EINVAL;
 
 	if ((!ttm_backup_bytes_avail() && !flags->purge) ||
-	    pool->use_dma_alloc || ttm_tt_is_backed_up(tt))
+	    ttm_pool_uses_dma_alloc(pool) || ttm_tt_is_backed_up(tt))
 		return -EBUSY;
 
 #ifdef CONFIG_X86
@@ -1014,7 +1014,7 @@ long ttm_pool_backup(struct ttm_pool *pool, struct ttm_tt *tt,
 	if (flags->purge)
 		return shrunken;
 
-	if (pool->use_dma32)
+	if (ttm_pool_uses_dma32(pool))
 		gfp = GFP_DMA32;
 	else
 		gfp = GFP_HIGHUSER;
@@ -1068,7 +1068,7 @@ void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
 {
 	unsigned int i, j;
 
-	WARN_ON(!dev && use_dma_alloc);
+	WARN_ON(!dev && ttm_pool_uses_dma_alloc(pool));
 
 	pool->dev = dev;
 	pool->nid = nid;
@@ -1239,7 +1239,7 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m)
 {
 	unsigned int i;
 
-	if (!pool->use_dma_alloc && pool->nid == NUMA_NO_NODE) {
+	if (!ttm_pool_uses_dma_alloc(pool) && pool->nid == NUMA_NO_NODE) {
 		seq_puts(m, "unused\n");
 		return 0;
 	}
@@ -1250,7 +1250,7 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m)
 	for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) {
 		if (!ttm_pool_select_type(pool, i, 0))
 			continue;
-		if (pool->use_dma_alloc)
+		if (ttm_pool_uses_dma_alloc(pool))
 			seq_puts(m, "DMA ");
 		else
 			seq_printf(m, "N%d ", pool->nid);
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 506e257dfba8..3b21ec33c877 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -93,7 +93,8 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc)
 	 * mapped TT pages need to be decrypted or otherwise the drivers
 	 * will end up sending encrypted mem to the gpu.
 	 */
-	if (bdev->pool.use_dma_alloc && cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
+	if (ttm_pool_uses_dma_alloc(&bdev->pool) &&
+	    cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
 		page_flags |= TTM_TT_FLAG_DECRYPTED;
 		drm_info_once(ddev, "TT memory decryption enabled.");
 	}
@@ -378,7 +379,7 @@ int ttm_tt_populate(struct ttm_device *bdev,
 
 	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
 		atomic_long_add(ttm->num_pages, &ttm_pages_allocated);
-		if (bdev->pool.use_dma32)
+		if (ttm_pool_uses_dma32(&bdev->pool))
 			atomic_long_add(ttm->num_pages,
 					&ttm_dma32_pages_allocated);
 	}
@@ -416,7 +417,7 @@ int ttm_tt_populate(struct ttm_device *bdev,
 error:
 	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
 		atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
-		if (bdev->pool.use_dma32)
+		if (ttm_pool_uses_dma32(&bdev->pool))
 			atomic_long_sub(ttm->num_pages,
 					&ttm_dma32_pages_allocated);
 	}
@@ -439,7 +440,7 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm)
 
 	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
 		atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
-		if (bdev->pool.use_dma32)
+		if (ttm_pool_uses_dma32(&bdev->pool))
 			atomic_long_sub(ttm->num_pages,
 					&ttm_dma32_pages_allocated);
 	}
diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h
index 54cd34a6e4c0..22154d84fef9 100644
--- a/include/drm/ttm/ttm_pool.h
+++ b/include/drm/ttm/ttm_pool.h
@@ -100,4 +100,14 @@ int ttm_pool_restore_and_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 int ttm_pool_mgr_init(unsigned long num_pages);
 void ttm_pool_mgr_fini(void);
 
+static inline bool ttm_pool_uses_dma_alloc(struct ttm_pool *pool)
+{
+	return pool->use_dma_alloc;
+}
+
+static inline bool ttm_pool_uses_dma32(struct ttm_pool *pool)
+{
+	return pool->use_dma32;
+}
+
 #endif
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v2 2/5] drm/ttm: Replace multiple booleans with flags in pool init
  2025-10-03 13:58 [RFC v2 0/5] Improving the worst case TTM large allocation latency Tvrtko Ursulin
  2025-10-03 13:58 ` [RFC v2 1/5] drm/ttm: Add getter for some pool properties Tvrtko Ursulin
@ 2025-10-03 13:58 ` Tvrtko Ursulin
  2025-10-03 13:58 ` [RFC v2 3/5] drm/ttm: Replace multiple booleans with flags in device init Tvrtko Ursulin
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2025-10-03 13:58 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: kernel-dev, Tvrtko Ursulin, Alex Deucher, Christian König,
	Thomas Hellström

Multiple consecutive boolean function arguments are usually not very
readable.

Replace the ones in ttm_pool_init() with flags with the additional
benefit of soon being able to pass in more data with just this one
code base churning cost.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c     |  2 +-
 drivers/gpu/drm/ttm/tests/ttm_device_test.c | 23 +++++++--------------
 drivers/gpu/drm/ttm/tests/ttm_pool_test.c   | 23 ++++++++++-----------
 drivers/gpu/drm/ttm/ttm_device.c            |  4 +++-
 drivers/gpu/drm/ttm/ttm_pool.c              |  8 +++----
 include/drm/ttm/ttm_pool.h                  | 15 +++++++-------
 6 files changed, 34 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index aa9ee5dffa45..8f6d331e1ea2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1837,7 +1837,7 @@ static int amdgpu_ttm_pools_init(struct amdgpu_device *adev)
 	for (i = 0; i < adev->gmc.num_mem_partitions; i++) {
 		ttm_pool_init(&adev->mman.ttm_pools[i], adev->dev,
 			      adev->gmc.mem_partitions[i].numa.node,
-			      false, false);
+			      0);
 	}
 	return 0;
 }
diff --git a/drivers/gpu/drm/ttm/tests/ttm_device_test.c b/drivers/gpu/drm/ttm/tests/ttm_device_test.c
index 1621903818e5..18c58b0fc18c 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_device_test.c
+++ b/drivers/gpu/drm/ttm/tests/ttm_device_test.c
@@ -10,8 +10,7 @@
 
 struct ttm_device_test_case {
 	const char *description;
-	bool use_dma_alloc;
-	bool use_dma32;
+	unsigned int flags;
 	bool pools_init_expected;
 };
 
@@ -119,26 +118,21 @@ static void ttm_device_init_no_vma_man(struct kunit *test)
 static const struct ttm_device_test_case ttm_device_cases[] = {
 	{
 		.description = "No DMA allocations, no DMA32 required",
-		.use_dma_alloc = false,
-		.use_dma32 = false,
 		.pools_init_expected = false,
 	},
 	{
 		.description = "DMA allocations, DMA32 required",
-		.use_dma_alloc = true,
-		.use_dma32 = true,
+		.flags = TTM_POOL_USE_DMA_ALLOC | TTM_POOL_USE_DMA32,
 		.pools_init_expected = true,
 	},
 	{
 		.description = "No DMA allocations, DMA32 required",
-		.use_dma_alloc = false,
-		.use_dma32 = true,
+		.flags = TTM_POOL_USE_DMA32,
 		.pools_init_expected = false,
 	},
 	{
 		.description = "DMA allocations, no DMA32 required",
-		.use_dma_alloc = true,
-		.use_dma32 = false,
+		.flags = TTM_POOL_USE_DMA_ALLOC,
 		.pools_init_expected = true,
 	},
 };
@@ -163,15 +157,14 @@ static void ttm_device_init_pools(struct kunit *test)
 	KUNIT_ASSERT_NOT_NULL(test, ttm_dev);
 
 	err = ttm_device_kunit_init(priv, ttm_dev,
-				    params->use_dma_alloc,
-				    params->use_dma32);
+				    params->flags & TTM_POOL_USE_DMA_ALLOC,
+				    params->flags & TTM_POOL_USE_DMA32);
 	KUNIT_ASSERT_EQ(test, err, 0);
 
 	pool = &ttm_dev->pool;
 	KUNIT_ASSERT_NOT_NULL(test, pool);
 	KUNIT_EXPECT_PTR_EQ(test, pool->dev, priv->dev);
-	KUNIT_EXPECT_EQ(test, pool->use_dma_alloc, params->use_dma_alloc);
-	KUNIT_EXPECT_EQ(test, pool->use_dma32, params->use_dma32);
+	KUNIT_EXPECT_EQ(test, pool->flags, params->flags);
 
 	if (params->pools_init_expected) {
 		for (int i = 0; i < TTM_NUM_CACHING_TYPES; ++i) {
@@ -181,7 +174,7 @@ static void ttm_device_init_pools(struct kunit *test)
 				KUNIT_EXPECT_EQ(test, pt.caching, i);
 				KUNIT_EXPECT_EQ(test, pt.order, j);
 
-				if (params->use_dma_alloc)
+				if (ttm_pool_uses_dma_alloc(pool))
 					KUNIT_ASSERT_FALSE(test,
 							   list_empty(&pt.pages));
 			}
diff --git a/drivers/gpu/drm/ttm/tests/ttm_pool_test.c b/drivers/gpu/drm/ttm/tests/ttm_pool_test.c
index 8ade53371f72..8caf4a587bac 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_pool_test.c
+++ b/drivers/gpu/drm/ttm/tests/ttm_pool_test.c
@@ -12,7 +12,7 @@
 struct ttm_pool_test_case {
 	const char *description;
 	unsigned int order;
-	bool use_dma_alloc;
+	unsigned int flags;
 };
 
 struct ttm_pool_test_priv {
@@ -86,7 +86,7 @@ static struct ttm_pool *ttm_pool_pre_populated(struct kunit *test,
 	pool = kunit_kzalloc(test, sizeof(*pool), GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, pool);
 
-	ttm_pool_init(pool, devs->dev, NUMA_NO_NODE, true, false);
+	ttm_pool_init(pool, devs->dev, NUMA_NO_NODE, TTM_POOL_USE_DMA_ALLOC);
 
 	err = ttm_pool_alloc(pool, tt, &simple_ctx);
 	KUNIT_ASSERT_EQ(test, err, 0);
@@ -113,12 +113,12 @@ static const struct ttm_pool_test_case ttm_pool_basic_cases[] = {
 	{
 		.description = "One page, with coherent DMA mappings enabled",
 		.order = 0,
-		.use_dma_alloc = true,
+		.flags = TTM_POOL_USE_DMA_ALLOC,
 	},
 	{
 		.description = "Above the allocation limit, with coherent DMA mappings enabled",
 		.order = MAX_PAGE_ORDER + 1,
-		.use_dma_alloc = true,
+		.flags = TTM_POOL_USE_DMA_ALLOC,
 	},
 };
 
@@ -150,12 +150,11 @@ static void ttm_pool_alloc_basic(struct kunit *test)
 	pool = kunit_kzalloc(test, sizeof(*pool), GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, pool);
 
-	ttm_pool_init(pool, devs->dev, NUMA_NO_NODE, params->use_dma_alloc,
-		      false);
+	ttm_pool_init(pool, devs->dev, NUMA_NO_NODE, params->flags);
 
 	KUNIT_ASSERT_PTR_EQ(test, pool->dev, devs->dev);
 	KUNIT_ASSERT_EQ(test, pool->nid, NUMA_NO_NODE);
-	KUNIT_ASSERT_EQ(test, pool->use_dma_alloc, params->use_dma_alloc);
+	KUNIT_ASSERT_EQ(test, pool->flags, params->flags);
 
 	err = ttm_pool_alloc(pool, tt, &simple_ctx);
 	KUNIT_ASSERT_EQ(test, err, 0);
@@ -165,14 +164,14 @@ static void ttm_pool_alloc_basic(struct kunit *test)
 	last_page = tt->pages[tt->num_pages - 1];
 
 	if (params->order <= MAX_PAGE_ORDER) {
-		if (params->use_dma_alloc) {
+		if (ttm_pool_uses_dma_alloc(pool)) {
 			KUNIT_ASSERT_NOT_NULL(test, (void *)fst_page->private);
 			KUNIT_ASSERT_NOT_NULL(test, (void *)last_page->private);
 		} else {
 			KUNIT_ASSERT_EQ(test, fst_page->private, params->order);
 		}
 	} else {
-		if (params->use_dma_alloc) {
+		if (ttm_pool_uses_dma_alloc(pool)) {
 			KUNIT_ASSERT_NOT_NULL(test, (void *)fst_page->private);
 			KUNIT_ASSERT_NULL(test, (void *)last_page->private);
 		} else {
@@ -218,7 +217,7 @@ static void ttm_pool_alloc_basic_dma_addr(struct kunit *test)
 	pool = kunit_kzalloc(test, sizeof(*pool), GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, pool);
 
-	ttm_pool_init(pool, devs->dev, NUMA_NO_NODE, true, false);
+	ttm_pool_init(pool, devs->dev, NUMA_NO_NODE, TTM_POOL_USE_DMA_ALLOC);
 
 	err = ttm_pool_alloc(pool, tt, &simple_ctx);
 	KUNIT_ASSERT_EQ(test, err, 0);
@@ -348,7 +347,7 @@ static void ttm_pool_free_dma_alloc(struct kunit *test)
 	pool = kunit_kzalloc(test, sizeof(*pool), GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, pool);
 
-	ttm_pool_init(pool, devs->dev, NUMA_NO_NODE, true, false);
+	ttm_pool_init(pool, devs->dev, NUMA_NO_NODE, TTM_POOL_USE_DMA_ALLOC);
 	ttm_pool_alloc(pool, tt, &simple_ctx);
 
 	pt = &pool->caching[caching].orders[order];
@@ -379,7 +378,7 @@ static void ttm_pool_free_no_dma_alloc(struct kunit *test)
 	pool = kunit_kzalloc(test, sizeof(*pool), GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, pool);
 
-	ttm_pool_init(pool, devs->dev, NUMA_NO_NODE, false, false);
+	ttm_pool_init(pool, devs->dev, NUMA_NO_NODE, 0);
 	ttm_pool_alloc(pool, tt, &simple_ctx);
 
 	pt = &pool->caching[caching].orders[order];
diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index c3e2fcbdd2cc..250e9f10145d 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -236,7 +236,9 @@ int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *func
 	else
 		nid = NUMA_NO_NODE;
 
-	ttm_pool_init(&bdev->pool, dev, nid, use_dma_alloc, use_dma32);
+	ttm_pool_init(&bdev->pool, dev, nid,
+		      (use_dma_alloc ? TTM_POOL_USE_DMA_ALLOC : 0) |
+		      (use_dma32 ? TTM_POOL_USE_DMA32 : 0));
 
 	bdev->vma_manager = vma_manager;
 	spin_lock_init(&bdev->lru_lock);
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index a9430b516fc3..0323531d216a 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -1058,13 +1058,12 @@ long ttm_pool_backup(struct ttm_pool *pool, struct ttm_tt *tt,
  * @pool: the pool to initialize
  * @dev: device for DMA allocations and mappings
  * @nid: NUMA node to use for allocations
- * @use_dma_alloc: true if coherent DMA alloc should be used
- * @use_dma32: true if GFP_DMA32 should be used
+ * @flags: TTM_POOL_ flags
  *
  * Initialize the pool and its pool types.
  */
 void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
-		   int nid, bool use_dma_alloc, bool use_dma32)
+		   int nid, unsigned int flags)
 {
 	unsigned int i, j;
 
@@ -1072,8 +1071,7 @@ void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
 
 	pool->dev = dev;
 	pool->nid = nid;
-	pool->use_dma_alloc = use_dma_alloc;
-	pool->use_dma32 = use_dma32;
+	pool->flags = flags;
 
 	for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) {
 		for (j = 0; j < NR_PAGE_ORDERS; ++j) {
diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h
index 22154d84fef9..d898186765f1 100644
--- a/include/drm/ttm/ttm_pool.h
+++ b/include/drm/ttm/ttm_pool.h
@@ -59,21 +59,22 @@ struct ttm_pool_type {
 	struct list_head pages;
 };
 
+#define TTM_POOL_USE_DMA_ALLOC 	BIT(0) /* Use coherent DMA allocations. */
+#define TTM_POOL_USE_DMA32	BIT(1) /* Use GFP_DMA32 allocations. */
+
 /**
  * struct ttm_pool - Pool for all caching and orders
  *
  * @dev: the device we allocate pages for
  * @nid: which numa node to use
- * @use_dma_alloc: if coherent DMA allocations should be used
- * @use_dma32: if GFP_DMA32 should be used
+ * @flags: TTM_POOL_ flags
  * @caching: pools for each caching/order
  */
 struct ttm_pool {
 	struct device *dev;
 	int nid;
 
-	bool use_dma_alloc;
-	bool use_dma32;
+	unsigned int flags;
 
 	struct {
 		struct ttm_pool_type orders[NR_PAGE_ORDERS];
@@ -85,7 +86,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt);
 
 void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
-		   int nid, bool use_dma_alloc, bool use_dma32);
+		   int nid, unsigned int flags);
 void ttm_pool_fini(struct ttm_pool *pool);
 
 int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m);
@@ -102,12 +103,12 @@ void ttm_pool_mgr_fini(void);
 
 static inline bool ttm_pool_uses_dma_alloc(struct ttm_pool *pool)
 {
-	return pool->use_dma_alloc;
+	return pool->flags & TTM_POOL_USE_DMA_ALLOC;
 }
 
 static inline bool ttm_pool_uses_dma32(struct ttm_pool *pool)
 {
-	return pool->use_dma32;
+	return pool->flags & TTM_POOL_USE_DMA32;
 }
 
 #endif
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v2 3/5] drm/ttm: Replace multiple booleans with flags in device init
  2025-10-03 13:58 [RFC v2 0/5] Improving the worst case TTM large allocation latency Tvrtko Ursulin
  2025-10-03 13:58 ` [RFC v2 1/5] drm/ttm: Add getter for some pool properties Tvrtko Ursulin
  2025-10-03 13:58 ` [RFC v2 2/5] drm/ttm: Replace multiple booleans with flags in pool init Tvrtko Ursulin
@ 2025-10-03 13:58 ` Tvrtko Ursulin
  2025-10-03 13:58 ` [RFC v2 4/5] drm/ttm: Allow drivers to specify maximum beneficial TTM pool size Tvrtko Ursulin
  2025-10-03 13:58 ` [RFC v2 5/5] drm/amdgpu: Configure max beneficial TTM pool allocation order Tvrtko Ursulin
  4 siblings, 0 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2025-10-03 13:58 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: kernel-dev, Tvrtko Ursulin, Alex Deucher, Christian König,
	Danilo Krummrich, Dave Airlie, Gerd Hoffmann, Joonas Lahtinen,
	Lucas De Marchi, Lyude Paul, Maarten Lankhorst, Maxime Ripard,
	Rodrigo Vivi, Sui Jingfeng, Thomas Hellström,
	Thomas Zimmermann, Zack Rusin

Multiple consecutive boolean function arguments are usually not very
readable.

Replace the ones in ttm_device_init() with flags with the additional
benefit of soon being able to pass in more data with just a one off
code base churning cost.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Sui Jingfeng <suijingfeng@loongson.cn>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Zack Rusin <zack.rusin@broadcom.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |  4 ++--
 drivers/gpu/drm/drm_gem_vram_helper.c         |  2 +-
 drivers/gpu/drm/i915/intel_region_ttm.c       |  2 +-
 drivers/gpu/drm/loongson/lsdc_ttm.c           |  2 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c         |  4 ++--
 drivers/gpu/drm/qxl/qxl_ttm.c                 |  2 +-
 drivers/gpu/drm/radeon/radeon_ttm.c           |  4 ++--
 drivers/gpu/drm/ttm/tests/ttm_bo_test.c       | 16 +++++++-------
 .../gpu/drm/ttm/tests/ttm_bo_validate_test.c  |  2 +-
 drivers/gpu/drm/ttm/tests/ttm_device_test.c   | 12 +++++-----
 drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c | 22 ++++++++-----------
 drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.h |  7 ++----
 drivers/gpu/drm/ttm/ttm_device.c              |  9 +++-----
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c           |  4 ++--
 drivers/gpu/drm/xe/xe_device.c                |  2 +-
 include/drm/ttm/ttm_device.h                  |  2 +-
 16 files changed, 42 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 8f6d331e1ea2..948c6d0a422b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1930,8 +1930,8 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
 	r = ttm_device_init(&adev->mman.bdev, &amdgpu_bo_driver, adev->dev,
 			       adev_to_drm(adev)->anon_inode->i_mapping,
 			       adev_to_drm(adev)->vma_offset_manager,
-			       adev->need_swiotlb,
-			       dma_addressing_limited(adev->dev));
+			       (adev->need_swiotlb ? TTM_POOL_USE_DMA_ALLOC : 0) |
+			       (dma_addressing_limited(adev->dev) ? TTM_POOL_USE_DMA32 : 0));
 	if (r) {
 		dev_err(adev->dev,
 			"failed initializing buffer object driver(%d).\n", r);
diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c b/drivers/gpu/drm/drm_gem_vram_helper.c
index 90760d0ca071..386d15f630f5 100644
--- a/drivers/gpu/drm/drm_gem_vram_helper.c
+++ b/drivers/gpu/drm/drm_gem_vram_helper.c
@@ -859,7 +859,7 @@ static int drm_vram_mm_init(struct drm_vram_mm *vmm, struct drm_device *dev,
 	ret = ttm_device_init(&vmm->bdev, &bo_driver, dev->dev,
 				 dev->anon_inode->i_mapping,
 				 dev->vma_offset_manager,
-				 false, true);
+				 TTM_POOL_USE_DMA32);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c
index 04525d92bec5..47a69aad5c3f 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -34,7 +34,7 @@ int intel_region_ttm_device_init(struct drm_i915_private *dev_priv)
 
 	return ttm_device_init(&dev_priv->bdev, i915_ttm_driver(),
 			       drm->dev, drm->anon_inode->i_mapping,
-			       drm->vma_offset_manager, false, false);
+			       drm->vma_offset_manager, 0);
 }
 
 /**
diff --git a/drivers/gpu/drm/loongson/lsdc_ttm.c b/drivers/gpu/drm/loongson/lsdc_ttm.c
index 2e42c6970c9f..b74dadd1a9c6 100644
--- a/drivers/gpu/drm/loongson/lsdc_ttm.c
+++ b/drivers/gpu/drm/loongson/lsdc_ttm.c
@@ -544,7 +544,7 @@ int lsdc_ttm_init(struct lsdc_device *ldev)
 
 	ret = ttm_device_init(&ldev->bdev, &lsdc_bo_driver, ddev->dev,
 			      ddev->anon_inode->i_mapping,
-			      ddev->vma_offset_manager, false, true);
+			      ddev->vma_offset_manager, TTM_POOL_USE_DMA32);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c b/drivers/gpu/drm/nouveau/nouveau_ttm.c
index 7d2436e5d50d..34649eb2b7c5 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -302,8 +302,8 @@ nouveau_ttm_init(struct nouveau_drm *drm)
 	ret = ttm_device_init(&drm->ttm.bdev, &nouveau_bo_driver, drm->dev->dev,
 				  dev->anon_inode->i_mapping,
 				  dev->vma_offset_manager,
-				  drm_need_swiotlb(drm->client.mmu.dmabits),
-				  drm->client.mmu.dmabits <= 32);
+				  (drm_need_swiotlb(drm->client.mmu.dmabits) ? TTM_POOL_USE_DMA_ALLOC : 0 ) |
+				  (drm->client.mmu.dmabits <= 32 ? TTM_POOL_USE_DMA32 : 0));
 	if (ret) {
 		NV_ERROR(drm, "error initialising bo driver, %d\n", ret);
 		return ret;
diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c
index 765a144cea14..85d9df48affa 100644
--- a/drivers/gpu/drm/qxl/qxl_ttm.c
+++ b/drivers/gpu/drm/qxl/qxl_ttm.c
@@ -196,7 +196,7 @@ int qxl_ttm_init(struct qxl_device *qdev)
 	r = ttm_device_init(&qdev->mman.bdev, &qxl_bo_driver, NULL,
 			    qdev->ddev.anon_inode->i_mapping,
 			    qdev->ddev.vma_offset_manager,
-			    false, false);
+			    0);
 	if (r) {
 		DRM_ERROR("failed initializing buffer object driver(%d).\n", r);
 		return r;
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index 616d25c8c2de..95758054db65 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -683,8 +683,8 @@ int radeon_ttm_init(struct radeon_device *rdev)
 	r = ttm_device_init(&rdev->mman.bdev, &radeon_bo_driver, rdev->dev,
 			       rdev_to_drm(rdev)->anon_inode->i_mapping,
 			       rdev_to_drm(rdev)->vma_offset_manager,
-			       rdev->need_swiotlb,
-			       dma_addressing_limited(&rdev->pdev->dev));
+			       (rdev->need_swiotlb ? TTM_POOL_USE_DMA_ALLOC : 0 ) |
+			       (dma_addressing_limited(&rdev->pdev->dev) ? TTM_POOL_USE_DMA32 : 0));
 	if (r) {
 		DRM_ERROR("failed initializing buffer object driver(%d).\n", r);
 		return r;
diff --git a/drivers/gpu/drm/ttm/tests/ttm_bo_test.c b/drivers/gpu/drm/ttm/tests/ttm_bo_test.c
index 5426b435f702..d468f8322072 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_bo_test.c
+++ b/drivers/gpu/drm/ttm/tests/ttm_bo_test.c
@@ -251,7 +251,7 @@ static void ttm_bo_unreserve_basic(struct kunit *test)
 	ttm_dev = kunit_kzalloc(test, sizeof(*ttm_dev), GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, ttm_dev);
 
-	err = ttm_device_kunit_init(priv, ttm_dev, false, false);
+	err = ttm_device_kunit_init(priv, ttm_dev, 0);
 	KUNIT_ASSERT_EQ(test, err, 0);
 	priv->ttm_dev = ttm_dev;
 
@@ -290,7 +290,7 @@ static void ttm_bo_unreserve_pinned(struct kunit *test)
 	ttm_dev = kunit_kzalloc(test, sizeof(*ttm_dev), GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, ttm_dev);
 
-	err = ttm_device_kunit_init(priv, ttm_dev, false, false);
+	err = ttm_device_kunit_init(priv, ttm_dev, 0);
 	KUNIT_ASSERT_EQ(test, err, 0);
 	priv->ttm_dev = ttm_dev;
 
@@ -342,7 +342,7 @@ static void ttm_bo_unreserve_bulk(struct kunit *test)
 	resv = kunit_kzalloc(test, sizeof(*resv), GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, resv);
 
-	err = ttm_device_kunit_init(priv, ttm_dev, false, false);
+	err = ttm_device_kunit_init(priv, ttm_dev, 0);
 	KUNIT_ASSERT_EQ(test, err, 0);
 	priv->ttm_dev = ttm_dev;
 
@@ -394,7 +394,7 @@ static void ttm_bo_fini_basic(struct kunit *test)
 	ttm_dev = kunit_kzalloc(test, sizeof(*ttm_dev), GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, ttm_dev);
 
-	err = ttm_device_kunit_init(priv, ttm_dev, false, false);
+	err = ttm_device_kunit_init(priv, ttm_dev, 0);
 	KUNIT_ASSERT_EQ(test, err, 0);
 	priv->ttm_dev = ttm_dev;
 
@@ -437,7 +437,7 @@ static void ttm_bo_fini_shared_resv(struct kunit *test)
 	ttm_dev = kunit_kzalloc(test, sizeof(*ttm_dev), GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, ttm_dev);
 
-	err = ttm_device_kunit_init(priv, ttm_dev, false, false);
+	err = ttm_device_kunit_init(priv, ttm_dev, 0);
 	KUNIT_ASSERT_EQ(test, err, 0);
 	priv->ttm_dev = ttm_dev;
 
@@ -477,7 +477,7 @@ static void ttm_bo_pin_basic(struct kunit *test)
 	ttm_dev = kunit_kzalloc(test, sizeof(*ttm_dev), GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, ttm_dev);
 
-	err = ttm_device_kunit_init(priv, ttm_dev, false, false);
+	err = ttm_device_kunit_init(priv, ttm_dev, 0);
 	KUNIT_ASSERT_EQ(test, err, 0);
 	priv->ttm_dev = ttm_dev;
 
@@ -512,7 +512,7 @@ static void ttm_bo_pin_unpin_resource(struct kunit *test)
 	ttm_dev = kunit_kzalloc(test, sizeof(*ttm_dev), GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, ttm_dev);
 
-	err = ttm_device_kunit_init(priv, ttm_dev, false, false);
+	err = ttm_device_kunit_init(priv, ttm_dev, 0);
 	KUNIT_ASSERT_EQ(test, err, 0);
 	priv->ttm_dev = ttm_dev;
 
@@ -563,7 +563,7 @@ static void ttm_bo_multiple_pin_one_unpin(struct kunit *test)
 	ttm_dev = kunit_kzalloc(test, sizeof(*ttm_dev), GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, ttm_dev);
 
-	err = ttm_device_kunit_init(priv, ttm_dev, false, false);
+	err = ttm_device_kunit_init(priv, ttm_dev, 0);
 	KUNIT_ASSERT_EQ(test, err, 0);
 	priv->ttm_dev = ttm_dev;
 
diff --git a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c
index 3a1eef83190c..17a570af296c 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c
+++ b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c
@@ -995,7 +995,7 @@ static void ttm_bo_validate_busy_domain_evict(struct kunit *test)
 	 */
 	ttm_device_fini(priv->ttm_dev);
 
-	err = ttm_device_kunit_init_bad_evict(test->priv, priv->ttm_dev, false, false);
+	err = ttm_device_kunit_init_bad_evict(test->priv, priv->ttm_dev);
 	KUNIT_ASSERT_EQ(test, err, 0);
 
 	ttm_mock_manager_init(priv->ttm_dev, mem_type, MANAGER_SIZE);
diff --git a/drivers/gpu/drm/ttm/tests/ttm_device_test.c b/drivers/gpu/drm/ttm/tests/ttm_device_test.c
index 18c58b0fc18c..0e0359f1cc0e 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_device_test.c
+++ b/drivers/gpu/drm/ttm/tests/ttm_device_test.c
@@ -24,7 +24,7 @@ static void ttm_device_init_basic(struct kunit *test)
 	ttm_dev = kunit_kzalloc(test, sizeof(*ttm_dev), GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, ttm_dev);
 
-	err = ttm_device_kunit_init(priv, ttm_dev, false, false);
+	err = ttm_device_kunit_init(priv, ttm_dev, 0);
 	KUNIT_ASSERT_EQ(test, err, 0);
 
 	KUNIT_EXPECT_PTR_EQ(test, ttm_dev->funcs, &ttm_dev_funcs);
@@ -54,7 +54,7 @@ static void ttm_device_init_multiple(struct kunit *test)
 	KUNIT_ASSERT_NOT_NULL(test, ttm_devs);
 
 	for (i = 0; i < num_dev; i++) {
-		err = ttm_device_kunit_init(priv, &ttm_devs[i], false, false);
+		err = ttm_device_kunit_init(priv, &ttm_devs[i], 0);
 		KUNIT_ASSERT_EQ(test, err, 0);
 
 		KUNIT_EXPECT_PTR_EQ(test, ttm_devs[i].dev_mapping,
@@ -80,7 +80,7 @@ static void ttm_device_fini_basic(struct kunit *test)
 	ttm_dev = kunit_kzalloc(test, sizeof(*ttm_dev), GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, ttm_dev);
 
-	err = ttm_device_kunit_init(priv, ttm_dev, false, false);
+	err = ttm_device_kunit_init(priv, ttm_dev, 0);
 	KUNIT_ASSERT_EQ(test, err, 0);
 
 	man = ttm_manager_type(ttm_dev, TTM_PL_SYSTEM);
@@ -108,7 +108,7 @@ static void ttm_device_init_no_vma_man(struct kunit *test)
 	vma_man = drm->vma_offset_manager;
 	drm->vma_offset_manager = NULL;
 
-	err = ttm_device_kunit_init(priv, ttm_dev, false, false);
+	err = ttm_device_kunit_init(priv, ttm_dev, 0);
 	KUNIT_EXPECT_EQ(test, err, -EINVAL);
 
 	/* Bring the manager back for a graceful cleanup */
@@ -156,9 +156,7 @@ static void ttm_device_init_pools(struct kunit *test)
 	ttm_dev = kunit_kzalloc(test, sizeof(*ttm_dev), GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, ttm_dev);
 
-	err = ttm_device_kunit_init(priv, ttm_dev,
-				    params->flags & TTM_POOL_USE_DMA_ALLOC,
-				    params->flags & TTM_POOL_USE_DMA32);
+	err = ttm_device_kunit_init(priv, ttm_dev, params->flags);
 	KUNIT_ASSERT_EQ(test, err, 0);
 
 	pool = &ttm_dev->pool;
diff --git a/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c b/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c
index 7aaf0d1395ff..4b0fc4f5fcbf 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c
+++ b/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c
@@ -117,8 +117,7 @@ static void bad_evict_flags(struct ttm_buffer_object *bo,
 
 static int ttm_device_kunit_init_with_funcs(struct ttm_test_devices *priv,
 					    struct ttm_device *ttm,
-					    bool use_dma_alloc,
-					    bool use_dma32,
+					    unsigned int pool_flags,
 					    struct ttm_device_funcs *funcs)
 {
 	struct drm_device *drm = priv->drm;
@@ -127,7 +126,7 @@ static int ttm_device_kunit_init_with_funcs(struct ttm_test_devices *priv,
 	err = ttm_device_init(ttm, funcs, drm->dev,
 			      drm->anon_inode->i_mapping,
 			      drm->vma_offset_manager,
-			      use_dma_alloc, use_dma32);
+			      pool_flags);
 
 	return err;
 }
@@ -143,11 +142,10 @@ EXPORT_SYMBOL_GPL(ttm_dev_funcs);
 
 int ttm_device_kunit_init(struct ttm_test_devices *priv,
 			  struct ttm_device *ttm,
-			  bool use_dma_alloc,
-			  bool use_dma32)
+			  unsigned int pool_flags)
 {
-	return ttm_device_kunit_init_with_funcs(priv, ttm, use_dma_alloc,
-						use_dma32, &ttm_dev_funcs);
+	return ttm_device_kunit_init_with_funcs(priv, ttm, pool_flags,
+						&ttm_dev_funcs);
 }
 EXPORT_SYMBOL_GPL(ttm_device_kunit_init);
 
@@ -161,12 +159,10 @@ struct ttm_device_funcs ttm_dev_funcs_bad_evict = {
 EXPORT_SYMBOL_GPL(ttm_dev_funcs_bad_evict);
 
 int ttm_device_kunit_init_bad_evict(struct ttm_test_devices *priv,
-				    struct ttm_device *ttm,
-				    bool use_dma_alloc,
-				    bool use_dma32)
+				    struct ttm_device *ttm)
 {
-	return ttm_device_kunit_init_with_funcs(priv, ttm, use_dma_alloc,
-						use_dma32, &ttm_dev_funcs_bad_evict);
+	return ttm_device_kunit_init_with_funcs(priv, ttm, 0,
+						&ttm_dev_funcs_bad_evict);
 }
 EXPORT_SYMBOL_GPL(ttm_device_kunit_init_bad_evict);
 
@@ -252,7 +248,7 @@ struct ttm_test_devices *ttm_test_devices_all(struct kunit *test)
 	ttm_dev = kunit_kzalloc(test, sizeof(*ttm_dev), GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, ttm_dev);
 
-	err = ttm_device_kunit_init(devs, ttm_dev, false, false);
+	err = ttm_device_kunit_init(devs, ttm_dev, 0);
 	KUNIT_ASSERT_EQ(test, err, 0);
 
 	devs->ttm_dev = ttm_dev;
diff --git a/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.h b/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.h
index c7da23232ffa..8112892ba290 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.h
+++ b/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.h
@@ -28,12 +28,9 @@ struct ttm_test_devices {
 /* Building blocks for test-specific init functions */
 int ttm_device_kunit_init(struct ttm_test_devices *priv,
 			  struct ttm_device *ttm,
-			  bool use_dma_alloc,
-			  bool use_dma32);
+			  unsigned int pool_flags);
 int ttm_device_kunit_init_bad_evict(struct ttm_test_devices *priv,
-				    struct ttm_device *ttm,
-				    bool use_dma_alloc,
-				    bool use_dma32);
+				    struct ttm_device *ttm);
 struct ttm_buffer_object *ttm_bo_kunit_init(struct kunit *test,
 					    struct ttm_test_devices *devs,
 					    size_t size,
diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 250e9f10145d..43f9cab186b1 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -198,8 +198,7 @@ EXPORT_SYMBOL(ttm_device_swapout);
  * @dev: The core kernel device pointer for DMA mappings and allocations.
  * @mapping: The address space to use for this bo.
  * @vma_manager: A pointer to a vma manager.
- * @use_dma_alloc: If coherent DMA allocation API should be used.
- * @use_dma32: If we should use GFP_DMA32 for device memory allocations.
+ * @pool_flags: TTM_POOL_ flags.
  *
  * Initializes a struct ttm_device:
  * Returns:
@@ -208,7 +207,7 @@ EXPORT_SYMBOL(ttm_device_swapout);
 int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *funcs,
 		    struct device *dev, struct address_space *mapping,
 		    struct drm_vma_offset_manager *vma_manager,
-		    bool use_dma_alloc, bool use_dma32)
+		    unsigned int pool_flags)
 {
 	struct ttm_global *glob = &ttm_glob;
 	int ret, nid;
@@ -236,9 +235,7 @@ int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *func
 	else
 		nid = NUMA_NO_NODE;
 
-	ttm_pool_init(&bdev->pool, dev, nid,
-		      (use_dma_alloc ? TTM_POOL_USE_DMA_ALLOC : 0) |
-		      (use_dma32 ? TTM_POOL_USE_DMA32 : 0));
+	ttm_pool_init(&bdev->pool, dev, nid, pool_flags);
 
 	bdev->vma_manager = vma_manager;
 	spin_lock_init(&bdev->lru_lock);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index 8ff958d119be..b5e0ba3c02b9 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -1023,8 +1023,8 @@ static int vmw_driver_load(struct vmw_private *dev_priv, u32 pci_id)
 			      dev_priv->drm.dev,
 			      dev_priv->drm.anon_inode->i_mapping,
 			      dev_priv->drm.vma_offset_manager,
-			      dev_priv->map_mode == vmw_dma_alloc_coherent,
-			      false);
+			      (dev_priv->map_mode == vmw_dma_alloc_coherent) ?
+			      TTM_POOL_USE_DMA_ALLOC : 0);
 	if (unlikely(ret != 0)) {
 		drm_err(&dev_priv->drm,
 			"Failed initializing TTM buffer object driver.\n");
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 386940323630..0096d71c1759 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -437,7 +437,7 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
 
 	err = ttm_device_init(&xe->ttm, &xe_ttm_funcs, xe->drm.dev,
 			      xe->drm.anon_inode->i_mapping,
-			      xe->drm.vma_offset_manager, false, false);
+			      xe->drm.vma_offset_manager, 0);
 	if (WARN_ON(err))
 		goto err;
 
diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
index 592b5f802859..baa1c0c3ed86 100644
--- a/include/drm/ttm/ttm_device.h
+++ b/include/drm/ttm/ttm_device.h
@@ -292,7 +292,7 @@ static inline void ttm_set_driver_manager(struct ttm_device *bdev, int type,
 int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *funcs,
 		    struct device *dev, struct address_space *mapping,
 		    struct drm_vma_offset_manager *vma_manager,
-		    bool use_dma_alloc, bool use_dma32);
+		    unsigned int pool_flags);
 void ttm_device_fini(struct ttm_device *bdev);
 void ttm_device_clear_dma_mappings(struct ttm_device *bdev);
 
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v2 4/5] drm/ttm: Allow drivers to specify maximum beneficial TTM pool size
  2025-10-03 13:58 [RFC v2 0/5] Improving the worst case TTM large allocation latency Tvrtko Ursulin
                   ` (2 preceding siblings ...)
  2025-10-03 13:58 ` [RFC v2 3/5] drm/ttm: Replace multiple booleans with flags in device init Tvrtko Ursulin
@ 2025-10-03 13:58 ` Tvrtko Ursulin
  2025-10-06  9:49   ` Christian König
  2025-10-03 13:58 ` [RFC v2 5/5] drm/amdgpu: Configure max beneficial TTM pool allocation order Tvrtko Ursulin
  4 siblings, 1 reply; 16+ messages in thread
From: Tvrtko Ursulin @ 2025-10-03 13:58 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: kernel-dev, Tvrtko Ursulin, Christian König,
	Thadeu Lima de Souza Cascardo

GPUs typically benefit from contiguous memory via reduced TLB pressure and
improved caching performance, where the maximum size of contiguous block
which adds a performance benefit is related to hardware design.

TTM pool allocator by default tries (hard) to allocate up to the system
MAX_PAGE_ORDER blocks. This varies by the CPU platform and can also be
configured via Kconfig.

If that limit was set to be higher than the GPU can make an extra use of,
lets allow the individual drivers to let TTM know over which allocation
order can the pool allocator afford to make a little bit less effort with.

We implement this by disabling direct reclaim for those allocations, which
reduces the allocation latency and lowers the demands on the page
allocator, in cases where expending this effort is not critical for the
GPU in question.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c |  8 ++++++++
 include/drm/ttm/ttm_pool.h     | 10 ++++++++--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 0323531d216a..c0bd92259394 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -135,6 +135,7 @@ static DECLARE_RWSEM(pool_shrink_rwsem);
 static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 					unsigned int order)
 {
+	const unsigned int beneficial_order = ttm_pool_beneficial_order(pool);
 	unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
 	struct ttm_pool_dma *dma;
 	struct page *p;
@@ -148,6 +149,13 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 		gfp_flags |= __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN |
 			__GFP_THISNODE;
 
+	/*
+	 * Do not add latency to the allocation path for allocations orders
+	 * device tolds us do not bring them additional performance gains.
+	 */
+	if (beneficial_order && order > beneficial_order)
+		gfp_flags &= ~__GFP_DIRECT_RECLAIM;
+
 	if (!ttm_pool_uses_dma_alloc(pool)) {
 		p = alloc_pages_node(pool->nid, gfp_flags, order);
 		if (p)
diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h
index d898186765f1..b2114e2d0695 100644
--- a/include/drm/ttm/ttm_pool.h
+++ b/include/drm/ttm/ttm_pool.h
@@ -59,8 +59,9 @@ struct ttm_pool_type {
 	struct list_head pages;
 };
 
-#define TTM_POOL_USE_DMA_ALLOC 	BIT(0) /* Use coherent DMA allocations. */
-#define TTM_POOL_USE_DMA32	BIT(1) /* Use GFP_DMA32 allocations. */
+#define TTM_POOL_BENEFICIAL_ORDER(n)	((n) & 0xff) /* Max order which caller can benefit from */
+#define TTM_POOL_USE_DMA_ALLOC 		BIT(8) /* Use coherent DMA allocations. */
+#define TTM_POOL_USE_DMA32		BIT(9) /* Use GFP_DMA32 allocations. */
 
 /**
  * struct ttm_pool - Pool for all caching and orders
@@ -111,4 +112,9 @@ static inline bool ttm_pool_uses_dma32(struct ttm_pool *pool)
 	return pool->flags & TTM_POOL_USE_DMA32;
 }
 
+static inline bool ttm_pool_beneficial_order(struct ttm_pool *pool)
+{
+	return pool->flags & 0xff;
+}
+
 #endif
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v2 5/5] drm/amdgpu: Configure max beneficial TTM pool allocation order
  2025-10-03 13:58 [RFC v2 0/5] Improving the worst case TTM large allocation latency Tvrtko Ursulin
                   ` (3 preceding siblings ...)
  2025-10-03 13:58 ` [RFC v2 4/5] drm/ttm: Allow drivers to specify maximum beneficial TTM pool size Tvrtko Ursulin
@ 2025-10-03 13:58 ` Tvrtko Ursulin
  4 siblings, 0 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2025-10-03 13:58 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: kernel-dev, Tvrtko Ursulin, Alex Deucher, Christian König,
	Thadeu Lima de Souza Cascardo

Let the TTM pool allocator know that we can afford for it to expend less
effort for satisfying contiguous allocations larger than 2MiB. The latter
is the maximum relevant PTE entry size and the driver and hardware are
happy to get larger blocks only opportunistically.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 948c6d0a422b..723b885210a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1837,7 +1837,7 @@ static int amdgpu_ttm_pools_init(struct amdgpu_device *adev)
 	for (i = 0; i < adev->gmc.num_mem_partitions; i++) {
 		ttm_pool_init(&adev->mman.ttm_pools[i], adev->dev,
 			      adev->gmc.mem_partitions[i].numa.node,
-			      0);
+			      TTM_POOL_BENEFICIAL_ORDER(get_order(2 * SZ_1M)));
 	}
 	return 0;
 }
@@ -1931,7 +1931,8 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
 			       adev_to_drm(adev)->anon_inode->i_mapping,
 			       adev_to_drm(adev)->vma_offset_manager,
 			       (adev->need_swiotlb ? TTM_POOL_USE_DMA_ALLOC : 0) |
-			       (dma_addressing_limited(adev->dev) ? TTM_POOL_USE_DMA32 : 0));
+			       (dma_addressing_limited(adev->dev) ? TTM_POOL_USE_DMA32 : 0) |
+			       TTM_POOL_BENEFICIAL_ORDER(get_order(2 * SZ_1M)));
 	if (r) {
 		dev_err(adev->dev,
 			"failed initializing buffer object driver(%d).\n", r);
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC v2 1/5] drm/ttm: Add getter for some pool properties
  2025-10-03 13:58 ` [RFC v2 1/5] drm/ttm: Add getter for some pool properties Tvrtko Ursulin
@ 2025-10-06  8:38   ` Christian König
  2025-10-07 14:00     ` Tvrtko Ursulin
  0 siblings, 1 reply; 16+ messages in thread
From: Christian König @ 2025-10-06  8:38 UTC (permalink / raw)
  To: Tvrtko Ursulin, amd-gfx, dri-devel; +Cc: kernel-dev, Thomas Hellström

On 03.10.25 15:58, Tvrtko Ursulin wrote:
> No functional change but to allow easier refactoring in the future.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/ttm/ttm_pool.c | 28 ++++++++++++++--------------
>  drivers/gpu/drm/ttm/ttm_tt.c   |  9 +++++----
>  include/drm/ttm/ttm_pool.h     | 10 ++++++++++
>  3 files changed, 29 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
> index baf27c70a419..a9430b516fc3 100644
> --- a/drivers/gpu/drm/ttm/ttm_pool.c
> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> @@ -148,7 +148,7 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
>  		gfp_flags |= __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN |
>  			__GFP_THISNODE;
>  
> -	if (!pool->use_dma_alloc) {
> +	if (!ttm_pool_uses_dma_alloc(pool)) {
>  		p = alloc_pages_node(pool->nid, gfp_flags, order);
>  		if (p)
>  			p->private = order;
> @@ -200,7 +200,7 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
>  		set_pages_wb(p, 1 << order);
>  #endif
>  
> -	if (!pool || !pool->use_dma_alloc) {
> +	if (!pool || !ttm_pool_uses_dma_alloc(pool)) {
>  		__free_pages(p, order);
>  		return;
>  	}
> @@ -243,7 +243,7 @@ static int ttm_pool_map(struct ttm_pool *pool, unsigned int order,
>  {
>  	dma_addr_t addr;
>  
> -	if (pool->use_dma_alloc) {
> +	if (ttm_pool_uses_dma_alloc(pool)) {
>  		struct ttm_pool_dma *dma = (void *)p->private;
>  
>  		addr = dma->addr;
> @@ -265,7 +265,7 @@ static void ttm_pool_unmap(struct ttm_pool *pool, dma_addr_t dma_addr,
>  			   unsigned int num_pages)
>  {
>  	/* Unmapped while freeing the page */
> -	if (pool->use_dma_alloc)
> +	if (ttm_pool_uses_dma_alloc(pool))
>  		return;
>  
>  	dma_unmap_page(pool->dev, dma_addr, (long)num_pages << PAGE_SHIFT,
> @@ -339,7 +339,7 @@ static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool,
>  						  enum ttm_caching caching,
>  						  unsigned int order)
>  {
> -	if (pool->use_dma_alloc)
> +	if (ttm_pool_uses_dma_alloc(pool))
>  		return &pool->caching[caching].orders[order];
>  
>  #ifdef CONFIG_X86
> @@ -348,7 +348,7 @@ static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool,
>  		if (pool->nid != NUMA_NO_NODE)
>  			return &pool->caching[caching].orders[order];
>  
> -		if (pool->use_dma32)
> +		if (ttm_pool_uses_dma32(pool))
>  			return &global_dma32_write_combined[order];
>  
>  		return &global_write_combined[order];
> @@ -356,7 +356,7 @@ static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool,
>  		if (pool->nid != NUMA_NO_NODE)
>  			return &pool->caching[caching].orders[order];
>  
> -		if (pool->use_dma32)
> +		if (ttm_pool_uses_dma32(pool))
>  			return &global_dma32_uncached[order];
>  
>  		return &global_uncached[order];
> @@ -396,7 +396,7 @@ static unsigned int ttm_pool_shrink(void)
>  /* Return the allocation order based for a page */
>  static unsigned int ttm_pool_page_order(struct ttm_pool *pool, struct page *p)
>  {
> -	if (pool->use_dma_alloc) {
> +	if (ttm_pool_uses_dma_alloc(pool)) {
>  		struct ttm_pool_dma *dma = (void *)p->private;
>  
>  		return dma->vaddr & ~PAGE_MASK;
> @@ -719,7 +719,7 @@ static int __ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
>  	if (ctx->gfp_retry_mayfail)
>  		gfp_flags |= __GFP_RETRY_MAYFAIL;
>  
> -	if (pool->use_dma32)
> +	if (ttm_pool_uses_dma32(pool))
>  		gfp_flags |= GFP_DMA32;
>  	else
>  		gfp_flags |= GFP_HIGHUSER;
> @@ -977,7 +977,7 @@ long ttm_pool_backup(struct ttm_pool *pool, struct ttm_tt *tt,
>  		return -EINVAL;
>  
>  	if ((!ttm_backup_bytes_avail() && !flags->purge) ||
> -	    pool->use_dma_alloc || ttm_tt_is_backed_up(tt))
> +	    ttm_pool_uses_dma_alloc(pool) || ttm_tt_is_backed_up(tt))
>  		return -EBUSY;
>  
>  #ifdef CONFIG_X86
> @@ -1014,7 +1014,7 @@ long ttm_pool_backup(struct ttm_pool *pool, struct ttm_tt *tt,
>  	if (flags->purge)
>  		return shrunken;
>  
> -	if (pool->use_dma32)
> +	if (ttm_pool_uses_dma32(pool))
>  		gfp = GFP_DMA32;
>  	else
>  		gfp = GFP_HIGHUSER;
> @@ -1068,7 +1068,7 @@ void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
>  {
>  	unsigned int i, j;
>  
> -	WARN_ON(!dev && use_dma_alloc);
> +	WARN_ON(!dev && ttm_pool_uses_dma_alloc(pool));
>  
>  	pool->dev = dev;
>  	pool->nid = nid;
> @@ -1239,7 +1239,7 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m)
>  {
>  	unsigned int i;
>  
> -	if (!pool->use_dma_alloc && pool->nid == NUMA_NO_NODE) {
> +	if (!ttm_pool_uses_dma_alloc(pool) && pool->nid == NUMA_NO_NODE) {
>  		seq_puts(m, "unused\n");
>  		return 0;
>  	}
> @@ -1250,7 +1250,7 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m)
>  	for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) {
>  		if (!ttm_pool_select_type(pool, i, 0))
>  			continue;
> -		if (pool->use_dma_alloc)
> +		if (ttm_pool_uses_dma_alloc(pool))
>  			seq_puts(m, "DMA ");
>  		else
>  			seq_printf(m, "N%d ", pool->nid);
> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> index 506e257dfba8..3b21ec33c877 100644
> --- a/drivers/gpu/drm/ttm/ttm_tt.c
> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> @@ -93,7 +93,8 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc)
>  	 * mapped TT pages need to be decrypted or otherwise the drivers
>  	 * will end up sending encrypted mem to the gpu.
>  	 */
> -	if (bdev->pool.use_dma_alloc && cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
> +	if (ttm_pool_uses_dma_alloc(&bdev->pool) &&
> +	    cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
>  		page_flags |= TTM_TT_FLAG_DECRYPTED;
>  		drm_info_once(ddev, "TT memory decryption enabled.");
>  	}
> @@ -378,7 +379,7 @@ int ttm_tt_populate(struct ttm_device *bdev,
>  
>  	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
>  		atomic_long_add(ttm->num_pages, &ttm_pages_allocated);
> -		if (bdev->pool.use_dma32)
> +		if (ttm_pool_uses_dma32(&bdev->pool))
>  			atomic_long_add(ttm->num_pages,
>  					&ttm_dma32_pages_allocated);
>  	}
> @@ -416,7 +417,7 @@ int ttm_tt_populate(struct ttm_device *bdev,
>  error:
>  	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
>  		atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
> -		if (bdev->pool.use_dma32)
> +		if (ttm_pool_uses_dma32(&bdev->pool))
>  			atomic_long_sub(ttm->num_pages,
>  					&ttm_dma32_pages_allocated);
>  	}
> @@ -439,7 +440,7 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm)
>  
>  	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
>  		atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
> -		if (bdev->pool.use_dma32)
> +		if (ttm_pool_uses_dma32(&bdev->pool))
>  			atomic_long_sub(ttm->num_pages,
>  					&ttm_dma32_pages_allocated);
>  	}
> diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h
> index 54cd34a6e4c0..22154d84fef9 100644
> --- a/include/drm/ttm/ttm_pool.h
> +++ b/include/drm/ttm/ttm_pool.h
> @@ -100,4 +100,14 @@ int ttm_pool_restore_and_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
>  int ttm_pool_mgr_init(unsigned long num_pages);
>  void ttm_pool_mgr_fini(void);
>  
> +static inline bool ttm_pool_uses_dma_alloc(struct ttm_pool *pool)
> +{
> +	return pool->use_dma_alloc;
> +}
> +
> +static inline bool ttm_pool_uses_dma32(struct ttm_pool *pool)
> +{
> +	return pool->use_dma32;
> +}
> +

Please not in the header. Neither drivers nor other TTM modules should mess with such properties.

That is all internal to the pool.

Regards,
Christian.

>  #endif


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC v2 4/5] drm/ttm: Allow drivers to specify maximum beneficial TTM pool size
  2025-10-03 13:58 ` [RFC v2 4/5] drm/ttm: Allow drivers to specify maximum beneficial TTM pool size Tvrtko Ursulin
@ 2025-10-06  9:49   ` Christian König
  2025-10-07 13:57     ` Tvrtko Ursulin
  0 siblings, 1 reply; 16+ messages in thread
From: Christian König @ 2025-10-06  9:49 UTC (permalink / raw)
  To: Tvrtko Ursulin, amd-gfx, dri-devel
  Cc: kernel-dev, Thadeu Lima de Souza Cascardo



On 03.10.25 15:58, Tvrtko Ursulin wrote:
> GPUs typically benefit from contiguous memory via reduced TLB pressure and
> improved caching performance, where the maximum size of contiguous block
> which adds a performance benefit is related to hardware design.
> 
> TTM pool allocator by default tries (hard) to allocate up to the system
> MAX_PAGE_ORDER blocks. This varies by the CPU platform and can also be
> configured via Kconfig.
> 
> If that limit was set to be higher than the GPU can make an extra use of,
> lets allow the individual drivers to let TTM know over which allocation
> order can the pool allocator afford to make a little bit less effort with.
> 
> We implement this by disabling direct reclaim for those allocations, which
> reduces the allocation latency and lowers the demands on the page
> allocator, in cases where expending this effort is not critical for the
> GPU in question.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
> ---
>  drivers/gpu/drm/ttm/ttm_pool.c |  8 ++++++++
>  include/drm/ttm/ttm_pool.h     | 10 ++++++++--
>  2 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
> index 0323531d216a..c0bd92259394 100644
> --- a/drivers/gpu/drm/ttm/ttm_pool.c
> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> @@ -135,6 +135,7 @@ static DECLARE_RWSEM(pool_shrink_rwsem);
>  static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
>  					unsigned int order)
>  {
> +	const unsigned int beneficial_order = ttm_pool_beneficial_order(pool);
>  	unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
>  	struct ttm_pool_dma *dma;
>  	struct page *p;
> @@ -148,6 +149,13 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
>  		gfp_flags |= __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN |
>  			__GFP_THISNODE;
>  
> +	/*
> +	 * Do not add latency to the allocation path for allocations orders
> +	 * device tolds us do not bring them additional performance gains.
> +	 */
> +	if (beneficial_order && order > beneficial_order)
> +		gfp_flags &= ~__GFP_DIRECT_RECLAIM;
> +
>  	if (!ttm_pool_uses_dma_alloc(pool)) {
>  		p = alloc_pages_node(pool->nid, gfp_flags, order);
>  		if (p)
> diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h
> index d898186765f1..b2114e2d0695 100644
> --- a/include/drm/ttm/ttm_pool.h
> +++ b/include/drm/ttm/ttm_pool.h
> @@ -59,8 +59,9 @@ struct ttm_pool_type {
>  	struct list_head pages;
>  };
>  
> -#define TTM_POOL_USE_DMA_ALLOC 	BIT(0) /* Use coherent DMA allocations. */
> -#define TTM_POOL_USE_DMA32	BIT(1) /* Use GFP_DMA32 allocations. */
> +#define TTM_POOL_BENEFICIAL_ORDER(n)	((n) & 0xff) /* Max order which caller can benefit from */

Looks good in general, but I'm not 100% convinced that we want to mix this value into the flags.

On the one hand it makes your live easier because you don't need to change all drivers using it, on the other hand changing all drivers using it would potentially be cleaner and document the value better.

Christian.

> +#define TTM_POOL_USE_DMA_ALLOC 		BIT(8) /* Use coherent DMA allocations. */
> +#define TTM_POOL_USE_DMA32		BIT(9) /* Use GFP_DMA32 allocations. */
>  
>  /**
>   * struct ttm_pool - Pool for all caching and orders
> @@ -111,4 +112,9 @@ static inline bool ttm_pool_uses_dma32(struct ttm_pool *pool)
>  	return pool->flags & TTM_POOL_USE_DMA32;
>  }
>  
> +static inline bool ttm_pool_beneficial_order(struct ttm_pool *pool)
> +{
> +	return pool->flags & 0xff;
> +}
> +
>  #endif


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC v2 4/5] drm/ttm: Allow drivers to specify maximum beneficial TTM pool size
  2025-10-06  9:49   ` Christian König
@ 2025-10-07 13:57     ` Tvrtko Ursulin
  2025-10-07 15:36       ` Christian König
  0 siblings, 1 reply; 16+ messages in thread
From: Tvrtko Ursulin @ 2025-10-07 13:57 UTC (permalink / raw)
  To: Christian König, amd-gfx, dri-devel
  Cc: kernel-dev, Thadeu Lima de Souza Cascardo


On 06/10/2025 10:49, Christian König wrote:
> On 03.10.25 15:58, Tvrtko Ursulin wrote:
>> GPUs typically benefit from contiguous memory via reduced TLB pressure and
>> improved caching performance, where the maximum size of contiguous block
>> which adds a performance benefit is related to hardware design.
>>
>> TTM pool allocator by default tries (hard) to allocate up to the system
>> MAX_PAGE_ORDER blocks. This varies by the CPU platform and can also be
>> configured via Kconfig.
>>
>> If that limit was set to be higher than the GPU can make an extra use of,
>> lets allow the individual drivers to let TTM know over which allocation
>> order can the pool allocator afford to make a little bit less effort with.
>>
>> We implement this by disabling direct reclaim for those allocations, which
>> reduces the allocation latency and lowers the demands on the page
>> allocator, in cases where expending this effort is not critical for the
>> GPU in question.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
>> ---
>>   drivers/gpu/drm/ttm/ttm_pool.c |  8 ++++++++
>>   include/drm/ttm/ttm_pool.h     | 10 ++++++++--
>>   2 files changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
>> index 0323531d216a..c0bd92259394 100644
>> --- a/drivers/gpu/drm/ttm/ttm_pool.c
>> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
>> @@ -135,6 +135,7 @@ static DECLARE_RWSEM(pool_shrink_rwsem);
>>   static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
>>   					unsigned int order)
>>   {
>> +	const unsigned int beneficial_order = ttm_pool_beneficial_order(pool);
>>   	unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
>>   	struct ttm_pool_dma *dma;
>>   	struct page *p;
>> @@ -148,6 +149,13 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
>>   		gfp_flags |= __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN |
>>   			__GFP_THISNODE;
>>   
>> +	/*
>> +	 * Do not add latency to the allocation path for allocations orders
>> +	 * device tolds us do not bring them additional performance gains.
>> +	 */
>> +	if (beneficial_order && order > beneficial_order)
>> +		gfp_flags &= ~__GFP_DIRECT_RECLAIM;
>> +
>>   	if (!ttm_pool_uses_dma_alloc(pool)) {
>>   		p = alloc_pages_node(pool->nid, gfp_flags, order);
>>   		if (p)
>> diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h
>> index d898186765f1..b2114e2d0695 100644
>> --- a/include/drm/ttm/ttm_pool.h
>> +++ b/include/drm/ttm/ttm_pool.h
>> @@ -59,8 +59,9 @@ struct ttm_pool_type {
>>   	struct list_head pages;
>>   };
>>   
>> -#define TTM_POOL_USE_DMA_ALLOC 	BIT(0) /* Use coherent DMA allocations. */
>> -#define TTM_POOL_USE_DMA32	BIT(1) /* Use GFP_DMA32 allocations. */
>> +#define TTM_POOL_BENEFICIAL_ORDER(n)	((n) & 0xff) /* Max order which caller can benefit from */
> 
> Looks good in general, but I'm not 100% convinced that we want to mix this value into the flags.
> 
> On the one hand it makes your live easier because you don't need to change all drivers using it, on the other hand changing all drivers using it would potentially be cleaner and document the value better.

I was not 100% convinced either but it looked a reasonable compromise.

My thinking was to not simply add an int after the existing two booleans 
but to try and clean it up at the same time. Once I replaced them with 
flags then the option were to either add a new int argument or add some 
flags like TTM_POOL_BENEFICIAL_SIZE_2M, TTM_POOL_BENEFICIAL_SIZE_64K, 
with the thinking there probably isn't a full range of page sizes. But 
then I thought why not just put the order in some bits. Advantages being 
it adds names to anonymous booleans and is extensible with no further churn.

But I don't know, I am happy to change it to something else if you are 
sure this isn't the way.

If we add a new int then it has to have some "stick with default" 
semantics. Like -1 or whatnot. With is also meh. I wouldn't do a zero 
because it feels conflated.

Regards,

Tvrtko

>> +#define TTM_POOL_USE_DMA_ALLOC 		BIT(8) /* Use coherent DMA allocations. */
>> +#define TTM_POOL_USE_DMA32		BIT(9) /* Use GFP_DMA32 allocations. */
>>   
>>   /**
>>    * struct ttm_pool - Pool for all caching and orders
>> @@ -111,4 +112,9 @@ static inline bool ttm_pool_uses_dma32(struct ttm_pool *pool)
>>   	return pool->flags & TTM_POOL_USE_DMA32;
>>   }
>>   
>> +static inline bool ttm_pool_beneficial_order(struct ttm_pool *pool)
>> +{
>> +	return pool->flags & 0xff;
>> +}
>> +
>>   #endif
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC v2 1/5] drm/ttm: Add getter for some pool properties
  2025-10-06  8:38   ` Christian König
@ 2025-10-07 14:00     ` Tvrtko Ursulin
  2025-10-07 14:03       ` Christian König
  0 siblings, 1 reply; 16+ messages in thread
From: Tvrtko Ursulin @ 2025-10-07 14:00 UTC (permalink / raw)
  To: Christian König, amd-gfx, dri-devel
  Cc: kernel-dev, Thomas Hellström


On 06/10/2025 09:38, Christian König wrote:
> On 03.10.25 15:58, Tvrtko Ursulin wrote:
>> No functional change but to allow easier refactoring in the future.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> ---
>>   drivers/gpu/drm/ttm/ttm_pool.c | 28 ++++++++++++++--------------
>>   drivers/gpu/drm/ttm/ttm_tt.c   |  9 +++++----
>>   include/drm/ttm/ttm_pool.h     | 10 ++++++++++
>>   3 files changed, 29 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
>> index baf27c70a419..a9430b516fc3 100644
>> --- a/drivers/gpu/drm/ttm/ttm_pool.c
>> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
>> @@ -148,7 +148,7 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
>>   		gfp_flags |= __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN |
>>   			__GFP_THISNODE;
>>   
>> -	if (!pool->use_dma_alloc) {
>> +	if (!ttm_pool_uses_dma_alloc(pool)) {
>>   		p = alloc_pages_node(pool->nid, gfp_flags, order);
>>   		if (p)
>>   			p->private = order;
>> @@ -200,7 +200,7 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
>>   		set_pages_wb(p, 1 << order);
>>   #endif
>>   
>> -	if (!pool || !pool->use_dma_alloc) {
>> +	if (!pool || !ttm_pool_uses_dma_alloc(pool)) {
>>   		__free_pages(p, order);
>>   		return;
>>   	}
>> @@ -243,7 +243,7 @@ static int ttm_pool_map(struct ttm_pool *pool, unsigned int order,
>>   {
>>   	dma_addr_t addr;
>>   
>> -	if (pool->use_dma_alloc) {
>> +	if (ttm_pool_uses_dma_alloc(pool)) {
>>   		struct ttm_pool_dma *dma = (void *)p->private;
>>   
>>   		addr = dma->addr;
>> @@ -265,7 +265,7 @@ static void ttm_pool_unmap(struct ttm_pool *pool, dma_addr_t dma_addr,
>>   			   unsigned int num_pages)
>>   {
>>   	/* Unmapped while freeing the page */
>> -	if (pool->use_dma_alloc)
>> +	if (ttm_pool_uses_dma_alloc(pool))
>>   		return;
>>   
>>   	dma_unmap_page(pool->dev, dma_addr, (long)num_pages << PAGE_SHIFT,
>> @@ -339,7 +339,7 @@ static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool,
>>   						  enum ttm_caching caching,
>>   						  unsigned int order)
>>   {
>> -	if (pool->use_dma_alloc)
>> +	if (ttm_pool_uses_dma_alloc(pool))
>>   		return &pool->caching[caching].orders[order];
>>   
>>   #ifdef CONFIG_X86
>> @@ -348,7 +348,7 @@ static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool,
>>   		if (pool->nid != NUMA_NO_NODE)
>>   			return &pool->caching[caching].orders[order];
>>   
>> -		if (pool->use_dma32)
>> +		if (ttm_pool_uses_dma32(pool))
>>   			return &global_dma32_write_combined[order];
>>   
>>   		return &global_write_combined[order];
>> @@ -356,7 +356,7 @@ static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool,
>>   		if (pool->nid != NUMA_NO_NODE)
>>   			return &pool->caching[caching].orders[order];
>>   
>> -		if (pool->use_dma32)
>> +		if (ttm_pool_uses_dma32(pool))
>>   			return &global_dma32_uncached[order];
>>   
>>   		return &global_uncached[order];
>> @@ -396,7 +396,7 @@ static unsigned int ttm_pool_shrink(void)
>>   /* Return the allocation order based for a page */
>>   static unsigned int ttm_pool_page_order(struct ttm_pool *pool, struct page *p)
>>   {
>> -	if (pool->use_dma_alloc) {
>> +	if (ttm_pool_uses_dma_alloc(pool)) {
>>   		struct ttm_pool_dma *dma = (void *)p->private;
>>   
>>   		return dma->vaddr & ~PAGE_MASK;
>> @@ -719,7 +719,7 @@ static int __ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
>>   	if (ctx->gfp_retry_mayfail)
>>   		gfp_flags |= __GFP_RETRY_MAYFAIL;
>>   
>> -	if (pool->use_dma32)
>> +	if (ttm_pool_uses_dma32(pool))
>>   		gfp_flags |= GFP_DMA32;
>>   	else
>>   		gfp_flags |= GFP_HIGHUSER;
>> @@ -977,7 +977,7 @@ long ttm_pool_backup(struct ttm_pool *pool, struct ttm_tt *tt,
>>   		return -EINVAL;
>>   
>>   	if ((!ttm_backup_bytes_avail() && !flags->purge) ||
>> -	    pool->use_dma_alloc || ttm_tt_is_backed_up(tt))
>> +	    ttm_pool_uses_dma_alloc(pool) || ttm_tt_is_backed_up(tt))
>>   		return -EBUSY;
>>   
>>   #ifdef CONFIG_X86
>> @@ -1014,7 +1014,7 @@ long ttm_pool_backup(struct ttm_pool *pool, struct ttm_tt *tt,
>>   	if (flags->purge)
>>   		return shrunken;
>>   
>> -	if (pool->use_dma32)
>> +	if (ttm_pool_uses_dma32(pool))
>>   		gfp = GFP_DMA32;
>>   	else
>>   		gfp = GFP_HIGHUSER;
>> @@ -1068,7 +1068,7 @@ void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
>>   {
>>   	unsigned int i, j;
>>   
>> -	WARN_ON(!dev && use_dma_alloc);
>> +	WARN_ON(!dev && ttm_pool_uses_dma_alloc(pool));
>>   
>>   	pool->dev = dev;
>>   	pool->nid = nid;
>> @@ -1239,7 +1239,7 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m)
>>   {
>>   	unsigned int i;
>>   
>> -	if (!pool->use_dma_alloc && pool->nid == NUMA_NO_NODE) {
>> +	if (!ttm_pool_uses_dma_alloc(pool) && pool->nid == NUMA_NO_NODE) {
>>   		seq_puts(m, "unused\n");
>>   		return 0;
>>   	}
>> @@ -1250,7 +1250,7 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m)
>>   	for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) {
>>   		if (!ttm_pool_select_type(pool, i, 0))
>>   			continue;
>> -		if (pool->use_dma_alloc)
>> +		if (ttm_pool_uses_dma_alloc(pool))
>>   			seq_puts(m, "DMA ");
>>   		else
>>   			seq_printf(m, "N%d ", pool->nid);
>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
>> index 506e257dfba8..3b21ec33c877 100644
>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>> @@ -93,7 +93,8 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc)
>>   	 * mapped TT pages need to be decrypted or otherwise the drivers
>>   	 * will end up sending encrypted mem to the gpu.
>>   	 */
>> -	if (bdev->pool.use_dma_alloc && cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
>> +	if (ttm_pool_uses_dma_alloc(&bdev->pool) &&
>> +	    cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
>>   		page_flags |= TTM_TT_FLAG_DECRYPTED;
>>   		drm_info_once(ddev, "TT memory decryption enabled.");
>>   	}
>> @@ -378,7 +379,7 @@ int ttm_tt_populate(struct ttm_device *bdev,
>>   
>>   	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
>>   		atomic_long_add(ttm->num_pages, &ttm_pages_allocated);
>> -		if (bdev->pool.use_dma32)
>> +		if (ttm_pool_uses_dma32(&bdev->pool))
>>   			atomic_long_add(ttm->num_pages,
>>   					&ttm_dma32_pages_allocated);
>>   	}
>> @@ -416,7 +417,7 @@ int ttm_tt_populate(struct ttm_device *bdev,
>>   error:
>>   	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
>>   		atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
>> -		if (bdev->pool.use_dma32)
>> +		if (ttm_pool_uses_dma32(&bdev->pool))
>>   			atomic_long_sub(ttm->num_pages,
>>   					&ttm_dma32_pages_allocated);
>>   	}
>> @@ -439,7 +440,7 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm)
>>   
>>   	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
>>   		atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
>> -		if (bdev->pool.use_dma32)
>> +		if (ttm_pool_uses_dma32(&bdev->pool))
>>   			atomic_long_sub(ttm->num_pages,
>>   					&ttm_dma32_pages_allocated);
>>   	}
>> diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h
>> index 54cd34a6e4c0..22154d84fef9 100644
>> --- a/include/drm/ttm/ttm_pool.h
>> +++ b/include/drm/ttm/ttm_pool.h
>> @@ -100,4 +100,14 @@ int ttm_pool_restore_and_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
>>   int ttm_pool_mgr_init(unsigned long num_pages);
>>   void ttm_pool_mgr_fini(void);
>>   
>> +static inline bool ttm_pool_uses_dma_alloc(struct ttm_pool *pool)
>> +{
>> +	return pool->use_dma_alloc;
>> +}
>> +
>> +static inline bool ttm_pool_uses_dma32(struct ttm_pool *pool)
>> +{
>> +	return pool->use_dma32;
>> +}
>> +
> 
> Please not in the header. Neither drivers nor other TTM modules should mess with such properties.
> 
> That is all internal to the pool.

Hmm IMHO it is not that bad. Especially that ttm_pool.c and ttm_tt.c 
need to have access to them. Alternatiev is a new header for internal 
helpers which sounds a bit too much. But if you insist I can create it.

Regards,

Tvrtko


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC v2 1/5] drm/ttm: Add getter for some pool properties
  2025-10-07 14:00     ` Tvrtko Ursulin
@ 2025-10-07 14:03       ` Christian König
  2025-10-07 14:27         ` Tvrtko Ursulin
  0 siblings, 1 reply; 16+ messages in thread
From: Christian König @ 2025-10-07 14:03 UTC (permalink / raw)
  To: Tvrtko Ursulin, amd-gfx, dri-devel; +Cc: kernel-dev, Thomas Hellström

On 07.10.25 16:00, Tvrtko Ursulin wrote:
>>
>> Please not in the header. Neither drivers nor other TTM modules should mess with such properties.
>>
>> That is all internal to the pool.
> 
> Hmm IMHO it is not that bad. Especially that ttm_pool.c and ttm_tt.c need to have access to them. Alternatiev is a new header for internal helpers which sounds a bit too much. But if you insist I can create it.

Wait a second why is ttm_tt.c still needing this? For the DMA32 eviction?

Regards,
Christian.

> 
> Regards,
> 
> Tvrtko
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC v2 1/5] drm/ttm: Add getter for some pool properties
  2025-10-07 14:03       ` Christian König
@ 2025-10-07 14:27         ` Tvrtko Ursulin
  2025-10-07 15:04           ` Christian König
  0 siblings, 1 reply; 16+ messages in thread
From: Tvrtko Ursulin @ 2025-10-07 14:27 UTC (permalink / raw)
  To: Christian König, amd-gfx, dri-devel
  Cc: kernel-dev, Thomas Hellström



On 07/10/2025 15:03, Christian König wrote:
> On 07.10.25 16:00, Tvrtko Ursulin wrote:
>>>
>>> Please not in the header. Neither drivers nor other TTM modules should mess with such properties.
>>>
>>> That is all internal to the pool.
>>
>> Hmm IMHO it is not that bad. Especially that ttm_pool.c and ttm_tt.c need to have access to them. Alternatiev is a new header for internal helpers which sounds a bit too much. But if you insist I can create it.
> 
> Wait a second why is ttm_tt.c still needing this? For the DMA32 eviction?

Apparently so, goes back to:

680dcede2762 ("drm/ttm: switch back to static allocation limits for now"

Then there is the newer usage for ttm->use_dma_alloc from:

71ce046327cf ("drm/ttm: Make sure the mapped tt pages are decrypted when 
needed")

Regards,

Tvrtko


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC v2 1/5] drm/ttm: Add getter for some pool properties
  2025-10-07 14:27         ` Tvrtko Ursulin
@ 2025-10-07 15:04           ` Christian König
  2025-10-08  7:34             ` Tvrtko Ursulin
  0 siblings, 1 reply; 16+ messages in thread
From: Christian König @ 2025-10-07 15:04 UTC (permalink / raw)
  To: Tvrtko Ursulin, amd-gfx, dri-devel; +Cc: kernel-dev, Thomas Hellström

On 07.10.25 16:27, Tvrtko Ursulin wrote:
> 
> 
> On 07/10/2025 15:03, Christian König wrote:
>> On 07.10.25 16:00, Tvrtko Ursulin wrote:
>>>>
>>>> Please not in the header. Neither drivers nor other TTM modules should mess with such properties.
>>>>
>>>> That is all internal to the pool.
>>>
>>> Hmm IMHO it is not that bad. Especially that ttm_pool.c and ttm_tt.c need to have access to them. Alternatiev is a new header for internal helpers which sounds a bit too much. But if you insist I can create it.
>>
>> Wait a second why is ttm_tt.c still needing this? For the DMA32 eviction?
> 
> Apparently so, goes back to:
> 
> 680dcede2762 ("drm/ttm: switch back to static allocation limits for now"
> 
> Then there is the newer usage for ttm->use_dma_alloc from:
> 
> 71ce046327cf ("drm/ttm: Make sure the mapped tt pages are decrypted when needed")

Ah yes that was the DMA layer hack for encryption Zack came up with.

In this case please put the functions into ttm_bo_internal.h. It should already be in the ttm directory for TTM internal stuff.

Going to comment on the other patch as well.

Thanks,
Christian.

> 
> Regards,
> 
> Tvrtko
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC v2 4/5] drm/ttm: Allow drivers to specify maximum beneficial TTM pool size
  2025-10-07 13:57     ` Tvrtko Ursulin
@ 2025-10-07 15:36       ` Christian König
  0 siblings, 0 replies; 16+ messages in thread
From: Christian König @ 2025-10-07 15:36 UTC (permalink / raw)
  To: Tvrtko Ursulin, amd-gfx, dri-devel
  Cc: kernel-dev, Thadeu Lima de Souza Cascardo

On 07.10.25 15:57, Tvrtko Ursulin wrote:
>>>   -#define TTM_POOL_USE_DMA_ALLOC     BIT(0) /* Use coherent DMA allocations. */
>>> -#define TTM_POOL_USE_DMA32    BIT(1) /* Use GFP_DMA32 allocations. */
>>> +#define TTM_POOL_BENEFICIAL_ORDER(n)    ((n) & 0xff) /* Max order which caller can benefit from */
>>
>> Looks good in general, but I'm not 100% convinced that we want to mix this value into the flags.
>>
>> On the one hand it makes your live easier because you don't need to change all drivers using it, on the other hand changing all drivers using it would potentially be cleaner and document the value better.
> 
> I was not 100% convinced either but it looked a reasonable compromise.
> 
> My thinking was to not simply add an int after the existing two booleans but to try and clean it up at the same time. Once I replaced them with flags then the option were to either add a new int argument or add some flags like TTM_POOL_BENEFICIAL_SIZE_2M, TTM_POOL_BENEFICIAL_SIZE_64K, with the thinking there probably isn't a full range of page sizes. But then I thought why not just put the order in some bits. Advantages being it adds names to anonymous booleans and is extensible with no further churn.
> 
> But I don't know, I am happy to change it to something else if you are sure this isn't the way.
> 
> If we add a new int then it has to have some "stick with default" semantics. Like -1 or whatnot. With is also meh. I wouldn't do a zero because it feels conflated.

Ideally drivers would use MAX_PAGE_ORDER, but yeah that is quite some work to do.

Feel free to go ahead with the flags approach, it should solve the problem and is not much work for now.

Regards,
Christian.

> 
> Regards,
> 
> Tvrtko
> 
>>> +#define TTM_POOL_USE_DMA_ALLOC         BIT(8) /* Use coherent DMA allocations. */
>>> +#define TTM_POOL_USE_DMA32        BIT(9) /* Use GFP_DMA32 allocations. */
>>>     /**
>>>    * struct ttm_pool - Pool for all caching and orders
>>> @@ -111,4 +112,9 @@ static inline bool ttm_pool_uses_dma32(struct ttm_pool *pool)
>>>       return pool->flags & TTM_POOL_USE_DMA32;
>>>   }
>>>   +static inline bool ttm_pool_beneficial_order(struct ttm_pool *pool)
>>> +{
>>> +    return pool->flags & 0xff;
>>> +}
>>> +
>>>   #endif
>>
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC v2 1/5] drm/ttm: Add getter for some pool properties
  2025-10-07 15:04           ` Christian König
@ 2025-10-08  7:34             ` Tvrtko Ursulin
  2025-10-08 10:01               ` Christian König
  0 siblings, 1 reply; 16+ messages in thread
From: Tvrtko Ursulin @ 2025-10-08  7:34 UTC (permalink / raw)
  To: Christian König, amd-gfx, dri-devel
  Cc: kernel-dev, Thomas Hellström


On 07/10/2025 16:04, Christian König wrote:
> On 07.10.25 16:27, Tvrtko Ursulin wrote:
>>
>>
>> On 07/10/2025 15:03, Christian König wrote:
>>> On 07.10.25 16:00, Tvrtko Ursulin wrote:
>>>>>
>>>>> Please not in the header. Neither drivers nor other TTM modules should mess with such properties.
>>>>>
>>>>> That is all internal to the pool.
>>>>
>>>> Hmm IMHO it is not that bad. Especially that ttm_pool.c and ttm_tt.c need to have access to them. Alternatiev is a new header for internal helpers which sounds a bit too much. But if you insist I can create it.
>>>
>>> Wait a second why is ttm_tt.c still needing this? For the DMA32 eviction?
>>
>> Apparently so, goes back to:
>>
>> 680dcede2762 ("drm/ttm: switch back to static allocation limits for now"
>>
>> Then there is the newer usage for ttm->use_dma_alloc from:
>>
>> 71ce046327cf ("drm/ttm: Make sure the mapped tt pages are decrypted when needed")
> 
> Ah yes that was the DMA layer hack for encryption Zack came up with.
> 
> In this case please put the functions into ttm_bo_internal.h. It should already be in the ttm directory for TTM internal stuff.

Then ttm_pool.c will have to start including ttm_bo_internal.h which 
feels it is making the component separation worse. Add 
ttm_pool_internal.h? Keep in ttm_pool.h?

Regards,

Tvrtko

> 
> Going to comment on the other patch as well.
> 
> Thanks,
> Christian.
> 
>>
>> Regards,
>>
>> Tvrtko
>>
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC v2 1/5] drm/ttm: Add getter for some pool properties
  2025-10-08  7:34             ` Tvrtko Ursulin
@ 2025-10-08 10:01               ` Christian König
  0 siblings, 0 replies; 16+ messages in thread
From: Christian König @ 2025-10-08 10:01 UTC (permalink / raw)
  To: Tvrtko Ursulin, amd-gfx, dri-devel; +Cc: kernel-dev, Thomas Hellström

On 08.10.25 09:34, Tvrtko Ursulin wrote:
> 
> On 07/10/2025 16:04, Christian König wrote:
>> On 07.10.25 16:27, Tvrtko Ursulin wrote:
>>>
>>>
>>> On 07/10/2025 15:03, Christian König wrote:
>>>> On 07.10.25 16:00, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> Please not in the header. Neither drivers nor other TTM modules should mess with such properties.
>>>>>>
>>>>>> That is all internal to the pool.
>>>>>
>>>>> Hmm IMHO it is not that bad. Especially that ttm_pool.c and ttm_tt.c need to have access to them. Alternatiev is a new header for internal helpers which sounds a bit too much. But if you insist I can create it.
>>>>
>>>> Wait a second why is ttm_tt.c still needing this? For the DMA32 eviction?
>>>
>>> Apparently so, goes back to:
>>>
>>> 680dcede2762 ("drm/ttm: switch back to static allocation limits for now"
>>>
>>> Then there is the newer usage for ttm->use_dma_alloc from:
>>>
>>> 71ce046327cf ("drm/ttm: Make sure the mapped tt pages are decrypted when needed")
>>
>> Ah yes that was the DMA layer hack for encryption Zack came up with.
>>
>> In this case please put the functions into ttm_bo_internal.h. It should already be in the ttm directory for TTM internal stuff.
> 
> Then ttm_pool.c will have to start including ttm_bo_internal.h which feels it is making the component separation worse. Add ttm_pool_internal.h? Keep in ttm_pool.h?

In that case just add ttm_pool_internal.h.

Thanks,
Christian.

> 
> Regards,
> 
> Tvrtko
> 
>>
>> Going to comment on the other patch as well.
>>
>> Thanks,
>> Christian.
>>
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-10-08 10:02 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-03 13:58 [RFC v2 0/5] Improving the worst case TTM large allocation latency Tvrtko Ursulin
2025-10-03 13:58 ` [RFC v2 1/5] drm/ttm: Add getter for some pool properties Tvrtko Ursulin
2025-10-06  8:38   ` Christian König
2025-10-07 14:00     ` Tvrtko Ursulin
2025-10-07 14:03       ` Christian König
2025-10-07 14:27         ` Tvrtko Ursulin
2025-10-07 15:04           ` Christian König
2025-10-08  7:34             ` Tvrtko Ursulin
2025-10-08 10:01               ` Christian König
2025-10-03 13:58 ` [RFC v2 2/5] drm/ttm: Replace multiple booleans with flags in pool init Tvrtko Ursulin
2025-10-03 13:58 ` [RFC v2 3/5] drm/ttm: Replace multiple booleans with flags in device init Tvrtko Ursulin
2025-10-03 13:58 ` [RFC v2 4/5] drm/ttm: Allow drivers to specify maximum beneficial TTM pool size Tvrtko Ursulin
2025-10-06  9:49   ` Christian König
2025-10-07 13:57     ` Tvrtko Ursulin
2025-10-07 15:36       ` Christian König
2025-10-03 13:58 ` [RFC v2 5/5] drm/amdgpu: Configure max beneficial TTM pool allocation order Tvrtko Ursulin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox