dri-devel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/ttm: Fix GPU MM stats during pool shrinking
@ 2026-05-01 22:30 Matthew Brost
  2026-05-01 22:57 ` Kenneth Crudup
  2026-05-02  0:05 ` Kenneth Crudup
  0 siblings, 2 replies; 5+ messages in thread
From: Matthew Brost @ 2026-05-01 22:30 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: Kenneth Crudup, Christian Koenig, Huang Rui, Matthew Auld,
	David Airlie

TTM pool shrinking frees pages by calling __free_pages() directly,
which bypasses updates to NR_GPU_ACTIVE and leaves GPU MM accounting
out of sync.

Introduce a helper, __free_pages_gpu_account(), and use it for all page
frees in ttm_pool.c so GPU MM statistics are updated consistently.

Reported-by: Kenneth Crudup <kenny@panix.com>
Fixes: ae80122f3896 ("drm/ttm: use gpu mm stats to track gpu memory allocations. (v4)")
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: David Airlie <airlied@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 26a3689e5fd9..95bbd9328072 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -206,6 +206,14 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 	return NULL;
 }
 
+static void __free_pages_gpu_account(struct page *p, unsigned int order,
+				     bool reclaim)
+{
+	mod_lruvec_page_state(p, reclaim ? NR_GPU_RECLAIM : NR_GPU_ACTIVE,
+			      -(1 << order));
+	__free_pages(p, order);
+}
+
 /* Reset the caching and pages of size 1 << order */
 static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
 			       unsigned int order, struct page *p, bool reclaim)
@@ -223,9 +231,7 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
 #endif
 
 	if (!pool || !ttm_pool_uses_dma_alloc(pool)) {
-		mod_lruvec_page_state(p, reclaim ? NR_GPU_RECLAIM : NR_GPU_ACTIVE,
-				      -(1 << order));
-		__free_pages(p, order);
+		__free_pages_gpu_account(p, order, reclaim);
 		return;
 	}
 
@@ -606,7 +612,7 @@ static int ttm_pool_restore_commit(struct ttm_pool_tt_restore *restore,
 			 */
 			ttm_pool_split_for_swap(restore->pool, p);
 			copy_highpage(restore->alloced_page + i, p);
-			__free_pages(p, 0);
+			__free_pages_gpu_account(p, 0, false);
 		}
 
 		restore->restored_pages++;
@@ -1068,7 +1074,7 @@ long ttm_pool_backup(struct ttm_pool *pool, struct ttm_tt *tt,
 			if (flags->purge) {
 				shrunken += num_pages;
 				page->private = 0;
-				__free_pages(page, order);
+				__free_pages_gpu_account(page, order, false);
 				memset(tt->pages + i, 0,
 				       num_pages * sizeof(*tt->pages));
 			}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/ttm: Fix GPU MM stats during pool shrinking
  2026-05-01 22:30 [PATCH] drm/ttm: Fix GPU MM stats during pool shrinking Matthew Brost
@ 2026-05-01 22:57 ` Kenneth Crudup
  2026-05-02  0:05 ` Kenneth Crudup
  1 sibling, 0 replies; 5+ messages in thread
From: Kenneth Crudup @ 2026-05-01 22:57 UTC (permalink / raw)
  To: Matthew Brost, intel-xe, dri-devel
  Cc: Christian Koenig, Huang Rui, Matthew Auld, David Airlie


On 5/1/26 15:30, Matthew Brost wrote:

> TTM pool shrinking frees pages by calling __free_pages() directly,
> which bypasses updates to NR_GPU_ACTIVE and leaves GPU MM accounting
> out of sync.
> 
> Introduce a helper, __free_pages_gpu_account(), and use it for all page
> frees in ttm_pool.c so GPU MM statistics are updated consistently.
> 
> Reported-by: Kenneth Crudup <kenny@panix.com>
> Fixes: ae80122f3896 ("drm/ttm: use gpu mm stats to track gpu memory allocations. (v4)")
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Huang Rui <ray.huang@amd.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: David Airlie <airlied@gmail.com>
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/ttm/ttm_pool.c | 16 +++++++++++-----
>   1 file changed, 11 insertions(+), 5 deletions(-)

Tested-By: Kenneth R. Crudup <kenny@panix.com>

> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
> index 26a3689e5fd9..95bbd9328072 100644
> --- a/drivers/gpu/drm/ttm/ttm_pool.c
> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> @@ -206,6 +206,14 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
>   	return NULL;
>   }
>   
> +static void __free_pages_gpu_account(struct page *p, unsigned int order,
> +				     bool reclaim)
> +{
> +	mod_lruvec_page_state(p, reclaim ? NR_GPU_RECLAIM : NR_GPU_ACTIVE,
> +			      -(1 << order));
> +	__free_pages(p, order);
> +}
> +
>   /* Reset the caching and pages of size 1 << order */
>   static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
>   			       unsigned int order, struct page *p, bool reclaim)
> @@ -223,9 +231,7 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
>   #endif
>   
>   	if (!pool || !ttm_pool_uses_dma_alloc(pool)) {
> -		mod_lruvec_page_state(p, reclaim ? NR_GPU_RECLAIM : NR_GPU_ACTIVE,
> -				      -(1 << order));
> -		__free_pages(p, order);
> +		__free_pages_gpu_account(p, order, reclaim);
>   		return;
>   	}
>   
> @@ -606,7 +612,7 @@ static int ttm_pool_restore_commit(struct ttm_pool_tt_restore *restore,
>   			 */
>   			ttm_pool_split_for_swap(restore->pool, p);
>   			copy_highpage(restore->alloced_page + i, p);
> -			__free_pages(p, 0);
> +			__free_pages_gpu_account(p, 0, false);
>   		}
>   
>   		restore->restored_pages++;
> @@ -1068,7 +1074,7 @@ long ttm_pool_backup(struct ttm_pool *pool, struct ttm_tt *tt,
>   			if (flags->purge) {
>   				shrunken += num_pages;
>   				page->private = 0;
> -				__free_pages(page, order);
> +				__free_pages_gpu_account(page, order, false);
>   				memset(tt->pages + i, 0,
>   				       num_pages * sizeof(*tt->pages));
>   			}

-- 
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange 
County CA


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/ttm: Fix GPU MM stats during pool shrinking
  2026-05-01 22:30 [PATCH] drm/ttm: Fix GPU MM stats during pool shrinking Matthew Brost
  2026-05-01 22:57 ` Kenneth Crudup
@ 2026-05-02  0:05 ` Kenneth Crudup
  2026-05-02  4:12   ` Matthew Brost
  1 sibling, 1 reply; 5+ messages in thread
From: Kenneth Crudup @ 2026-05-02  0:05 UTC (permalink / raw)
  To: Matthew Brost, intel-xe, dri-devel
  Cc: Christian Koenig, Huang Rui, Matthew Auld, David Airlie


On 5/1/26 15:30, Matthew Brost wrote:

> TTM pool shrinking frees pages by calling __free_pages() directly,
> which bypasses updates to NR_GPU_ACTIVE and leaves GPU MM accounting
> out of sync.
> 
> Introduce a helper, __free_pages_gpu_account(), and use it for all page
> frees in ttm_pool.c so GPU MM statistics are updated consistently.

OK, so why/how does "bonnie++" increase the GPU Memory size?

----
SwapTotal:      33554428 kB
MemTotal:       32345672 kB
GPUReclaim:       621568 kB
GPUActive:        453592 kB

SwapTotal:      33554428 kB
MemTotal:       32345672 kB
GPUActive:       5554976 kB
GPUReclaim:        12716 kB

SwapTotal:      33554428 kB
MemTotal:       32345672 kB
GPUActive:      18407272 kB
GPUReclaim:          884 kB

SwapTotal:      33554428 kB
MemTotal:       32345672 kB
GPUActive:      24022916 kB
GPUReclaim:          716 kB

SwapTotal:      33554428 kB
MemTotal:       32345672 kB
GPUActive:      25258248 kB
GPUReclaim:        16032 kB

SwapTotal:      33554428 kB
MemTotal:       32345672 kB
GPUActive:      28207188 kB
GPUReclaim:         3684 kB
----

... and I'm now not so sure the patch is working ... this after a 2nd 
bonnie run:

----
GPUActive:      44357100 kB
SwapTotal:      33554428 kB
MemTotal:       32345672 kB
GPUReclaim:        94864 kB

GPUActive:      44373904 kB
SwapTotal:      33554428 kB
MemTotal:       32345672 kB
GPUReclaim:        94996 kB

GPUActive:      44354940 kB
SwapTotal:      33554428 kB
MemTotal:       32345672 kB
GPUReclaim:        98048 kB

GPUActive:      44769340 kB
SwapTotal:      33554428 kB
MemTotal:       32345672 kB
GPUReclaim:       122996 kB
----

-Kenny

-- 
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange 
County CA


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/ttm: Fix GPU MM stats during pool shrinking
  2026-05-02  0:05 ` Kenneth Crudup
@ 2026-05-02  4:12   ` Matthew Brost
  2026-05-02  4:13     ` Matthew Brost
  0 siblings, 1 reply; 5+ messages in thread
From: Matthew Brost @ 2026-05-02  4:12 UTC (permalink / raw)
  To: Kenneth Crudup
  Cc: intel-xe, dri-devel, Christian Koenig, Huang Rui, Matthew Auld,
	David Airlie

On Fri, May 01, 2026 at 05:05:14PM -0700, Kenneth Crudup wrote:
> 
> On 5/1/26 15:30, Matthew Brost wrote:
> 
> > TTM pool shrinking frees pages by calling __free_pages() directly,
> > which bypasses updates to NR_GPU_ACTIVE and leaves GPU MM accounting
> > out of sync.
> > 
> > Introduce a helper, __free_pages_gpu_account(), and use it for all page
> > frees in ttm_pool.c so GPU MM statistics are updated consistently.
> 
> OK, so why/how does "bonnie++" increase the GPU Memory size?
> 

Well, it shouldn’t. What bonnie++ does is basically consume system
memory, triggering reclaim, which in turn will evict GPU BOs that exist
when the display is open. Thanks for pointing me to bonnie++ - this,
plus running something like the WebGL Aquarium in Chrome, is a very nice
test case for Xe/DRM shrinkers.

> ----
> SwapTotal:      33554428 kB
> MemTotal:       32345672 kB
> GPUReclaim:       621568 kB
> GPUActive:        453592 kB
> 
> SwapTotal:      33554428 kB
> MemTotal:       32345672 kB
> GPUActive:       5554976 kB
> GPUReclaim:        12716 kB
> 
> SwapTotal:      33554428 kB
> MemTotal:       32345672 kB
> GPUActive:      18407272 kB
> GPUReclaim:          884 kB
> 
> SwapTotal:      33554428 kB
> MemTotal:       32345672 kB
> GPUActive:      24022916 kB
> GPUReclaim:          716 kB
> 
> SwapTotal:      33554428 kB
> MemTotal:       32345672 kB
> GPUActive:      25258248 kB
> GPUReclaim:        16032 kB
> 
> SwapTotal:      33554428 kB
> MemTotal:       32345672 kB
> GPUActive:      28207188 kB
> GPUReclaim:         3684 kB
> ----
> 
> ... and I'm now not so sure the patch is working ... this after a 2nd bonnie
> run:

It doesn’t appear that it is. I can recreate what you’re seeing with
this patch alone on drm-tip. I originally coded this patch on top of a
local fix to avoid the TTM shrinker allocating higher-order folios when
reclaiming memory—this is working there. I falsely assumed it would work
on drm-tip as well. Let me ensure I have a standalone fix for GPUActive
accounting first, then apply my TTM shrinker fix on top.

Matt

> 
> ----
> GPUActive:      44357100 kB
> SwapTotal:      33554428 kB
> MemTotal:       32345672 kB
> GPUReclaim:        94864 kB
> 
> GPUActive:      44373904 kB
> SwapTotal:      33554428 kB
> MemTotal:       32345672 kB
> GPUReclaim:        94996 kB
> 
> GPUActive:      44354940 kB
> SwapTotal:      33554428 kB
> MemTotal:       32345672 kB
> GPUReclaim:        98048 kB
> 
> GPUActive:      44769340 kB
> SwapTotal:      33554428 kB
> MemTotal:       32345672 kB
> GPUReclaim:       122996 kB
> ----
> 
> -Kenny
> 
> -- 
> Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange County
> CA
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/ttm: Fix GPU MM stats during pool shrinking
  2026-05-02  4:12   ` Matthew Brost
@ 2026-05-02  4:13     ` Matthew Brost
  0 siblings, 0 replies; 5+ messages in thread
From: Matthew Brost @ 2026-05-02  4:13 UTC (permalink / raw)
  To: Kenneth Crudup
  Cc: intel-xe, dri-devel, Christian Koenig, Huang Rui, Matthew Auld,
	David Airlie

On Fri, May 01, 2026 at 09:12:31PM -0700, Matthew Brost wrote:
> On Fri, May 01, 2026 at 05:05:14PM -0700, Kenneth Crudup wrote:
> > 
> > On 5/1/26 15:30, Matthew Brost wrote:
> > 
> > > TTM pool shrinking frees pages by calling __free_pages() directly,
> > > which bypasses updates to NR_GPU_ACTIVE and leaves GPU MM accounting
> > > out of sync.
> > > 
> > > Introduce a helper, __free_pages_gpu_account(), and use it for all page
> > > frees in ttm_pool.c so GPU MM statistics are updated consistently.
> > 
> > OK, so why/how does "bonnie++" increase the GPU Memory size?
> > 
> 
> Well, it shouldn’t. What bonnie++ does is basically consume system
> memory, triggering reclaim, which in turn will evict GPU BOs that exist
> when the display is open. Thanks for pointing me to bonnie++ - this,
> plus running something like the WebGL Aquarium in Chrome, is a very nice
> test case for Xe/DRM shrinkers.
> 
> > ----
> > SwapTotal:      33554428 kB
> > MemTotal:       32345672 kB
> > GPUReclaim:       621568 kB
> > GPUActive:        453592 kB
> > 
> > SwapTotal:      33554428 kB
> > MemTotal:       32345672 kB
> > GPUActive:       5554976 kB
> > GPUReclaim:        12716 kB
> > 
> > SwapTotal:      33554428 kB
> > MemTotal:       32345672 kB
> > GPUActive:      18407272 kB
> > GPUReclaim:          884 kB
> > 
> > SwapTotal:      33554428 kB
> > MemTotal:       32345672 kB
> > GPUActive:      24022916 kB
> > GPUReclaim:          716 kB
> > 
> > SwapTotal:      33554428 kB
> > MemTotal:       32345672 kB
> > GPUActive:      25258248 kB
> > GPUReclaim:        16032 kB
> > 
> > SwapTotal:      33554428 kB
> > MemTotal:       32345672 kB
> > GPUActive:      28207188 kB
> > GPUReclaim:         3684 kB
> > ----
> > 
> > ... and I'm now not so sure the patch is working ... this after a 2nd bonnie
> > run:
> 
> It doesn’t appear that it is. I can recreate what you’re seeing with
> this patch alone on drm-tip. I originally coded this patch on top of a
> local fix to avoid the TTM shrinker allocating higher-order folios when

s/allocating/spliting

Matt

> reclaiming memory—this is working there. I falsely assumed it would work
> on drm-tip as well. Let me ensure I have a standalone fix for GPUActive
> accounting first, then apply my TTM shrinker fix on top.
> 
> Matt
> 
> > 
> > ----
> > GPUActive:      44357100 kB
> > SwapTotal:      33554428 kB
> > MemTotal:       32345672 kB
> > GPUReclaim:        94864 kB
> > 
> > GPUActive:      44373904 kB
> > SwapTotal:      33554428 kB
> > MemTotal:       32345672 kB
> > GPUReclaim:        94996 kB
> > 
> > GPUActive:      44354940 kB
> > SwapTotal:      33554428 kB
> > MemTotal:       32345672 kB
> > GPUReclaim:        98048 kB
> > 
> > GPUActive:      44769340 kB
> > SwapTotal:      33554428 kB
> > MemTotal:       32345672 kB
> > GPUReclaim:       122996 kB
> > ----
> > 
> > -Kenny
> > 
> > -- 
> > Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange County
> > CA
> > 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-05-05  7:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-01 22:30 [PATCH] drm/ttm: Fix GPU MM stats during pool shrinking Matthew Brost
2026-05-01 22:57 ` Kenneth Crudup
2026-05-02  0:05 ` Kenneth Crudup
2026-05-02  4:12   ` Matthew Brost
2026-05-02  4:13     ` Matthew Brost

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox