[PATCH 1/2] mm: add gpu active/reclaim per-node stat counters (v2)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters (v2)
@ 2025-06-19  7:20 Dave Airlie
  2025-06-19  7:20 ` [PATCH 2/2] drm/ttm: use gpu mm stats to track gpu memory allocations Dave Airlie
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Dave Airlie @ 2025-06-19  7:20 UTC (permalink / raw)
  To: dri-devel
  Cc: Dave Airlie, Christian Koenig, Matthew Brost, Johannes Weiner,
	linux-mm, Andrew Morton

From: Dave Airlie <airlied@redhat.com>

While discussing memcg intergration with gpu memory allocations,
it was pointed out that there was no numa/system counters for
GPU memory allocations.

With more integrated memory GPU server systems turning up, and
more requirements for memory tracking it seems we should start
closing the gap.

Add two counters to track GPU per-node system memory allocations.

The first is currently allocated to GPU objects, and the second
is for memory that is stored in GPU page pools that can be reclaimed,
by the shrinker.

Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>

---

v2: add more info to the documentation on this memory.

I'd like to get acks to merge this via the drm tree, if possible,

Dave.
---
 Documentation/filesystems/proc.rst | 8 ++++++++
 drivers/base/node.c                | 5 +++++
 fs/proc/meminfo.c                  | 6 ++++++
 include/linux/mmzone.h             | 2 ++
 mm/show_mem.c                      | 9 +++++++--
 mm/vmstat.c                        | 2 ++
 6 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 5236cb52e357..7cc5a9185190 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -1095,6 +1095,8 @@ Example output. You may not have all of these fields.
     CmaFree:               0 kB
     Unaccepted:            0 kB
     Balloon:               0 kB
+    GPUActive:             0 kB
+    GPUReclaim:            0 kB
     HugePages_Total:       0
     HugePages_Free:        0
     HugePages_Rsvd:        0
@@ -1273,6 +1275,12 @@ Unaccepted
               Memory that has not been accepted by the guest
 Balloon
               Memory returned to Host by VM Balloon Drivers
+GPUActive
+              System memory allocated to active GPU objects
+GPUReclaim
+              System memory stored in GPU pools for reuse. This memory is not
+              counted in GPUActive. It is shrinker reclaimable memory kept in a reuse
+              pool because it has non-standard page table attributes, like WC or UC.
 HugePages_Total, HugePages_Free, HugePages_Rsvd, HugePages_Surp, Hugepagesize, Hugetlb
               See Documentation/admin-guide/mm/hugetlbpage.rst.
 DirectMap4k, DirectMap2M, DirectMap1G
diff --git a/drivers/base/node.c b/drivers/base/node.c
index c19094481630..64406862314b 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -463,6 +463,8 @@ static ssize_t node_read_meminfo(struct device *dev,
 #ifdef CONFIG_UNACCEPTED_MEMORY
 			     "Node %d Unaccepted:     %8lu kB\n"
 #endif
+			     "Node %d GPUActive:      %8lu kB\n"
+			     "Node %d GPUReclaim:     %8lu kB\n"
 			     ,
 			     nid, K(node_page_state(pgdat, NR_FILE_DIRTY)),
 			     nid, K(node_page_state(pgdat, NR_WRITEBACK)),
@@ -496,6 +498,9 @@ static ssize_t node_read_meminfo(struct device *dev,
 			     ,
 			     nid, K(sum_zone_node_page_state(nid, NR_UNACCEPTED))
 #endif
+			     ,
+			     nid, K(node_page_state(pgdat, NR_GPU_ACTIVE)),
+			     nid, K(node_page_state(pgdat, NR_GPU_RECLAIM))
 			    );
 	len += hugetlb_report_node_meminfo(buf, len, nid);
 	return len;
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index bc2bc60c36cc..334948744e55 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -164,6 +164,12 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 	show_val_kb(m, "Balloon:        ",
 		    global_node_page_state(NR_BALLOON_PAGES));
 
+	show_val_kb(m, "GPUActive:      ",
+		    global_node_page_state(NR_GPU_ACTIVE));
+
+	show_val_kb(m, "GPUReclaim:     ",
+		    global_node_page_state(NR_GPU_RECLAIM));
+
 	hugetlb_report_meminfo(m);
 
 	arch_report_meminfo(m);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 283913d42d7b..458a3465dd8f 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -241,6 +241,8 @@ enum node_stat_item {
 	NR_HUGETLB,
 #endif
 	NR_BALLOON_PAGES,
+	NR_GPU_ACTIVE,          /* GPU pages assigned to an object */
+	NR_GPU_RECLAIM,         /* GPU pages in shrinkable pools */
 	NR_VM_NODE_STAT_ITEMS
 };
 
diff --git a/mm/show_mem.c b/mm/show_mem.c
index 0cf8bf5d832d..072d33a50148 100644
--- a/mm/show_mem.c
+++ b/mm/show_mem.c
@@ -255,7 +255,9 @@ static void show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_z
 			" sec_pagetables:%lukB"
 			" all_unreclaimable? %s"
 			" Balloon:%lukB"
-			"\n",
+		        " gpu_active:%lukB"
+		        " gpu_reclaim:%lukB"
+		        "\n",
 			pgdat->node_id,
 			K(node_page_state(pgdat, NR_ACTIVE_ANON)),
 			K(node_page_state(pgdat, NR_INACTIVE_ANON)),
@@ -281,7 +283,10 @@ static void show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_z
 			K(node_page_state(pgdat, NR_PAGETABLE)),
 			K(node_page_state(pgdat, NR_SECONDARY_PAGETABLE)),
 			str_yes_no(pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES),
-			K(node_page_state(pgdat, NR_BALLOON_PAGES)));
+		        K(node_page_state(pgdat, NR_BALLOON_PAGES)),
+		        K(node_page_state(pgdat, NR_GPU_ACTIVE)),
+			K(node_page_state(pgdat, NR_GPU_RECLAIM)));
+
 	}
 
 	for_each_populated_zone(zone) {
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 429ae5339bfe..25a74cf29473 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1281,6 +1281,8 @@ const char * const vmstat_text[] = {
 	"nr_hugetlb",
 #endif
 	"nr_balloon_pages",
+	"nr_gpu_active",
+	"nr_gpu_reclaim",
 	/* system-wide enum vm_stat_item counters */
 	"nr_dirty_threshold",
 	"nr_dirty_background_threshold",
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] drm/ttm: use gpu mm stats to track gpu memory allocations.
  2025-06-19  7:20 [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters (v2) Dave Airlie
@ 2025-06-19  7:20 ` Dave Airlie
  2025-06-19 22:37 ` [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters (v2) Andrew Morton
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Dave Airlie @ 2025-06-19  7:20 UTC (permalink / raw)
  To: dri-devel
  Cc: Dave Airlie, Christian Koenig, Matthew Brost, Johannes Weiner,
	linux-mm, Andrew Morton

From: Dave Airlie <airlied@redhat.com>

This uses the newly introduced per-node gpu tracking stats,
to track GPU memory allocated via TTM and reclaimable memory in
the TTM page pools.

These stats will be useful later for system information and
later when mem cgroups are integrated.

Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index c2ea865be657..ccc3b9a13e9e 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -130,6 +130,16 @@ static struct list_head shrinker_list;
 static struct shrinker *mm_shrinker;
 static DECLARE_RWSEM(pool_shrink_rwsem);
 
+/* helper to get a current valid node id from a pool */
+static int ttm_pool_nid(struct ttm_pool *pool) {
+	int nid = NUMA_NO_NODE;
+	if (pool)
+		nid = pool->nid;
+	if (nid == NUMA_NO_NODE)
+		nid = numa_node_id();
+	return nid;
+}
+
 /* Allocate pages of size 1 << order with the given gfp_flags */
 static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 					unsigned int order)
@@ -149,8 +159,10 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 
 	if (!pool->use_dma_alloc) {
 		p = alloc_pages_node(pool->nid, gfp_flags, order);
-		if (p)
+		if (p) {
 			p->private = order;
+			mod_node_page_state(NODE_DATA(ttm_pool_nid(pool)), NR_GPU_ACTIVE, (1 << order));
+		}
 		return p;
 	}
 
@@ -201,6 +213,7 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
 
 	if (!pool || !pool->use_dma_alloc) {
 		__free_pages(p, order);
+		mod_node_page_state(NODE_DATA(ttm_pool_nid(pool)), NR_GPU_ACTIVE, -(1 << order));
 		return;
 	}
 
@@ -275,6 +288,7 @@ static void ttm_pool_unmap(struct ttm_pool *pool, dma_addr_t dma_addr,
 static void ttm_pool_type_give(struct ttm_pool_type *pt, struct page *p)
 {
 	unsigned int i, num_pages = 1 << pt->order;
+	int nid = ttm_pool_nid(pt->pool);
 
 	for (i = 0; i < num_pages; ++i) {
 		if (PageHighMem(p))
@@ -287,17 +301,23 @@ static void ttm_pool_type_give(struct ttm_pool_type *pt, struct page *p)
 	list_add(&p->lru, &pt->pages);
 	spin_unlock(&pt->lock);
 	atomic_long_add(1 << pt->order, &allocated_pages);
+
+	mod_node_page_state(NODE_DATA(nid), NR_GPU_ACTIVE, -(1 << pt->order));
+	mod_node_page_state(NODE_DATA(nid), NR_GPU_RECLAIM, (1 << pt->order));
 }
 
 /* Take pages from a specific pool_type, return NULL when nothing available */
 static struct page *ttm_pool_type_take(struct ttm_pool_type *pt)
 {
 	struct page *p;
+	int nid = ttm_pool_nid(pt->pool);
 
 	spin_lock(&pt->lock);
 	p = list_first_entry_or_null(&pt->pages, typeof(*p), lru);
 	if (p) {
 		atomic_long_sub(1 << pt->order, &allocated_pages);
+		mod_node_page_state(NODE_DATA(nid), NR_GPU_ACTIVE, (1 << pt->order));
+		mod_node_page_state(NODE_DATA(nid), NR_GPU_RECLAIM, -(1 << pt->order));
 		list_del(&p->lru);
 	}
 	spin_unlock(&pt->lock);
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters (v2)
  2025-06-19  7:20 [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters (v2) Dave Airlie
  2025-06-19  7:20 ` [PATCH 2/2] drm/ttm: use gpu mm stats to track gpu memory allocations Dave Airlie
@ 2025-06-19 22:37 ` Andrew Morton
  2025-06-20 17:57 ` Zi Yan
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2025-06-19 22:37 UTC (permalink / raw)
  To: Dave Airlie
  Cc: dri-devel, Dave Airlie, Christian Koenig, Matthew Brost,
	Johannes Weiner, linux-mm

On Thu, 19 Jun 2025 17:20:25 +1000 Dave Airlie <airlied@gmail.com> wrote:

> 
> While discussing memcg intergration with gpu memory allocations,
> it was pointed out that there was no numa/system counters for
> GPU memory allocations.
> 
> With more integrated memory GPU server systems turning up, and
> more requirements for memory tracking it seems we should start
> closing the gap.
> 
> Add two counters to track GPU per-node system memory allocations.
> 
> The first is currently allocated to GPU objects, and the second
> is for memory that is stored in GPU page pools that can be reclaimed,
> by the shrinker.
> 

Acked-by: Andrew Morton <akpm@linux-foundation.org>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters (v2)
  2025-06-19  7:20 [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters (v2) Dave Airlie
  2025-06-19  7:20 ` [PATCH 2/2] drm/ttm: use gpu mm stats to track gpu memory allocations Dave Airlie
  2025-06-19 22:37 ` [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters (v2) Andrew Morton
@ 2025-06-20 17:57 ` Zi Yan
  2025-06-20 18:51 ` Shakeel Butt
  2025-06-23  8:54 ` Christian König
  4 siblings, 0 replies; 10+ messages in thread
From: Zi Yan @ 2025-06-20 17:57 UTC (permalink / raw)
  To: Dave Airlie
  Cc: dri-devel, Dave Airlie, Christian Koenig, Matthew Brost,
	Johannes Weiner, linux-mm, Andrew Morton

On 19 Jun 2025, at 3:20, Dave Airlie wrote:

> From: Dave Airlie <airlied@redhat.com>
>
> While discussing memcg intergration with gpu memory allocations,
> it was pointed out that there was no numa/system counters for
> GPU memory allocations.
>
> With more integrated memory GPU server systems turning up, and
> more requirements for memory tracking it seems we should start
> closing the gap.
>
> Add two counters to track GPU per-node system memory allocations.
>
> The first is currently allocated to GPU objects, and the second
> is for memory that is stored in GPU page pools that can be reclaimed,
> by the shrinker.
>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: linux-mm@kvack.org
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Dave Airlie <airlied@redhat.com>
>
> ---
>
> v2: add more info to the documentation on this memory.
>
> I'd like to get acks to merge this via the drm tree, if possible,
>
> Dave.
> ---
>  Documentation/filesystems/proc.rst | 8 ++++++++
>  drivers/base/node.c                | 5 +++++
>  fs/proc/meminfo.c                  | 6 ++++++
>  include/linux/mmzone.h             | 2 ++
>  mm/show_mem.c                      | 9 +++++++--
>  mm/vmstat.c                        | 2 ++
>  6 files changed, 30 insertions(+), 2 deletions(-)
>

<snip>

> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 283913d42d7b..458a3465dd8f 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -241,6 +241,8 @@ enum node_stat_item {
>  	NR_HUGETLB,
>  #endif
>  	NR_BALLOON_PAGES,
> +	NR_GPU_ACTIVE,          /* GPU pages assigned to an object */
> +	NR_GPU_RECLAIM,         /* GPU pages in shrinkable pools */

"GPU pages" seems confusing. These are not pages from GPU memory, right?
Would the comments below sound better?

/* Pages assigned to a GPU object */
/* Pages in shrinkable GPU pools */

Otherwise, Acked-by: Zi Yan <ziy@nvidia.com>

--
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters (v2)
  2025-06-19  7:20 [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters (v2) Dave Airlie
                   ` (2 preceding siblings ...)
  2025-06-20 17:57 ` Zi Yan
@ 2025-06-20 18:51 ` Shakeel Butt
  2025-06-23  8:54 ` Christian König
  4 siblings, 0 replies; 10+ messages in thread
From: Shakeel Butt @ 2025-06-20 18:51 UTC (permalink / raw)
  To: Dave Airlie
  Cc: dri-devel, Dave Airlie, Christian Koenig, Matthew Brost,
	Johannes Weiner, linux-mm, Andrew Morton

On Thu, Jun 19, 2025 at 05:20:25PM +1000, Dave Airlie wrote:
> From: Dave Airlie <airlied@redhat.com>
> 
> While discussing memcg intergration with gpu memory allocations,
> it was pointed out that there was no numa/system counters for
> GPU memory allocations.
> 
> With more integrated memory GPU server systems turning up, and
> more requirements for memory tracking it seems we should start
> closing the gap.
> 
> Add two counters to track GPU per-node system memory allocations.
> 
> The first is currently allocated to GPU objects, and the second
> is for memory that is stored in GPU page pools that can be reclaimed,
> by the shrinker.
> 
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: linux-mm@kvack.org
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Dave Airlie <airlied@redhat.com>

With Zi's suggestion, you can add:

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters (v2)
  2025-06-19  7:20 [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters (v2) Dave Airlie
                   ` (3 preceding siblings ...)
  2025-06-20 18:51 ` Shakeel Butt
@ 2025-06-23  8:54 ` Christian König
  2025-06-24  1:12   ` David Airlie
  4 siblings, 1 reply; 10+ messages in thread
From: Christian König @ 2025-06-23  8:54 UTC (permalink / raw)
  To: Dave Airlie, dri-devel
  Cc: Dave Airlie, Matthew Brost, Johannes Weiner, linux-mm,
	Andrew Morton

On 6/19/25 09:20, Dave Airlie wrote:
> From: Dave Airlie <airlied@redhat.com>
> 
> While discussing memcg intergration with gpu memory allocations,
> it was pointed out that there was no numa/system counters for
> GPU memory allocations.
> 
> With more integrated memory GPU server systems turning up, and
> more requirements for memory tracking it seems we should start
> closing the gap.
> 
> Add two counters to track GPU per-node system memory allocations.
> 
> The first is currently allocated to GPU objects, and the second
> is for memory that is stored in GPU page pools that can be reclaimed,
> by the shrinker.
> 
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: linux-mm@kvack.org
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Dave Airlie <airlied@redhat.com>
> 
> ---
> 
> v2: add more info to the documentation on this memory.
> 
> I'd like to get acks to merge this via the drm tree, if possible,
> 
> Dave.
> ---
>  Documentation/filesystems/proc.rst | 8 ++++++++
>  drivers/base/node.c                | 5 +++++
>  fs/proc/meminfo.c                  | 6 ++++++
>  include/linux/mmzone.h             | 2 ++
>  mm/show_mem.c                      | 9 +++++++--
>  mm/vmstat.c                        | 2 ++
>  6 files changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> index 5236cb52e357..7cc5a9185190 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -1095,6 +1095,8 @@ Example output. You may not have all of these fields.
>      CmaFree:               0 kB
>      Unaccepted:            0 kB
>      Balloon:               0 kB
> +    GPUActive:             0 kB
> +    GPUReclaim:            0 kB

Active certainly makes sense, but I think we should rather disable the pool on newer CPUs than adding reclaimable memory here.

Regards,
Christian.

>      HugePages_Total:       0
>      HugePages_Free:        0
>      HugePages_Rsvd:        0
> @@ -1273,6 +1275,12 @@ Unaccepted
>                Memory that has not been accepted by the guest
>  Balloon
>                Memory returned to Host by VM Balloon Drivers
> +GPUActive
> +              System memory allocated to active GPU objects
> +GPUReclaim
> +              System memory stored in GPU pools for reuse. This memory is not
> +              counted in GPUActive. It is shrinker reclaimable memory kept in a reuse
> +              pool because it has non-standard page table attributes, like WC or UC.
>  HugePages_Total, HugePages_Free, HugePages_Rsvd, HugePages_Surp, Hugepagesize, Hugetlb
>                See Documentation/admin-guide/mm/hugetlbpage.rst.
>  DirectMap4k, DirectMap2M, DirectMap1G
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index c19094481630..64406862314b 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -463,6 +463,8 @@ static ssize_t node_read_meminfo(struct device *dev,
>  #ifdef CONFIG_UNACCEPTED_MEMORY
>  			     "Node %d Unaccepted:     %8lu kB\n"
>  #endif
> +			     "Node %d GPUActive:      %8lu kB\n"
> +			     "Node %d GPUReclaim:     %8lu kB\n"
>  			     ,
>  			     nid, K(node_page_state(pgdat, NR_FILE_DIRTY)),
>  			     nid, K(node_page_state(pgdat, NR_WRITEBACK)),
> @@ -496,6 +498,9 @@ static ssize_t node_read_meminfo(struct device *dev,
>  			     ,
>  			     nid, K(sum_zone_node_page_state(nid, NR_UNACCEPTED))
>  #endif
> +			     ,
> +			     nid, K(node_page_state(pgdat, NR_GPU_ACTIVE)),
> +			     nid, K(node_page_state(pgdat, NR_GPU_RECLAIM))
>  			    );
>  	len += hugetlb_report_node_meminfo(buf, len, nid);
>  	return len;
> diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> index bc2bc60c36cc..334948744e55 100644
> --- a/fs/proc/meminfo.c
> +++ b/fs/proc/meminfo.c
> @@ -164,6 +164,12 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
>  	show_val_kb(m, "Balloon:        ",
>  		    global_node_page_state(NR_BALLOON_PAGES));
>  
> +	show_val_kb(m, "GPUActive:      ",
> +		    global_node_page_state(NR_GPU_ACTIVE));
> +
> +	show_val_kb(m, "GPUReclaim:     ",
> +		    global_node_page_state(NR_GPU_RECLAIM));
> +
>  	hugetlb_report_meminfo(m);
>  
>  	arch_report_meminfo(m);
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 283913d42d7b..458a3465dd8f 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -241,6 +241,8 @@ enum node_stat_item {
>  	NR_HUGETLB,
>  #endif
>  	NR_BALLOON_PAGES,
> +	NR_GPU_ACTIVE,          /* GPU pages assigned to an object */
> +	NR_GPU_RECLAIM,         /* GPU pages in shrinkable pools */
>  	NR_VM_NODE_STAT_ITEMS
>  };
>  
> diff --git a/mm/show_mem.c b/mm/show_mem.c
> index 0cf8bf5d832d..072d33a50148 100644
> --- a/mm/show_mem.c
> +++ b/mm/show_mem.c
> @@ -255,7 +255,9 @@ static void show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_z
>  			" sec_pagetables:%lukB"
>  			" all_unreclaimable? %s"
>  			" Balloon:%lukB"
> -			"\n",
> +		        " gpu_active:%lukB"
> +		        " gpu_reclaim:%lukB"
> +		        "\n",
>  			pgdat->node_id,
>  			K(node_page_state(pgdat, NR_ACTIVE_ANON)),
>  			K(node_page_state(pgdat, NR_INACTIVE_ANON)),
> @@ -281,7 +283,10 @@ static void show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_z
>  			K(node_page_state(pgdat, NR_PAGETABLE)),
>  			K(node_page_state(pgdat, NR_SECONDARY_PAGETABLE)),
>  			str_yes_no(pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES),
> -			K(node_page_state(pgdat, NR_BALLOON_PAGES)));
> +		        K(node_page_state(pgdat, NR_BALLOON_PAGES)),
> +		        K(node_page_state(pgdat, NR_GPU_ACTIVE)),
> +			K(node_page_state(pgdat, NR_GPU_RECLAIM)));
> +
>  	}
>  
>  	for_each_populated_zone(zone) {
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 429ae5339bfe..25a74cf29473 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1281,6 +1281,8 @@ const char * const vmstat_text[] = {
>  	"nr_hugetlb",
>  #endif
>  	"nr_balloon_pages",
> +	"nr_gpu_active",
> +	"nr_gpu_reclaim",
>  	/* system-wide enum vm_stat_item counters */
>  	"nr_dirty_threshold",
>  	"nr_dirty_background_threshold",



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters (v2)
  2025-06-23  8:54 ` Christian König
@ 2025-06-24  1:12   ` David Airlie
  2025-06-25 11:55     ` Christian König
  0 siblings, 1 reply; 10+ messages in thread
From: David Airlie @ 2025-06-24  1:12 UTC (permalink / raw)
  To: Christian König
  Cc: Dave Airlie, dri-devel, Matthew Brost, Johannes Weiner, linux-mm,
	Andrew Morton

On Mon, Jun 23, 2025 at 6:54 PM Christian König
<christian.koenig@amd.com> wrote:
>
> On 6/19/25 09:20, Dave Airlie wrote:
> > From: Dave Airlie <airlied@redhat.com>
> >
> > While discussing memcg intergration with gpu memory allocations,
> > it was pointed out that there was no numa/system counters for
> > GPU memory allocations.
> >
> > With more integrated memory GPU server systems turning up, and
> > more requirements for memory tracking it seems we should start
> > closing the gap.
> >
> > Add two counters to track GPU per-node system memory allocations.
> >
> > The first is currently allocated to GPU objects, and the second
> > is for memory that is stored in GPU page pools that can be reclaimed,
> > by the shrinker.
> >
> > Cc: Christian Koenig <christian.koenig@amd.com>
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: linux-mm@kvack.org
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Signed-off-by: Dave Airlie <airlied@redhat.com>
> >
> > ---
> >
> > v2: add more info to the documentation on this memory.
> >
> > I'd like to get acks to merge this via the drm tree, if possible,
> >
> > Dave.
> > ---
> >  Documentation/filesystems/proc.rst | 8 ++++++++
> >  drivers/base/node.c                | 5 +++++
> >  fs/proc/meminfo.c                  | 6 ++++++
> >  include/linux/mmzone.h             | 2 ++
> >  mm/show_mem.c                      | 9 +++++++--
> >  mm/vmstat.c                        | 2 ++
> >  6 files changed, 30 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> > index 5236cb52e357..7cc5a9185190 100644
> > --- a/Documentation/filesystems/proc.rst
> > +++ b/Documentation/filesystems/proc.rst
> > @@ -1095,6 +1095,8 @@ Example output. You may not have all of these fields.
> >      CmaFree:               0 kB
> >      Unaccepted:            0 kB
> >      Balloon:               0 kB
> > +    GPUActive:             0 kB
> > +    GPUReclaim:            0 kB
>
> Active certainly makes sense, but I think we should rather disable the pool on newer CPUs than adding reclaimable memory here.

I'm not just concerned about newer platforms though, even on Fedora 42
on my test ryzen1+7900xt machine, with a desktop session running

nr_gpu_active 7473
nr_gpu_reclaim 6656

It's not an insignificant amount of memory. I also think if we get to
some sort of discardable GTT objects with a shrinker they should
probably be accounted in reclaim.

Dave.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters (v2)
  2025-06-24  1:12   ` David Airlie
@ 2025-06-25 11:55     ` Christian König
  2025-06-25 19:16       ` David Airlie
  0 siblings, 1 reply; 10+ messages in thread
From: Christian König @ 2025-06-25 11:55 UTC (permalink / raw)
  To: David Airlie
  Cc: Dave Airlie, dri-devel, Matthew Brost, Johannes Weiner, linux-mm,
	Andrew Morton

On 24.06.25 03:12, David Airlie wrote:
> On Mon, Jun 23, 2025 at 6:54 PM Christian König
> <christian.koenig@amd.com> wrote:
>>
>> On 6/19/25 09:20, Dave Airlie wrote:
>>> From: Dave Airlie <airlied@redhat.com>
>>>
>>> While discussing memcg intergration with gpu memory allocations,
>>> it was pointed out that there was no numa/system counters for
>>> GPU memory allocations.
>>>
>>> With more integrated memory GPU server systems turning up, and
>>> more requirements for memory tracking it seems we should start
>>> closing the gap.
>>>
>>> Add two counters to track GPU per-node system memory allocations.
>>>
>>> The first is currently allocated to GPU objects, and the second
>>> is for memory that is stored in GPU page pools that can be reclaimed,
>>> by the shrinker.
>>>
>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>>> Cc: linux-mm@kvack.org
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Signed-off-by: Dave Airlie <airlied@redhat.com>
>>>
>>> ---
>>>
>>> v2: add more info to the documentation on this memory.
>>>
>>> I'd like to get acks to merge this via the drm tree, if possible,
>>>
>>> Dave.
>>> ---
>>>  Documentation/filesystems/proc.rst | 8 ++++++++
>>>  drivers/base/node.c                | 5 +++++
>>>  fs/proc/meminfo.c                  | 6 ++++++
>>>  include/linux/mmzone.h             | 2 ++
>>>  mm/show_mem.c                      | 9 +++++++--
>>>  mm/vmstat.c                        | 2 ++
>>>  6 files changed, 30 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
>>> index 5236cb52e357..7cc5a9185190 100644
>>> --- a/Documentation/filesystems/proc.rst
>>> +++ b/Documentation/filesystems/proc.rst
>>> @@ -1095,6 +1095,8 @@ Example output. You may not have all of these fields.
>>>      CmaFree:               0 kB
>>>      Unaccepted:            0 kB
>>>      Balloon:               0 kB
>>> +    GPUActive:             0 kB
>>> +    GPUReclaim:            0 kB
>>
>> Active certainly makes sense, but I think we should rather disable the pool on newer CPUs than adding reclaimable memory here.
> 
> I'm not just concerned about newer platforms though, even on Fedora 42
> on my test ryzen1+7900xt machine, with a desktop session running
> 
> nr_gpu_active 7473
> nr_gpu_reclaim 6656
> 
> It's not an insignificant amount of memory.

That was not what I meant, that you have quite a bunch of memory allocated to the GPU is correct.

But the problem is more that we used the pool for way to many thinks which is actually not necessary.

But granted this is orthogonal to that patch here.

> I also think if we get to
> some sort of discardable GTT objects with a shrinker they should
> probably be accounted in reclaim.

The problem is that this is extremely driver specific.

On amdgpu we have some temporary buffers which can be reclaimed immediately, but the really big chunk is for example what XE does with it's shrinker.

See Thomas TTM patches from a few month ago. If memory is active or reclaimable does not depend on how it is allocated, but on how it is used.

So the accounting need to be at the driver level if you really want to distinct between the two states.

Christian.

> 
> Dave.
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters (v2)
  2025-06-25 11:55     ` Christian König
@ 2025-06-25 19:16       ` David Airlie
  2025-06-26  9:00         ` Christian König
  0 siblings, 1 reply; 10+ messages in thread
From: David Airlie @ 2025-06-25 19:16 UTC (permalink / raw)
  To: Christian König
  Cc: Dave Airlie, dri-devel, Matthew Brost, Johannes Weiner, linux-mm,
	Andrew Morton

On Wed, Jun 25, 2025 at 9:55 PM Christian König
<christian.koenig@amd.com> wrote:
>
> On 24.06.25 03:12, David Airlie wrote:
> > On Mon, Jun 23, 2025 at 6:54 PM Christian König
> > <christian.koenig@amd.com> wrote:
> >>
> >> On 6/19/25 09:20, Dave Airlie wrote:
> >>> From: Dave Airlie <airlied@redhat.com>
> >>>
> >>> While discussing memcg intergration with gpu memory allocations,
> >>> it was pointed out that there was no numa/system counters for
> >>> GPU memory allocations.
> >>>
> >>> With more integrated memory GPU server systems turning up, and
> >>> more requirements for memory tracking it seems we should start
> >>> closing the gap.
> >>>
> >>> Add two counters to track GPU per-node system memory allocations.
> >>>
> >>> The first is currently allocated to GPU objects, and the second
> >>> is for memory that is stored in GPU page pools that can be reclaimed,
> >>> by the shrinker.
> >>>
> >>> Cc: Christian Koenig <christian.koenig@amd.com>
> >>> Cc: Matthew Brost <matthew.brost@intel.com>
> >>> Cc: Johannes Weiner <hannes@cmpxchg.org>
> >>> Cc: linux-mm@kvack.org
> >>> Cc: Andrew Morton <akpm@linux-foundation.org>
> >>> Signed-off-by: Dave Airlie <airlied@redhat.com>
> >>>
> >>> ---
> >>>
> >>> v2: add more info to the documentation on this memory.
> >>>
> >>> I'd like to get acks to merge this via the drm tree, if possible,
> >>>
> >>> Dave.
> >>> ---
> >>>  Documentation/filesystems/proc.rst | 8 ++++++++
> >>>  drivers/base/node.c                | 5 +++++
> >>>  fs/proc/meminfo.c                  | 6 ++++++
> >>>  include/linux/mmzone.h             | 2 ++
> >>>  mm/show_mem.c                      | 9 +++++++--
> >>>  mm/vmstat.c                        | 2 ++
> >>>  6 files changed, 30 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> >>> index 5236cb52e357..7cc5a9185190 100644
> >>> --- a/Documentation/filesystems/proc.rst
> >>> +++ b/Documentation/filesystems/proc.rst
> >>> @@ -1095,6 +1095,8 @@ Example output. You may not have all of these fields.
> >>>      CmaFree:               0 kB
> >>>      Unaccepted:            0 kB
> >>>      Balloon:               0 kB
> >>> +    GPUActive:             0 kB
> >>> +    GPUReclaim:            0 kB
> >>
> >> Active certainly makes sense, but I think we should rather disable the pool on newer CPUs than adding reclaimable memory here.
> >
> > I'm not just concerned about newer platforms though, even on Fedora 42
> > on my test ryzen1+7900xt machine, with a desktop session running
> >
> > nr_gpu_active 7473
> > nr_gpu_reclaim 6656
> >
> > It's not an insignificant amount of memory.
>
> That was not what I meant, that you have quite a bunch of memory allocated to the GPU is correct.
>
> But the problem is more that we used the pool for way to many thinks which is actually not necessary.
>
> But granted this is orthogonal to that patch here.

At least here this is all WC allocations, probably from userspace, so
it feels like we are using it correctly, since we stopped pooling
cached pages.

>
> > I also think if we get to
> > some sort of discardable GTT objects with a shrinker they should
> > probably be accounted in reclaim.
>
> The problem is that this is extremely driver specific.
>
> On amdgpu we have some temporary buffers which can be reclaimed immediately, but the really big chunk is for example what XE does with it's shrinker.
>
> See Thomas TTM patches from a few month ago. If memory is active or reclaimable does not depend on how it is allocated, but on how it is used.
>
> So the accounting need to be at the driver level if you really want to distinct between the two states.

How the counters are used is fine to be done at the driver level on
top of this, though I think for discardable there is grounds for
ttm_tt having a discardable flag once we see a couple of drivers using
it, and then maybe the counters could be moved, but it's also fine to
use these counters in drivers outside TTM if they are done
appropriately, just so we can see the memory allocations as part of
the big picture.

Dave.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters (v2)
  2025-06-25 19:16       ` David Airlie
@ 2025-06-26  9:00         ` Christian König
  0 siblings, 0 replies; 10+ messages in thread
From: Christian König @ 2025-06-26  9:00 UTC (permalink / raw)
  To: David Airlie
  Cc: Dave Airlie, dri-devel, Matthew Brost, Johannes Weiner, linux-mm,
	Andrew Morton

On 25.06.25 21:16, David Airlie wrote:
>>>>> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
>>>>> index 5236cb52e357..7cc5a9185190 100644
>>>>> --- a/Documentation/filesystems/proc.rst
>>>>> +++ b/Documentation/filesystems/proc.rst
>>>>> @@ -1095,6 +1095,8 @@ Example output. You may not have all of these fields.
>>>>>      CmaFree:               0 kB
>>>>>      Unaccepted:            0 kB
>>>>>      Balloon:               0 kB
>>>>> +    GPUActive:             0 kB
>>>>> +    GPUReclaim:            0 kB
>>>>
>>>> Active certainly makes sense, but I think we should rather disable the pool on newer CPUs than adding reclaimable memory here.
>>>
>>> I'm not just concerned about newer platforms though, even on Fedora 42
>>> on my test ryzen1+7900xt machine, with a desktop session running
>>>
>>> nr_gpu_active 7473
>>> nr_gpu_reclaim 6656
>>>
>>> It's not an insignificant amount of memory.
>>
>> That was not what I meant, that you have quite a bunch of memory allocated to the GPU is correct.
>>
>> But the problem is more that we used the pool for way to many thinks which is actually not necessary.
>>
>> But granted this is orthogonal to that patch here.
> 
> At least here this is all WC allocations, probably from userspace, so
> it feels like we are using it correctly, since we stopped pooling
> cached pages.

Well what the kernel does is technically correct, it's just that userspace wants to use WC because ~15 years ago that was state of the art.

On today's HW using WC has not the benefit it used to have, but the kernel still has to deal with all the complexity and overhead....

Just ignoring the WC flag when userspace sets it and only setting it when the kernel finds that it is necessary would still be technically correct.

>>> I also think if we get to
>>> some sort of discardable GTT objects with a shrinker they should
>>> probably be accounted in reclaim.
>>
>> The problem is that this is extremely driver specific.
>>
>> On amdgpu we have some temporary buffers which can be reclaimed immediately, but the really big chunk is for example what XE does with it's shrinker.
>>
>> See Thomas TTM patches from a few month ago. If memory is active or reclaimable does not depend on how it is allocated, but on how it is used.
>>
>> So the accounting need to be at the driver level if you really want to distinct between the two states.
> 
> How the counters are used is fine to be done at the driver level on
> top of this

But then you have double accounting. E.g. the allocation backend says that this memory is GpuActive and the driver says that it is GpuReclaim.

Maybe making GpuReclaim a subset of GpuActive isn't such a bad idea? Alternatively the driver could decrease GpuActive in favor of increasing GpuReclaim when it has a separate shrinker.

>, though I think for discardable there is grounds for
> ttm_tt having a discardable flag once we see a couple of drivers using
> it, and then maybe the counters could be moved,

Well it is certainly a good idea to have a discard able flag in TTM, but it isn't used that often and the last time this was brought up it was abandoned as to much work for to little gain.

> but it's also fine to
> use these counters in drivers outside TTM if they are done
> appropriately, just so we can see the memory allocations as part of
> the big picture.

Yeah, that is what I'm worrying about. In drivers we need to be super careful with that to not come up with incorrect numbers.

Christian.

> 
> Dave.
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-06-26  9:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-19  7:20 [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters (v2) Dave Airlie
2025-06-19  7:20 ` [PATCH 2/2] drm/ttm: use gpu mm stats to track gpu memory allocations Dave Airlie
2025-06-19 22:37 ` [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters (v2) Andrew Morton
2025-06-20 17:57 ` Zi Yan
2025-06-20 18:51 ` Shakeel Butt
2025-06-23  8:54 ` Christian König
2025-06-24  1:12   ` David Airlie
2025-06-25 11:55     ` Christian König
2025-06-25 19:16       ` David Airlie
2025-06-26  9:00         ` Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).