[PATCH 1/2] mm: add gpu active/reclaim per-node stat counters

dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters
@ 2025-06-18  4:06 Dave Airlie
  2025-06-18  4:06 ` [PATCH 2/2] drm/ttm: use gpu mm stats to track gpu memory allocations Dave Airlie
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Dave Airlie @ 2025-06-18  4:06 UTC (permalink / raw)
  To: dri-devel
  Cc: Dave Airlie, Christian Koenig, Matthew Brost, Johannes Weiner,
	linux-mm, Andrew Morton

From: Dave Airlie <airlied@redhat.com>

While discussing memcg intergration with gpu memory allocations,
it was pointed out that there was no numa/system counters for
GPU memory allocations.

With more integrated memory GPU server systems turning up, and
more requirements for memory tracking it seems we should start
closing the gap.

Add two counters to track GPU per-node system memory allocations.

The first is currently allocated to GPU objects, and the second
is for memory that is stored in GPU page pools that can be reclaimed,
by the shrinker.

Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>

---

I'd like to get acks to merge this via the drm tree, if possible,

Dave.
---
 Documentation/filesystems/proc.rst | 6 ++++++
 drivers/base/node.c                | 5 +++++
 fs/proc/meminfo.c                  | 6 ++++++
 include/linux/mmzone.h             | 2 ++
 mm/show_mem.c                      | 9 +++++++--
 mm/vmstat.c                        | 2 ++
 6 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 5236cb52e357..45f61a19a790 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -1095,6 +1095,8 @@ Example output. You may not have all of these fields.
     CmaFree:               0 kB
     Unaccepted:            0 kB
     Balloon:               0 kB
+    GPUActive:             0 kB
+    GPUReclaim:            0 kB
     HugePages_Total:       0
     HugePages_Free:        0
     HugePages_Rsvd:        0
@@ -1273,6 +1275,10 @@ Unaccepted
               Memory that has not been accepted by the guest
 Balloon
               Memory returned to Host by VM Balloon Drivers
+GPUActive
+              Memory allocated to GPU objects
+GPUReclaim
+              Memory in GPU allocator pools that is reclaimable
 HugePages_Total, HugePages_Free, HugePages_Rsvd, HugePages_Surp, Hugepagesize, Hugetlb
               See Documentation/admin-guide/mm/hugetlbpage.rst.
 DirectMap4k, DirectMap2M, DirectMap1G
diff --git a/drivers/base/node.c b/drivers/base/node.c
index c19094481630..64406862314b 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -463,6 +463,8 @@ static ssize_t node_read_meminfo(struct device *dev,
 #ifdef CONFIG_UNACCEPTED_MEMORY
 			     "Node %d Unaccepted:     %8lu kB\n"
 #endif
+			     "Node %d GPUActive:      %8lu kB\n"
+			     "Node %d GPUReclaim:     %8lu kB\n"
 			     ,
 			     nid, K(node_page_state(pgdat, NR_FILE_DIRTY)),
 			     nid, K(node_page_state(pgdat, NR_WRITEBACK)),
@@ -496,6 +498,9 @@ static ssize_t node_read_meminfo(struct device *dev,
 			     ,
 			     nid, K(sum_zone_node_page_state(nid, NR_UNACCEPTED))
 #endif
+			     ,
+			     nid, K(node_page_state(pgdat, NR_GPU_ACTIVE)),
+			     nid, K(node_page_state(pgdat, NR_GPU_RECLAIM))
 			    );
 	len += hugetlb_report_node_meminfo(buf, len, nid);
 	return len;
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index bc2bc60c36cc..334948744e55 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -164,6 +164,12 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 	show_val_kb(m, "Balloon:        ",
 		    global_node_page_state(NR_BALLOON_PAGES));
 
+	show_val_kb(m, "GPUActive:      ",
+		    global_node_page_state(NR_GPU_ACTIVE));
+
+	show_val_kb(m, "GPUReclaim:     ",
+		    global_node_page_state(NR_GPU_RECLAIM));
+
 	hugetlb_report_meminfo(m);
 
 	arch_report_meminfo(m);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 283913d42d7b..95c5e4813427 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -241,6 +241,8 @@ enum node_stat_item {
 	NR_HUGETLB,
 #endif
 	NR_BALLOON_PAGES,
+	NR_GPU_ACTIVE,          /* GPU pages assigned to an object */
+	NR_GPU_RECLAIM,         /* GPU pages in shrinkable pool */
 	NR_VM_NODE_STAT_ITEMS
 };
 
diff --git a/mm/show_mem.c b/mm/show_mem.c
index 0cf8bf5d832d..072d33a50148 100644
--- a/mm/show_mem.c
+++ b/mm/show_mem.c
@@ -255,7 +255,9 @@ static void show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_z
 			" sec_pagetables:%lukB"
 			" all_unreclaimable? %s"
 			" Balloon:%lukB"
-			"\n",
+		        " gpu_active:%lukB"
+		        " gpu_reclaim:%lukB"
+		        "\n",
 			pgdat->node_id,
 			K(node_page_state(pgdat, NR_ACTIVE_ANON)),
 			K(node_page_state(pgdat, NR_INACTIVE_ANON)),
@@ -281,7 +283,10 @@ static void show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_z
 			K(node_page_state(pgdat, NR_PAGETABLE)),
 			K(node_page_state(pgdat, NR_SECONDARY_PAGETABLE)),
 			str_yes_no(pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES),
-			K(node_page_state(pgdat, NR_BALLOON_PAGES)));
+		        K(node_page_state(pgdat, NR_BALLOON_PAGES)),
+		        K(node_page_state(pgdat, NR_GPU_ACTIVE)),
+			K(node_page_state(pgdat, NR_GPU_RECLAIM)));
+
 	}
 
 	for_each_populated_zone(zone) {
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 429ae5339bfe..25a74cf29473 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1281,6 +1281,8 @@ const char * const vmstat_text[] = {
 	"nr_hugetlb",
 #endif
 	"nr_balloon_pages",
+	"nr_gpu_active",
+	"nr_gpu_reclaim",
 	/* system-wide enum vm_stat_item counters */
 	"nr_dirty_threshold",
 	"nr_dirty_background_threshold",
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] drm/ttm: use gpu mm stats to track gpu memory allocations.
  2025-06-18  4:06 [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters Dave Airlie
@ 2025-06-18  4:06 ` Dave Airlie
  2025-06-19  0:05 ` [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters Andrew Morton
  2025-06-19  0:26 ` Shakeel Butt
  2 siblings, 0 replies; 5+ messages in thread
From: Dave Airlie @ 2025-06-18  4:06 UTC (permalink / raw)
  To: dri-devel
  Cc: Dave Airlie, Christian Koenig, Matthew Brost, Johannes Weiner,
	Andrew Morton, linux-mm

From: Dave Airlie <airlied@redhat.com>

This uses the newly introduced per-node gpu tracking stats,
to track GPU memory allocated via TTM and reclaimable memory in
the TTM page pools.

These stats will be useful later for system information and
later when mem cgroups are integrated.

Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index c2ea865be657..ccc3b9a13e9e 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -130,6 +130,16 @@ static struct list_head shrinker_list;
 static struct shrinker *mm_shrinker;
 static DECLARE_RWSEM(pool_shrink_rwsem);
 
+/* helper to get a current valid node id from a pool */
+static int ttm_pool_nid(struct ttm_pool *pool) {
+	int nid = NUMA_NO_NODE;
+	if (pool)
+		nid = pool->nid;
+	if (nid == NUMA_NO_NODE)
+		nid = numa_node_id();
+	return nid;
+}
+
 /* Allocate pages of size 1 << order with the given gfp_flags */
 static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 					unsigned int order)
@@ -149,8 +159,10 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 
 	if (!pool->use_dma_alloc) {
 		p = alloc_pages_node(pool->nid, gfp_flags, order);
-		if (p)
+		if (p) {
 			p->private = order;
+			mod_node_page_state(NODE_DATA(ttm_pool_nid(pool)), NR_GPU_ACTIVE, (1 << order));
+		}
 		return p;
 	}
 
@@ -201,6 +213,7 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
 
 	if (!pool || !pool->use_dma_alloc) {
 		__free_pages(p, order);
+		mod_node_page_state(NODE_DATA(ttm_pool_nid(pool)), NR_GPU_ACTIVE, -(1 << order));
 		return;
 	}
 
@@ -275,6 +288,7 @@ static void ttm_pool_unmap(struct ttm_pool *pool, dma_addr_t dma_addr,
 static void ttm_pool_type_give(struct ttm_pool_type *pt, struct page *p)
 {
 	unsigned int i, num_pages = 1 << pt->order;
+	int nid = ttm_pool_nid(pt->pool);
 
 	for (i = 0; i < num_pages; ++i) {
 		if (PageHighMem(p))
@@ -287,17 +301,23 @@ static void ttm_pool_type_give(struct ttm_pool_type *pt, struct page *p)
 	list_add(&p->lru, &pt->pages);
 	spin_unlock(&pt->lock);
 	atomic_long_add(1 << pt->order, &allocated_pages);
+
+	mod_node_page_state(NODE_DATA(nid), NR_GPU_ACTIVE, -(1 << pt->order));
+	mod_node_page_state(NODE_DATA(nid), NR_GPU_RECLAIM, (1 << pt->order));
 }
 
 /* Take pages from a specific pool_type, return NULL when nothing available */
 static struct page *ttm_pool_type_take(struct ttm_pool_type *pt)
 {
 	struct page *p;
+	int nid = ttm_pool_nid(pt->pool);
 
 	spin_lock(&pt->lock);
 	p = list_first_entry_or_null(&pt->pages, typeof(*p), lru);
 	if (p) {
 		atomic_long_sub(1 << pt->order, &allocated_pages);
+		mod_node_page_state(NODE_DATA(nid), NR_GPU_ACTIVE, (1 << pt->order));
+		mod_node_page_state(NODE_DATA(nid), NR_GPU_RECLAIM, -(1 << pt->order));
 		list_del(&p->lru);
 	}
 	spin_unlock(&pt->lock);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters
  2025-06-18  4:06 [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters Dave Airlie
  2025-06-18  4:06 ` [PATCH 2/2] drm/ttm: use gpu mm stats to track gpu memory allocations Dave Airlie
@ 2025-06-19  0:05 ` Andrew Morton
  2025-06-19  0:26 ` Shakeel Butt
  2 siblings, 0 replies; 5+ messages in thread
From: Andrew Morton @ 2025-06-19  0:05 UTC (permalink / raw)
  To: Dave Airlie
  Cc: dri-devel, Dave Airlie, Christian Koenig, Matthew Brost,
	Johannes Weiner, linux-mm

On Wed, 18 Jun 2025 14:06:17 +1000 Dave Airlie <airlied@gmail.com> wrote:

> While discussing memcg intergration with gpu memory allocations,
> it was pointed out that there was no numa/system counters for
> GPU memory allocations.
> 
> With more integrated memory GPU server systems turning up, and
> more requirements for memory tracking it seems we should start
> closing the gap.
> 
> Add two counters to track GPU per-node system memory allocations.
> 
> The first is currently allocated to GPU objects, and the second
> is for memory that is stored in GPU page pools that can be reclaimed,
> by the shrinker.
> 

Reviewed-by: Andrew Morton <akpm@linux-foundation.org>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters
  2025-06-18  4:06 [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters Dave Airlie
  2025-06-18  4:06 ` [PATCH 2/2] drm/ttm: use gpu mm stats to track gpu memory allocations Dave Airlie
  2025-06-19  0:05 ` [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters Andrew Morton
@ 2025-06-19  0:26 ` Shakeel Butt
  2025-06-19  0:42   ` David Airlie
  2 siblings, 1 reply; 5+ messages in thread
From: Shakeel Butt @ 2025-06-19  0:26 UTC (permalink / raw)
  To: Dave Airlie
  Cc: dri-devel, Dave Airlie, Christian Koenig, Matthew Brost,
	Johannes Weiner, linux-mm, Andrew Morton

On Wed, Jun 18, 2025 at 02:06:17PM +1000, Dave Airlie wrote:
> From: Dave Airlie <airlied@redhat.com>
> 
> While discussing memcg intergration with gpu memory allocations,
> it was pointed out that there was no numa/system counters for
> GPU memory allocations.
> 
> With more integrated memory GPU server systems turning up, and
> more requirements for memory tracking it seems we should start
> closing the gap.
> 
> Add two counters to track GPU per-node system memory allocations.
> 
> The first is currently allocated to GPU objects, and the second
> is for memory that is stored in GPU page pools that can be reclaimed,
> by the shrinker.
> 
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: linux-mm@kvack.org
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Dave Airlie <airlied@redhat.com>
> 
> ---
> 
> I'd like to get acks to merge this via the drm tree, if possible,
> 
> Dave.
> ---
>  Documentation/filesystems/proc.rst | 6 ++++++
>  drivers/base/node.c                | 5 +++++
>  fs/proc/meminfo.c                  | 6 ++++++
>  include/linux/mmzone.h             | 2 ++
>  mm/show_mem.c                      | 9 +++++++--
>  mm/vmstat.c                        | 2 ++
>  6 files changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> index 5236cb52e357..45f61a19a790 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -1095,6 +1095,8 @@ Example output. You may not have all of these fields.
>      CmaFree:               0 kB
>      Unaccepted:            0 kB
>      Balloon:               0 kB
> +    GPUActive:             0 kB
> +    GPUReclaim:            0 kB
>      HugePages_Total:       0
>      HugePages_Free:        0
>      HugePages_Rsvd:        0
> @@ -1273,6 +1275,10 @@ Unaccepted
>                Memory that has not been accepted by the guest
>  Balloon
>                Memory returned to Host by VM Balloon Drivers
> +GPUActive
> +              Memory allocated to GPU objects
> +GPUReclaim
> +              Memory in GPU allocator pools that is reclaimable

Can you please explain a bit more about these GPUActive & GPUReclaim?
Please correct me if I am wrong, GPUActive is the total memory used by
GPU objects and GPUReclaim is the subset of GPUActive which is
reclaimable (possibly through shrinkers).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters
  2025-06-19  0:26 ` Shakeel Butt
@ 2025-06-19  0:42   ` David Airlie
  0 siblings, 0 replies; 5+ messages in thread
From: David Airlie @ 2025-06-19  0:42 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Dave Airlie, dri-devel, Christian Koenig, Matthew Brost,
	Johannes Weiner, linux-mm, Andrew Morton

On Thu, Jun 19, 2025 at 10:33 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Wed, Jun 18, 2025 at 02:06:17PM +1000, Dave Airlie wrote:
> > From: Dave Airlie <airlied@redhat.com>
> >
> > While discussing memcg intergration with gpu memory allocations,
> > it was pointed out that there was no numa/system counters for
> > GPU memory allocations.
> >
> > With more integrated memory GPU server systems turning up, and
> > more requirements for memory tracking it seems we should start
> > closing the gap.
> >
> > Add two counters to track GPU per-node system memory allocations.
> >
> > The first is currently allocated to GPU objects, and the second
> > is for memory that is stored in GPU page pools that can be reclaimed,
> > by the shrinker.
> >
> > Cc: Christian Koenig <christian.koenig@amd.com>
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: linux-mm@kvack.org
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Signed-off-by: Dave Airlie <airlied@redhat.com>
> >
> > ---
> >
> > I'd like to get acks to merge this via the drm tree, if possible,
> >
> > Dave.
> > ---
> >  Documentation/filesystems/proc.rst | 6 ++++++
> >  drivers/base/node.c                | 5 +++++
> >  fs/proc/meminfo.c                  | 6 ++++++
> >  include/linux/mmzone.h             | 2 ++
> >  mm/show_mem.c                      | 9 +++++++--
> >  mm/vmstat.c                        | 2 ++
> >  6 files changed, 28 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> > index 5236cb52e357..45f61a19a790 100644
> > --- a/Documentation/filesystems/proc.rst
> > +++ b/Documentation/filesystems/proc.rst
> > @@ -1095,6 +1095,8 @@ Example output. You may not have all of these fields.
> >      CmaFree:               0 kB
> >      Unaccepted:            0 kB
> >      Balloon:               0 kB
> > +    GPUActive:             0 kB
> > +    GPUReclaim:            0 kB
> >      HugePages_Total:       0
> >      HugePages_Free:        0
> >      HugePages_Rsvd:        0
> > @@ -1273,6 +1275,10 @@ Unaccepted
> >                Memory that has not been accepted by the guest
> >  Balloon
> >                Memory returned to Host by VM Balloon Drivers
> > +GPUActive
> > +              Memory allocated to GPU objects
> > +GPUReclaim
> > +              Memory in GPU allocator pools that is reclaimable
>
> Can you please explain a bit more about these GPUActive & GPUReclaim?
> Please correct me if I am wrong, GPUActive is the total memory used by
> GPU objects and GPUReclaim is the subset of GPUActive which is
> reclaimable (possibly through shrinkers).

Currently,
GPUActive is total memory used by active GPU objects.
GPUReclaim is the amount of memory (not a subset of Active) that is
being stored in GPU reusable pools, that can be retrieved via a simple
shrinker. (this memory usually has different page table attributes,
uncached or writecombined).

Example workflow:
User allocates cached system RAM for GPU object:
Active increases,
Free cached system RAM,
Active decreases.

User allocates write combined system RAM for GPU object:
Active increases
Free write combined system RAM
Active decreases,
Reclaim increases
User allocates another WC system RAM object:
Reclaim decreases
Active increases
Shrinker shrinks the pool:
Reclaim decreases.

There could be in the future a 3rd type of memory which I'm not sure
it if necessary to account at this level, but it would be Active
memory that the driver considers discardable, and could be shrunk
easily, but I'm not seeing much consistency on usage in drivers for
this, or even what use case it is needed for, so I'm not going to
address it yet. This could end up in Reclaim, but I'd need to see the
use cases for it.

Dave.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-06-19  0:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-18  4:06 [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters Dave Airlie
2025-06-18  4:06 ` [PATCH 2/2] drm/ttm: use gpu mm stats to track gpu memory allocations Dave Airlie
2025-06-19  0:05 ` [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters Andrew Morton
2025-06-19  0:26 ` Shakeel Butt
2025-06-19  0:42   ` David Airlie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).