All of lore.kernel.org
 help / color / mirror / Atom feed
* drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v4)
@ 2025-10-16  2:31 Dave Airlie
  2025-10-16  2:31 ` [PATCH 01/16] mm: add gpu active/reclaim per-node stat counters (v2) Dave Airlie
                   ` (15 more replies)
  0 siblings, 16 replies; 19+ messages in thread
From: Dave Airlie @ 2025-10-16  2:31 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Dave Chinner, Waiman Long, simona

Hi all,

This is a another repost with some fixes and cleanups. I've added Christian's acks/reviews from the
previous round. I've fixed the obj_cgroup_put into the core, instead of in the drivers.

I'd really like to land this into drm-next, I've added Maarten xe support patch to this. I'd like
to get any missing acks/reviews.

Christian, I think you said patch 4 got lost last time, hopefully you get it this time.

Patches still needing ack/review:
ttm/pool: drop numa specific pools
ttm/pool: track allocated_pages per numa node.
ttm: add objcg pointer to bo and tt (v2)
ttm/pool: enable memcg tracking and shrinker. (v2)
amdgpu: add support for memory cgroups

Differences since v1 posting:
1. added ttm_bo_set_cgroup wrapper - the cgroup reference is passed to the ttm object.
2. put the cgroup reference in ttm object release
3. rebase onto 6.19-rc1
4. added xe support patch from Maarten.

Differences since v2 posting:
1. Squashed exports into where they are used (Shakeel)
2. Fixed bug in uncharge path memcg
3. Fixed config bug in the module option.

Differences since 1st posting:
1. Added patch 18: add a module option to allow pooled pages to not be stored in the lru per-memcg
   (Requested by Christian Konig)
2. Converged the naming and stats between vmstat and memcg (Suggested by Shakeel Butt)
3. Cleaned up the charge/uncharge code and some other bits.

Dave.

Original cover letter:
tl;dr: start using list_lru/numa/memcg in GPU driver core and amdgpu driver for now.

This is a complete series of patches, some of which have been sent before and reviewed,
but I want to get the complete picture for others, and try to figure out how best to land this.

There are 3 pieces to this:
01->02: add support for global gpu stat counters (previously posted, patch 2 is newer)
03->06: port ttm pools to list_lru for numa awareness
07->13: add memcg stats + gpu apis, then port ttm pools to memcg aware list_lru and shrinker
14: enable amdgpu to use new functionality.
15: add a module option to turn it all off.

The biggest difference in the memcg code from previously is I discovered what
obj cgroups were designed for and I'm reusing the page/objcg intergration that 
already exists, to avoid reinventing that wheel right now.

There are some igt-gpu-tools tests I've written at:
https://gitlab.freedesktop.org/airlied/igt-gpu-tools/-/tree/amdgpu-cgroups?ref_type=heads

One problem is there are a lot of delayed action, that probably means the testing
needs a bit more robustness, but the tests validate all the basic paths.

Regards,
Dave.



^ permalink raw reply	[flat|nested] 19+ messages in thread
* drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v5)
@ 2026-02-24  2:06 Dave Airlie
  2026-02-24  2:06 ` [PATCH 01/16] mm: add gpu active/reclaim per-node stat counters (v2) Dave Airlie
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Airlie @ 2026-02-24  2:06 UTC (permalink / raw)
  To: dri-devel, tj, christian.koenig, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song
  Cc: cgroups, Dave Chinner, Waiman Long, simona

Hi all,

This time I really want to make forward progress on landing this. I'll likely merge the first
half into drm-next soon, but I'd like to get it all landed.

The main changes since v4, is I did an AI review of the patchset and it find a bug 
with the reclaim codepaths when no memcg was around, and a bug in the diff calcs
for accounted pages I introduced.

Christian, I think you said patch 4 got lost last time, hopefully you get it this time.

Patches still needing ack/review:
ttm/pool: drop numa specific pools
ttm/pool: track allocated_pages per numa node.
ttm: add objcg pointer to bo and tt (v2)
ttm/pool: enable memcg tracking and shrinker. (v2)
amdgpu: add support for memory cgroups

Differences since v1 posting:
1. added ttm_bo_set_cgroup wrapper - the cgroup reference is passed to the ttm object.
2. put the cgroup reference in ttm object release
3. rebase onto 6.19-rc1
4. added xe support patch from Maarten.

Differences since v2 posting:
1. Squashed exports into where they are used (Shakeel)
2. Fixed bug in uncharge path memcg
3. Fixed config bug in the module option.

Differences since 1st posting:
1. Added patch 18: add a module option to allow pooled pages to not be stored in the lru per-memcg
   (Requested by Christian Konig)
2. Converged the naming and stats between vmstat and memcg (Suggested by Shakeel Butt)
3. Cleaned up the charge/uncharge code and some other bits.

Dave.

Original cover letter:
tl;dr: start using list_lru/numa/memcg in GPU driver core and amdgpu driver for now.

This is a complete series of patches, some of which have been sent before and reviewed,
but I want to get the complete picture for others, and try to figure out how best to land this.

There are 3 pieces to this:
01->02: add support for global gpu stat counters (previously posted, patch 2 is newer)
03->06: port ttm pools to list_lru for numa awareness
07->13: add memcg stats + gpu apis, then port ttm pools to memcg aware list_lru and shrinker
14: enable amdgpu to use new functionality.
15: add a module option to turn it all off.

The biggest difference in the memcg code from previously is I discovered what
obj cgroups were designed for and I'm reusing the page/objcg intergration that 
already exists, to avoid reinventing that wheel right now.

There are some igt-gpu-tools tests I've written at:
https://gitlab.freedesktop.org/airlied/igt-gpu-tools/-/tree/amdgpu-cgroups?ref_type=heads

One problem is there are a lot of delayed action, that probably means the testing
needs a bit more robustness, but the tests validate all the basic paths.

Regards,
Dave.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2026-02-24  2:09 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-16  2:31 drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v4) Dave Airlie
2025-10-16  2:31 ` [PATCH 01/16] mm: add gpu active/reclaim per-node stat counters (v2) Dave Airlie
2025-10-16  7:48   ` Christian König
2025-10-16  2:31 ` [PATCH 02/16] drm/ttm: use gpu mm stats to track gpu memory allocations. (v4) Dave Airlie
2025-10-16  2:31 ` [PATCH 03/16] ttm/pool: port to list_lru. (v2) Dave Airlie
2025-10-16  2:31 ` [PATCH 04/16] ttm/pool: drop numa specific pools Dave Airlie
2025-10-16  2:31 ` [PATCH 05/16] ttm/pool: make pool shrinker NUMA aware Dave Airlie
2025-10-16  2:31 ` [PATCH 06/16] ttm/pool: track allocated_pages per numa node Dave Airlie
2025-10-16  2:31 ` [PATCH 07/16] memcg: add support for GPU page counters. (v3) Dave Airlie
2025-10-16  2:31 ` [PATCH 08/16] ttm: add a memcg accounting flag to the alloc/populate APIs Dave Airlie
2025-10-16  2:31 ` [PATCH 09/16] ttm/pool: initialise the shrinker earlier Dave Airlie
2025-10-16  2:31 ` [PATCH 10/16] ttm: add objcg pointer to bo and tt (v2) Dave Airlie
2025-10-16  2:31 ` [PATCH 11/16] ttm/pool: enable memcg tracking and shrinker. (v2) Dave Airlie
2025-10-16  2:31 ` [PATCH 12/16] ttm: hook up memcg placement flags Dave Airlie
2025-10-16  2:31 ` [PATCH 13/16] memcontrol: allow objcg api when memcg is config off Dave Airlie
2025-10-16  2:31 ` [PATCH 14/16] amdgpu: add support for memory cgroups Dave Airlie
2025-10-16  2:31 ` [PATCH 15/16] ttm: add support for a module option to disable memcg integration Dave Airlie
2025-10-16  2:31 ` [PATCH 16/16] xe: create a flag to enable memcg accounting for XE as well Dave Airlie
  -- strict thread matches above, loose matches on Subject: below --
2026-02-24  2:06 drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v5) Dave Airlie
2026-02-24  2:06 ` [PATCH 01/16] mm: add gpu active/reclaim per-node stat counters (v2) Dave Airlie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.