linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] DRM resource management cgroup, try 2.
@ 2024-06-27 15:47 Maarten Lankhorst
  2024-06-27 15:47 ` [RFC PATCH 1/6] mm/page_counter: Move calculating protection values to page_counter Maarten Lankhorst
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Maarten Lankhorst @ 2024-06-27 15:47 UTC (permalink / raw)
  To: intel-xe, linux-kernel, dri-devel, Tejun Heo, Zefan Li,
	Johannes Weiner, Andrew Morton
  Cc: Friedrich Vock, cgroups, linux-mm, Maarten Lankhorst

Hey,

A new version of my attempt at managing VRAM through cgroups.
Even though it's called the DRM resource management cgroup, it would be trivial
to rename it to devmem or whatever, since there is nothing DRM specific about it.

This series allows setting limits on VRAM similar to system memory,
with min/low/max limits.
This allows various cgroups to have their own limits for usage.

It sounds very abstract, but it can be used to prioritise the foreground
application (by setting low), or hard partition memory so multiple processes
sharing a single GPU use a proportional amount of memory each in a fair way,
or to prevent long running compute jobs from having their memory evicted.

This is a minimal proof of concept to get discussion going again. It works,
but it only tracks active use of VRAM. In the ideal world, we would track
it better in a way that also integrates better with the memory cgroup
controller. Ideally for every VRAM allocation, we would know we could push
it out to swap if needed, charging the original process not the process evicting.

I'm hoping to restart the discussion, so that we can plug the holes and finally move forward.

New in this version:
- Complete rewrite using page_counter.
- Support setting min/low/max, respected in the same way as memory cgroup.
  (Could be useful to add/allow high? To go over limit for temporary bindings
   during eviction on GART.)
- Locking reworked. Fastpath should now be lockless with RCU.
- Add a second implementation for AMD, to show how easy it is to make it work.
  (Should we completely move this to TTM instead?)
- TTM now always respects min/low when evicting, bailing out with -ENOSPC instead
  where required.

I'm hoping for some good feedback on the path forward for upstreaming. I feel this
version has a lot better chance of being upstreamed than the previous. It should
be a lot more scalable thanks to the usage of RCU and page_counter.

Cheers,
Maarten

Maarten Lankhorst (6):
  mm/page_counter: Move calculating protection values to page_counter
  drm/cgroup: Add memory accounting DRM cgroup
  drm/ttm: Handle cgroup based eviction in TTM
  drm/xe: Implement cgroup for vram
  drm/amdgpu: Add cgroups implementation
  drm/xe: Hack to test with mapped pages instead of vram.

 Documentation/admin-guide/cgroup-v2.rst       |  51 ++
 Documentation/gpu/drm-compute.rst             |  54 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |   6 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  |   6 +
 drivers/gpu/drm/ttm/tests/ttm_bo_test.c       |  18 +-
 drivers/gpu/drm/ttm/tests/ttm_resource_test.c |   2 +-
 drivers/gpu/drm/ttm/ttm_bo.c                  |  38 +-
 drivers/gpu/drm/ttm/ttm_resource.c            |  28 +-
 drivers/gpu/drm/xe/xe_device.c                |   4 +
 drivers/gpu/drm/xe/xe_device_types.h          |   4 +
 drivers/gpu/drm/xe/xe_ttm_sys_mgr.c           |  14 +
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c          |  10 +
 include/drm/ttm/ttm_bo.h                      |   3 +-
 include/drm/ttm/ttm_resource.h                |  16 +-
 include/linux/cgroup_drm.h                    | 115 +++
 include/linux/cgroup_subsys.h                 |   4 +
 include/linux/page_counter.h                  |   4 +
 init/Kconfig                                  |   7 +
 kernel/cgroup/Makefile                        |   1 +
 kernel/cgroup/drm.c                           | 813 ++++++++++++++++++
 mm/memcontrol.c                               | 154 +---
 mm/page_counter.c                             | 173 ++++
 23 files changed, 1355 insertions(+), 172 deletions(-)
 create mode 100644 Documentation/gpu/drm-compute.rst
 create mode 100644 include/linux/cgroup_drm.h
 create mode 100644 kernel/cgroup/drm.c

-- 
2.45.2



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-09-03 11:26 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-27 15:47 [RFC PATCH 0/6] DRM resource management cgroup, try 2 Maarten Lankhorst
2024-06-27 15:47 ` [RFC PATCH 1/6] mm/page_counter: Move calculating protection values to page_counter Maarten Lankhorst
2024-06-27 17:33   ` Roman Gushchin
2024-06-27 18:48   ` Shakeel Butt
2024-06-27 15:47 ` [RFC PATCH 2/6] drm/cgroup: Add memory accounting DRM cgroup Maarten Lankhorst
2024-06-27 17:16   ` Maxime Ripard
2024-06-27 19:22     ` Maarten Lankhorst
2024-06-28 14:04       ` Maxime Ripard
2024-07-01  9:25         ` Maarten Lankhorst
2024-07-01 17:01           ` Tvrtko Ursulin
2024-08-06 13:01             ` Daniel Vetter
2024-08-06 14:09               ` Maxime Ripard
2024-08-06 15:26                 ` Daniel Vetter
2024-09-03  8:53                   ` Maxime Ripard
2024-09-03 11:26                     ` Simona Vetter
2024-08-06  8:19           ` Maxime Ripard
2024-06-27 15:47 ` [RFC PATCH 3/6] drm/ttm: Handle cgroup based eviction in TTM Maarten Lankhorst
2024-06-27 15:47 ` [RFC PATCH 4/6] drm/xe: Implement cgroup for vram Maarten Lankhorst
2024-06-27 15:47 ` [RFC PATCH 5/6] drm/amdgpu: Add cgroups implementation Maarten Lankhorst
2024-06-27 15:47 ` [RFC PATCH 6/6] drm/xe: Hack to test with mapped pages instead of vram Maarten Lankhorst

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).