From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: intel-xe@lists.freedesktop.org
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
"Natalie Vock" <natalie.vock@gmx.de>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Tejun Heo" <tj@kernel.org>, "Michal Koutný" <mkoutny@suse.com>,
cgroups@vger.kernel.org, "Huang Rui" <ray.huang@amd.com>,
"Matthew Brost" <matthew.brost@intel.com>,
"Matthew Auld" <matthew.auld@intel.com>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
"Maxime Ripard" <mripard@kernel.org>,
"Thomas Zimmermann" <tzimmermann@suse.de>,
"Simona Vetter" <simona@ffwll.ch>,
"David Airlie" <airlied@gmail.com>,
"Christian König" <christian.koenig@amd.com>,
"Alex Deucher" <alexander.deucher@amd.com>,
"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org,
linux-kernel@vger.kernel.org
Subject: [PATCH v5 0/6] [PATCH v5 0/6] Add reclaim to the dmem cgroup controller
Date: Thu, 11 Jun 2026 16:22:36 +0200 [thread overview]
Message-ID: <20260611142242.2529-1-thomas.hellstrom@linux.intel.com> (raw)
When writing a "max" limit lower than the current usage, the
existing code silently failed. This series aims to improve
on that by returning -EBUSY on failure and also attempt
to synchronously reclaim device memory to push the usage
under the new max limit to avoid the error.
Patch 1 fixes a pre-existing amdgpu_vram_mgr_init() error path
Patch 2 introduces struct dmem_cgroup_init for extensible region
registration.
Patch 3 implements and documents a reclaim callback interface
for the dmem controller.
Patch 4 implements a TTM reclaim callback.
Patches 5-6 hook up the reclaim callback to the dmem cgroup-aware
drivers xe and amdgpu.
v2:
- Remove the error propagation that was in a previous series (Maarten)
- A number of updates in patch 1. See its commit message for
details (Maarten)
v3:
- Add patch 1 fixing a pre-existing amdgpu_vram_mgr_init() error path
bug where drmm_cgroup_register_region() was called before
INIT_LIST_HEAD() and gpu_buddy_init(), causing a kernel panic on
failure. (Sashiko-bot)
- Use an rwsem to protect reclaim callback registration and region
unregister against concurrent reclaim invocations. (Sashiko-bot)
- Fix ttm_resource_manager_set_dmem_region() storing an error pointer
in man->cg unconditionally. (Sashiko-bot)
- Fix kernel-doc function name format for ttm_bo_evict_cgroup() and
ttm_resource_manager_set_dmem_region().
v4:
- Rebased on drm-tip; dropped the XE_PL_STOLEN guard in the xe patch
as stolen memory uses a separate TTM manager.
v5:
- Add patch 2 introducing struct dmem_cgroup_init to make the
dmem_cgroup_register_region() API extensible without adding positional
arguments in the future.
- Use nonblock=true in reset_all_resource_limits() to avoid sleeping
inside rcu_read_lock() in dmemcs_offline(). (Sashiko-bot)
- Compare usage against the truncated limit stored in cnt.max, not the
original u64. (Sashiko-bot)
- Use DMEM_MAX_RECLAIM_RETRIES (16) retry budget instead of 5, matching
the memcg controller; only -ENOSPC (no progress) counts against the
budget, other errors abort immediately.
- Handle NULL region in ttm_resource_manager_set_dmem_region() to clear
the reclaim callback, preventing use-after-free when the manager is
torn down while the dmem region outlives it. (Sashiko-bot)
- Return 0 on any eviction progress; reserve -ENOSPC for zero progress.
- Register xe fini devres action before drmm_cgroup_register_region()
so LIFO teardown runs unregister first, draining callbacks before the
manager is destroyed. (Sashiko-bot)
- Switch amdgpu to explicit dmem_cgroup_unregister_region() at the top
of amdgpu_vram_mgr_fini() before any manager teardown, since amdgpu's
fini is called explicitly during driver unbind before drmm cleanup.
(Sashiko-bot)
- Wrap the xe reclaim callback with drm_dev_enter()/drm_dev_exit() to
prevent TTM reclaim from running after driver unbind.
User-space tests are at
https://patchwork.freedesktop.org/series/163935/
Test-with: 20260428065411.4222-1-thomas.hellstrom@linux.intel.com
Thomas Hellström (6):
drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init()
cgroup/dmem: Introduce struct dmem_cgroup_init for region initialization
cgroup/dmem: Add reclaim callback for lowering max below current usage
drm/ttm: Hook up a cgroup-aware reclaim callback for the dmem
controller
drm/xe: Wire up dmem cgroup reclaim for VRAM manager
drm/amdgpu: Wire up dmem cgroup reclaim for VRAM manager
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 10 +-
drivers/gpu/drm/ttm/ttm_bo.c | 95 ++++++++++++++++-
drivers/gpu/drm/ttm/ttm_bo_util.c | 3 +-
drivers/gpu/drm/ttm/ttm_resource.c | 37 +++++++
drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 19 ++--
include/drm/ttm/ttm_bo.h | 10 ++
include/drm/ttm/ttm_resource.h | 4 +
include/linux/cgroup_dmem.h | 24 +++++
kernel/cgroup/dmem.c | 106 +++++++++++++++++--
10 files changed, 286 insertions(+), 24 deletions(-)
--
2.54.0
Thomas Hellström (6):
drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init()
cgroup/dmem: Introduce struct dmem_cgroup_init for region
initialization
cgroup/dmem: Add reclaim callback for lowering max below current usage
drm/ttm: Hook up a cgroup-aware reclaim callback for the dmem
controller
drm/xe: Wire up dmem cgroup reclaim for VRAM manager
drm/amdgpu: Wire up dmem cgroup reclaim for VRAM manager
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 28 +++-
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h | 2 +
drivers/gpu/drm/drm_drv.c | 8 +-
drivers/gpu/drm/ttm/ttm_bo.c | 95 +++++++++++++-
drivers/gpu/drm/ttm/ttm_bo_util.c | 3 +-
drivers/gpu/drm/ttm/ttm_resource.c | 50 +++++++
drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 53 +++++++-
include/drm/drm_drv.h | 4 +-
include/drm/ttm/ttm_bo.h | 10 ++
include/drm/ttm/ttm_resource.h | 7 +
include/linux/cgroup_dmem.h | 37 +++++-
kernel/cgroup/dmem.c | 129 +++++++++++++++++--
13 files changed, 393 insertions(+), 35 deletions(-)
--
2.54.0
next reply other threads:[~2026-06-11 14:23 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-11 14:22 Thomas Hellström [this message]
2026-06-11 14:22 ` [PATCH v5 1/6] drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init() Thomas Hellström
2026-06-11 14:37 ` sashiko-bot
2026-06-11 17:27 ` Thomas Hellström
2026-06-11 14:22 ` [PATCH v5 2/6] cgroup/dmem: Introduce struct dmem_cgroup_init for region initialization Thomas Hellström
2026-06-11 14:22 ` [PATCH v5 3/6] cgroup/dmem: Add reclaim callback for lowering max below current usage Thomas Hellström
2026-06-11 14:40 ` sashiko-bot
2026-06-11 17:28 ` Thomas Hellström
2026-06-11 14:22 ` [PATCH v5 4/6] drm/ttm: Hook up a cgroup-aware reclaim callback for the dmem controller Thomas Hellström
2026-06-11 14:22 ` [PATCH v5 5/6] drm/xe: Wire up dmem cgroup reclaim for VRAM manager Thomas Hellström
2026-06-11 14:22 ` [PATCH v5 6/6] drm/amdgpu: " Thomas Hellström
2026-06-11 14:35 ` sashiko-bot
2026-06-11 14:49 ` ✗ CI.checkpatch: warning for Add reclaim to the dmem cgroup controller (rev5) Patchwork
2026-06-11 14:51 ` ✓ CI.KUnit: success " Patchwork
2026-06-11 16:44 ` ✓ Xe.CI.BAT: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260611142242.2529-1-thomas.hellstrom@linux.intel.com \
--to=thomas.hellstrom@linux.intel.com \
--cc=airlied@gmail.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=cgroups@vger.kernel.org \
--cc=christian.koenig@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=hannes@cmpxchg.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maarten.lankhorst@linux.intel.com \
--cc=matthew.auld@intel.com \
--cc=matthew.brost@intel.com \
--cc=mkoutny@suse.com \
--cc=mripard@kernel.org \
--cc=natalie.vock@gmx.de \
--cc=ray.huang@amd.com \
--cc=rodrigo.vivi@intel.com \
--cc=simona@ffwll.ch \
--cc=tj@kernel.org \
--cc=tzimmermann@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.