From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: intel-xe@lists.freedesktop.org
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
"Natalie Vock" <natalie.vock@gmx.de>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Tejun Heo" <tj@kernel.org>, "Michal Koutný" <mkoutny@suse.com>,
cgroups@vger.kernel.org, "Huang Rui" <ray.huang@amd.com>,
"Matthew Brost" <matthew.brost@intel.com>,
"Matthew Auld" <matthew.auld@intel.com>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
"Maxime Ripard" <mripard@kernel.org>,
"Thomas Zimmermann" <tzimmermann@suse.de>,
"Simona Vetter" <simona@ffwll.ch>,
"David Airlie" <airlied@gmail.com>,
"Christian König" <christian.koenig@amd.com>,
"Alex Deucher" <alexander.deucher@amd.com>,
"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org,
linux-kernel@vger.kernel.org
Subject: [PATCH v5 6/6] drm/amdgpu: Wire up dmem cgroup reclaim for VRAM manager
Date: Thu, 11 Jun 2026 16:22:42 +0200 [thread overview]
Message-ID: <20260611142242.2529-7-thomas.hellstrom@linux.intel.com> (raw)
In-Reply-To: <20260611142242.2529-1-thomas.hellstrom@linux.intel.com>
Register the VRAM manager with the dmem cgroup reclaim infrastructure
so that lowering dmem.max below current VRAM usage triggers TTM
eviction rather than failing with -EBUSY.
Guard place->flags in amdgpu_ttm_bo_eviction_valuable() against NULL,
as the TTM reclaim path passes a NULL place in cgroup drain mode.
v3:
- Rebased on fix for uninitialized list and buddy allocator on the
drmm_cgroup_register_region() error path.
v5:
- Rebased on the introduction of struct dmem_cgroup_init.
- Clear the reclaim callback in amdgpu_vram_mgr_fini() to prevent
use-after-free if cgroup reclaim is triggered after driver unbind
while userspace holds an open DRM file descriptor. (Sashiko-bot)
- Switch from drmm_cgroup_register_region() to the raw
dmem_cgroup_register_region() and store the region in
amdgpu_vram_mgr.cg_region. Explicitly call
dmem_cgroup_unregister_region() at the top of amdgpu_vram_mgr_fini()
before any manager teardown, draining in-flight reclaim callbacks via
the rwsem before the manager is destroyed. This is required because
amdgpu's vram manager fini is called explicitly during driver unbind,
which may precede the DRM device release and thus precede any
drmm-based cleanup. (Sashiko-bot)
Assisted-by: GitHub_Copilot:claude-sonnet-4.6
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 29 ++++++++++++++++----
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h | 2 ++
3 files changed, 26 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 2740de94e93c..8cbcd33f51a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1488,7 +1488,7 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
dma_resv_for_each_fence(&resv_cursor, bo->base.resv,
DMA_RESV_USAGE_BOOKKEEP, f) {
if (amdkfd_fence_check_mm(f, current->mm) &&
- !(place->flags & TTM_PL_FLAG_CONTIGUOUS))
+ !(place && (place->flags & TTM_PL_FLAG_CONTIGUOUS)))
return false;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 08f05c3aed1d..ee98b963e84a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -906,6 +906,10 @@ static const struct ttm_resource_manager_func amdgpu_vram_mgr_func = {
.debug = amdgpu_vram_mgr_debug
};
+static const struct dmem_cgroup_ops amdgpu_vram_mgr_dmem_ops = {
+ .reclaim = ttm_resource_manager_dmem_reclaim,
+};
+
/**
* amdgpu_vram_mgr_init - init VRAM manager and DRM MM
*
@@ -917,6 +921,7 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev)
{
struct amdgpu_vram_mgr *mgr = &adev->mman.vram_mgr;
struct ttm_resource_manager *man = &mgr->manager;
+ struct dmem_cgroup_region *cg;
int err;
ttm_resource_manager_init(man, &adev->mman.bdev,
@@ -933,12 +938,15 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev)
if (err)
return err;
- man->cg = drmm_cgroup_register_region(adev_to_drm(adev), "vram",
- &(struct dmem_cgroup_init){
- .size = adev->gmc.real_vram_size,
- });
- if (IS_ERR(man->cg))
- return PTR_ERR(man->cg);
+ cg = dmem_cgroup_register_region(&(struct dmem_cgroup_init){
+ .size = adev->gmc.real_vram_size,
+ .ops = &amdgpu_vram_mgr_dmem_ops,
+ .reclaim_priv = man,
+ }, "vram");
+ if (IS_ERR(cg))
+ return PTR_ERR(cg);
+
+ ttm_resource_manager_set_dmem_region(man, cg);
ttm_set_driver_manager(&adev->mman.bdev, TTM_PL_VRAM, &mgr->manager);
ttm_resource_manager_set_used(man, true);
@@ -960,6 +968,15 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device *adev)
int ret;
struct amdgpu_vram_reservation *rsv, *temp;
+ /*
+ * Drain any in-flight dmem cgroup reclaim callbacks and remove the
+ * region from the global list before tearing down the manager.
+ * This must happen first so no reclaim callback can access the
+ * manager after this point.
+ */
+ dmem_cgroup_unregister_region(mgr->cg_region);
+ mgr->cg_region = NULL;
+
ttm_resource_manager_set_used(man, false);
ret = ttm_resource_manager_evict_all(&adev->mman.bdev, man);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h
index 429a21a2e9b2..07103cddb335 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h
@@ -36,6 +36,8 @@ struct amdgpu_vram_mgr {
atomic64_t vis_usage;
u64 default_page_size;
struct list_head allocated_vres_list;
+ /** @cg_region: dmem cgroup region for VRAM; unregistered in fini. */
+ struct dmem_cgroup_region *cg_region;
};
struct amdgpu_vres_task {
--
2.54.0
prev parent reply other threads:[~2026-06-11 14:23 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-11 14:22 [PATCH v5 0/6] [PATCH v5 0/6] Add reclaim to the dmem cgroup controller Thomas Hellström
2026-06-11 14:22 ` [PATCH v5 1/6] drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init() Thomas Hellström
2026-06-11 14:22 ` [PATCH v5 2/6] cgroup/dmem: Introduce struct dmem_cgroup_init for region initialization Thomas Hellström
2026-06-11 14:22 ` [PATCH v5 3/6] cgroup/dmem: Add reclaim callback for lowering max below current usage Thomas Hellström
2026-06-11 14:22 ` [PATCH v5 4/6] drm/ttm: Hook up a cgroup-aware reclaim callback for the dmem controller Thomas Hellström
2026-06-11 14:22 ` [PATCH v5 5/6] drm/xe: Wire up dmem cgroup reclaim for VRAM manager Thomas Hellström
2026-06-11 14:22 ` Thomas Hellström [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260611142242.2529-7-thomas.hellstrom@linux.intel.com \
--to=thomas.hellstrom@linux.intel.com \
--cc=airlied@gmail.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=cgroups@vger.kernel.org \
--cc=christian.koenig@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=hannes@cmpxchg.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maarten.lankhorst@linux.intel.com \
--cc=matthew.auld@intel.com \
--cc=matthew.brost@intel.com \
--cc=mkoutny@suse.com \
--cc=mripard@kernel.org \
--cc=natalie.vock@gmx.de \
--cc=ray.huang@amd.com \
--cc=rodrigo.vivi@intel.com \
--cc=simona@ffwll.ch \
--cc=tj@kernel.org \
--cc=tzimmermann@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox