From: sashiko-bot@kernel.org
To: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: dri-devel@lists.freedesktop.org, tj@kernel.org, mkoutny@suse.com,
hannes@cmpxchg.org
Subject: Re: [PATCH v6 6/6] drm/amdgpu: Wire up dmem cgroup reclaim for VRAM manager
Date: Thu, 11 Jun 2026 18:32:53 +0000 [thread overview]
Message-ID: <20260611183254.F065E1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260611173301.17473-7-thomas.hellstrom@linux.intel.com>
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
New issues:
- [Critical] Early return in `amdgpu_vram_mgr_fini()` on eviction failure skips `dmem_cgroup_unregister_region()`, leading to a use-after-free.
- [Medium] The cgroup region name is hardcoded to 'vram', breaking the device-specific namespace prefix and causing collisions on multi-GPU setups.
Pre-existing issues:
- [High] Unconditional cleanup of the buddy allocator causes a UAF/NULL dereference if `gpu_buddy_init()` fails during initialization.
--
commit 869c423a8a9487b5c3e7321e4f9a0e14f69318b4
Author: Thomas Hellström <thomas.hellstrom@linux.intel.com>
drm/amdgpu: Wire up dmem cgroup reclaim for VRAM manager
Register the VRAM manager with the dmem cgroup reclaim infrastructure
so that lowering dmem.max below current VRAM usage triggers TTM
eviction rather than failing with -EBUSY.
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> index 08f05c3aed1d5..2250bab0970dd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
[ ... ]
> @@ -933,12 +938,16 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev)
> if (err)
> return err;
>
> - man->cg = drmm_cgroup_register_region(adev_to_drm(adev), "vram",
> - &(struct dmem_cgroup_init){
> - .size = adev->gmc.real_vram_size,
> - });
> - if (IS_ERR(man->cg))
> - return PTR_ERR(man->cg);
> + cg = dmem_cgroup_register_region(&(struct dmem_cgroup_init){
> + .size = adev->gmc.real_vram_size,
> + .ops = &amdgpu_vram_mgr_dmem_ops,
> + .reclaim_priv = man,
> + }, "vram");
[Severity: Medium]
Does this hardcoded "vram" string break the device-specific namespace prefix?
Before this patch, drmm_cgroup_register_region() automatically prefixed the
cgroup region name with the DRM device's unique identifier. On systems with
multiple AMD GPUs, won't using just "vram" here cause all GPUs to register a
region with the exact same name, breaking userspace tools' ability to
distinguish between the GPUs' memory cgroups?
[ ... ]
> @@ -966,6 +975,16 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device *adev)
> ret = ttm_resource_manager_evict_all(&adev->mman.bdev, man);
> if (ret)
> return;
>
> + /*
> + * Drain any in-flight dmem cgroup reclaim callbacks and remove the
> + * region from the global list. This must happen after evict_all()
> + * so that ttm_resource_free() can still uncharge via man->cg while
> + * BOs are being evicted.
> + */
> + dmem_cgroup_unregister_region(mgr->cg_region);
[Severity: Critical]
If ttm_resource_manager_evict_all() fails (for example, due to pinned BOs)
and returns an error, the early return above skips the newly added
dmem_cgroup_unregister_region() call.
Could this permanently leak the region and leave a dangling pointer as its
reclaim callback argument? When the DRM device is eventually released and
the amdgpu_device struct is freed, any subsequent memory pressure triggering
dmem cgroup reclaim would dereference the freed man pointer, causing a
kernel use-after-free.
> + mgr->cg_region = NULL;
> + man->cg = NULL;
> +
> mutex_lock(&mgr->lock);
> list_for_each_entry_safe(rsv, temp, &mgr->reservations_pending, blocks)
> kfree(rsv);
[ ... ]
> list_for_each_entry_safe(rsv, temp, &mgr->reserved_pages, blocks) {
> gpu_buddy_free_list(&mgr->mm, &rsv->allocated, 0);
> kfree(rsv);
> }
> if (!adev->gmc.is_app_apu)
> gpu_buddy_fini(&mgr->mm);
[Severity: High]
This isn't a bug introduced by this patch, but does this unconditional
cleanup cause a use-after-free or NULL pointer dereference if
gpu_buddy_init() failed during initialization?
If gpu_buddy_init() fails, it returns an error and frees its internal arrays
without setting them to NULL. The driver error path then calls
amdgpu_device_fini_sw(), which triggers amdgpu_vram_mgr_fini().
Since mm->n_roots was populated before the allocation failure, won't
gpu_buddy_fini() iterate over mm->n_roots and dereference the uninitialized
or NULL mm->roots pointer?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260611173301.17473-1-thomas.hellstrom@linux.intel.com?part=6
next prev parent reply other threads:[~2026-06-11 18:32 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-11 17:32 [PATCH v6 0/6] [PATCH v6 0/6] Add reclaim to the dmem cgroup controller Thomas Hellström
2026-06-11 17:32 ` [PATCH v6 1/6] drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init() Thomas Hellström
2026-06-11 17:45 ` sashiko-bot
2026-06-11 17:32 ` [PATCH v6 2/6] cgroup/dmem: Introduce struct dmem_cgroup_init for region initialization Thomas Hellström
2026-06-11 17:32 ` [PATCH v6 3/6] cgroup/dmem: Add reclaim callback for lowering max below current usage Thomas Hellström
2026-06-11 18:01 ` sashiko-bot
2026-06-11 17:32 ` [PATCH v6 4/6] drm/ttm: Hook up a cgroup-aware reclaim callback for the dmem controller Thomas Hellström
2026-06-11 17:33 ` [PATCH v6 5/6] drm/xe: Wire up dmem cgroup reclaim for VRAM manager Thomas Hellström
2026-06-11 17:33 ` [PATCH v6 6/6] drm/amdgpu: " Thomas Hellström
2026-06-11 18:32 ` sashiko-bot [this message]
2026-06-11 19:41 ` Thomas Hellström
2026-06-11 20:00 ` ✗ CI.checkpatch: warning for Add reclaim to the dmem cgroup controller (rev6) Patchwork
2026-06-11 20:02 ` ✓ CI.KUnit: success " Patchwork
2026-06-11 20:49 ` ✓ Xe.CI.BAT: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260611183254.F065E1F000E9@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=hannes@cmpxchg.org \
--cc=mkoutny@suse.com \
--cc=sashiko-reviews@lists.linux.dev \
--cc=thomas.hellstrom@linux.intel.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.