dri-devel Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: mkoutny@suse.com, dri-devel@lists.freedesktop.org, tj@kernel.org,
	hannes@cmpxchg.org
Subject: Re: [PATCH v7 6/6] drm/amdgpu: Wire up dmem cgroup reclaim for VRAM manager
Date: Fri, 03 Jul 2026 13:25:00 +0000	[thread overview]
Message-ID: <20260703132501.5604A1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260703130541.2686-7-thomas.hellstrom@linux.intel.com>

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Use-After-Free due to concurrent cgroup reclaim during probe error teardown.
- [Medium] Memory leak of VRAM manager buddy allocator on cgroup region registration failure.
--

commit 7beb8b077a31ff08f80ed7810a3913fd3a463579
Author: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/amdgpu: Wire up dmem cgroup reclaim for VRAM manager
    
Register the VRAM manager with the dmem cgroup reclaim infrastructure
so that lowering dmem.max below current VRAM usage triggers TTM
eviction rather than failing with -EBUSY.

[ ... ]

> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> index 08f05c3aed1d5..9b9d738ba7941 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c

[ ... ]

> @@ -906,6 +906,28 @@ static const struct ttm_resource_manager_func amdgpu_vram_mgr_func = {
>  	.debug	= amdgpu_vram_mgr_debug
>  };
>  
> +static const struct dmem_cgroup_ops amdgpu_vram_mgr_dmem_ops;
> +
> +static int amdgpu_vram_mgr_dmem_reclaim(struct dmem_cgroup_pool_state *pool,
> +					u64 target_bytes, void *priv)
> +{
> +	struct ttm_resource_manager *man = priv;
> +	struct amdgpu_device *adev = amdgpu_ttm_adev(man->bdev);
> +	int ret, idx;
> +
> +	if (!drm_dev_enter(adev_to_drm(adev), &idx))
> +		return -ENODEV;

[Severity: High]
Does this drm_dev_enter() check adequately protect against teardown during a
probe error?

If a failure occurs during amdgpu_pci_probe() (for instance in
amdgpu_driver_load_kms()), drm_dev_unplug() is never called, so
dev->unplugged remains false. The teardown path then proceeds to
drm_dev_release() which does:

drm_dev_release(dev) {
    if (dev->driver->release)
        dev->driver->release(dev);

    drm_managed_release(dev);
    ...
}

The driver release callback will destroy the VRAM manager and TTM structures.
Only after this does drm_managed_release() unregister the cgroup region. If a
cgroup memory reclaim occurs during this window, drm_dev_enter() will allow
execution to proceed, and ttm_resource_manager_dmem_reclaim() will
dereference the destroyed VRAM manager. Is there a way to prevent reclaim from
accessing the manager during probe failure teardown?

> +
> +	ret = ttm_resource_manager_dmem_reclaim(pool, target_bytes, priv);
> +
> +	drm_dev_exit(idx);
> +	return ret;
> +}

[ ... ]

> @@ -933,12 +956,16 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev)
>  	if (err)
>  		return err;
>  
> -	man->cg = drmm_cgroup_register_region(adev_to_drm(adev), "vram",
> -					      &(struct dmem_cgroup_init){
> +	cg = drmm_cgroup_register_region(adev_to_drm(adev), "vram",
> +					 &(struct dmem_cgroup_init){
>  						.size = adev->gmc.real_vram_size,
> -					      });
> -	if (IS_ERR(man->cg))
> -		return PTR_ERR(man->cg);
> +						.ops = &amdgpu_vram_mgr_dmem_ops,
> +						.reclaim_priv = man,
> +					 });
> +	if (IS_ERR(cg))
> +		return PTR_ERR(cg);

[Severity: Medium]
Does returning here leak the buddy allocator initialized earlier in this
function via gpu_buddy_init()?

If drmm_cgroup_register_region() fails, the error propagates out of
amdgpu_ttm_init(), causing the GMC IP block's sw_init to fail. During driver
teardown, amdgpu_device_ip_fini() skips calling sw_fini for IP blocks where
status.sw is false. Since amdgpu_vram_mgr_fini() is skipped, it seems the
buddy allocator memory allocated by gpu_buddy_init() will never be freed.

Should this error path call gpu_buddy_fini() before returning?

> +
> +	ttm_resource_manager_set_dmem_region(man, cg);
>  
>  	ttm_set_driver_manager(&adev->mman.bdev, TTM_PL_VRAM, &mgr->manager);
>  	ttm_resource_manager_set_used(man, true);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260703130541.2686-1-thomas.hellstrom@linux.intel.com?part=6

  reply	other threads:[~2026-07-03 13:25 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-07-03 13:05 [PATCH v7 0/6] Add reclaim to the dmem cgroup controller Thomas Hellström
2026-07-03 13:05 ` [PATCH v7 1/6] drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init() Thomas Hellström
2026-07-03 13:08   ` Christian König
2026-07-03 13:11     ` Thomas Hellström
2026-07-03 13:26   ` sashiko-bot
2026-07-03 13:05 ` [PATCH v7 2/6] cgroup/dmem: Introduce struct dmem_cgroup_init for region initialization Thomas Hellström
2026-07-03 13:05 ` [PATCH v7 3/6] cgroup/dmem: Add reclaim callback for lowering max below current usage Thomas Hellström
2026-07-03 13:24   ` sashiko-bot
2026-07-03 13:05 ` [PATCH v7 4/6] drm/ttm: Hook up a cgroup-aware reclaim callback for the dmem controller Thomas Hellström
2026-07-03 13:19   ` sashiko-bot
2026-07-03 13:05 ` [PATCH v7 5/6] drm/xe: Wire up dmem cgroup reclaim for VRAM manager Thomas Hellström
2026-07-03 13:05 ` [PATCH v7 6/6] drm/amdgpu: " Thomas Hellström
2026-07-03 13:25   ` sashiko-bot [this message]
2026-07-03 14:37 ` [PATCH v7 0/6] Add reclaim to the dmem cgroup controller Thadeu Lima de Souza Cascardo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260703132501.5604A1F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hannes@cmpxchg.org \
    --cc=mkoutny@suse.com \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox