All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: sashiko-reviews@lists.linux.dev
Cc: mkoutny@suse.com, tj@kernel.org, dri-devel@lists.freedesktop.org,
	hannes@cmpxchg.org
Subject: Re: [PATCH v5 1/6] drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init()
Date: Thu, 11 Jun 2026 19:27:04 +0200	[thread overview]
Message-ID: <f888a852-25d3-4bbb-8675-29ed080c1bfa@linux.intel.com> (raw)
In-Reply-To: <20260611143751.86DFA1F00893@smtp.kernel.org>


On 6/11/26 16:37, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
>
> Pre-existing issues:
> - [High] Incomplete Fix / Kernel panic on module load error path: If `gpu_buddy_init()` fails, it cleans up its internal allocations but leaves `mm->n_roots > 0` and `mm->free_trees` and `mm->roots` as dangling or NULL pointers. Because `adev->mman.initialized` is set to `true` prior to calling `amdgpu_vram_mgr_init()`, a failure in `gpu_buddy_init()` unconditionally triggers the driver teardown path. `amdgpu_vram_mgr_fini()` then invokes `gpu_buddy_fini()`, which dereferences the invalid pointers and causes a kernel panic.

Pre-existing issues are out of scope for this series, except when they 
are needed for the new functionality.

/Thomas



> --
>
> commit 88a159aad4fc080127cdbd3bc127dfaae0ae9c17
> Author: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>
> drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init()
>
> This commit fixes kernel panics on the module load error path by moving
> drmm_cgroup_register_region() after the list and buddy allocator are fully
> initialized.
>
> Closes: https://sashiko.dev/#/patchset/20260428073116.15687-1-thomas.hellstrom@linux.intel.com?part=4
>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>> index 2a241a5b12c42..ac3f71d77140a 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> [ ... ]
>> @@ -935,6 +932,10 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev)
>>   	if (err)
>>   		return err;
> [Severity: High]
> This is a pre-existing issue, but if gpu_buddy_init() fails, does it still
> leave the system vulnerable to the same module load error path panic?
>
> Looking at drivers/gpu/buddy.c, if gpu_buddy_init() fails after setting
> mm->n_roots, the error path cleans up allocations but leaves mm->n_roots
> populated and mm->roots as a dangling or NULL pointer:
>
> drivers/gpu/buddy.c:gpu_buddy_init() {
>      ...
>      mm->n_roots = hweight64(size);
>      ...
> out_free_tree:
>      while (i--)
>          kfree(mm->free_trees[i]);
>      kfree(mm->free_trees);
>      return -ENOMEM;
> }
>
> Since adev->mman.initialized is set to true before amdgpu_vram_mgr_init() is
> called, returning this error will trigger the teardown path where
> amdgpu_vram_mgr_fini() unconditionally calls gpu_buddy_fini():
>
> drivers/gpu/buddy.c:gpu_buddy_fini() {
>      ...
>      for (i = 0; i < mm->n_roots; ++i) {
>          start = gpu_buddy_block_offset(mm->roots[i]);
>      ...
> }
>
> Because mm->n_roots is non-zero, will this unconditionally dereference the
> invalid mm->roots pointer and cause a panic?
>
>> +	man->cg = drmm_cgroup_register_region(adev_to_drm(adev), "vram", adev->gmc.real_vram_size);
>> +	if (IS_ERR(man->cg))
>> +		return PTR_ERR(man->cg);
>> +
>>   	ttm_set_driver_manager(&adev->mman.bdev, TTM_PL_VRAM, &mgr->manager);
>>   	ttm_resource_manager_set_used(man, true);
>>   	return 0;

  reply	other threads:[~2026-06-11 17:27 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-11 14:22 [PATCH v5 0/6] [PATCH v5 0/6] Add reclaim to the dmem cgroup controller Thomas Hellström
2026-06-11 14:22 ` [PATCH v5 1/6] drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init() Thomas Hellström
2026-06-11 14:37   ` sashiko-bot
2026-06-11 17:27     ` Thomas Hellström [this message]
2026-06-11 14:22 ` [PATCH v5 2/6] cgroup/dmem: Introduce struct dmem_cgroup_init for region initialization Thomas Hellström
2026-06-11 14:22 ` [PATCH v5 3/6] cgroup/dmem: Add reclaim callback for lowering max below current usage Thomas Hellström
2026-06-11 14:40   ` sashiko-bot
2026-06-11 17:28     ` Thomas Hellström
2026-06-11 14:22 ` [PATCH v5 4/6] drm/ttm: Hook up a cgroup-aware reclaim callback for the dmem controller Thomas Hellström
2026-06-11 14:22 ` [PATCH v5 5/6] drm/xe: Wire up dmem cgroup reclaim for VRAM manager Thomas Hellström
2026-06-11 14:22 ` [PATCH v5 6/6] drm/amdgpu: " Thomas Hellström
2026-06-11 14:35   ` sashiko-bot
2026-06-11 14:49 ` ✗ CI.checkpatch: warning for Add reclaim to the dmem cgroup controller (rev5) Patchwork
2026-06-11 14:51 ` ✓ CI.KUnit: success " Patchwork
2026-06-11 16:44 ` ✓ Xe.CI.BAT: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f888a852-25d3-4bbb-8675-29ed080c1bfa@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hannes@cmpxchg.org \
    --cc=mkoutny@suse.com \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.