From: Matthew Auld <matthew.auld@intel.com>
To: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>,
christian.koenig@amd.com, dri-devel@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
amd-gfx@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Subject: Re: [PATCH v2 1/2] gpu/buddy: replace dual-tree/force_merge with decoupled clear tracker
Date: Fri, 8 May 2026 16:19:35 +0100 [thread overview]
Message-ID: <75bcbaac-c5f9-4dc6-af1e-4d9ed66d5f16@intel.com> (raw)
In-Reply-To: <20260504111055.262964-1-Arunpravin.PaneerSelvam@amd.com>
On 04/05/2026 12:10, Arunpravin Paneer Selvam wrote:
> The current buddy allocator maintains separate clear_tree[] and
> dirty_tree[] rbtrees per order, preventing coalescing between cleared
> and dirty buddies. Under mixed workloads, this creates a merge barrier:
> adjacent buddies frequently end up split across trees, forcing reliance
> on __force_merge() during allocation.
>
> __force_merge() performs an O(N x max_order) scan under the VRAM manager
> lock, leading to allocation stalls and failures for large contiguous
> requests even when sufficient total free memory is available.
>
> Solution
>
> Replace the dual-tree design with:
> - A single free_tree[order] rbtree for dirty and mixed free blocks
> (fully cleared free blocks float outside this tree)
> - A lightweight out-of-band clear tracker (gpu_clear_tracker)
>
> Fully cleared free blocks are tracked outside the buddy trees using an
> augmented interval rbtree, enabling O(log E) lookup of the largest
> cleared extents.
>
> Buddy coalescing is now unconditional in __gpu_buddy_free(), regardless
> of clear/dirty state. This removes the merge barrier and eliminates the
> need for __force_merge().
>
> Benefits
>
> - Correct high-order allocations after mixed clear/dirty workloads
> - Elimination of O(N x max_order) merge cost from the allocation path
> - O(log E) cleared-extent lookup replacing O(N) scans
> - Predictable allocation latency under fragmentation
> - Reduced complexity with a single tree per order
>
> Test:
> dEQP-VK.memory.allocation.basic.size_8KiB.reverse.count_4000
>
> Below data is from /sys/kernel/debug/dri/1/amdgpu_vram_mm:
>
> Base (dual-tree), before VKCTS test:
> order- 6 free: 6 MiB, blocks: 26
> order- 5 free: 1 MiB, blocks: 15
> order- 4 free: 960 KiB, blocks: 15
> order- 3 free: 5 MiB, blocks: 171
> order- 2 free: 2 MiB, blocks: 176
> order- 1 free: 1 MiB, blocks: 165
> order- 0 free: 16 KiB, blocks: 4
>
> Base (dual-tree), after VKCTS test:
> order- 6 free: 768 KiB, blocks: 3
> order- 5 free: 499 MiB, blocks: 3999
> order- 4 free: 250 MiB, blocks: 4001
> order- 3 free: 129 MiB, blocks: 4157
> order- 2 free: 65 MiB, blocks: 4161
> order- 1 free: 63 MiB, blocks: 8138
> order- 0 free: 20 KiB, blocks: 5
>
> Clear tracker, before VKCTS test:
> order- 6 free: 4 MiB, blocks: 19
> order- 5 free: 2 MiB, blocks: 18
> order- 4 free: 704 KiB, blocks: 11
> order- 3 free: 5 MiB, blocks: 168
> order- 2 free: 2 MiB, blocks: 174
> order- 1 free: 1 MiB, blocks: 167
> order- 0 free: 32 KiB, blocks: 8
>
> Clear tracker, after VKCTS test:
> order- 6 free: 4 MiB, blocks: 19
> order- 5 free: 2 MiB, blocks: 18
> order- 4 free: 704 KiB, blocks: 11
> order- 3 free: 5 MiB, blocks: 168
> order- 2 free: 2 MiB, blocks: 174
> order- 1 free: 1 MiB, blocks: 167
> order- 0 free: 28 KiB, blocks: 7
>
> v2:
> - Code-style cleanup and minor refactoring
> - Renamed locals for clarity
>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Christian König <christian.koenig@amd.com>
> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Still need some more time to fully go over this, but in the meantime
there is some feedback here from sashiko, which might be worth a look:
https://sashiko.dev/#/patchset/20260504111055.262964-1-Arunpravin.PaneerSelvam%40amd.com
next prev parent reply other threads:[~2026-05-08 15:19 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-04 11:10 [PATCH v2 1/2] gpu/buddy: replace dual-tree/force_merge with decoupled clear tracker Arunpravin Paneer Selvam
2026-05-04 11:10 ` [PATCH v2 2/2] gpu/tests/buddy: add clear-tracker allocation latency benchmarks Arunpravin Paneer Selvam
2026-05-08 15:19 ` Matthew Auld [this message]
2026-05-08 16:26 ` [PATCH v2 1/2] gpu/buddy: replace dual-tree/force_merge with decoupled clear tracker Arunpravin Paneer Selvam
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=75bcbaac-c5f9-4dc6-af1e-4d9ed66d5f16@intel.com \
--to=matthew.auld@intel.com \
--cc=Arunpravin.PaneerSelvam@amd.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=christian.koenig@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox