From: Arunpravin Paneer Selvam <arunpravin.paneerselvam@amd.com>
To: Matthew Auld <matthew.auld@intel.com>,
christian.koenig@amd.com, dri-devel@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
amd-gfx@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Subject: Re: [PATCH v3 1/2] drm/buddy: Improve offset-aligned allocation handling
Date: Tue, 17 Feb 2026 15:46:28 +0530 [thread overview]
Message-ID: <7060e2d5-b99e-4dbb-8d8a-f2b6a80df922@amd.com> (raw)
In-Reply-To: <1c753753-ced2-41c9-bad1-c9a172003d2f@intel.com>
On 2/17/2026 3:31 PM, Matthew Auld wrote:
> On 17/02/2026 06:03, Arunpravin Paneer Selvam wrote:
>> Hi Matthew,
>>
>> On 2/10/2026 9:56 PM, Matthew Auld wrote:
>>> On 09/02/2026 08:30, Arunpravin Paneer Selvam wrote:
>>>> Large alignment requests previously forced the buddy allocator to
>>>> search by
>>>> alignment order, which often caused higher-order free blocks to be
>>>> split even
>>>> when a suitably aligned smaller region already existed within them.
>>>> This led
>>>> to excessive fragmentation, especially for workloads requesting
>>>> small sizes
>>>> with large alignment constraints.
>>>>
>>>> This change prioritizes the requested allocation size during the
>>>> search and
>>>> uses an augmented RB-tree field (subtree_max_alignment) to
>>>> efficiently locate
>>>> free blocks that satisfy both size and offset-alignment
>>>> requirements. As a
>>>> result, the allocator can directly select an aligned sub-region
>>>> without
>>>> splitting larger blocks unnecessarily.
>>>>
>>>> A practical example is the VKCTS test
>>>> dEQP-VK.memory.allocation.basic.size_8KiB.reverse.count_4000, which
>>>> repeatedly
>>>> allocates 8 KiB buffers with a 256 KiB alignment. Previously, such
>>>> allocations
>>>> caused large blocks to be split aggressively, despite smaller
>>>> aligned regions
>>>> being sufficient. With this change, those aligned regions are
>>>> reused directly,
>>>> significantly reducing fragmentation.
>>>>
>>>> This improvement is visible in the amdgpu VRAM buddy allocator state
>>>> (/sys/kernel/debug/dri/1/amdgpu_vram_mm). After the change, higher-
>>>> order blocks
>>>> are preserved and the number of low-order fragments is
>>>> substantially reduced.
>>>>
>>>> Before:
>>>> order- 5 free: 1936 MiB, blocks: 15490
>>>> order- 4 free: 967 MiB, blocks: 15486
>>>> order- 3 free: 483 MiB, blocks: 15485
>>>> order- 2 free: 241 MiB, blocks: 15486
>>>> order- 1 free: 241 MiB, blocks: 30948
>>>>
>>>> After:
>>>> order- 5 free: 493 MiB, blocks: 3941
>>>> order- 4 free: 246 MiB, blocks: 3943
>>>> order- 3 free: 123 MiB, blocks: 4101
>>>> order- 2 free: 61 MiB, blocks: 4101
>>>> order- 1 free: 61 MiB, blocks: 8018
>>>>
>>>> By avoiding unnecessary splits, this change improves allocator
>>>> efficiency and
>>>> helps maintain larger contiguous free regions under heavy
>>>> offset-aligned
>>>> allocation workloads.
>>>>
>>>> v2:(Matthew)
>>>> - Update augmented information along the path to the inserted node.
>>>>
>>>> v3:
>>>> - Move the patch to gpu/buddy.c file.
>>>>
>>>> Signed-off-by: Arunpravin Paneer Selvam
>>>> <Arunpravin.PaneerSelvam@amd.com>
>>>> Suggested-by: Christian König <christian.koenig@amd.com>
>>>> ---
>>>> drivers/gpu/buddy.c | 271
>>>> +++++++++++++++++++++++++++++++-------
>>>> include/linux/gpu_buddy.h | 2 +
>>>> 2 files changed, 228 insertions(+), 45 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/buddy.c b/drivers/gpu/buddy.c
>>>> index 603c59a2013a..3a25eed050ba 100644
>>>> --- a/drivers/gpu/buddy.c
>>>> +++ b/drivers/gpu/buddy.c
>>>> @@ -14,6 +14,16 @@
>>>> static struct kmem_cache *slab_blocks;
>>>> +static unsigned int gpu_buddy_block_offset_alignment(struct
>>>> gpu_buddy_block *block)
>>>> +{
>>>> + return __ffs(gpu_buddy_block_offset(block));
>>>
>>> __ffs() will be undefined for offset zero it seems, so might blow up
>>> in some strange way. I guess just return the max possible alignment
>>> here if offset is zero? Also are we meant to use __ffs64() here?
>> Yes, I had the same concern about __ffs() being undefined when the
>> offset is zero. My initial thought was to derive the maximum possible
>> alignment from the allocator size using ilog2(mm->size) and return
>> that value for the zero-offset case.
>>
>> But, RB_DECLARE_CALLBACKS_MAX() requires the compute callback
>> (gpu_buddy_block_offset_alignment()) to accept only a single struct
>> gpu_buddy_block * argument. It does not provide a mechanism to pass
>> additional context such as the associated struct gpu_buddy *mm. As a
>> result, deriving the alignment from allocator state (e.g., via
>> ilog2(mm- >size)) is not directly feasible within this callback.
>> When I tested the zero-offset case locally, __ffs() returned 64,
>> which effectively corresponds to the maximum alignment for a u64
>> offset. Based on that observation, I initially left the __ffs() call
>> unchanged for the zero case as well.
>>
>> One possible alternative would be to store a pointer to struct
>> gpu_buddy inside each gpu_buddy_block.
>>
>> All other review comments have been addressed, and I will send a v4
>> once this point is clarified.
>
> Yeah, I was thinking we just return the max theoretical value, so 64,
> or perhaps 64+1. It just needs to be a value that will be larger than
> any other possible alignment, since zero is special. It shouldn't
> matter if that is larger than the actual real max for the region, I
> think.
>
> if (!offset)
> return 64 + 1;
>
> return __ffs64(offset);
>
> ?
Yes, that should work. I will update the helper accordingly in v4.
Regards,
Arun.
>
>>
>> Regards,
>> Arun.
>>>
>>>> +}
>>>> +
>>>> +RB_DECLARE_CALLBACKS_MAX(static, gpu_buddy_augment_cb,
>>>> + struct gpu_buddy_block, rb,
>>>> + unsigned int, subtree_max_alignment,
>>>> + gpu_buddy_block_offset_alignment);
>>>> +
>>>> static struct gpu_buddy_block *gpu_block_alloc(struct gpu_buddy *mm,
>>>> struct gpu_buddy_block *parent,
>>>> unsigned int order,
>>>> @@ -31,6 +41,9 @@ static struct gpu_buddy_block
>>>> *gpu_block_alloc(struct gpu_buddy *mm,
>>>> block->header |= order;
>>>> block->parent = parent;
>>>> + block->subtree_max_alignment =
>>>> + gpu_buddy_block_offset_alignment(block);
>>>> +
>>>> RB_CLEAR_NODE(&block->rb);
>>>> BUG_ON(block->header & GPU_BUDDY_HEADER_UNUSED);
>>>> @@ -67,26 +80,42 @@ static bool rbtree_is_empty(struct rb_root *root)
>>>> return RB_EMPTY_ROOT(root);
>>>> }
>>>> -static bool gpu_buddy_block_offset_less(const struct
>>>> gpu_buddy_block *block,
>>>> - const struct gpu_buddy_block *node)
>>>> -{
>>>> - return gpu_buddy_block_offset(block) <
>>>> gpu_buddy_block_offset(node);
>>>> -}
>>>> -
>>>> -static bool rbtree_block_offset_less(struct rb_node *block,
>>>> - const struct rb_node *node)
>>>> -{
>>>> - return gpu_buddy_block_offset_less(rbtree_get_free_block(block),
>>>> - rbtree_get_free_block(node));
>>>> -}
>>>> -
>>>> static void rbtree_insert(struct gpu_buddy *mm,
>>>> struct gpu_buddy_block *block,
>>>> enum gpu_buddy_free_tree tree)
>>>> {
>>>> - rb_add(&block->rb,
>>>> - &mm->free_trees[tree][gpu_buddy_block_order(block)],
>>>> - rbtree_block_offset_less);
>>>> + struct rb_node **link, *parent = NULL;
>>>> + unsigned int block_alignment, order;
>>>> + struct gpu_buddy_block *node;
>>>> + struct rb_root *root;
>>>> +
>>>> + order = gpu_buddy_block_order(block);
>>>> + block_alignment = gpu_buddy_block_offset_alignment(block);
>>>> +
>>>> + root = &mm->free_trees[tree][order];
>>>> + link = &root->rb_node;
>>>> +
>>>> + while (*link) {
>>>> + parent = *link;
>>>> + node = rbtree_get_free_block(parent);
>>>> + /*
>>>> + * Manual augmentation update during insertion traversal.
>>>> Required
>>>> + * because rb_insert_augmented() only calls rotate
>>>> callback during
>>>> + * rotations. This ensures all ancestors on the insertion
>>>> path have
>>>> + * correct subtree_max_alignment values.
>>>> + */
>>>> + if (node->subtree_max_alignment < block_alignment)
>>>> + node->subtree_max_alignment = block_alignment;
>>>> +
>>>> + if (gpu_buddy_block_offset(block) <
>>>> gpu_buddy_block_offset(node))
>>>> + link = &parent->rb_left;
>>>> + else
>>>> + link = &parent->rb_right;
>>>> + }
>>>> +
>>>> + block->subtree_max_alignment = block_alignment;
>>>> + rb_link_node(&block->rb, parent, link);
>>>> + rb_insert_augmented(&block->rb, root, &gpu_buddy_augment_cb);
>>>> }
>>>> static void rbtree_remove(struct gpu_buddy *mm,
>>>> @@ -99,7 +128,7 @@ static void rbtree_remove(struct gpu_buddy *mm,
>>>> tree = get_block_tree(block);
>>>> root = &mm->free_trees[tree][order];
>>>> - rb_erase(&block->rb, root);
>>>> + rb_erase_augmented(&block->rb, root, &gpu_buddy_augment_cb);
>>>> RB_CLEAR_NODE(&block->rb);
>>>> }
>>>> @@ -790,6 +819,132 @@ alloc_from_freetree(struct gpu_buddy *mm,
>>>> return ERR_PTR(err);
>>>> }
>>>> +static bool
>>>> +gpu_buddy_can_offset_align(u64 size, u64 min_block_size)
>>>> +{
>>>> + return size < min_block_size && is_power_of_2(size);
>>>> +}
>>>> +
>>>> +static bool gpu_buddy_subtree_can_satisfy(struct rb_node *node,
>>>> + unsigned int alignment)
>>>> +{
>>>> + struct gpu_buddy_block *block;
>>>> +
>>>> + if (!node)
>>>> + return false;
>>>
>>> All callers seem to handle null case already, so could potentially
>>> drop this?
>>>
>>>> +
>>>> + block = rbtree_get_free_block(node);
>>>> + return block->subtree_max_alignment >= alignment;
>>>> +}
>>>> +
>>>> +static struct gpu_buddy_block *
>>>> +gpu_buddy_find_block_aligned(struct gpu_buddy *mm,
>>>> + enum gpu_buddy_free_tree tree,
>>>> + unsigned int order,
>>>> + unsigned int tmp,
>>>> + unsigned int alignment,
>>>> + unsigned long flags)
>>>> +{
>>>> + struct rb_root *root = &mm->free_trees[tree][tmp];
>>>> + struct rb_node *rb = root->rb_node;
>>>> +
>>>> + while (rb) {
>>>> + struct gpu_buddy_block *block = rbtree_get_free_block(rb);
>>>> + struct rb_node *left_node = rb->rb_left, *right_node = rb-
>>>> >rb_right;
>>>> +
>>>> + if (right_node) {
>>>> + if (gpu_buddy_subtree_can_satisfy(right_node,
>>>> alignment)) {
>>>> + rb = right_node;
>>>> + continue;
>>>> + }
>>>> + }
>>>> +
>>>> + if (gpu_buddy_block_order(block) >= order &&
>>>
>>> Is this not always true? With that we can drop order, or better yet
>>> s/ tmp/order/ ?
>>>
>>>> + __ffs(gpu_buddy_block_offset(block)) >= alignment)
>>>
>>> Same here with undefined offset zero case. I guess also use the helper.
>>>
>>>> + return block;
>>>> +
>>>> + if (left_node) {
>>>> + if (gpu_buddy_subtree_can_satisfy(left_node,
>>>> alignment)) {
>>>> + rb = left_node;
>>>> + continue;
>>>> + }
>>>> + }
>>>> +
>>>> + break;
>>>> + }
>>>> +
>>>> + return NULL;
>>>> +}
>>>> +
>>>> +static struct gpu_buddy_block *
>>>> +gpu_buddy_offset_aligned_allocation(struct gpu_buddy *mm,
>>>> + u64 size,
>>>> + u64 min_block_size,
>>>> + unsigned long flags)
>>>> +{
>>>> + struct gpu_buddy_block *block = NULL;
>>>> + unsigned int order, tmp, alignment;
>>>> + struct gpu_buddy_block *buddy;
>>>> + enum gpu_buddy_free_tree tree;
>>>> + unsigned long pages;
>>>> + int err;
>>>> +
>>>> + alignment = ilog2(min_block_size);
>>>> + pages = size >> ilog2(mm->chunk_size);
>>>> + order = fls(pages) - 1;
>>>> +
>>>> + tree = (flags & GPU_BUDDY_CLEAR_ALLOCATION) ?
>>>> + GPU_BUDDY_CLEAR_TREE : GPU_BUDDY_DIRTY_TREE;
>>>> +
>>>> + for (tmp = order; tmp <= mm->max_order; ++tmp) {
>>>> + block = gpu_buddy_find_block_aligned(mm, tree, order,
>>>> + tmp, alignment, flags);
>>>> + if (!block) {
>>>> + tree = (tree == GPU_BUDDY_CLEAR_TREE) ?
>>>> + GPU_BUDDY_DIRTY_TREE : GPU_BUDDY_CLEAR_TREE;
>>>> + block = gpu_buddy_find_block_aligned(mm, tree, order,
>>>> + tmp, alignment, flags);
>>>> + }
>>>> +
>>>> + if (block)
>>>> + break;
>>>> + }
>>>> +
>>>> + if (!block)
>>>> + return ERR_PTR(-ENOSPC);
>>>> +
>>>> + while (gpu_buddy_block_order(block) > order) {
>>>> + struct gpu_buddy_block *left, *right;
>>>> +
>>>> + err = split_block(mm, block);
>>>> + if (unlikely(err))
>>>> + goto err_undo;
>>>> +
>>>> + left = block->left;
>>>> + right = block->right;
>>>> +
>>>> + if (__ffs(gpu_buddy_block_offset(right)) >= alignment)
>>>
>>> Might be better to use the helper for this?
>>>
>>>> + block = right;
>>>> + else
>>>> + block = left;
>>>> + }
>>>> +
>>>> + return block;
>>>> +
>>>> +err_undo:
>>>> + /*
>>>> + * We really don't want to leave around a bunch of split
>>>> blocks, since
>>>> + * bigger is better, so make sure we merge everything back
>>>> before we
>>>> + * free the allocated blocks.
>>>> + */
>>>> + buddy = __get_buddy(block);
>>>> + if (buddy &&
>>>> + (gpu_buddy_block_is_free(block) &&
>>>> + gpu_buddy_block_is_free(buddy)))
>>>> + __gpu_buddy_free(mm, block, false);
>>>> + return ERR_PTR(err);
>>>> +}
>>>> +
>>>> static int __alloc_range(struct gpu_buddy *mm,
>>>> struct list_head *dfs,
>>>> u64 start, u64 size,
>>>> @@ -1059,6 +1214,7 @@ EXPORT_SYMBOL(gpu_buddy_block_trim);
>>>> static struct gpu_buddy_block *
>>>> __gpu_buddy_alloc_blocks(struct gpu_buddy *mm,
>>>> u64 start, u64 end,
>>>> + u64 size, u64 min_block_size,
>>>> unsigned int order,
>>>> unsigned long flags)
>>>> {
>>>> @@ -1066,6 +1222,11 @@ __gpu_buddy_alloc_blocks(struct gpu_buddy *mm,
>>>> /* Allocate traversing within the range */
>>>> return __gpu_buddy_alloc_range_bias(mm, start, end,
>>>> order, flags);
>>>> + else if (size < min_block_size)
>>>> + /* Allocate from an offset-aligned region without size
>>>> rounding */
>>>> + return gpu_buddy_offset_aligned_allocation(mm, size,
>>>> + min_block_size,
>>>> + flags);
>>>> else
>>>> /* Allocate from freetree */
>>>> return alloc_from_freetree(mm, order, flags);
>>>> @@ -1137,8 +1298,11 @@ int gpu_buddy_alloc_blocks(struct gpu_buddy
>>>> *mm,
>>>> if (flags & GPU_BUDDY_CONTIGUOUS_ALLOCATION) {
>>>> size = roundup_pow_of_two(size);
>>>> min_block_size = size;
>>>> - /* Align size value to min_block_size */
>>>> - } else if (!IS_ALIGNED(size, min_block_size)) {
>>>> + /*
>>>> + * Normalize the requested size to min_block_size for
>>>> regular allocations.
>>>> + * Offset-aligned allocations intentionally skip size
>>>> rounding.
>>>> + */
>>>> + } else if (!gpu_buddy_can_offset_align(size, min_block_size)) {
>>>> size = round_up(size, min_block_size);
>>>> }
>>>> @@ -1158,43 +1322,60 @@ int gpu_buddy_alloc_blocks(struct
>>>> gpu_buddy *mm,
>>>> do {
>>>> order = min(order, (unsigned int)fls(pages) - 1);
>>>> BUG_ON(order > mm->max_order);
>>>> - BUG_ON(order < min_order);
>>>> + /*
>>>> + * Regular allocations must not allocate blocks smaller
>>>> than min_block_size.
>>>> + * Offset-aligned allocations deliberately bypass this
>>>> constraint.
>>>> + */
>>>> + BUG_ON(size >= min_block_size && order < min_order);
>>>> do {
>>>> + unsigned int fallback_order;
>>>> +
>>>> block = __gpu_buddy_alloc_blocks(mm, start,
>>>> end,
>>>> + size,
>>>> + min_block_size,
>>>> order,
>>>> flags);
>>>> if (!IS_ERR(block))
>>>> break;
>>>> - if (order-- == min_order) {
>>>> - /* Try allocation through force merge method */
>>>> - if (mm->clear_avail &&
>>>> - !__force_merge(mm, start, end, min_order)) {
>>>> - block = __gpu_buddy_alloc_blocks(mm, start,
>>>> - end,
>>>> - min_order,
>>>> - flags);
>>>> - if (!IS_ERR(block)) {
>>>> - order = min_order;
>>>> - break;
>>>> - }
>>>> - }
>>>> + if (size < min_block_size) {
>>>> + fallback_order = order;
>>>> + } else if (order == min_order) {
>>>> + fallback_order = min_order;
>>>> + } else {
>>>> + order--;
>>>> + continue;
>>>> + }
>>>> - /*
>>>> - * Try contiguous block allocation through
>>>> - * try harder method.
>>>> - */
>>>> - if (flags & GPU_BUDDY_CONTIGUOUS_ALLOCATION &&
>>>> - !(flags & GPU_BUDDY_RANGE_ALLOCATION))
>>>> - return __alloc_contig_try_harder(mm,
>>>> - original_size,
>>>> - original_min_size,
>>>> - blocks);
>>>> - err = -ENOSPC;
>>>> - goto err_free;
>>>> + /* Try allocation through force merge method */
>>>> + if (mm->clear_avail &&
>>>> + !__force_merge(mm, start, end, fallback_order)) {
>>>> + block = __gpu_buddy_alloc_blocks(mm, start,
>>>> + end,
>>>> + size,
>>>> + min_block_size,
>>>> + fallback_order,
>>>> + flags);
>>>> + if (!IS_ERR(block)) {
>>>> + order = fallback_order;
>>>> + break;
>>>> + }
>>>> }
>>>> +
>>>> + /*
>>>> + * Try contiguous block allocation through
>>>> + * try harder method.
>>>> + */
>>>> + if (flags & GPU_BUDDY_CONTIGUOUS_ALLOCATION &&
>>>> + !(flags & GPU_BUDDY_RANGE_ALLOCATION))
>>>> + return __alloc_contig_try_harder(mm,
>>>> + original_size,
>>>> + original_min_size,
>>>> + blocks);
>>>> + err = -ENOSPC;
>>>> + goto err_free;
>>>> } while (1);
>>>> mark_allocated(mm, block);
>>>> diff --git a/include/linux/gpu_buddy.h b/include/linux/gpu_buddy.h
>>>> index 07ac65db6d2e..7ad817c69ec6 100644
>>>> --- a/include/linux/gpu_buddy.h
>>>> +++ b/include/linux/gpu_buddy.h
>>>> @@ -11,6 +11,7 @@
>>>> #include <linux/slab.h>
>>>> #include <linux/sched.h>
>>>> #include <linux/rbtree.h>
>>>> +#include <linux/rbtree_augmented.h>
>>>> #define GPU_BUDDY_RANGE_ALLOCATION BIT(0)
>>>> #define GPU_BUDDY_TOPDOWN_ALLOCATION BIT(1)
>>>> @@ -58,6 +59,7 @@ struct gpu_buddy_block {
>>>> };
>>>> struct list_head tmp_link;
>>>> + unsigned int subtree_max_alignment;
>>>> };
>>>> /* Order-zero must be at least SZ_4K */
>>>>
>>>> base-commit: 9d757669b2b22cd224c334924f798393ffca537c
>>>
>>
>
prev parent reply other threads:[~2026-02-17 10:16 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-09 8:30 [PATCH v3 1/2] drm/buddy: Improve offset-aligned allocation handling Arunpravin Paneer Selvam
2026-02-09 8:30 ` [PATCH v3 2/2] drm/buddy: Add KUnit test for offset-aligned allocations Arunpravin Paneer Selvam
2026-02-09 19:23 ` kernel test robot
2026-02-09 19:26 ` kernel test robot
2026-02-09 21:20 ` kernel test robot
2026-02-09 9:46 ` ✓ i915.CI.BAT: success for series starting with [v3,1/2] drm/buddy: Improve offset-aligned allocation handling Patchwork
2026-02-09 13:22 ` ✗ i915.CI.Full: failure " Patchwork
2026-02-10 16:26 ` [PATCH v3 1/2] " Matthew Auld
2026-02-17 6:03 ` Arunpravin Paneer Selvam
2026-02-17 10:01 ` Matthew Auld
2026-02-17 10:16 ` Arunpravin Paneer Selvam [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7060e2d5-b99e-4dbb-8d8a-f2b6a80df922@amd.com \
--to=arunpravin.paneerselvam@amd.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=christian.koenig@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.auld@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox