Re: [RFC PATCH] drm/buddy: Optimize large alignment handling to avoid unnecessary splits

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Arunpravin Paneer Selvam <arunpravin.paneerselvam@amd.com>
To: Matthew Auld <matthew.auld@intel.com>,
	christian.koenig@amd.com, amd-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	intel-xe@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Subject: Re: [RFC PATCH] drm/buddy: Optimize large alignment handling to avoid unnecessary splits
Date: Thu, 29 Jan 2026 18:49:06 +0530	[thread overview]
Message-ID: <794705d7-84aa-41da-b028-9b9fb29318c8@amd.com> (raw)
In-Reply-To: <db877e44-a548-4a25-971b-d9a7729701a4@intel.com>

Hi Matthew,

On 22/01/26 20:32, Matthew Auld wrote:
> On 22/01/2026 14:13, Arunpravin Paneer Selvam wrote:
>>
>> On 09/01/26 22:12, Matthew Auld wrote:
>>> On 11/12/2025 12:23, Arunpravin Paneer Selvam wrote:
>>>> Large alignment requests previously forced the allocator to search by
>>>> alignment order, causing large free blocks to be split even when a
>>>> smaller aligned range existed within them. This patch switches the
>>>> search to prioritize the requested size and uses an augmented RB-tree
>>>> field (subtree_max_alignment) to efficiently locate blocks that 
>>>> satisfy
>>>> both size and alignment. This prevents unnecessary block splitting and
>>>> significantly reduces fragmentation.
>>>>
>>>> A practical example is the VKCTS test
>>>> dEQP-VK.memory.allocation.basic.size_8KiB.reverse.count_4000, which
>>>> allocates 8 KiB buffers with a 256 KiB alignment. Previously, these
>>>> requests caused the allocator to split large blocks despite having
>>>> smaller aligned portions within them that could satisfy the 
>>>> allocation.
>>>> The new design now identifies and allocates from these portions,
>>>> avoiding unnecessary splitting.
>>>>
>>>> Signed-off-by: Arunpravin Paneer Selvam 
>>>> <Arunpravin.PaneerSelvam@amd.com>
>>>> Suggested-by: Christian König <christian.koenig@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/drm_buddy.c | 205 
>>>> +++++++++++++++++++++++++++++++++---
>>>>   include/drm/drm_buddy.h     |   3 +
>>>>   2 files changed, 191 insertions(+), 17 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
>>>> index f2c92902e4a3..f749814bb270 100644
>>>> --- a/drivers/gpu/drm/drm_buddy.c
>>>> +++ b/drivers/gpu/drm/drm_buddy.c
>>>> @@ -23,6 +23,18 @@ static struct kmem_cache *slab_blocks;
>>>>   #define for_each_free_tree(tree) \
>>>>       for ((tree) = 0; (tree) < DRM_BUDDY_MAX_FREE_TREES; (tree)++)
>>>>   +static unsigned int drm_buddy_min_offset_or_size_order(struct 
>>>> drm_buddy_block *block)
>>>> +{
>>>> +    return min_t(unsigned int,
>>>> +             __ffs(drm_buddy_block_offset(block)),
>>>> +             drm_buddy_block_order(block));
>>>
>>> Didn't quite get this bit. Why do we pick the min between the order 
>>> and "alignment"? Say we have order zero block but is has 256K addr 
>>> alignment this just selects zero? What is the idea here?
>> Sorry for the confusion. I mixed up two concepts and I have sent the 
>> offset alignment only patch. Please have a look.
>>>
>>>> +}
>>>> +
>>>> +RB_DECLARE_CALLBACKS_MAX(static, drm_buddy_augment_cb,
>>>> +             struct drm_buddy_block, rb,
>>>> +             unsigned int, subtree_max_alignment,
>>>> +             drm_buddy_min_offset_or_size_order);
>>>> +
>>>>   static struct drm_buddy_block *drm_block_alloc(struct drm_buddy *mm,
>>>>                              struct drm_buddy_block *parent,
>>>>                              unsigned int order,
>>>> @@ -40,6 +52,9 @@ static struct drm_buddy_block 
>>>> *drm_block_alloc(struct drm_buddy *mm,
>>>>       block->header |= order;
>>>>       block->parent = parent;
>>>>   +    block->subtree_max_alignment =
>>>> +        drm_buddy_min_offset_or_size_order(block);
>>>> +
>>>>       RB_CLEAR_NODE(&block->rb);
>>>>         BUG_ON(block->header & DRM_BUDDY_HEADER_UNUSED);
>>>> @@ -76,26 +91,32 @@ static bool rbtree_is_empty(struct rb_root *root)
>>>>       return RB_EMPTY_ROOT(root);
>>>>   }
>>>>   -static bool drm_buddy_block_offset_less(const struct 
>>>> drm_buddy_block *block,
>>>> -                    const struct drm_buddy_block *node)
>>>> -{
>>>> -    return drm_buddy_block_offset(block) < 
>>>> drm_buddy_block_offset(node);
>>>> -}
>>>> -
>>>> -static bool rbtree_block_offset_less(struct rb_node *block,
>>>> -                     const struct rb_node *node)
>>>> -{
>>>> -    return drm_buddy_block_offset_less(rbtree_get_free_block(block),
>>>> -                       rbtree_get_free_block(node));
>>>> -}
>>>> -
>>>>   static void rbtree_insert(struct drm_buddy *mm,
>>>>                 struct drm_buddy_block *block,
>>>>                 enum drm_buddy_free_tree tree)
>>>>   {
>>>> -    rb_add(&block->rb,
>>>> - &mm->free_trees[tree][drm_buddy_block_order(block)],
>>>> -           rbtree_block_offset_less);
>>>> +    struct rb_node **link, *parent = NULL;
>>>> +    struct drm_buddy_block *node;
>>>> +    struct rb_root *root;
>>>> +    unsigned int order;
>>>> +
>>>> +    order = drm_buddy_block_order(block);
>>>> +
>>>> +    root = &mm->free_trees[tree][order];
>>>> +    link = &root->rb_node;
>>>> +
>>>> +    while (*link) {
>>>> +        parent = *link;
>>>> +        node = rbtree_get_free_block(parent);
>>>> +
>>>> +        if (drm_buddy_block_offset(block) < 
>>>> drm_buddy_block_offset(node))
>>>> +            link = &parent->rb_left;
>>>> +        else
>>>> +            link = &parent->rb_right;
>>>
>>> Is this correct? From the docs it sounds like we are meant to update 
>>> the max alignment for each subtree on the path leading up to the 
>>> insertion? It looks like insert_augmentated will only do it if there 
>>> is something to be rebalanced.
>> AFAIU from the docs, rb_insert_augmented() updates the augmented 
>> value (subtree_max_alignment) for all nodes on the insertion path, 
>> not only when a rotation happens.
>
> Unless I'm looking in the wrong place, the docs for insert_augmented():
>
> "On insertion, the user must update the augmented information on the 
> path leading to the inserted node, then call rb_link_node() as usual 
> and rb_insert_augmented() instead of the usual rb_insert_color() call. 
> If rb_insert_augmented() rebalances the rbtree, it will callback into 
> a user provided function to update the augmented information on the 
> affected subtrees."
>
> Plus the example code they give for the augmented case also matches 
> this, so pretty sure we need it. See interval_tree_insert [1]. Also if 
> that is indeed true, then perhaps something to add to the kunit to 
> catch this case.
>
> [1] https://docs.kernel.org/core-api/rbtree.html#sample-usage

You are right. I am debugging the augmentation update path and working 
on the fixes.

Thanks,

Arun.

>
>
>>>
>>>> +    }
>>>> +
>>>> +    rb_link_node(&block->rb, parent, link);
>>>> +    rb_insert_augmented(&block->rb, root, &drm_buddy_augment_cb);
>>>>   }
>>>>     static void rbtree_remove(struct drm_buddy *mm,
>>>> @@ -108,7 +129,7 @@ static void rbtree_remove(struct drm_buddy *mm,
>>>>       tree = get_block_tree(block);
>>>>       root = &mm->free_trees[tree][order];
>>>>   -    rb_erase(&block->rb, root);
>>>> +    rb_erase_augmented(&block->rb, root, &drm_buddy_augment_cb);
>>>>       RB_CLEAR_NODE(&block->rb);
>>>>   }
>>>>   @@ -596,6 +617,88 @@ static bool block_incompatible(struct 
>>>> drm_buddy_block *block, unsigned int flags
>>>>       return needs_clear != drm_buddy_block_is_clear(block);
>>>>   }
>>>>   +static bool drm_buddy_subtree_can_satisfy(struct rb_node *node,
>>>> +                      unsigned int alignment)
>>>> +{
>>>> +    struct drm_buddy_block *block;
>>>> +
>>>> +    if (!node)
>>>> +        return false;
>>>> +
>>>> +    block = rbtree_get_free_block(node);
>>>> +    return block->subtree_max_alignment >= alignment;
>>>> +}
>>>> +
>>>> +static struct drm_buddy_block *
>>>> +drm_buddy_find_block_aligned(struct drm_buddy *mm,
>>>> +                 enum drm_buddy_free_tree tree,
>>>> +                 unsigned int order,
>>>> +                 unsigned int tmp,
>>>> +                 unsigned int alignment,
>>>> +                 unsigned long flags)
>>>> +{
>>>> +    struct rb_root *root = &mm->free_trees[tree][tmp];
>>>> +    struct rb_node *rb = root->rb_node;
>>>> +
>>>> +    /* Try to find a block of the requested size that is already 
>>>> aligned */
>>>> +    while (rb) {
>>>> +        struct drm_buddy_block *block = rbtree_get_free_block(rb);
>>>> +        struct rb_node *left_node = rb->rb_left, *right_node = rb- 
>>>> >rb_right;
>>>> +
>>>> +        if (left_node) {
>>>> +            if (drm_buddy_subtree_can_satisfy(left_node, 
>>>> alignment)) {
>>>> +                rb = left_node;
>>>> +                continue;
>>>> +            }
>>>> +        }
>>>> +
>>>> +        if (drm_buddy_block_order(block) >= order &&
>>>> +            __ffs(drm_buddy_block_offset(block)) >= alignment)
>>>> +            return block;
>>>> +
>>>> +        if (right_node) {
>>>> +            if (drm_buddy_subtree_can_satisfy(right_node, 
>>>> alignment)) {
>>>> +                rb = right_node;
>>>> +                continue;
>>>> +            }
>>>> +        }
>>>> +
>>>> +        break;
>>>> +    }
>>>> +
>>>> +    if (tmp < max(order, alignment))
>>>> +        return NULL;
>>>> +
>>>> +    /* If none found, look for a larger block that can satisfy the 
>>>> alignment */
>>>
>>> What is the idea here? IIUC we are looking at some specific order 
>>> and we want some min addr alignment, if the above can't find any 
>>> subtree with suitable max alignment then we should bail and try the 
>>> next order? Why instead do we do the search again with the same 
>>> alignment below?
>>
>> Same as above, I mixed up two concepts. Please review v1 of offset 
>> aligned allocation patch.
>>
>> Regards,
>>
>> Arun.
>>
>>>
>>>> +    rb = root->rb_node;
>>>> +    while (rb) {
>>>> +        struct drm_buddy_block *block = rbtree_get_free_block(rb);
>>>> +        struct rb_node *left_node = rb->rb_left, *right_node = rb- 
>>>> >rb_right;
>>>> +
>>>> +        if (left_node) {
>>>> +            if (drm_buddy_subtree_can_satisfy(left_node, 
>>>> alignment)) {
>>>> +                rb = left_node;
>>>> +                continue;
>>>> +            }
>>>> +        }
>>>> +
>>>> +        if (drm_buddy_block_order(block) >= max(order, alignment) &&
>>>> +            drm_buddy_min_offset_or_size_order(block) >= alignment)
>>>> +            return block;
>>>> +
>>>> +        if (right_node) {
>>>> +            if (drm_buddy_subtree_can_satisfy(right_node, 
>>>> alignment)) {
>>>> +                rb = right_node;
>>>> +                continue;
>>>> +            }
>>>> +        }
>>>> +
>>>> +        break;
>>>> +    }
>>>> +
>>>> +    return NULL;
>>>> +}
>>>> +
>>>>   static struct drm_buddy_block *
>>>>   __alloc_range_bias(struct drm_buddy *mm,
>>>>              u64 start, u64 end,
>>>> @@ -798,6 +901,69 @@ alloc_from_freetree(struct drm_buddy *mm,
>>>>       return ERR_PTR(err);
>>>>   }
>>>>   +static int drm_buddy_offset_aligned_allocation(struct drm_buddy 
>>>> *mm,
>>>> +                           u64 size,
>>>> +                           u64 min_block_size,
>>>> +                           unsigned long flags,
>>>> +                           struct list_head *blocks)
>>>> +{
>>>> +    struct drm_buddy_block *block = NULL;
>>>> +    unsigned int order, tmp, alignment;
>>>> +    enum drm_buddy_free_tree tree;
>>>> +    unsigned long pages;
>>>> +
>>>> +    alignment = ilog2(min_block_size);
>>>> +    pages = size >> ilog2(mm->chunk_size);
>>>> +    order = fls(pages) - 1;
>>>> +
>>>> +    tree = (flags & DRM_BUDDY_CLEAR_ALLOCATION) ?
>>>> +        DRM_BUDDY_CLEAR_TREE : DRM_BUDDY_DIRTY_TREE;
>>>> +
>>>> +    for (tmp = order; tmp <= mm->max_order; ++tmp) {
>>>> +        block = drm_buddy_find_block_aligned(mm, tree, order,
>>>> +                             tmp, alignment, flags);
>>>> +        if (!block) {
>>>> +            tree = (tree == DRM_BUDDY_CLEAR_TREE) ?
>>>> +                DRM_BUDDY_DIRTY_TREE : DRM_BUDDY_CLEAR_TREE;
>>>> +            block = drm_buddy_find_block_aligned(mm, tree, order,
>>>> +                                 tmp, alignment, flags);
>>>> +        }
>>>> +
>>>> +        if (block)
>>>> +            break;
>>>> +    }
>>>> +
>>>> +    if (!block)
>>>> +        return -ENOSPC;
>>>> +
>>>> +    while (drm_buddy_block_order(block) > order) {
>>>> +        unsigned int child_order = drm_buddy_block_order(block) - 1;
>>>> +        struct drm_buddy_block *left, *right;
>>>> +        int r;
>>>> +
>>>> +        r = split_block(mm, block);
>>>> +        if (r)
>>>> +            return r;
>>>> +
>>>> +        left  = block->left;
>>>> +        right = block->right;
>>>> +
>>>> +        if (child_order >= alignment)
>>>> +            block = right;
>>>> +        else
>>>> +            block = left;
>>>> +    }
>>>> +
>>>> +    mark_allocated(mm, block);
>>>> +    mm->avail -= drm_buddy_block_size(mm, block);
>>>> +    if (drm_buddy_block_is_clear(block))
>>>> +        mm->clear_avail -= drm_buddy_block_size(mm, block);
>>>> +    kmemleak_update_trace(block);
>>>> +    list_add_tail(&block->link, blocks);
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>>   static int __alloc_range(struct drm_buddy *mm,
>>>>                struct list_head *dfs,
>>>>                u64 start, u64 size,
>>>> @@ -1147,6 +1313,11 @@ int drm_buddy_alloc_blocks(struct drm_buddy 
>>>> *mm,
>>>>           min_block_size = size;
>>>>       /* Align size value to min_block_size */
>>>>       } else if (!IS_ALIGNED(size, min_block_size)) {
>>>> +        if (min_block_size > size && is_power_of_2(size))
>>>> +            return drm_buddy_offset_aligned_allocation(mm, size,
>>>> +                                   min_block_size,
>>>> +                                   flags,
>>>> +                                   blocks);
>>>>           size = round_up(size, min_block_size);
>>>>       }
>>>>   diff --git a/include/drm/drm_buddy.h b/include/drm/drm_buddy.h
>>>> index d7891d08f67a..da6a40fb4763 100644
>>>> --- a/include/drm/drm_buddy.h
>>>> +++ b/include/drm/drm_buddy.h
>>>> @@ -11,6 +11,7 @@
>>>>   #include <linux/slab.h>
>>>>   #include <linux/sched.h>
>>>>   #include <linux/rbtree.h>
>>>> +#include <linux/rbtree_augmented.h>
>>>>     #include <drm/drm_print.h>
>>>>   @@ -60,6 +61,8 @@ struct drm_buddy_block {
>>>>       };
>>>>         struct list_head tmp_link;
>>>> +
>>>> +    unsigned int subtree_max_alignment;
>>>>   };
>>>>     /* Order-zero must be at least SZ_4K */
>>>
>

     prev parent reply	other threads:[~2026-01-29 13:19 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-11 12:23 [RFC PATCH] drm/buddy: Optimize large alignment handling to avoid unnecessary splits Arunpravin Paneer Selvam
2026-01-09 16:42 ` Matthew Auld
2026-01-22 14:13   ` Arunpravin Paneer Selvam
2026-01-22 15:02     ` Matthew Auld
2026-01-29 13:19       ` Arunpravin Paneer Selvam [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=794705d7-84aa-41da-b028-9b9fb29318c8@amd.com \
    --to=arunpravin.paneerselvam@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox