Re: [PATCH v3 1/2] drm/buddy: Improve offset-aligned allocation handling

public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed

From: Matthew Auld <matthew.auld@intel.com>
To: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>,
	christian.koenig@amd.com, dri-devel@lists.freedesktop.org,
	intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	amd-gfx@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Subject: Re: [PATCH v3 1/2] drm/buddy: Improve offset-aligned allocation handling
Date: Tue, 10 Feb 2026 16:26:24 +0000	[thread overview]
Message-ID: <ce9833ef-8cfa-4b1a-b4d5-eda0158cb19d@intel.com> (raw)
In-Reply-To: <20260209083051.13376-1-Arunpravin.PaneerSelvam@amd.com>

On 09/02/2026 08:30, Arunpravin Paneer Selvam wrote:
> Large alignment requests previously forced the buddy allocator to search by
> alignment order, which often caused higher-order free blocks to be split even
> when a suitably aligned smaller region already existed within them. This led
> to excessive fragmentation, especially for workloads requesting small sizes
> with large alignment constraints.
> 
> This change prioritizes the requested allocation size during the search and
> uses an augmented RB-tree field (subtree_max_alignment) to efficiently locate
> free blocks that satisfy both size and offset-alignment requirements. As a
> result, the allocator can directly select an aligned sub-region without
> splitting larger blocks unnecessarily.
> 
> A practical example is the VKCTS test
> dEQP-VK.memory.allocation.basic.size_8KiB.reverse.count_4000, which repeatedly
> allocates 8 KiB buffers with a 256 KiB alignment. Previously, such allocations
> caused large blocks to be split aggressively, despite smaller aligned regions
> being sufficient. With this change, those aligned regions are reused directly,
> significantly reducing fragmentation.
> 
> This improvement is visible in the amdgpu VRAM buddy allocator state
> (/sys/kernel/debug/dri/1/amdgpu_vram_mm). After the change, higher-order blocks
> are preserved and the number of low-order fragments is substantially reduced.
> 
> Before:
>    order- 5 free: 1936 MiB, blocks: 15490
>    order- 4 free:  967 MiB, blocks: 15486
>    order- 3 free:  483 MiB, blocks: 15485
>    order- 2 free:  241 MiB, blocks: 15486
>    order- 1 free:  241 MiB, blocks: 30948
> 
> After:
>    order- 5 free:  493 MiB, blocks:  3941
>    order- 4 free:  246 MiB, blocks:  3943
>    order- 3 free:  123 MiB, blocks:  4101
>    order- 2 free:   61 MiB, blocks:  4101
>    order- 1 free:   61 MiB, blocks:  8018
> 
> By avoiding unnecessary splits, this change improves allocator efficiency and
> helps maintain larger contiguous free regions under heavy offset-aligned
> allocation workloads.
> 
> v2:(Matthew)
>    - Update augmented information along the path to the inserted node.
> 
> v3:
>    - Move the patch to gpu/buddy.c file.
> 
> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
> Suggested-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/buddy.c       | 271 +++++++++++++++++++++++++++++++-------
>   include/linux/gpu_buddy.h |   2 +
>   2 files changed, 228 insertions(+), 45 deletions(-)
> 
> diff --git a/drivers/gpu/buddy.c b/drivers/gpu/buddy.c
> index 603c59a2013a..3a25eed050ba 100644
> --- a/drivers/gpu/buddy.c
> +++ b/drivers/gpu/buddy.c
> @@ -14,6 +14,16 @@
>   
>   static struct kmem_cache *slab_blocks;
>   
> +static unsigned int gpu_buddy_block_offset_alignment(struct gpu_buddy_block *block)
> +{
> +	return __ffs(gpu_buddy_block_offset(block));

__ffs() will be undefined for offset zero it seems, so might blow up in 
some strange way. I guess just return the max possible alignment here if 
offset is zero? Also are we meant to use __ffs64() here?

> +}
> +
> +RB_DECLARE_CALLBACKS_MAX(static, gpu_buddy_augment_cb,
> +			 struct gpu_buddy_block, rb,
> +			 unsigned int, subtree_max_alignment,
> +			 gpu_buddy_block_offset_alignment);
> +
>   static struct gpu_buddy_block *gpu_block_alloc(struct gpu_buddy *mm,
>   					       struct gpu_buddy_block *parent,
>   					       unsigned int order,
> @@ -31,6 +41,9 @@ static struct gpu_buddy_block *gpu_block_alloc(struct gpu_buddy *mm,
>   	block->header |= order;
>   	block->parent = parent;
>   
> +	block->subtree_max_alignment =
> +		gpu_buddy_block_offset_alignment(block);
> +
>   	RB_CLEAR_NODE(&block->rb);
>   
>   	BUG_ON(block->header & GPU_BUDDY_HEADER_UNUSED);
> @@ -67,26 +80,42 @@ static bool rbtree_is_empty(struct rb_root *root)
>   	return RB_EMPTY_ROOT(root);
>   }
>   
> -static bool gpu_buddy_block_offset_less(const struct gpu_buddy_block *block,
> -					const struct gpu_buddy_block *node)
> -{
> -	return gpu_buddy_block_offset(block) < gpu_buddy_block_offset(node);
> -}
> -
> -static bool rbtree_block_offset_less(struct rb_node *block,
> -				     const struct rb_node *node)
> -{
> -	return gpu_buddy_block_offset_less(rbtree_get_free_block(block),
> -					   rbtree_get_free_block(node));
> -}
> -
>   static void rbtree_insert(struct gpu_buddy *mm,
>   			  struct gpu_buddy_block *block,
>   			  enum gpu_buddy_free_tree tree)
>   {
> -	rb_add(&block->rb,
> -	       &mm->free_trees[tree][gpu_buddy_block_order(block)],
> -	       rbtree_block_offset_less);
> +	struct rb_node **link, *parent = NULL;
> +	unsigned int block_alignment, order;
> +	struct gpu_buddy_block *node;
> +	struct rb_root *root;
> +
> +	order = gpu_buddy_block_order(block);
> +	block_alignment = gpu_buddy_block_offset_alignment(block);
> +
> +	root = &mm->free_trees[tree][order];
> +	link = &root->rb_node;
> +
> +	while (*link) {
> +		parent = *link;
> +		node = rbtree_get_free_block(parent);
> +		/*
> +		 * Manual augmentation update during insertion traversal. Required
> +		 * because rb_insert_augmented() only calls rotate callback during
> +		 * rotations. This ensures all ancestors on the insertion path have
> +		 * correct subtree_max_alignment values.
> +		 */
> +		if (node->subtree_max_alignment < block_alignment)
> +			node->subtree_max_alignment = block_alignment;
> +
> +		if (gpu_buddy_block_offset(block) < gpu_buddy_block_offset(node))
> +			link = &parent->rb_left;
> +		else
> +			link = &parent->rb_right;
> +	}
> +
> +	block->subtree_max_alignment = block_alignment;
> +	rb_link_node(&block->rb, parent, link);
> +	rb_insert_augmented(&block->rb, root, &gpu_buddy_augment_cb);
>   }
>   
>   static void rbtree_remove(struct gpu_buddy *mm,
> @@ -99,7 +128,7 @@ static void rbtree_remove(struct gpu_buddy *mm,
>   	tree = get_block_tree(block);
>   	root = &mm->free_trees[tree][order];
>   
> -	rb_erase(&block->rb, root);
> +	rb_erase_augmented(&block->rb, root, &gpu_buddy_augment_cb);
>   	RB_CLEAR_NODE(&block->rb);
>   }
>   
> @@ -790,6 +819,132 @@ alloc_from_freetree(struct gpu_buddy *mm,
>   	return ERR_PTR(err);
>   }
>   
> +static bool
> +gpu_buddy_can_offset_align(u64 size, u64 min_block_size)
> +{
> +	return size < min_block_size && is_power_of_2(size);
> +}
> +
> +static bool gpu_buddy_subtree_can_satisfy(struct rb_node *node,
> +					  unsigned int alignment)
> +{
> +	struct gpu_buddy_block *block;
> +
> +	if (!node)
> +		return false;

All callers seem to handle null case already, so could potentially drop 
this?

> +
> +	block = rbtree_get_free_block(node);
> +	return block->subtree_max_alignment >= alignment;
> +}
> +
> +static struct gpu_buddy_block *
> +gpu_buddy_find_block_aligned(struct gpu_buddy *mm,
> +			     enum gpu_buddy_free_tree tree,
> +			     unsigned int order,
> +			     unsigned int tmp,
> +			     unsigned int alignment,
> +			     unsigned long flags)
> +{
> +	struct rb_root *root = &mm->free_trees[tree][tmp];
> +	struct rb_node *rb = root->rb_node;
> +
> +	while (rb) {
> +		struct gpu_buddy_block *block = rbtree_get_free_block(rb);
> +		struct rb_node *left_node = rb->rb_left, *right_node = rb->rb_right;
> +
> +		if (right_node) {
> +			if (gpu_buddy_subtree_can_satisfy(right_node, alignment)) {
> +				rb = right_node;
> +				continue;
> +			}
> +		}
> +
> +		if (gpu_buddy_block_order(block) >= order &&

Is this not always true? With that we can drop order, or better yet 
s/tmp/order/ ?

> +		    __ffs(gpu_buddy_block_offset(block)) >= alignment)

Same here with undefined offset zero case. I guess also use the helper.

> +			return block;
> +
> +		if (left_node) {
> +			if (gpu_buddy_subtree_can_satisfy(left_node, alignment)) {
> +				rb = left_node;
> +				continue;
> +			}
> +		}
> +
> +		break;
> +	}
> +
> +	return NULL;
> +}
> +
> +static struct gpu_buddy_block *
> +gpu_buddy_offset_aligned_allocation(struct gpu_buddy *mm,
> +				    u64 size,
> +				    u64 min_block_size,
> +				    unsigned long flags)
> +{
> +	struct gpu_buddy_block *block = NULL;
> +	unsigned int order, tmp, alignment;
> +	struct gpu_buddy_block *buddy;
> +	enum gpu_buddy_free_tree tree;
> +	unsigned long pages;
> +	int err;
> +
> +	alignment = ilog2(min_block_size);
> +	pages = size >> ilog2(mm->chunk_size);
> +	order = fls(pages) - 1;
> +
> +	tree = (flags & GPU_BUDDY_CLEAR_ALLOCATION) ?
> +		GPU_BUDDY_CLEAR_TREE : GPU_BUDDY_DIRTY_TREE;
> +
> +	for (tmp = order; tmp <= mm->max_order; ++tmp) {
> +		block = gpu_buddy_find_block_aligned(mm, tree, order,
> +						     tmp, alignment, flags);
> +		if (!block) {
> +			tree = (tree == GPU_BUDDY_CLEAR_TREE) ?
> +				GPU_BUDDY_DIRTY_TREE : GPU_BUDDY_CLEAR_TREE;
> +			block = gpu_buddy_find_block_aligned(mm, tree, order,
> +							     tmp, alignment, flags);
> +		}
> +
> +		if (block)
> +			break;
> +	}
> +
> +	if (!block)
> +		return ERR_PTR(-ENOSPC);
> +
> +	while (gpu_buddy_block_order(block) > order) {
> +		struct gpu_buddy_block *left, *right;
> +
> +		err = split_block(mm, block);
> +		if (unlikely(err))
> +			goto err_undo;
> +
> +		left  = block->left;
> +		right = block->right;
> +
> +		if (__ffs(gpu_buddy_block_offset(right)) >= alignment)

Might be better to use the helper for this?

> +			block = right;
> +		else
> +			block = left;
> +	}
> +
> +	return block;
> +
> +err_undo:
> +	/*
> +	 * We really don't want to leave around a bunch of split blocks, since
> +	 * bigger is better, so make sure we merge everything back before we
> +	 * free the allocated blocks.
> +	 */
> +	buddy = __get_buddy(block);
> +	if (buddy &&
> +	    (gpu_buddy_block_is_free(block) &&
> +	     gpu_buddy_block_is_free(buddy)))
> +		__gpu_buddy_free(mm, block, false);
> +	return ERR_PTR(err);
> +}
> +
>   static int __alloc_range(struct gpu_buddy *mm,
>   			 struct list_head *dfs,
>   			 u64 start, u64 size,
> @@ -1059,6 +1214,7 @@ EXPORT_SYMBOL(gpu_buddy_block_trim);
>   static struct gpu_buddy_block *
>   __gpu_buddy_alloc_blocks(struct gpu_buddy *mm,
>   			 u64 start, u64 end,
> +			 u64 size, u64 min_block_size,
>   			 unsigned int order,
>   			 unsigned long flags)
>   {
> @@ -1066,6 +1222,11 @@ __gpu_buddy_alloc_blocks(struct gpu_buddy *mm,
>   		/* Allocate traversing within the range */
>   		return  __gpu_buddy_alloc_range_bias(mm, start, end,
>   						     order, flags);
> +	else if (size < min_block_size)
> +		/* Allocate from an offset-aligned region without size rounding */
> +		return gpu_buddy_offset_aligned_allocation(mm, size,
> +							   min_block_size,
> +							   flags);
>   	else
>   		/* Allocate from freetree */
>   		return alloc_from_freetree(mm, order, flags);
> @@ -1137,8 +1298,11 @@ int gpu_buddy_alloc_blocks(struct gpu_buddy *mm,
>   	if (flags & GPU_BUDDY_CONTIGUOUS_ALLOCATION) {
>   		size = roundup_pow_of_two(size);
>   		min_block_size = size;
> -	/* Align size value to min_block_size */
> -	} else if (!IS_ALIGNED(size, min_block_size)) {
> +		/*
> +		 * Normalize the requested size to min_block_size for regular allocations.
> +		 * Offset-aligned allocations intentionally skip size rounding.
> +		 */
> +	} else if (!gpu_buddy_can_offset_align(size, min_block_size)) {
>   		size = round_up(size, min_block_size);
>   	}
>   
> @@ -1158,43 +1322,60 @@ int gpu_buddy_alloc_blocks(struct gpu_buddy *mm,
>   	do {
>   		order = min(order, (unsigned int)fls(pages) - 1);
>   		BUG_ON(order > mm->max_order);
> -		BUG_ON(order < min_order);
> +		/*
> +		 * Regular allocations must not allocate blocks smaller than min_block_size.
> +		 * Offset-aligned allocations deliberately bypass this constraint.
> +		 */
> +		BUG_ON(size >= min_block_size && order < min_order);
>   
>   		do {
> +			unsigned int fallback_order;
> +
>   			block = __gpu_buddy_alloc_blocks(mm, start,
>   							 end,
> +							 size,
> +							 min_block_size,
>   							 order,
>   							 flags);
>   			if (!IS_ERR(block))
>   				break;
>   
> -			if (order-- == min_order) {
> -				/* Try allocation through force merge method */
> -				if (mm->clear_avail &&
> -				    !__force_merge(mm, start, end, min_order)) {
> -					block = __gpu_buddy_alloc_blocks(mm, start,
> -									 end,
> -									 min_order,
> -									 flags);
> -					if (!IS_ERR(block)) {
> -						order = min_order;
> -						break;
> -					}
> -				}
> +			if (size < min_block_size) {
> +				fallback_order = order;
> +			} else if (order == min_order) {
> +				fallback_order = min_order;
> +			} else {
> +				order--;
> +				continue;
> +			}
>   
> -				/*
> -				 * Try contiguous block allocation through
> -				 * try harder method.
> -				 */
> -				if (flags & GPU_BUDDY_CONTIGUOUS_ALLOCATION &&
> -				    !(flags & GPU_BUDDY_RANGE_ALLOCATION))
> -					return __alloc_contig_try_harder(mm,
> -									 original_size,
> -									 original_min_size,
> -									 blocks);
> -				err = -ENOSPC;
> -				goto err_free;
> +			/* Try allocation through force merge method */
> +			if (mm->clear_avail &&
> +			    !__force_merge(mm, start, end, fallback_order)) {
> +				block = __gpu_buddy_alloc_blocks(mm, start,
> +								 end,
> +								 size,
> +								 min_block_size,
> +								 fallback_order,
> +								 flags);
> +				if (!IS_ERR(block)) {
> +					order = fallback_order;
> +					break;
> +				}
>   			}
> +
> +			/*
> +			 * Try contiguous block allocation through
> +			 * try harder method.
> +			 */
> +			if (flags & GPU_BUDDY_CONTIGUOUS_ALLOCATION &&
> +			    !(flags & GPU_BUDDY_RANGE_ALLOCATION))
> +				return __alloc_contig_try_harder(mm,
> +								 original_size,
> +								 original_min_size,
> +								 blocks);
> +			err = -ENOSPC;
> +			goto err_free;
>   		} while (1);
>   
>   		mark_allocated(mm, block);
> diff --git a/include/linux/gpu_buddy.h b/include/linux/gpu_buddy.h
> index 07ac65db6d2e..7ad817c69ec6 100644
> --- a/include/linux/gpu_buddy.h
> +++ b/include/linux/gpu_buddy.h
> @@ -11,6 +11,7 @@
>   #include <linux/slab.h>
>   #include <linux/sched.h>
>   #include <linux/rbtree.h>
> +#include <linux/rbtree_augmented.h>
>   
>   #define GPU_BUDDY_RANGE_ALLOCATION		BIT(0)
>   #define GPU_BUDDY_TOPDOWN_ALLOCATION		BIT(1)
> @@ -58,6 +59,7 @@ struct gpu_buddy_block {
>   	};
>   
>   	struct list_head tmp_link;
> +	unsigned int subtree_max_alignment;
>   };
>   
>   /* Order-zero must be at least SZ_4K */
> 
> base-commit: 9d757669b2b22cd224c334924f798393ffca537c

next prev parent reply	other threads:[~2026-02-10 16:26 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-09  8:30 [PATCH v3 1/2] drm/buddy: Improve offset-aligned allocation handling Arunpravin Paneer Selvam
2026-02-09  8:30 ` [PATCH v3 2/2] drm/buddy: Add KUnit test for offset-aligned allocations Arunpravin Paneer Selvam
2026-02-09 19:23   ` kernel test robot
2026-02-09 19:26   ` kernel test robot
2026-02-09 21:20   ` kernel test robot
2026-02-09  8:37 ` ✗ CI.checkpatch: warning for series starting with [v3,1/2] drm/buddy: Improve offset-aligned allocation handling Patchwork
2026-02-09  8:39 ` ✓ CI.KUnit: success " Patchwork
2026-02-09  8:57 ` ✗ CI.checksparse: warning " Patchwork
2026-02-09  9:16 ` ✓ Xe.CI.BAT: success " Patchwork
2026-02-09 10:21 ` ✓ Xe.CI.FULL: " Patchwork
2026-02-10 16:26 ` Matthew Auld [this message]
2026-02-17  6:03   ` [PATCH v3 1/2] " Arunpravin Paneer Selvam
2026-02-17 10:01     ` Matthew Auld
2026-02-17 10:16       ` Arunpravin Paneer Selvam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ce9833ef-8cfa-4b1a-b4d5-eda0158cb19d@intel.com \
    --to=matthew.auld@intel.com \
    --cc=Arunpravin.PaneerSelvam@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox