From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <intel-xe-bounces@lists.freedesktop.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 2B87DCD6E49
	for <intel-xe@archiver.kernel.org>; Fri, 29 May 2026 17:41:11 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 78F2C11233F;
	Fri, 29 May 2026 17:41:10 +0000 (UTC)
Authentication-Results: gabe.freedesktop.org;
	dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="jDW8ApJD";
	dkim-atps=neutral
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 5AFF111233E;
 Fri, 29 May 2026 17:41:09 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1780076470; x=1811612470;
 h=message-id:date:mime-version:from:subject:to:cc:
 references:in-reply-to:content-transfer-encoding;
 bh=K9pLffMFCqJp1006okxYERprdr0XSFWhIpHQqySsqPQ=;
 b=jDW8ApJD9cwxWtaeVmkDBNDYtHFjRCcZ/yHsQNjwVo1BLH5++L5YCJPj
 vcg5c9rzSADkk5dAgGurXFbkpAnd2m2gpw1JwqmO/j4UAaTMDTwpGpsQ7
 1x0v4dWr1p3+3rAgEsvGmy+cWtQRC8ueabMZ/eO0oOqqavD0/9urSaftG
 1QDzcczSJhshyLLWKJ7sQTcIjrfm5K/l8MfTfSVWldj7AOHhur8MFtXqB
 eekQ+DJ0eCej9MkT8PMntsiUfuLcOpu5zaNGnzcyFMd9IeIl3XtNkTxeT
 Vx2FYXbxRwP4ZFQ4QOjd3kY7iGJZqD3RBl9IDj4Q+ujrAi7wnn+ewPnYn w==;
X-CSE-ConnectionGUID: WRSjdSssSFK4SU0WzHLhUw==
X-CSE-MsgGUID: TeUqshRtRt+V1fkNqRA0kw==
X-IronPort-AV: E=McAfee;i="6800,10657,11801"; a="98508578"
X-IronPort-AV: E=Sophos;i="6.24,175,1774335600"; d="scan'208";a="98508578"
Received: from fmviesa008.fm.intel.com ([10.60.135.148])
 by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 29 May 2026 10:41:09 -0700
X-CSE-ConnectionGUID: w2XtUBaTRgGZgJINMp6xpQ==
X-CSE-MsgGUID: eXOYA3E6SN2XFa/+d+SFxw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.24,175,1774335600"; d="scan'208";a="240365175"
Received: from amilburn-desk.amilburn-desk (HELO [10.245.244.139])
 ([10.245.244.139])
 by fmviesa008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 29 May 2026 10:41:07 -0700
Message-ID: <c9cdcf8a-d531-4e79-a238-97fd39b8c108@intel.com>
Date: Fri, 29 May 2026 18:41:05 +0100
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
From: Matthew Auld <matthew.auld@intel.com>
Subject: Re: [PATCH v4 1/2] gpu/buddy: replace dual-tree/force_merge with
 decoupled clear tracker
To: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>,
 christian.koenig@amd.com, dri-devel@lists.freedesktop.org,
 intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
 amd-gfx@lists.freedesktop.org
Cc: alexander.deucher@amd.com
References: <20260527112902.3815-1-Arunpravin.PaneerSelvam@amd.com>
Content-Language: en-GB
In-Reply-To: <20260527112902.3815-1-Arunpravin.PaneerSelvam@amd.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: intel-xe@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Intel Xe graphics driver <intel-xe.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-xe>,
 <mailto:intel-xe-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-xe>
List-Post: <mailto:intel-xe@lists.freedesktop.org>
List-Help: <mailto:intel-xe-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-xe>,
 <mailto:intel-xe-request@lists.freedesktop.org?subject=subscribe>
Errors-To: intel-xe-bounces@lists.freedesktop.org
Sender: "Intel-xe" <intel-xe-bounces@lists.freedesktop.org>

Hi,

On 27/05/2026 12:29, Arunpravin Paneer Selvam wrote:
> The current buddy allocator maintains separate clear_tree[] and
> dirty_tree[] rbtrees per order, preventing coalescing between cleared
> and dirty buddies. Under mixed workloads, this creates a merge barrier:
> adjacent buddies frequently end up split across trees, forcing reliance
> on __force_merge() during allocation.
> 
> __force_merge() performs an O(N x max_order) scan under the VRAM manager
> lock, leading to allocation stalls and failures for large contiguous
> requests even when sufficient total free memory is available.

So is this contig with non power-of-two sizes?

Do we know if we could force_merge everything in one go or somehow be 
more aggressive and do more than needed now, at the first sign of 
contention here, instead of doing it piecemeal? Downside would be losing 
more of the clear tracking, when this happens, but more re-merging.

Could we have another per-order list, of all blocks that we failed to 
merge, when we did the free step? When doing the force merge step, we 
maybe don't need to search blindly and can focus instead on the stuff 
tracked in those lists? Maybe it doesn't need to be a list, but could be 
another rb-tree?

We know the size of the total allocation, if we trigger force_merge, 
could we try to merge enough in one go for the entire allocation, 
instead of restarting the entire thing on the next iteration? Would that 
help at all?

But I guess these are more for the stalling side, and won't help much 
with the contig angle?

For the extent idea, is there any merit in maybe doing this for all 
contig blobs, and not just cleared stuff? Or is the workload you are 
seeing only benefit users that want cleared stuff? Wondering if this 
would benefit all users that want contig? Like if we hypothetically kept 
clear and dirty separate, like we do now, but with an improved 
force_merge, and then have extent tracking for all contig blobs and 
replace the try_harder stuff? When you do a contig alloc, the individual 
clear/dirty is still all there within the range, so you can skip 
re-clearing in some cases. I guess downside is overall more fuzzy contig 
+ clear/free path, but I guess you would never get allocation failures, 
when there is sufficient contig space?

> 
> Solution
> 
> Replace the dual-tree design with:
> - A single free_tree[order] rbtree for dirty and mixed free blocks
>    (fully cleared free blocks float outside this tree)
> - A lightweight out-of-band clear tracker (gpu_clear_tracker)
> 
> Fully cleared free blocks are tracked outside the buddy trees using an
> augmented interval rbtree, enabling O(log E) lookup of the largest
> cleared extents.
> 
> Buddy coalescing is now unconditional in __gpu_buddy_free(), regardless
> of clear/dirty state. This removes the merge barrier and eliminates the
> need for __force_merge().
> 
> Benefits
> 
> - Correct high-order allocations after mixed clear/dirty workloads
> - Elimination of O(N x max_order) merge cost from the allocation path
> - O(log E) cleared-extent lookup replacing O(N) scans
> - Predictable allocation latency under fragmentation
> - Reduced complexity with a single tree per order

Since there is no separate tracking for dirty stuff, is the non-cleared 
alloc path a bit more "fuzzy" now, with it potentially stealing cleared 
memory, or is it the same behaviour still?

For drivers that don't use free tracking, is there some benefit? Are 
there any downsides there? I assume that clear tracker is always empty.