From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4DAE6CCA476 for ; Fri, 10 Oct 2025 14:18:44 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A9EFF10EC17; Fri, 10 Oct 2025 14:18:43 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="Itz8PpLR"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) by gabe.freedesktop.org (Postfix) with ESMTPS id D556210EC17; Fri, 10 Oct 2025 14:18:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1760105923; x=1791641923; h=message-id:subject:from:to:cc:date:in-reply-to: references:content-transfer-encoding:mime-version; bh=NOOdadZfM+hXXJQUzprjRw3XtKENX6wp7sI0DYgOQQg=; b=Itz8PpLRJHrVNs0Bc0MjkVnjbGBBNp6IThPCNJUnyatVz1EVEWDig9D/ iUIVJa+fv+aoHn8PAtxrESETVPOpP/tc3VtvNTWw+2VWH1Oy6QKKnNQM7 nIHL0CFxnG2bQqppL7hg5jHb/cQyfMOVNjY4iZyHslcUT5PJFh3ksZjFf ftnh+3ui2BgIYJrgCQIaO2CP32k6BYVkc3UkA3BJb9Cf6Bgg/B17bbi9s RA5DNdOqvx5Q0HCOul26ekXeUSE8RQUV/FhtgumWNkH4zSw5qOaE5+D34 4xFsYlD2iT31A3qG4vkqf/r/skr4H53jJ67zP8Q2VibaUL55R71uthIJl g==; X-CSE-ConnectionGUID: 4QHn5iWIQAS4w2lp9dE0Mg== X-CSE-MsgGUID: TmniCr8XShKFlkDto65h/w== X-IronPort-AV: E=McAfee;i="6800,10657,11578"; a="62367908" X-IronPort-AV: E=Sophos;i="6.19,219,1754982000"; d="scan'208";a="62367908" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2025 07:18:42 -0700 X-CSE-ConnectionGUID: Kcn+481wR4q+VcsnKHOyxg== X-CSE-MsgGUID: vqT/724DRmma+GjyYHsdkg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.19,219,1754982000"; d="scan'208";a="180940066" Received: from dalessan-mobl3.ger.corp.intel.com (HELO [10.245.245.154]) ([10.245.245.154]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2025 07:18:38 -0700 Message-ID: <96d967fa9f2b193306d92ae38f83397cd6f18a4f.camel@linux.intel.com> Subject: Re: [PATCH v3 0/5] Improving the worst case TTM large allocation latency From: Thomas =?ISO-8859-1?Q?Hellstr=F6m?= To: Tvrtko Ursulin , Christian =?ISO-8859-1?Q?K=F6nig?= , amd-gfx@lists.freedesktop.org, Lucas De Marchi , dri-devel@lists.freedesktop.org, Rodrigo Vivi Cc: kernel-dev@igalia.com, Alex Deucher , Danilo Krummrich , Dave Airlie , Gerd Hoffmann , Joonas Lahtinen , Lyude Paul , Maarten Lankhorst , Maxime Ripard , Sui Jingfeng , Thadeu Lima de Souza Cascardo , Thomas Zimmermann , Zack Rusin Date: Fri, 10 Oct 2025 16:18:36 +0200 In-Reply-To: References: <20251008115314.55438-1-tvrtko.ursulin@igalia.com> <6bba6d25-91f3-49a6-81fc-7a03d891cd1d@amd.com> <22228578-a03c-4fc1-85b2-d281525a2b6f@igalia.com> <9bb3c06e-25c1-43d8-a4e8-e529c53ff77d@amd.com> <45973012f925dbbfdf0636c10f9d051c34f97e2e.camel@linux.intel.com> Organization: Intel Sweden AB, Registration Number: 556189-6027 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.54.3 (3.54.3-2.fc41) MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On Fri, 2025-10-10 at 16:11 +0200, Thomas Hellstr=C3=B6m wrote: > On Thu, 2025-10-09 at 09:53 +0100, Tvrtko Ursulin wrote: > >=20 > > On 08/10/2025 15:39, Thomas Hellstr=C3=B6m wrote: > > > On Wed, 2025-10-08 at 16:02 +0200, Christian K=C3=B6nig wrote: > > > > On 08.10.25 15:50, Tvrtko Ursulin wrote: > > > > >=20 > > > > > On 08/10/2025 13:35, Christian K=C3=B6nig wrote: > > > > > > On 08.10.25 13:53, Tvrtko Ursulin wrote: > > > > > > > Disclaimer: > > > > > > > Please note that as this series includes a patch which > > > > > > > touches > > > > > > > a good number of > > > > > > > drivers I will only copy everyone in the cover letter and > > > > > > > the > > > > > > > respective patch. > > > > > > > Assumption is people are subscribed to dri-devel so can > > > > > > > look at > > > > > > > the whole series > > > > > > > there. I know someone is bound to complain for both the > > > > > > > case > > > > > > > when everyone is > > > > > > > copied on everything for getting too much email, and also > > > > > > > for > > > > > > > this other case. > > > > > > > So please be flexible. > > > > > > >=20 > > > > > > > Description: > > > > > > >=20 > > > > > > > All drivers which use the TTM pool allocator end up > > > > > > > requesting > > > > > > > large order > > > > > > > allocations when allocating large buffers. Those can be > > > > > > > slow > > > > > > > due memory pressure > > > > > > > and so add latency to buffer creation. But there is often > > > > > > > also > > > > > > > a size limit > > > > > > > above which contiguous blocks do not bring any > > > > > > > performance > > > > > > > benefits. This series > > > > > > > allows drivers to say when it is okay for the TTM to try > > > > > > > a > > > > > > > bit > > > > > > > less hard. > > > > > > >=20 > > > > > > > We do this by allowing drivers to specify this cut off > > > > > > > point > > > > > > > when creating the > > > > > > > TTM device and pools. Allocations above this size will > > > > > > > skip > > > > > > > direct reclaim so > > > > > > > under memory pressure worst case latency will improve. > > > > > > > Background reclaim is > > > > > > > still kicked off and both before and after the memory > > > > > > > pressure > > > > > > > all the TTM pool > > > > > > > buckets remain to be used as they are today. > > > > > > >=20 > > > > > > > This is especially interesting if someone has configured > > > > > > > MAX_PAGE_ORDER to > > > > > > > higher than the default. And even with the default, with > > > > > > > amdgpu > > > > > > > for example, > > > > > > > the last patch in the series makes use of the new feature > > > > > > > by > > > > > > > telling TTM that > > > > > > > above 2MiB we do not expect performance benefits. Which > > > > > > > makes > > > > > > > TTM not try direct > > > > > > > reclaim for the top bucket (4MiB). > > > > > > >=20 > > > > > > > End result is TTM drivers become a tiny bit nicer mm > > > > > > > citizens > > > > > > > and users benefit > > > > > > > from better worst case buffer creation latencies. As a > > > > > > > side > > > > > > > benefit we get rid > > > > > > > of two instances of those often very unreadable mutliple > > > > > > > nameless booleans > > > > > > > function signatures. > > > > > > >=20 > > > > > > > If this sounds interesting and gets merge the invidual > > > > > > > drivers > > > > > > > can follow up > > > > > > > with patches configuring their thresholds. > > > > > > >=20 > > > > > > > v2: > > > > > > > =C2=A0=C2=A0 * Christian suggested to pass in the new data by > > > > > > > changing the > > > > > > > function signatures. > > > > > > >=20 > > > > > > > v3: > > > > > > > =C2=A0=C2=A0 * Moved ttm pool helpers into new ttm_pool_inter= nal.h. > > > > > > > (Christian) > > > > > >=20 > > > > > > Patch #3 is Acked-by: Christian K=C3=B6nig > > > > > > . > > > > > >=20 > > > > > > The rest is Reviewed-by: Christian K=C3=B6nig > > > > > > > > > > >=20 > > > > > Thank you! > > > > >=20 > > > > > So I think now I need acks to merge via drm-misc for all the > > > > > drivers which have their own trees. Which seems to be just > > > > > xe. > > > >=20 > > > > I think you should ping the XE guys for their opinion, but > > > > since > > > > there shouldn't be any functional change for them you can > > > > probably go > > > > ahead and merge the patches to drm-misc-next when there is no > > > > reply > > > > in time. > > >=20 > > > I will try to do a review tonight. One thing that comes up > > > though, > > > is > > > the change to ttm_device_init() where you add pool_flags. I had > > > another > > > patch series a number of months ago that added a struct with > > > flags > > > there instead to select the return value given when OOM. Now that > > > we're > > > adding an argument, should we try to use a struct instead so that > > > we > > > can use it for more that pool behavior? > > >=20 > > >=20 > > > I'll be able to find a pointer to that series later today. > >=20 > > Found it:=20 > > https://lore.kernel.org/dri-devel/20241002122422.287276-1-thomas.hellst= rom@linux.intel.com/ > >=20 > > Glad to see in that thread it isn't just me permanently slowed down > > by=20 > > "false, false" and similar. :) > >=20 > > I considered using a struct too and I guess there wasn't too much > > of > > a=20 > > sway that I went with flags. I thought not to overcomplicate with > > the > > on=20 > > stack struct which is mostly not needed for something so low level, > > and=20 > > to stick with the old school C visual patterns. > >=20 > > Since you only needed a single boolean in your series I suppose you > > could just follow up on my series if you find it acceptable. Or I > > can > > go=20 > > with yours, no problem either. >=20 > It seems yours has the most momentum ATM. I can follow up on yours. > It > would be great if we could perhaps change the naming of "pool_flags" > to > something more generic. Oh, BTW we started using that 'const struct' flags argument style in xe and in the TTM shrinker helper (ttm_bo_shrink()) as well, so there is a precedent. I'll let you decide :) Thanks, Thomas >=20 > Thanks, > Thomas >=20 >=20 > >=20 > > Regards, > >=20 > > Tvrtko > >=20 >=20