From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DD74C399031 for ; Wed, 3 Jun 2026 08:20:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780474857; cv=none; b=Lp9O2pJM3e9yoTXnq/j3OgigG0cQkrG8aQLq/9iErRJbQ/fyjdEz74ZL+VBMpqQ8tzW97SiotyS4PECHC/lULTLTsXTrqkUGFqGHTD8texPcGV4e9i8e/cVIcRuAh4fbvGkbsmtvki5+DfsSXYlyzOPcKEV0rz8rOnnolg5G030= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780474857; c=relaxed/simple; bh=WWj6vYJhJWFwQDvWLGOnVZ8atgZnjoxVhkkU66Mqz78=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=JGzpLuKKQ0EJkjWl8OiTSAppl1NCDKldBPdzYgPIncGL2SrglDiM6ymQ4CPMf8o4+lH0vf4tW3s7sR+oqcQ1NWmPu/iPBwDY/weiqSco+GSrDK8T2b6PIxCfQoRP37z95Evnm1LM+dPBGwksahrCByp51csO51Pdy8PYOPztCk8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Yon+l6xv; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Yon+l6xv" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 876031F00893; Wed, 3 Jun 2026 08:20:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780474856; bh=pRq08K52AkaLabs+jxp0xNc16t19qkjFsQsci3lW46U=; h=Date:Subject:To:Cc:References:From:In-Reply-To; b=Yon+l6xv26goOGhhSdePN8NWbgDf1PVL1auDkr/lBwrgNBxCRcN98yH5Qf7dHaJdt WykOJy3d8zrluxD6+rutk4eyES4j4HscjBpEndXJmFaH+EUJ1SmgByhPRgBUT5zjNd C0F4G7u2zo/cWZL2SO6QRXccr3f1BxzaPIczJZ0rOXHo8stFqa4jR59aVTdTDdhSYq bxG1WlrDamIR97USXvKBv1fV4qC3+o0k6YfTQZB0F7VtoE3ctfBQvI+Ssiu65fxJLy dgFNzXwMy6NjrILlhNtGj/80v+fM/7afFSho0s9Sisj7yQLjlAdu1PpYwsulClTt/L KD2zL1pRykpQg== Message-ID: <19983eab-0fcd-4ec3-955d-a18233bc4b59@kernel.org> Date: Wed, 3 Jun 2026 10:20:53 +0200 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm/compaction: cap compact_gap() at COMPACT_CLUSTER_MAX Content-Language: en-US To: "JP Kobryn (Meta)" , akpm@linux-foundation.org, surenb@google.com, mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com References: <20260519200851.141955-1-jp.kobryn@linux.dev> From: "Vlastimil Babka (SUSE)" Autocrypt: addr=vbabka@kernel.org; keydata= xsFNBFZdmxYBEADsw/SiUSjB0dM+vSh95UkgcHjzEVBlby/Fg+g42O7LAEkCYXi/vvq31JTB KxRWDHX0R2tgpFDXHnzZcQywawu8eSq0LxzxFNYMvtB7sV1pxYwej2qx9B75qW2plBs+7+YB 87tMFA+u+L4Z5xAzIimfLD5EKC56kJ1CsXlM8S/LHcmdD9Ctkn3trYDNnat0eoAcfPIP2OZ+ 9oe9IF/R28zmh0ifLXyJQQz5ofdj4bPf8ecEW0rhcqHfTD8k4yK0xxt3xW+6Exqp9n9bydiy tcSAw/TahjW6yrA+6JhSBv1v2tIm+itQc073zjSX8OFL51qQVzRFr7H2UQG33lw2QrvHRXqD Ot7ViKam7v0Ho9wEWiQOOZlHItOOXFphWb2yq3nzrKe45oWoSgkxKb97MVsQ+q2SYjJRBBH4 8qKhphADYxkIP6yut/eaj9ImvRUZZRi0DTc8xfnvHGTjKbJzC2xpFcY0DQbZzuwsIZ8OPJCc LM4S7mT25NE5kUTG/TKQCk922vRdGVMoLA7dIQrgXnRXtyT61sg8PG4wcfOnuWf8577aXP1x 6mzw3/jh3F+oSBHb/GcLC7mvWreJifUL2gEdssGfXhGWBo6zLS3qhgtwjay0Jl+kza1lo+Cv BB2T79D4WGdDuVa4eOrQ02TxqGN7G0Biz5ZLRSFzQSQwLn8fbwARAQABzSNWbGFzdGltaWwg QmFia2EgPHZiYWJrYUBrZXJuZWwub3JnPsLBsAQTAQoAWhYhBKlA1DSZLC6OmRA9UCJPp+fM gqZkBQJqFFy6GxSAAAAAAAQADm1hbnUyLDIuNSsxLjEyLDIsMgIbAwUJGtCBUAULCQgHAwUV CgkICwUWAgMBAAIeBQIXgAAKCRAiT6fnzIKmZJIUEADFx/tREzUImHrEwVHeSvDFmA7tJysI UVrlvrM09E7GIuzphzv7jYmo8n3ANpCczLEVr4G0syYQdTigaZgv3+FQDIIzhKih1IHhu1Ei XHlywNWKnQxxQEUNi5Mwx43wQz5XVw9F1A7gtKBKNtfogO511hAbrzagrYajyQacEJ/+sfhZ 9Da8ltHIXD8pcYaHUfQgEusCgmEd9+KrUwrTbckFKmYq5chuE6yJ4J0EmWknL096jIE6CnzF FRslQ3B1UKDjxVsm1ZHfir5NeWszLkTvGFsddFaWTgh8UycESG6VQzKXjjewXu2pG7YQYRpj QKm1W5X2TkwWkXRBZTmfmbhxIUMh3+zf5wQ463rSmDN/8v81tdqBtAW6rH/kzg1GvkaTHXn0 507yEHFzBksk2viAuIxxr7km8+/KARYLIdGtx30EG8cKzAUZOK6WqxtNCsXUJNrVE8CWrCaD icoNu7Fs1c5hmPHdSTnU48ce67449DdnO4neLSNhRiGlMHJgfJUmgrxu/hcYeOZ3haWmEQ2w uW1Mh01OHi8QZHCEyAbABrPs9GUgccc/4eYXX9hIgxfSkYzn8f+8NuIFPWl/0uTvjgqU29FQ SbzOLxHq9439Ox40G5mS5eZXRGxITYR+6TXvRGI6P/264jvflnr/pDGUttaikU+0W+1uxgKH cmYbEc7ATQRbGTU1AQgAn0H6UrFiWcovkh6EXVcl+SeqyO6JHOPm+e9Wu0Vw+VIUvXZVUVVQ La1PQDUi6j00ChlcR66g9/V0sPIcSutacPKfdKYOBvzd4rlhL8rfrdEsQw5ApZxrA8kYZVMh FmBRKAa6wos25moTlMKpCWzTH84+WO5+ziCTsTUZASAToz3RdunTD+vQcHj0GqNTPAHK63sf bAB2I0BslZkXkY1RLb/YhuA6E7JyEd2pilZOrIuBGl/5q2qSakgnAVFWFBR/DO27JuAksYnq +aH8vI0xGvwn75KqSk4UzAkDzWSmO4ZHuahKtQgZNsMYV+PGayRBX9b9zbldzopoLBdqHc4n jQARAQABwsF8BBgBCgAmAhsMFiEEqUDUNJksLo6ZED1QIk+n58yCpmQFAmfIHFQFCRYU6J8A CgkQIk+n58yCpmS2PA//bqN1LfcotmArgElsa+0EGZSQlYgK48pm8WAeTXTngudP9IJ4SuKY HR5RNjHcBeqN+Me0zxRqYzRb8nGanHEkDyf4Im8DQM8d6vbyU+FcPmG4skud4kgS1zMHnlVd SXfSIwKC/hKgdHG8aBV7545Lz9X6Iohea+94wneD0aw/hqF+QWewGZhWJriWAZtvEkzNjQOi 4U9F/trLten/x7bpphDSnDMKJtITbtzATT1Dq7o7VpIUK1nCTQALMuMjKCdi8OdU/+V+R3O4 0PXWvX8qrvqYapVbZ+9KqT74FsuB0Ya9uXwgBF2Q6cRuETZk5vqaqKxzqoQZCO8AOz/58j6O 2RHNy/mZEN+7tJ5Tsq42zVJ4jxsT8b9YplavCMsnBgDeRWhcbYhCyttoL7nYISyWg4kQYZ/P wIV3OuNv2f8iKYsxNsRuClOAF82+gvqOy1/1pprFjy8uo2pkoOrb63aOP3vO5VHnRKgra6dq NcaZ+c6J4H+nEJGi2SkHAUJz5oBzuThvPudLvPA/SK8sKoM01IRxSihev/S/5WLazXB1PGem OCbvzC1IjWJJraxiDJ5IygokapUa2RP7+WBR22skQ3SSl6G107QgWKSyTOGWEaRmV53vxQLV jXuCmzSSasTL60zq5yGrT4/DYQVSNEUiUbG4pYekxJujNeEDkUlky0Y= In-Reply-To: <20260519200851.141955-1-jp.kobryn@linux.dev> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 5/19/26 22:08, JP Kobryn (Meta) wrote: > compact_gap() returns 2 << order, which is used as watermark headroom in > __compaction_suitable() and as a reclaim target in kswapd. The computed > value scales exponentially by order. For order-9 THP allocations this > evaluates to 1024 pages, but the compaction free scanner's working set is > bounded by COMPACT_CLUSTER_MAX (32 pages). The scanner stops isolating free > pages once it matches the migration batch. The current gap over-reserves by > 32x. > > On fragmented production hosts, kswapd will try and reclaim up to the gap, > but it only reaches that threshold 18% of the time, causing reclaim to > continue a majority of the time. The over-sized gap also causes 46% of > order-9 compaction suitability checks to fail unnecessarily - the zone has > sufficient free pages for the scanner to operate, but not enough to clear > the inflated threshold. > > Cap compact_gap() at COMPACT_CLUSTER_MAX to align the watermark headroom > with the scanner's actual capacity. Orders 0-4 are unaffected since their > gap is <= 32. > > A/B test on ~100 instagram production hosts (64GB, 60s measurement): > > Unpatched (43 hosts) > pgscan_kswapd (mean/host): ~1.6M > reclaim efficiency (steal/scan): 83.8% > compaction success (success/stall): 2.1% > THP success (alloc/alloc+fallback): 4.9% > forced lru_add_drain (mean/host): ~107K > > Patched (59 hosts) > pgscan_kswapd (mean/host): ~449K > reclaim efficiency (steal/scan): 91.0% > compaction success (success/stall): 28.3% > THP success (alloc/alloc+fallback): 17.2% > forced lru_add_drain (mean/host): ~64K > > Signed-off-by: JP Kobryn (Meta) Reviewed-by: Vlastimil Babka (SUSE) > --- > include/linux/compaction.h | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/include/linux/compaction.h b/include/linux/compaction.h > index 173d9c07a8952..09aea63b8a89d 100644 > --- a/include/linux/compaction.h > +++ b/include/linux/compaction.h > @@ -2,6 +2,8 @@ > #ifndef _LINUX_COMPACTION_H > #define _LINUX_COMPACTION_H > > +#include > + > /* > * Determines how hard direct compaction should try to succeed. > * Lower value means higher priority, analogically to reclaim priority. > @@ -73,11 +75,9 @@ static inline unsigned long compact_gap(unsigned int order) > * effectively limited by COMPACT_CLUSTER_MAX, as that's the maximum > * that the migrate scanner can have isolated on migrate list, and free > * scanner is only invoked when the number of isolated free pages is > - * lower than that. But it's not worth to complicate the formula here > - * as a bigger gap for higher orders than strictly necessary can also > - * improve chances of compaction success. > + * lower than that. > */ > - return 2UL << order; > + return min(2UL << order, COMPACT_CLUSTER_MAX); > } > > static inline int current_is_kcompactd(void)