From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-185.mta0.migadu.com (out-185.mta0.migadu.com [91.218.175.185]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4FBA23806C2 for ; Tue, 16 Jun 2026 20:00:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.185 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781640028; cv=none; b=XQzqj6CaOc3PArrQjmazUi97D9Vb6a4hy+5UscnykhDtEI1iMcB4mC1Dwwh7cdgEi04/6BSZu8nnET9gRooIZ0sz40VYUbgCJOcoOGWhe1fgKU9DFLfVlvYoWlRl/6eWtDFdzx2IVr/GB57wU8qXtW/O7Wi8iLqSWo1xGLTnwgI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781640028; c=relaxed/simple; bh=ee+1EfupxeNHtwyqoMxnA4Z/yiHPkDd0ODXAjHZ2r3Q=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=kZZwJyz5NIGg43J6vJX2UAtSEHMF//fJiAp/BcSebZt4PCufaZcAx2j31D5xaUHGcq3IGN5Kc5LaSxxHR/myMVr2Mh3T3uK0GfMeNeB67xF8s4iip5kRgb7tFs4FlvakeblV0CAWs617hW+X8Gkjm5OgpaoXIYmpZDSVjQBK2Y8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=Ry8HaRcc; arc=none smtp.client-ip=91.218.175.185 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="Ry8HaRcc" Message-ID: <73cf2b7c-2423-4b96-b98b-a1946e9f952a@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1781640024; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RfwGTdd6S5lmSlSd7PKEAPV3soyypH9jTQJVVAS/V4Y=; b=Ry8HaRccx8uCSYv92/cGzHTZFdmCuqHJqRUsfN+GZB+yQT4/7xQ9rMyh1nSPRgF3XJaHX9 8vpd1K6ox2MpP8dPRBTmuzez8iEWeFYLlfqRpI02iKIhM8ys8VPB+WjnfUckHxbbNk0R1Q VekE6tkVbBEl+du88qFRzcl54hOr74Y= Date: Tue, 16 Jun 2026 13:00:13 -0700 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH] mm/page_alloc: skip high atomic reservation at or below costly order To: Frank van der Linden Cc: akpm@linux-foundation.org, vbabka@kernel.org, surenb@google.com, mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org, usama.arif@linux.dev, kirill@shutemov.name, willy@infradead.org, linux-kernel@vger.kernel.org, kernel-team@meta.com References: <20260519012532.272770-1-jp.kobryn@linux.dev> Content-Language: en-US X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: JP Kobryn In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT On 5/28/26 10:09 AM, Frank van der Linden wrote: > On Mon, May 18, 2026 at 6:25 PM JP Kobryn (Meta) wrote: >> >> We're seeing a pattern in production where 2MB THP order-9 allocations are >> failing due to fragmentation and triggering reclaim on systems with plenty >> of free memory. Over time, the success rate of these THP allocations do not >> increase at all. >> >> Inspecting zone->vm_stat[NR_FREE_PAGES] via kprobe on compaction_suitable() >> indicated the given zone had sufficient free pages for order-9 allocations, >> yet they were going unused. Drilling down into the zone and inspecting >> /proc/pagetypeinfo revealed why. Order-9 blocks were accumulating in the >> zone's HighAtomic bucket (while zero were present in Movable). THP is >> unable to draw blocks from HighAtomic since that bucket is not in the >> fallback list. >> >> The heuristic for reserving pageblocks in HighAtomic is that any atomic >> allocation greater than order-0 will result in the full pageblock being >> captured. This means that an order-1 atomic allocation will over-reserve by >> 256x, a full 512 pageblock. >> >> Gate the reservation on order. Skip for allocations at or below >> PAGE_ALLOC_COSTLY_ORDER. This prevents smaller atomic allocations from >> reserving entire pageblocks, and significantly helps when THP is in use on >> a fragmented but otherwise healthy system. >> >> Testing was performed using an A/B instagram workload receiving prod >> traffic. Each side had ~60 hosts with 64G memory. The patch resulted in >> several gains: >> >> Unpatched >> HighAtomic pageblocks per host: 309-312 (1% of zone or 620MB), >> ...all order-9 blocks in HighAtomic >> THP success rate: 1-6% >> Compaction success rate: 0-2% >> pgscan_kswapd (total across ~60 hosts, per minute): ~70.2M >> Atomic order-4+ allocations: 0 >> >> Patched >> HighAtomic pageblocks per host: 1 >> THP success rate: 44-78% >> Compaction success rate: 24-47% >> pgscan_kswapd (total across ~60 hosts, per minute): ~29.9M >> Atomic order-4+ allocations: 0 >> >> Note that for this workload all atomic allocations were order 0-3 >> originating from the network stack, btrfs, and scheduler. >> >> Signed-off-by: JP Kobryn (Meta) > > Was this issue reproduced with a tree that does not have your patch, > but includes b480cbb07102 ("mm/page_alloc: don't increase highatomic > reserve after pcp alloc") ? The symptoms here seem the same. > No it was not, but thanks for sharing this. I could see this patch helping a situation like this. See this patch [0] for an update on the buddy side. [0] https://lore.kernel.org/all/20260616191420.52556-1-jp.kobryn@linux.dev/