Re: [PATCH 1/1] iomap: avoid compaction for costly folio order allocation

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Salvatore Dipietro <dipiets@amazon.it>
To: <willy@infradead.org>
Cc: <abuehaze@amazon.com>, <akpm@linux-foundation.org>,
	<alisaidi@amazon.com>, <blakgeof@amazon.com>,
	<brauner@kernel.org>, <dipietro.salvatore@gmail.com>,
	<dipiets@amazon.it>, <djwong@kernel.org>,
	<linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-mm@kvack.org>, <linux-xfs@vger.kernel.org>,
	<ritesh.list@gmail.com>, <stable@vger.kernel.org>,
	<vbabka@suse.com>
Subject: Re: [PATCH 1/1] iomap: avoid compaction for costly folio order allocation
Date: Wed, 6 May 2026 12:33:18 +0000	[thread overview]
Message-ID: <20260506123326.17293-1-dipiets@amazon.it> (raw)
In-Reply-To: <afc3xFgKogxF5Lbq@casper.infradead.org>


On 5/03/26 05:52, Ritesh Harjani wrote:
> Also as per the documentation [1], huge_pages=try option is the default
> setting. So I am assuming in production we at least won't suffer from
> this memory fragmentation, correct?

Yes, huge_pages=try is the default option, but without pre-allocating the
entire shared_buffer size in memory via "vm.nr_hugepages" — which is not
done automatically — huge pages will not be used and the system falls into
the huge_pages=off category. Even with a partial pre-allocation, PostgreSQL
will not be able to use hugepages.


On 5/03/26 11:55, Matthew Wilcox wrote:
> or we need more understandable GFP flags.  Or the page allocator could
> use the __GFP_NORETRY flag to say "oh well, this allocation has a fallback,
> I'll kick kcompactd to try to compact some more memory, but I'll fail
> the allocation".

We also tested kicking off kcompactd in the background when __GFP_NORETRY is
passed, returning "nopage" to avoid blocking the folio allocation request. 
Here is the patch tested as the other with PREEMPT_NONE patch [1]:


diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 65e205111553..d4f322910992 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4818,6 +4818,26 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (current->flags & PF_MEMALLOC)
 		goto nopage;
 
+	/*
+	 * Costly allocations with __GFP_NORETRY are opportunistic - Don't
+	 * stall on direct compaction or reclaim; instead, kick
+	 * kcompactd on the preferred node so large pages may become
+	 * available for future allocations and let the caller fall back now.
+	 *
+	 * Direct compaction is way too costly for hot allocation paths on
+	 * large systems: each attempt calls drain_all_pages() which IPIs
+	 * every CPU.  Only wake kcompactd on the local node to avoid
+	 * cross-NUMA interference with unrelated workloads.
+	 */
+	if (costly_order && (gfp_mask & __GFP_NORETRY)) {
+		struct zone *preferred_zone = ac->preferred_zoneref->zone;
+
+		if (preferred_zone)
+			wakeup_kcompactd(preferred_zone->zone_pgdat, order,
+					 ac->highest_zoneidx);
+		goto nopage;
+	}
+
 	/* Try direct reclaim and then allocating */
 	if (!compact_first) {
 		page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags,



Here are the results we collected (kcompactd background):

| Patch                |    Run 1   |    Run 2   |    Run 3   |   Average   | % vs Baseline |
|----------------------|-----------:|-----------:|-----------:|------------:|:-------------:|
| Baseline             | 107,064.61 |  97,043.86 | 101,830.78 | 101,979.75  |       —       |
| Proposed patch       | 146,012.23 | 136,392.36 | 141,178.00 | 141,194.20  |    +38.45%    |
| Ritesh's suggestion  | 147,481.50 | 133,069.03 | 137,051.30 | 139,200.61  |    +36.50%    |
| Matthew's suggestion | 145,653.91 | 144,169.24 | 141,768.31 | 143,863.82  |    +41.07%    |
| kcompactd background | 146,760.75 | 128,094.92 | 127,979.74 | 134,278.47  |    +31.67%    |

  
[1] https://lore.kernel.org/all/20260403191942.21410-1-dipiets@amazon.it/T/#m8baeeaf48aa7ae5342c8c2db8f4e1c27e03c1368





AMAZON DEVELOPMENT CENTER ITALY SRL, viale Monte Grappa 3/5, 20124 Milano, Italia, Registro delle Imprese di Milano Monza Brianza Lodi REA n. 2504859, Capitale Sociale: 10.000 EUR i.v., Cod. Fisc. e P.IVA 10100050961, Societa con Socio Unico

next prev parent reply	other threads:[~2026-05-06 12:34 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-03 19:35 [PATCH 0/1] iomap: avoid compaction for costly folio order allocation Salvatore Dipietro
2026-04-03 19:35 ` [PATCH 1/1] " Salvatore Dipietro
2026-04-04  1:13   ` Ritesh Harjani
2026-04-04  4:15   ` Matthew Wilcox
2026-04-04 16:47     ` Ritesh Harjani
2026-04-04 20:46       ` Matthew Wilcox
2026-04-16 15:14       ` Ritesh Harjani
2026-04-20 16:33         ` Salvatore Dipietro
2026-04-20 18:44           ` Matthew Wilcox
2026-04-21  1:16             ` Ritesh Harjani
2026-04-28 15:02               ` Salvatore Dipietro
2026-05-03  5:52                 ` Ritesh Harjani
2026-05-03 11:55                   ` Matthew Wilcox
2026-05-06 12:33                     ` Salvatore Dipietro [this message]
2026-04-05 22:43   ` Dave Chinner
2026-04-07  5:40     ` Christoph Hellwig
2026-04-21  9:02     ` Vlastimil Babka

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:65e20511155 dfblob:d4f32291099 )
 OR (
bs:"Re: [PATCH 1/1] iomap: avoid compaction for costly folio order allocation" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260506123326.17293-1-dipiets@amazon.it \
    --to=dipiets@amazon.it \
    --cc=abuehaze@amazon.com \
    --cc=akpm@linux-foundation.org \
    --cc=alisaidi@amazon.com \
    --cc=blakgeof@amazon.com \
    --cc=brauner@kernel.org \
    --cc=dipietro.salvatore@gmail.com \
    --cc=djwong@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ritesh.list@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox