* [PATCH v2] mm/filemap: avoid costly reclaim for high-order folio allocations
@ 2026-04-20 16:14 Salvatore Dipietro
2026-04-20 16:51 ` Andrew Morton
2026-04-20 19:12 ` Matthew Wilcox
0 siblings, 2 replies; 4+ messages in thread
From: Salvatore Dipietro @ 2026-04-20 16:14 UTC (permalink / raw)
To: linux-kernel
Cc: ritesh.list, abuehaze, alisaidi, blakgeof, brauner,
dipietro.salvatore, dipiets, djwong, linux-fsdevel, linux-mm,
linux-xfs, stable, willy, Jan Kara, Andrew Morton
Commit 5d8edfb900d5 ("iomap: Copy larger chunks from userspace")
introduced high-order folio allocations in the buffered write path.
When memory is fragmented, each failed allocation above
PAGE_ALLOC_COSTLY_ORDER triggers compaction and drain_all_pages() via
__alloc_pages_slowpath(), causing a 0.75x throughput drop on pgbench
(simple-update) with 1024 clients on a 96-vCPU arm64 system.
In __filemap_get_folio(), for orders above min_order, split the
allocation behavior by cost:
- For orders above PAGE_ALLOC_COSTLY_ORDER: strip
__GFP_DIRECT_RECLAIM, making them purely opportunistic. The
allocator tries the freelists only and returns NULL immediately if
pages are not available.
- For non-costly orders (between min_order and
PAGE_ALLOC_COSTLY_ORDER): use __GFP_NORETRY to allow lightweight
direct reclaim without expensive compaction retries.
With this patch, pgbench throughput recovers to 148k TPS (+67% vs
regressed baseline), stable across all iterations.
v2:
- strip __GFP_DIRECT_RECLAIM to avoid costly reclaim for high-order
folio allocations
- Moved fix from iomap to mm/filemap layer
Fixes: 5d8edfb900d5 ("iomap: Copy larger chunks from userspace")
Cc: stable@vger.kernel.org
Signed-off-by: Salvatore Dipietro <dipiets@amazon.it>
---
mm/filemap.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index 4e636647100c..f2343c26dd63 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2007,8 +2007,13 @@ struct folio *__filemap_get_folio_mpol(struct address_space *mapping,
gfp_t alloc_gfp = gfp;
err = -ENOMEM;
- if (order > min_order)
- alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
+ if (order > min_order) {
+ alloc_gfp |= __GFP_NOWARN;
+ if (order > PAGE_ALLOC_COSTLY_ORDER)
+ alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
+ else
+ alloc_gfp |= __GFP_NORETRY;
+ }
folio = filemap_alloc_folio(alloc_gfp, order, policy);
if (!folio)
continue;
base-commit: c7275b05bc428c7373d97aa2da02d3a7fa6b9f66
--
2.47.3
AMAZON DEVELOPMENT CENTER ITALY SRL, viale Monte Grappa 3/5, 20124 Milano, Italia, Registro delle Imprese di Milano Monza Brianza Lodi REA n. 2504859, Capitale Sociale: 10.000 EUR i.v., Cod. Fisc. e P.IVA 10100050961, Societa con Socio Unico
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2] mm/filemap: avoid costly reclaim for high-order folio allocations
2026-04-20 16:14 [PATCH v2] mm/filemap: avoid costly reclaim for high-order folio allocations Salvatore Dipietro
@ 2026-04-20 16:51 ` Andrew Morton
2026-04-20 18:41 ` Matthew Wilcox
2026-04-20 19:12 ` Matthew Wilcox
1 sibling, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2026-04-20 16:51 UTC (permalink / raw)
To: Salvatore Dipietro
Cc: linux-kernel, ritesh.list, abuehaze, alisaidi, blakgeof, brauner,
dipietro.salvatore, djwong, linux-fsdevel, linux-mm, linux-xfs,
stable, willy, Jan Kara
On Mon, 20 Apr 2026 16:14:03 +0000 Salvatore Dipietro <dipiets@amazon.it> wrote:
> Commit 5d8edfb900d5 ("iomap: Copy larger chunks from userspace")
> introduced high-order folio allocations in the buffered write path.
> When memory is fragmented, each failed allocation above
> PAGE_ALLOC_COSTLY_ORDER triggers compaction and drain_all_pages() via
> __alloc_pages_slowpath(), causing a 0.75x throughput drop on pgbench
> (simple-update) with 1024 clients on a 96-vCPU arm64 system.
>
> In __filemap_get_folio(), for orders above min_order, split the
> allocation behavior by cost:
>
> - For orders above PAGE_ALLOC_COSTLY_ORDER: strip
> __GFP_DIRECT_RECLAIM, making them purely opportunistic. The
> allocator tries the freelists only and returns NULL immediately if
> pages are not available.
>
> - For non-costly orders (between min_order and
> PAGE_ALLOC_COSTLY_ORDER): use __GFP_NORETRY to allow lightweight
> direct reclaim without expensive compaction retries.
>
> With this patch, pgbench throughput recovers to 148k TPS (+67% vs
> regressed baseline), stable across all iterations.
"Good money after bad"? Prove me wrong!
Instead of performing weird fragile hard-to-maintain party tricks with
the page allocator to work around the damage, plan B is to simply
revert 5d8edfb900d5.
5d8edfb900d5 came with no performance testing results. Does anyone
have any evidence that it improved anything? By how much?
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2007,8 +2007,13 @@ struct folio *__filemap_get_folio_mpol(struct address_space *mapping,
> gfp_t alloc_gfp = gfp;
>
> err = -ENOMEM;
> - if (order > min_order)
> - alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
> + if (order > min_order) {
> + alloc_gfp |= __GFP_NOWARN;
> + if (order > PAGE_ALLOC_COSTLY_ORDER)
> + alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
> + else
> + alloc_gfp |= __GFP_NORETRY;
> + }
> folio = filemap_alloc_folio(alloc_gfp, order, policy);
I don't think it's reasonable to expect a reader to understand why this
code is as it is. Hence each clause here should have a comment
explaining why we're taking that step, please.
Look. I'm being grumpy. We know that patches which purportedly
improve performance must come with quality performance testing results.
How long have we been at this?
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2] mm/filemap: avoid costly reclaim for high-order folio allocations
2026-04-20 16:51 ` Andrew Morton
@ 2026-04-20 18:41 ` Matthew Wilcox
0 siblings, 0 replies; 4+ messages in thread
From: Matthew Wilcox @ 2026-04-20 18:41 UTC (permalink / raw)
To: Andrew Morton
Cc: Salvatore Dipietro, linux-kernel, ritesh.list, abuehaze, alisaidi,
blakgeof, brauner, dipietro.salvatore, djwong, linux-fsdevel,
linux-mm, linux-xfs, stable, Jan Kara
On Mon, Apr 20, 2026 at 09:51:06AM -0700, Andrew Morton wrote:
> On Mon, 20 Apr 2026 16:14:03 +0000 Salvatore Dipietro <dipiets@amazon.it> wrote:
> > Commit 5d8edfb900d5 ("iomap: Copy larger chunks from userspace")
> > introduced high-order folio allocations in the buffered write path.
No it didn't. 5d8edfb900d5 only allows the use of larger folios if they
already existed in the page cache. d6bb59a9444d allows the creation of
large folios.
> > When memory is fragmented, each failed allocation above
> > PAGE_ALLOC_COSTLY_ORDER triggers compaction and drain_all_pages() via
> > __alloc_pages_slowpath(), causing a 0.75x throughput drop on pgbench
> > (simple-update) with 1024 clients on a 96-vCPU arm64 system.
Why are you pretending this is new instead of already being the source
of much recent discussion?
https://lore.kernel.org/all/20260403193535.9970-1-dipiets@amazon.it/
> "Good money after bad"? Prove me wrong!
>
> Instead of performing weird fragile hard-to-maintain party tricks with
> the page allocator to work around the damage, plan B is to simply
> revert 5d8edfb900d5.
lol. best of luck with that. you'd break a lot of other things if you
did.
> 5d8edfb900d5 came with no performance testing results. Does anyone
> have any evidence that it improved anything? By how much?
Christoph reported it doubled write performance with NFS once NFS was
converted to use it.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2] mm/filemap: avoid costly reclaim for high-order folio allocations
2026-04-20 16:14 [PATCH v2] mm/filemap: avoid costly reclaim for high-order folio allocations Salvatore Dipietro
2026-04-20 16:51 ` Andrew Morton
@ 2026-04-20 19:12 ` Matthew Wilcox
1 sibling, 0 replies; 4+ messages in thread
From: Matthew Wilcox @ 2026-04-20 19:12 UTC (permalink / raw)
To: Salvatore Dipietro
Cc: linux-kernel, ritesh.list, abuehaze, alisaidi, blakgeof, brauner,
dipietro.salvatore, djwong, linux-fsdevel, linux-mm, linux-xfs,
stable, Jan Kara, Andrew Morton
On Mon, Apr 20, 2026 at 04:14:03PM +0000, Salvatore Dipietro wrote:
> v2:
> - strip __GFP_DIRECT_RECLAIM to avoid costly reclaim for high-order
> folio allocations
> - Moved fix from iomap to mm/filemap layer
I don't think filemap is the right place for this. And neither does
Dave Chinner, nor Christoph Hellwig:
https://lore.kernel.org/all/adSY3GnLHyQatigQ@infradead.org/
I asked you for performance results with different patches, and you
didn't reply. Now you're asking for this patch to be merged instead.
THIS IS NOT HOW IT WORKS. You answer the damned questions being asked
of you by your fellow developers.
> err = -ENOMEM;
> - if (order > min_order)
> - alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
> + if (order > min_order) {
> + alloc_gfp |= __GFP_NOWARN;
> + if (order > PAGE_ALLOC_COSTLY_ORDER)
> + alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
> + else
> + alloc_gfp |= __GFP_NORETRY;
> + }
> folio = filemap_alloc_folio(alloc_gfp, order, policy);
> if (!folio)
> continue;
>
> base-commit: c7275b05bc428c7373d97aa2da02d3a7fa6b9f66
> --
> 2.47.3
>
>
>
>
> AMAZON DEVELOPMENT CENTER ITALY SRL, viale Monte Grappa 3/5, 20124 Milano, Italia, Registro delle Imprese di Milano Monza Brianza Lodi REA n. 2504859, Capitale Sociale: 10.000 EUR i.v., Cod. Fisc. e P.IVA 10100050961, Societa con Socio Unico
>
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-04-20 19:12 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-20 16:14 [PATCH v2] mm/filemap: avoid costly reclaim for high-order folio allocations Salvatore Dipietro
2026-04-20 16:51 ` Andrew Morton
2026-04-20 18:41 ` Matthew Wilcox
2026-04-20 19:12 ` Matthew Wilcox
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox