* Re: [PATCH 1/1] iomap: avoid compaction for costly folio order allocation [not found] ` <20260403193535.9970-2-dipiets@amazon.it> @ 2026-04-04 1:13 ` Ritesh Harjani 2026-04-04 4:15 ` Matthew Wilcox 1 sibling, 0 replies; 4+ messages in thread From: Ritesh Harjani @ 2026-04-04 1:13 UTC (permalink / raw) To: Salvatore Dipietro, linux-kernel Cc: dipiets, alisaidi, blakgeof, abuehaze, dipietro.salvatore, willy, stable, Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel, linux-mm Let's cc: linux-mm too. Salvatore Dipietro <dipiets@amazon.it> writes: > Commit 5d8edfb900d5 ("iomap: Copy larger chunks from userspace") > introduced high-order folio allocations in the buffered write > path. When memory is fragmented, each failed allocation triggers Isn't it the right thing to do i.e. run compaction, when memory is fragmented? > compaction and drain_all_pages() via __alloc_pages_slowpath(), > causing a 0.75x throughput drop on pgbench (simple-update) with > 1024 clients on a 96-vCPU arm64 system. > I think removing the __GFP_DIRECT_RECLAIM flag unconditionally at the caller may cause -ENOMEM. Note that it is the __filemap_get_folio() which retries with smaller order allocations, so instead of changing the callers, shouldn't this be fixed in __filemap_get_folio() instead? Maybe in there too, we should keep the reclaim flag (if passed by caller) at least for <= PAGE_ALLOC_COSTLY_ORDER + 1 Thoughts? -ritesh ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/1] iomap: avoid compaction for costly folio order allocation [not found] ` <20260403193535.9970-2-dipiets@amazon.it> 2026-04-04 1:13 ` [PATCH 1/1] iomap: avoid compaction for costly folio order allocation Ritesh Harjani @ 2026-04-04 4:15 ` Matthew Wilcox 2026-04-04 16:47 ` Ritesh Harjani 1 sibling, 1 reply; 4+ messages in thread From: Matthew Wilcox @ 2026-04-04 4:15 UTC (permalink / raw) To: Salvatore Dipietro Cc: linux-kernel, alisaidi, blakgeof, abuehaze, dipietro.salvatore, stable, Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel, linux-mm On Fri, Apr 03, 2026 at 07:35:34PM +0000, Salvatore Dipietro wrote: > Commit 5d8edfb900d5 ("iomap: Copy larger chunks from userspace") > introduced high-order folio allocations in the buffered write > path. When memory is fragmented, each failed allocation triggers > compaction and drain_all_pages() via __alloc_pages_slowpath(), > causing a 0.75x throughput drop on pgbench (simple-update) with > 1024 clients on a 96-vCPU arm64 system. > > Strip __GFP_DIRECT_RECLAIM from folio allocations in > iomap_get_folio() when the order exceeds PAGE_ALLOC_COSTLY_ORDER, > making them purely opportunistic. If you look at __filemap_get_folio_mpol(), that's kind of being tried already: if (order > min_order) alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN; * %__GFP_NORETRY: The VM implementation will try only very lightweight * memory direct reclaim to get some memory under memory pressure (thus * it can sleep). It will avoid disruptive actions like OOM killer. The * caller must handle the failure which is quite likely to happen under * heavy memory pressure. The flag is suitable when failure can easily be * handled at small cost, such as reduced throughput. which, from the description, seemed like the right approach. So either the description or the implementation should be updated, I suppose? Now, what happens if you change those two lines to: if (order > min_order) { alloc_gfp &= ~__GFP_DIRECT_RECLAIM; alloc_gfp |= __GFP_NOWARN; } Do you recover the performance? ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/1] iomap: avoid compaction for costly folio order allocation 2026-04-04 4:15 ` Matthew Wilcox @ 2026-04-04 16:47 ` Ritesh Harjani 2026-04-04 20:46 ` Matthew Wilcox 0 siblings, 1 reply; 4+ messages in thread From: Ritesh Harjani @ 2026-04-04 16:47 UTC (permalink / raw) To: Matthew Wilcox, Salvatore Dipietro Cc: linux-kernel, alisaidi, blakgeof, abuehaze, dipietro.salvatore, stable, Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel, linux-mm Matthew Wilcox <willy@infradead.org> writes: > On Fri, Apr 03, 2026 at 07:35:34PM +0000, Salvatore Dipietro wrote: >> Commit 5d8edfb900d5 ("iomap: Copy larger chunks from userspace") >> introduced high-order folio allocations in the buffered write >> path. When memory is fragmented, each failed allocation triggers >> compaction and drain_all_pages() via __alloc_pages_slowpath(), >> causing a 0.75x throughput drop on pgbench (simple-update) with >> 1024 clients on a 96-vCPU arm64 system. >> >> Strip __GFP_DIRECT_RECLAIM from folio allocations in >> iomap_get_folio() when the order exceeds PAGE_ALLOC_COSTLY_ORDER, >> making them purely opportunistic. > > If you look at __filemap_get_folio_mpol(), that's kind of being tried > already: > > if (order > min_order) > alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN; > > * %__GFP_NORETRY: The VM implementation will try only very lightweight > * memory direct reclaim to get some memory under memory pressure (thus > * it can sleep). It will avoid disruptive actions like OOM killer. The > * caller must handle the failure which is quite likely to happen under > * heavy memory pressure. The flag is suitable when failure can easily be > * handled at small cost, such as reduced throughput. > > which, from the description, seemed like the right approach. So either > the description or the implementation should be updated, I suppose? > > Now, what happens if you change those two lines to: > > if (order > min_order) { > alloc_gfp &= ~__GFP_DIRECT_RECLAIM; > alloc_gfp |= __GFP_NOWARN; > } Hi Matthew, Shouldn't we try this instead? This would still allows us to keep __GFP_NORETRY and hence light weight direct reclaim/compaction for atleast the non-costly order allocations, right? if (order > min_order) { alloc_gfp |= __GFP_NOWARN; if (order > PAGE_ALLOC_COSTLY_ORDER) alloc_gfp &= ~__GFP_DIRECT_RECLAIM; else alloc_gfp |= __GFP_NORETRY; } -ritesh > > Do you recover the performance? ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/1] iomap: avoid compaction for costly folio order allocation 2026-04-04 16:47 ` Ritesh Harjani @ 2026-04-04 20:46 ` Matthew Wilcox 0 siblings, 0 replies; 4+ messages in thread From: Matthew Wilcox @ 2026-04-04 20:46 UTC (permalink / raw) To: Ritesh Harjani Cc: Salvatore Dipietro, linux-kernel, alisaidi, blakgeof, abuehaze, dipietro.salvatore, stable, Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel, linux-mm On Sat, Apr 04, 2026 at 10:17:33PM +0530, Ritesh Harjani wrote: > Matthew Wilcox <willy@infradead.org> writes: > > > On Fri, Apr 03, 2026 at 07:35:34PM +0000, Salvatore Dipietro wrote: > >> Commit 5d8edfb900d5 ("iomap: Copy larger chunks from userspace") > >> introduced high-order folio allocations in the buffered write > >> path. When memory is fragmented, each failed allocation triggers > >> compaction and drain_all_pages() via __alloc_pages_slowpath(), > >> causing a 0.75x throughput drop on pgbench (simple-update) with > >> 1024 clients on a 96-vCPU arm64 system. > >> > >> Strip __GFP_DIRECT_RECLAIM from folio allocations in > >> iomap_get_folio() when the order exceeds PAGE_ALLOC_COSTLY_ORDER, > >> making them purely opportunistic. > > > > If you look at __filemap_get_folio_mpol(), that's kind of being tried > > already: > > > > if (order > min_order) > > alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN; > > > > * %__GFP_NORETRY: The VM implementation will try only very lightweight > > * memory direct reclaim to get some memory under memory pressure (thus > > * it can sleep). It will avoid disruptive actions like OOM killer. The > > * caller must handle the failure which is quite likely to happen under > > * heavy memory pressure. The flag is suitable when failure can easily be > > * handled at small cost, such as reduced throughput. > > > > which, from the description, seemed like the right approach. So either > > the description or the implementation should be updated, I suppose? > > > > Now, what happens if you change those two lines to: > > > > if (order > min_order) { > > alloc_gfp &= ~__GFP_DIRECT_RECLAIM; > > alloc_gfp |= __GFP_NOWARN; > > } > > Hi Matthew, > > Shouldn't we try this instead? This would still allows us to keep > __GFP_NORETRY and hence light weight direct reclaim/compaction for > atleast the non-costly order allocations, right? > > if (order > min_order) { > alloc_gfp |= __GFP_NOWARN; > if (order > PAGE_ALLOC_COSTLY_ORDER) > alloc_gfp &= ~__GFP_DIRECT_RECLAIM; > else > alloc_gfp |= __GFP_NORETRY; > } Uhh ... maybe? I'd want someone more familiar with the page allocator than I am to say whether that's the right approach. If it is, that seems too complex, and maybe we need a better approach to the page allocator flags. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-04-04 20:46 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260403193535.9970-1-dipiets@amazon.it>
[not found] ` <20260403193535.9970-2-dipiets@amazon.it>
2026-04-04 1:13 ` [PATCH 1/1] iomap: avoid compaction for costly folio order allocation Ritesh Harjani
2026-04-04 4:15 ` Matthew Wilcox
2026-04-04 16:47 ` Ritesh Harjani
2026-04-04 20:46 ` Matthew Wilcox
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox