public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <dgc@kernel.org>
To: Salvatore Dipietro <dipiets@amazon.it>
Cc: linux-kernel@vger.kernel.org, alisaidi@amazon.com,
	blakgeof@amazon.com, abuehaze@amazon.de,
	dipietro.salvatore@gmail.com, willy@infradead.org,
	stable@vger.kernel.org, Christian Brauner <brauner@kernel.org>,
	"Darrick J. Wong" <djwong@kernel.org>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 1/1] iomap: avoid compaction for costly folio order allocation
Date: Mon, 6 Apr 2026 08:43:57 +1000	[thread overview]
Message-ID: <adLlrSZ5oRAa_Hfd@dread> (raw)
In-Reply-To: <20260403193535.9970-2-dipiets@amazon.it>

On Fri, Apr 03, 2026 at 07:35:34PM +0000, Salvatore Dipietro wrote:
> Commit 5d8edfb900d5 ("iomap: Copy larger chunks from userspace")
> introduced high-order folio allocations in the buffered write
> path. When memory is fragmented, each failed allocation triggers
> compaction and drain_all_pages() via __alloc_pages_slowpath(),
> causing a 0.75x throughput drop on pgbench (simple-update) with 
> 1024 clients on a 96-vCPU arm64 system.
> 
> Strip __GFP_DIRECT_RECLAIM from folio allocations in
> iomap_get_folio() when the order exceeds PAGE_ALLOC_COSTLY_ORDER,
> making them purely opportunistic.
> 
> Fixes: 5d8edfb900d5 ("iomap: Copy larger chunks from userspace")
> Cc: stable@vger.kernel.org
> Signed-off-by: Salvatore Dipietro <dipiets@amazon.it>
> ---
>  fs/iomap/buffered-io.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 92a831cf4bf1..cb843d54b4d9 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -715,6 +715,7 @@ EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);
>  struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len)
>  {
>  	fgf_t fgp = FGP_WRITEBEGIN | FGP_NOFS;
> +	gfp_t gfp;
>  
>  	if (iter->flags & IOMAP_NOWAIT)
>  		fgp |= FGP_NOWAIT;
> @@ -722,8 +723,20 @@ struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len)
>  		fgp |= FGP_DONTCACHE;
>  	fgp |= fgf_set_order(len);
>  
> +	gfp = mapping_gfp_mask(iter->inode->i_mapping);
> +
> +	/*
> +	 * If the folio order hint exceeds PAGE_ALLOC_COSTLY_ORDER,
> +	 * strip __GFP_DIRECT_RECLAIM to make the allocation purely
> +	 * opportunistic.  This avoids compaction + drain_all_pages()
> +	 * in __alloc_pages_slowpath() that devastate throughput
> +	 * on large systems during buffered writes.
> +	 */
> +	if (FGF_GET_ORDER(fgp) > PAGE_ALLOC_COSTLY_ORDER)
> +		gfp &= ~__GFP_DIRECT_RECLAIM;

Adding these "gfp &= ~__GFP_DIRECT_RECLAIM" hacks everywhere
we need to do high order folio allocation is getting out of hand.

Compaction improves long term system performance, so we don't really
just want to turn it off whenever we have demand for high order
folios.

We should be doing is getting rid of compaction out of the direct
reclaim path - it is -clearly- way too costly for hot paths that use
large allocations, especially those with fallbacks to smaller
allocations or vmalloc.

Instead, memory reclaim should kick background compaction and let it
do the work. If the allocation path really, really needs high order
allocation to succeed, then it can direct the allocation to retry
until it succeeds and the allocator itself can wait for background
compaction to make progress.

For code that has fallbacks to smaller allocations, then there is no
need to wait for compaction - we can attempt fast smaller allocations
and continue that way until an allocation succeeds....

-Dave.
-- 
Dave Chinner
dgc@kernel.org

  parent reply	other threads:[~2026-04-05 22:44 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260403193535.9970-1-dipiets@amazon.it>
2026-04-03 19:35 ` [PATCH 1/1] iomap: avoid compaction for costly folio order allocation Salvatore Dipietro
2026-04-04  1:13   ` Ritesh Harjani
2026-04-04  4:15   ` Matthew Wilcox
2026-04-04 16:47     ` Ritesh Harjani
2026-04-04 20:46       ` Matthew Wilcox
2026-04-05 22:43   ` Dave Chinner [this message]
2026-04-07  5:40     ` Christoph Hellwig
     [not found] <20260403193201.30479-1-dipiets@amazon.it>
2026-04-03 19:32 ` Salvatore Dipietro
2026-04-04  6:25   ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adLlrSZ5oRAa_Hfd@dread \
    --to=dgc@kernel.org \
    --cc=abuehaze@amazon.de \
    --cc=alisaidi@amazon.com \
    --cc=blakgeof@amazon.com \
    --cc=brauner@kernel.org \
    --cc=dipietro.salvatore@gmail.com \
    --cc=dipiets@amazon.it \
    --cc=djwong@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox