public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Ritesh Harjani <ritesh.list@gmail.com>
Cc: Salvatore Dipietro <dipiets@amazon.it>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Vlastimil Babka <vbabka@suse.com>,
	abuehaze@amazon.com, alisaidi@amazon.com, blakgeof@amazon.com,
	brauner@kernel.org, dipietro.salvatore@gmail.com,
	djwong@kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org,
	stable@vger.kernel.org
Subject: Re: [PATCH 1/1] iomap: avoid compaction for costly folio order allocation
Date: Sun, 3 May 2026 12:55:48 +0100	[thread overview]
Message-ID: <afc3xFgKogxF5Lbq@casper.infradead.org> (raw)
In-Reply-To: <a4uhrqet.ritesh.list@gmail.com>

On Sun, May 03, 2026 at 11:22:10AM +0530, Ritesh Harjani wrote:
> Now this is what I believe could be the reason for memory fragmentation
> with this workload - 
> In Linux, each PTE page table uses 4KB size (assuming you are using 4KB
> system PAGE_SIZE). When your workload forks a
> child process for each new connection, child gets its own copy of the
> page tables which maps the shared buffer.
> Since each PTE table is a single 4KB page, hundreds of connections
> spawning means hundreds of thousands of single-page allocations for page
> tables. So it looks like, the major source of your memory fragmentation
> problem must be these several order-0 allocations for PTE page table
> pages.

While memory is fragmented, the _problem_ is that we try too hard to
defragment.  From the original post:

: When memory is fragmented, each failed allocation triggers
: compaction and drain_all_pages() via __alloc_pages_slowpath()

We really should only try compaction once.  If it didn't make useful
progress last time, it won't this time either.

> > | Patch                |    Run 1   |    Run 2   |    Run 3   |   Average   | % vs Baseline |
> > |----------------------|-----------:|-----------:|-----------:|------------:|:-------------:|
> > | Baseline             | 107,064.61 |  97,043.86 | 101,830.78 | 101,979.75  |       —       |
> > | Proposed patch       | 146,012.23 | 136,392.36 | 141,178.00 | 141,194.20  |    +38.45%    |
> > | Ritesh's suggestion  | 147,481.50 | 133,069.03 | 137,051.30 | 139,200.61  |    +36.50%    |
> > | Matthew's suggestion | 145,653.91 | 144,169.24 | 141,768.31 | 143,863.82  |    +41.07%    |
> 
> 
> The main reason, why I proposed the below patch was because, this only
> affects costly order allocation (i.e for order > PAGE_ALLOC_COSTLY_ORDER)
> by skipping direct reclaim for those orders, while still keeping the
> behaviour same for others.
> 
> So, for smaller orders (order > min_order and <=
> PAGE_ALLOC_COSTLY_ORDER), the allocator will still attempt for direct
> reclaim and compaction (which I guess is required to avoid oom too?) And
> also, this looks like a change which could be easily backportable :)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 4e636647100c..f2343c26dd63 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2007,8 +2007,13 @@ struct folio *__filemap_get_folio_mpol(struct address_space *mapping,
>  			gfp_t alloc_gfp = gfp;
>  
>  			err = -ENOMEM;
> -			if (order > min_order)
> -				alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
> +			if (order > min_order) {
> +				alloc_gfp |= __GFP_NOWARN;
> +				if (order > PAGE_ALLOC_COSTLY_ORDER)
> +					alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
> +				else
> +					alloc_gfp |= __GFP_NORETRY;
> +			}
> 
> 
> But of course let's hear from others on their suggestions / thoughts.
> Maybe the filemap is not the right place to fix this as Matthew, Andrew
> and others were pointing. Any other suggestions on how to approach this,
> please?

filemap.c REALLY shouldn't know about PAGE_ALLOC_COSTLY_ORDER.
That's an internal detail of the memory allocator.

Either we want an API to say "allocate me a folio between orders A and B"
or we need more understandable GFP flags.  Or the page allocator could
use the __GFP_NORETRY flag to say "oh well, this allocation has a fallback,
I'll kick kcompactd to try to compact some more memory, but I'll fail
the allocation".

  reply	other threads:[~2026-05-03 11:55 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-03 19:35 [PATCH 0/1] iomap: avoid compaction for costly folio order allocation Salvatore Dipietro
2026-04-03 19:35 ` [PATCH 1/1] " Salvatore Dipietro
2026-04-04  1:13   ` Ritesh Harjani
2026-04-04  4:15   ` Matthew Wilcox
2026-04-04 16:47     ` Ritesh Harjani
2026-04-04 20:46       ` Matthew Wilcox
2026-04-16 15:14       ` Ritesh Harjani
2026-04-20 16:33         ` Salvatore Dipietro
2026-04-20 18:44           ` Matthew Wilcox
2026-04-21  1:16             ` Ritesh Harjani
2026-04-28 15:02               ` Salvatore Dipietro
2026-05-03  5:52                 ` Ritesh Harjani
2026-05-03 11:55                   ` Matthew Wilcox [this message]
2026-04-05 22:43   ` Dave Chinner
2026-04-07  5:40     ` Christoph Hellwig
2026-04-21  9:02     ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afc3xFgKogxF5Lbq@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=abuehaze@amazon.com \
    --cc=akpm@linux-foundation.org \
    --cc=alisaidi@amazon.com \
    --cc=blakgeof@amazon.com \
    --cc=brauner@kernel.org \
    --cc=dipietro.salvatore@gmail.com \
    --cc=dipiets@amazon.it \
    --cc=djwong@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ritesh.list@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox