From: Roman Gushchin <roman.gushchin@linux.dev>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org,
Roman Gushchin <roman.gushchin@linux.dev>,
Jan Kara <jack@suse.cz>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Dev Jain <dev.jain@arm.com>,
linux-mm@kvack.org
Subject: [PATCH v2] mm: readahead: make thp readahead conditional to mmap_miss logic
Date: Sun, 5 Oct 2025 18:54:09 -0700 [thread overview]
Message-ID: <20251006015409.342697-1-roman.gushchin@linux.dev> (raw)
Commit 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file mappings")
introduced a special handling for VM_HUGEPAGE mappings: even if the
readahead is disabled, 1 or 2 HPAGE_PMD_ORDER pages are
allocated.
This change causes a significant regression for containers with a
tight memory.max limit, if VM_HUGEPAGE is widely used. Prior to this
commit, mmap_miss logic would eventually lead to the readahead
disablement, effectively reducing the memory pressure in the
cgroup. With this change the kernel is trying to allocate 1-2 huge
pages for each fault, no matter if these pages are used or not
before being evicted, increasing the memory pressure multi-fold.
To fix the regression, let's make the new VM_HUGEPAGE conditional
to the mmap_miss check, but keep independent from the ra->ra_pages.
This way the main intention of commit 4687fdbb805a ("mm/filemap:
Support VM_HUGEPAGE for file mappings") stays intact, but the
regression is resolved.
The logic behind this changes is simple: even if a user explicitly
requests using huge pages to back the file mapping (using VM_HUGEPAGE
flag), under a very strong memory pressure it's better to fall back
to ordinary pages.
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: linux-mm@kvack.org
--
v2: fixed VM_SEQ_READ handling (by Dev Jain)
---
mm/filemap.c | 42 ++++++++++++++++++++++--------------------
1 file changed, 22 insertions(+), 20 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index a52dd38d2b4a..446e591d57e5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3235,37 +3235,23 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
DEFINE_READAHEAD(ractl, file, ra, mapping, vmf->pgoff);
struct file *fpin = NULL;
vm_flags_t vm_flags = vmf->vma->vm_flags;
+ bool force_thp_readahead = false;
unsigned short mmap_miss;
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
/* Use the readahead code, even if readahead is disabled */
- if ((vm_flags & VM_HUGEPAGE) && HPAGE_PMD_ORDER <= MAX_PAGECACHE_ORDER) {
- fpin = maybe_unlock_mmap_for_io(vmf, fpin);
- ractl._index &= ~((unsigned long)HPAGE_PMD_NR - 1);
- ra->size = HPAGE_PMD_NR;
- /*
- * Fetch two PMD folios, so we get the chance to actually
- * readahead, unless we've been told not to.
- */
- if (!(vm_flags & VM_RAND_READ))
- ra->size *= 2;
- ra->async_size = HPAGE_PMD_NR;
- ra->order = HPAGE_PMD_ORDER;
- page_cache_ra_order(&ractl, ra);
- return fpin;
- }
-#endif
-
+ if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
+ (vm_flags & VM_HUGEPAGE) && HPAGE_PMD_ORDER <= MAX_PAGECACHE_ORDER)
+ force_thp_readahead = true;
/*
* If we don't want any read-ahead, don't bother. VM_EXEC case below is
* already intended for random access.
*/
if ((vm_flags & (VM_RAND_READ | VM_EXEC)) == VM_RAND_READ)
return fpin;
- if (!ra->ra_pages)
+ if (!ra->ra_pages && !force_thp_readahead)
return fpin;
- if (vm_flags & VM_SEQ_READ) {
+ if ((vm_flags & VM_SEQ_READ) && !force_thp_readahead) {
fpin = maybe_unlock_mmap_for_io(vmf, fpin);
page_cache_sync_ra(&ractl, ra->ra_pages);
return fpin;
@@ -3283,6 +3269,22 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
if (mmap_miss > MMAP_LOTSAMISS)
return fpin;
+ if (force_thp_readahead) {
+ fpin = maybe_unlock_mmap_for_io(vmf, fpin);
+ ractl._index &= ~((unsigned long)HPAGE_PMD_NR - 1);
+ ra->size = HPAGE_PMD_NR;
+ /*
+ * Fetch two PMD folios, so we get the chance to actually
+ * readahead, unless we've been told not to.
+ */
+ if (!(vm_flags & VM_RAND_READ))
+ ra->size *= 2;
+ ra->async_size = HPAGE_PMD_NR;
+ ra->order = HPAGE_PMD_ORDER;
+ page_cache_ra_order(&ractl, ra);
+ return fpin;
+ }
+
if (vm_flags & VM_EXEC) {
/*
* Allow arch to request a preferred minimum folio order for
--
2.51.0
next reply other threads:[~2025-10-06 1:54 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-06 1:54 Roman Gushchin [this message]
2025-10-06 4:48 ` [PATCH v2] mm: readahead: make thp readahead conditional to mmap_miss logic Dev Jain
2025-10-06 12:31 ` Jan Kara
2025-10-06 17:44 ` Roman Gushchin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251006015409.342697-1-roman.gushchin@linux.dev \
--to=roman.gushchin@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=dev.jain@arm.com \
--cc=jack@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.