From: fujunjie <fujunjie1@qq.com>
To: Andrew Morton <akpm@linux-foundation.org>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
David Hildenbrand <david@kernel.org>
Cc: Jan Kara <jack@suse.cz>, Vlastimil Babka <vbabka@kernel.org>,
Michal Hocko <mhocko@suse.com>, Lorenzo Stoakes <ljs@kernel.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: [PATCH] mm/fadvise: avoid remote LRU drain for mapped folio failures
Date: Wed, 6 May 2026 11:23:59 +0000 [thread overview]
Message-ID: <tencent_DEDC345B4071ED3C9B293ACD437B9C8DBC0A@qq.com> (raw)
generic_fadvise(POSIX_FADV_DONTNEED) drains the local LRU batch and
then tries to invalidate the requested page-cache range. If any folio
could not be evicted, it assumes that a remote per-cpu LRU batch might
be pinning the folio, calls lru_add_drain_all(), and walks the mapping
again.
mapping_try_invalidate() currently treats every failed
mapping_evict_folio() as a possible remote-LRU-batch failure. But
mapping_evict_folio() also fails for dirty or writeback folios, mapped
folios, and folios whose filesystem-private state cannot be released. A
global drain cannot make those folios evictable, so the drain adds
latency and cross-CPU work without addressing those failure reasons.
Mapped folios are a common false positive. They may also have transient
references, but while any page in the folio is mapped, a remote LRU drain
cannot remove the page-table references that keep the folio unevictable.
POSIX_FADV_DONTNEED does not unmap userspace mappings.
Teach the folio eviction path to report whether a failure hit the
existing refcount check on a clean, unmapped folio. Only request the
global drain for that case. This preserves the existing fallback for
failures that a remote LRU drain can plausibly fix, while avoiding it
for failure reasons that a remote drain is not expected to resolve.
On a 4-vCPU, 8G QEMU/KVM guest, a mmap streaming workload scanned a
128 MiB MAP_SHARED file in 2 MiB chunks. It called POSIX_FADV_DONTNEED
on each chunk after reading it while keeping the mapping in place. 5
rounds with 256 fadvise calls per round showed:
baseline: 112116 ns/call, 256 lru_add_drain_all() calls/round
patched: 79012 ns/call, 0 lru_add_drain_all() calls/round
A separate cross-CPU fallback test exercised the case this fallback was
originally intended to protect: CPU 0 created and wrote an 8-page file,
then CPU 1 called fsync() and POSIX_FADV_DONTNEED on it. On the patched
kernel, 10/10 rounds still called lru_add_drain_all() once and the
subsequent mincore() check saw 0 resident pages. This shows that the
patch does not remove the global drain path from this cross-CPU case.
The workload is a controlled test, not production workload proof. It
exercises the fadvise path and shows that mapped folio failures no
longer trigger global drains in this setup while the cross-CPU fallback
test continues to pass.
Signed-off-by: fujunjie <fujunjie1@qq.com>
---
mm/fadvise.c | 16 ++++++++-----
mm/internal.h | 2 +-
mm/truncate.c | 64 +++++++++++++++++++++++++++++++++------------------
3 files changed, 52 insertions(+), 30 deletions(-)
diff --git a/mm/fadvise.c b/mm/fadvise.c
index b63fe21416ff2..ef26c23bf35c6 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -141,7 +141,7 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
}
if (end_index >= start_index) {
- unsigned long nr_failed = 0;
+ unsigned long nr_lru_refs = 0;
/*
* It's common to FADV_DONTNEED right after
@@ -155,14 +155,18 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
lru_add_drain();
mapping_try_invalidate(mapping, start_index, end_index,
- &nr_failed);
+ &nr_lru_refs);
/*
- * The failures may be due to the folio being
- * in the LRU cache of a remote CPU. Drain all
- * caches and try again.
+ * Some clean, unmapped folios can fail invalidation
+ * because they are still sitting in remote per-cpu LRU
+ * batches. Failures caused by dirty/writeback state,
+ * user mappings or filesystem-private release state are
+ * not helped by a remote drain, so avoid it unless
+ * mapping_try_invalidate() found a failure that could
+ * plausibly be resolved by it.
*/
- if (nr_failed) {
+ if (nr_lru_refs) {
lru_add_drain_all();
invalidate_mapping_pages(mapping, start_index,
end_index);
diff --git a/mm/internal.h b/mm/internal.h
index 5a2ddcf68e0b6..e95e691fb4a01 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -562,7 +562,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start,
loff_t end);
long mapping_evict_folio(struct address_space *mapping, struct folio *folio);
unsigned long mapping_try_invalidate(struct address_space *mapping,
- pgoff_t start, pgoff_t end, unsigned long *nr_failed);
+ pgoff_t start, pgoff_t end, unsigned long *nr_lru_refs);
/**
* folio_evictable - Test whether a folio is evictable.
diff --git a/mm/truncate.c b/mm/truncate.c
index 12cc89f89afcf..abd72f3d358eb 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -311,19 +311,11 @@ int generic_error_remove_folio(struct address_space *mapping,
}
EXPORT_SYMBOL(generic_error_remove_folio);
-/**
- * mapping_evict_folio() - Remove an unused folio from the page-cache.
- * @mapping: The mapping this folio belongs to.
- * @folio: The folio to remove.
- *
- * Safely remove one folio from the page cache.
- * It only drops clean, unused folios.
- *
- * Context: Folio must be locked.
- * Return: The number of pages successfully removed.
- */
-long mapping_evict_folio(struct address_space *mapping, struct folio *folio)
+static long __mapping_evict_folio(struct address_space *mapping,
+ struct folio *folio, bool *lru_refs)
{
+ if (lru_refs)
+ *lru_refs = false;
/* The page may have been truncated before it was locked */
if (!mapping)
return 0;
@@ -331,14 +323,38 @@ long mapping_evict_folio(struct address_space *mapping, struct folio *folio)
return 0;
/* The refcount will be elevated if any page in the folio is mapped */
if (folio_ref_count(folio) >
- folio_nr_pages(folio) + folio_has_private(folio) + 1)
+ folio_nr_pages(folio) + folio_has_private(folio) + 1) {
+ /*
+ * A remote LRU drain can only help with extra references on
+ * otherwise evictable folios. Mapped folios also have an
+ * elevated refcount, but draining LRU caches cannot unmap them.
+ */
+ if (lru_refs && !folio_mapped(folio))
+ *lru_refs = true;
return 0;
+ }
if (!filemap_release_folio(folio, 0))
return 0;
return remove_mapping(mapping, folio);
}
+/**
+ * mapping_evict_folio() - Remove an unused folio from the page-cache.
+ * @mapping: The mapping this folio belongs to.
+ * @folio: The folio to remove.
+ *
+ * Safely remove one folio from the page cache.
+ * It only drops clean, unused folios.
+ *
+ * Context: Folio must be locked.
+ * Return: The number of pages successfully removed.
+ */
+long mapping_evict_folio(struct address_space *mapping, struct folio *folio)
+{
+ return __mapping_evict_folio(mapping, folio, NULL);
+}
+
/**
* truncate_inode_pages_range - truncate range of pages specified by start & end byte offsets
* @mapping: mapping to truncate
@@ -526,13 +542,15 @@ EXPORT_SYMBOL(truncate_inode_pages_final);
* @mapping: the address_space which holds the folios to invalidate
* @start: the offset 'from' which to invalidate
* @end: the offset 'to' which to invalidate (inclusive)
- * @nr_failed: How many folio invalidations failed
+ * @nr_lru_refs: Optional counter for failures which may be due to remote
+ * per-cpu LRU refs
*
- * This function is similar to invalidate_mapping_pages(), except that it
- * returns the number of folios which could not be evicted in @nr_failed.
+ * This function is similar to invalidate_mapping_pages(), except that callers
+ * may request the number of folio eviction failures that may be resolved by
+ * draining remote per-cpu LRU batches in @nr_lru_refs.
*/
unsigned long mapping_try_invalidate(struct address_space *mapping,
- pgoff_t start, pgoff_t end, unsigned long *nr_failed)
+ pgoff_t start, pgoff_t end, unsigned long *nr_lru_refs)
{
pgoff_t indices[FOLIO_BATCH_SIZE];
struct folio_batch fbatch;
@@ -548,6 +566,7 @@ unsigned long mapping_try_invalidate(struct address_space *mapping,
for (i = 0; i < nr; i++) {
struct folio *folio = fbatch.folios[i];
+ bool lru_refs = false;
/* We rely upon deletion not changing folio->index */
@@ -557,18 +576,17 @@ unsigned long mapping_try_invalidate(struct address_space *mapping,
continue;
}
- ret = mapping_evict_folio(mapping, folio);
+ ret = __mapping_evict_folio(mapping, folio,
+ nr_lru_refs ? &lru_refs : NULL);
+ if (!ret && lru_refs)
+ (*nr_lru_refs)++;
folio_unlock(folio);
/*
* Invalidation is a hint that the folio is no longer
* of interest and try to speed up its reclaim.
*/
- if (!ret) {
+ if (!ret)
deactivate_file_folio(folio);
- /* Likely in the lru cache of a remote CPU */
- if (nr_failed)
- (*nr_failed)++;
- }
count += ret;
}
base-commit: 1b55f8358e35a67bf3969339ea7b86988af92f66
--
2.34.1
next reply other threads:[~2026-05-06 11:24 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-06 11:23 fujunjie [this message]
2026-05-08 23:35 ` [PATCH] mm/fadvise: avoid remote LRU drain for mapped folio failures Andrew Morton
2026-05-10 18:14 ` Matthew Wilcox
2026-05-11 6:52 ` Fujunjie
2026-05-11 5:47 ` David Hildenbrand (Arm)
2026-05-11 6:46 ` Fujunjie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=tencent_DEDC345B4071ED3C9B293ACD437B9C8DBC0A@qq.com \
--to=fujunjie1@qq.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox