Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm/fadvise: avoid remote LRU drain for mapped folio failures
@ 2026-05-06 11:23 fujunjie
  2026-05-08 23:35 ` Andrew Morton
  2026-05-11  5:47 ` David Hildenbrand (Arm)
  0 siblings, 2 replies; 6+ messages in thread
From: fujunjie @ 2026-05-06 11:23 UTC (permalink / raw)
  To: Andrew Morton, Matthew Wilcox (Oracle), David Hildenbrand
  Cc: Jan Kara, Vlastimil Babka, Michal Hocko, Lorenzo Stoakes,
	Liam R. Howlett, Mike Rapoport, Suren Baghdasaryan, linux-mm,
	linux-fsdevel, linux-kernel

generic_fadvise(POSIX_FADV_DONTNEED) drains the local LRU batch and
then tries to invalidate the requested page-cache range. If any folio
could not be evicted, it assumes that a remote per-cpu LRU batch might
be pinning the folio, calls lru_add_drain_all(), and walks the mapping
again.

mapping_try_invalidate() currently treats every failed
mapping_evict_folio() as a possible remote-LRU-batch failure. But
mapping_evict_folio() also fails for dirty or writeback folios, mapped
folios, and folios whose filesystem-private state cannot be released. A
global drain cannot make those folios evictable, so the drain adds
latency and cross-CPU work without addressing those failure reasons.

Mapped folios are a common false positive. They may also have transient
references, but while any page in the folio is mapped, a remote LRU drain
cannot remove the page-table references that keep the folio unevictable.
POSIX_FADV_DONTNEED does not unmap userspace mappings.

Teach the folio eviction path to report whether a failure hit the
existing refcount check on a clean, unmapped folio. Only request the
global drain for that case. This preserves the existing fallback for
failures that a remote LRU drain can plausibly fix, while avoiding it
for failure reasons that a remote drain is not expected to resolve.

On a 4-vCPU, 8G QEMU/KVM guest, a mmap streaming workload scanned a
128 MiB MAP_SHARED file in 2 MiB chunks. It called POSIX_FADV_DONTNEED
on each chunk after reading it while keeping the mapping in place. 5
rounds with 256 fadvise calls per round showed:

  baseline: 112116 ns/call, 256 lru_add_drain_all() calls/round
  patched:   79012 ns/call,   0 lru_add_drain_all() calls/round

A separate cross-CPU fallback test exercised the case this fallback was
originally intended to protect: CPU 0 created and wrote an 8-page file,
then CPU 1 called fsync() and POSIX_FADV_DONTNEED on it. On the patched
kernel, 10/10 rounds still called lru_add_drain_all() once and the
subsequent mincore() check saw 0 resident pages. This shows that the
patch does not remove the global drain path from this cross-CPU case.

The workload is a controlled test, not production workload proof. It
exercises the fadvise path and shows that mapped folio failures no
longer trigger global drains in this setup while the cross-CPU fallback
test continues to pass.

Signed-off-by: fujunjie <fujunjie1@qq.com>
---
 mm/fadvise.c  | 16 ++++++++-----
 mm/internal.h |  2 +-
 mm/truncate.c | 64 +++++++++++++++++++++++++++++++++------------------
 3 files changed, 52 insertions(+), 30 deletions(-)

diff --git a/mm/fadvise.c b/mm/fadvise.c
index b63fe21416ff2..ef26c23bf35c6 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -141,7 +141,7 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
 		}
 
 		if (end_index >= start_index) {
-			unsigned long nr_failed = 0;
+			unsigned long nr_lru_refs = 0;
 
 			/*
 			 * It's common to FADV_DONTNEED right after
@@ -155,14 +155,18 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
 			lru_add_drain();
 
 			mapping_try_invalidate(mapping, start_index, end_index,
-					&nr_failed);
+					&nr_lru_refs);
 
 			/*
-			 * The failures may be due to the folio being
-			 * in the LRU cache of a remote CPU. Drain all
-			 * caches and try again.
+			 * Some clean, unmapped folios can fail invalidation
+			 * because they are still sitting in remote per-cpu LRU
+			 * batches.  Failures caused by dirty/writeback state,
+			 * user mappings or filesystem-private release state are
+			 * not helped by a remote drain, so avoid it unless
+			 * mapping_try_invalidate() found a failure that could
+			 * plausibly be resolved by it.
 			 */
-			if (nr_failed) {
+			if (nr_lru_refs) {
 				lru_add_drain_all();
 				invalidate_mapping_pages(mapping, start_index,
 						end_index);
diff --git a/mm/internal.h b/mm/internal.h
index 5a2ddcf68e0b6..e95e691fb4a01 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -562,7 +562,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start,
 		loff_t end);
 long mapping_evict_folio(struct address_space *mapping, struct folio *folio);
 unsigned long mapping_try_invalidate(struct address_space *mapping,
-		pgoff_t start, pgoff_t end, unsigned long *nr_failed);
+		pgoff_t start, pgoff_t end, unsigned long *nr_lru_refs);
 
 /**
  * folio_evictable - Test whether a folio is evictable.
diff --git a/mm/truncate.c b/mm/truncate.c
index 12cc89f89afcf..abd72f3d358eb 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -311,19 +311,11 @@ int generic_error_remove_folio(struct address_space *mapping,
 }
 EXPORT_SYMBOL(generic_error_remove_folio);
 
-/**
- * mapping_evict_folio() - Remove an unused folio from the page-cache.
- * @mapping: The mapping this folio belongs to.
- * @folio: The folio to remove.
- *
- * Safely remove one folio from the page cache.
- * It only drops clean, unused folios.
- *
- * Context: Folio must be locked.
- * Return: The number of pages successfully removed.
- */
-long mapping_evict_folio(struct address_space *mapping, struct folio *folio)
+static long __mapping_evict_folio(struct address_space *mapping,
+				  struct folio *folio, bool *lru_refs)
 {
+	if (lru_refs)
+		*lru_refs = false;
 	/* The page may have been truncated before it was locked */
 	if (!mapping)
 		return 0;
@@ -331,14 +323,38 @@ long mapping_evict_folio(struct address_space *mapping, struct folio *folio)
 		return 0;
 	/* The refcount will be elevated if any page in the folio is mapped */
 	if (folio_ref_count(folio) >
-			folio_nr_pages(folio) + folio_has_private(folio) + 1)
+			folio_nr_pages(folio) + folio_has_private(folio) + 1) {
+		/*
+		 * A remote LRU drain can only help with extra references on
+		 * otherwise evictable folios.  Mapped folios also have an
+		 * elevated refcount, but draining LRU caches cannot unmap them.
+		 */
+		if (lru_refs && !folio_mapped(folio))
+			*lru_refs = true;
 		return 0;
+	}
 	if (!filemap_release_folio(folio, 0))
 		return 0;
 
 	return remove_mapping(mapping, folio);
 }
 
+/**
+ * mapping_evict_folio() - Remove an unused folio from the page-cache.
+ * @mapping: The mapping this folio belongs to.
+ * @folio: The folio to remove.
+ *
+ * Safely remove one folio from the page cache.
+ * It only drops clean, unused folios.
+ *
+ * Context: Folio must be locked.
+ * Return: The number of pages successfully removed.
+ */
+long mapping_evict_folio(struct address_space *mapping, struct folio *folio)
+{
+	return __mapping_evict_folio(mapping, folio, NULL);
+}
+
 /**
  * truncate_inode_pages_range - truncate range of pages specified by start & end byte offsets
  * @mapping: mapping to truncate
@@ -526,13 +542,15 @@ EXPORT_SYMBOL(truncate_inode_pages_final);
  * @mapping: the address_space which holds the folios to invalidate
  * @start: the offset 'from' which to invalidate
  * @end: the offset 'to' which to invalidate (inclusive)
- * @nr_failed: How many folio invalidations failed
+ * @nr_lru_refs: Optional counter for failures which may be due to remote
+ *                per-cpu LRU refs
  *
- * This function is similar to invalidate_mapping_pages(), except that it
- * returns the number of folios which could not be evicted in @nr_failed.
+ * This function is similar to invalidate_mapping_pages(), except that callers
+ * may request the number of folio eviction failures that may be resolved by
+ * draining remote per-cpu LRU batches in @nr_lru_refs.
  */
 unsigned long mapping_try_invalidate(struct address_space *mapping,
-		pgoff_t start, pgoff_t end, unsigned long *nr_failed)
+		pgoff_t start, pgoff_t end, unsigned long *nr_lru_refs)
 {
 	pgoff_t indices[FOLIO_BATCH_SIZE];
 	struct folio_batch fbatch;
@@ -548,6 +566,7 @@ unsigned long mapping_try_invalidate(struct address_space *mapping,
 
 		for (i = 0; i < nr; i++) {
 			struct folio *folio = fbatch.folios[i];
+			bool lru_refs = false;
 
 			/* We rely upon deletion not changing folio->index */
 
@@ -557,18 +576,17 @@ unsigned long mapping_try_invalidate(struct address_space *mapping,
 				continue;
 			}
 
-			ret = mapping_evict_folio(mapping, folio);
+			ret = __mapping_evict_folio(mapping, folio,
+						    nr_lru_refs ? &lru_refs : NULL);
+			if (!ret && lru_refs)
+				(*nr_lru_refs)++;
 			folio_unlock(folio);
 			/*
 			 * Invalidation is a hint that the folio is no longer
 			 * of interest and try to speed up its reclaim.
 			 */
-			if (!ret) {
+			if (!ret)
 				deactivate_file_folio(folio);
-				/* Likely in the lru cache of a remote CPU */
-				if (nr_failed)
-					(*nr_failed)++;
-			}
 			count += ret;
 		}
 

base-commit: 1b55f8358e35a67bf3969339ea7b86988af92f66
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/fadvise: avoid remote LRU drain for mapped folio failures
  2026-05-06 11:23 [PATCH] mm/fadvise: avoid remote LRU drain for mapped folio failures fujunjie
@ 2026-05-08 23:35 ` Andrew Morton
  2026-05-10 18:14   ` Matthew Wilcox
  2026-05-11  5:47 ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2026-05-08 23:35 UTC (permalink / raw)
  To: fujunjie
  Cc: Matthew Wilcox (Oracle), David Hildenbrand, Jan Kara,
	Vlastimil Babka, Michal Hocko, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Suren Baghdasaryan, linux-mm, linux-fsdevel,
	linux-kernel

On Wed,  6 May 2026 11:23:59 +0000 fujunjie <fujunjie1@qq.com> wrote:

> generic_fadvise(POSIX_FADV_DONTNEED) drains the local LRU batch and
> then tries to invalidate the requested page-cache range. If any folio
> could not be evicted, it assumes that a remote per-cpu LRU batch might
> be pinning the folio, calls lru_add_drain_all(), and walks the mapping
> again.
> 
> ...
> 
> Teach the folio eviction path to report whether a failure hit the
> existing refcount check on a clean, unmapped folio. Only request the
> global drain for that case. This preserves the existing fallback for
> failures that a remote LRU drain can plausibly fix, while avoiding it
> for failure reasons that a remote drain is not expected to resolve.
> 

Thanks.  AI reviw asked one question:
	https://sashiko.dev/#/patchset/tencent_DEDC345B4071ED3C9B293ACD437B9C8DBC0A@qq.com


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/fadvise: avoid remote LRU drain for mapped folio failures
  2026-05-08 23:35 ` Andrew Morton
@ 2026-05-10 18:14   ` Matthew Wilcox
  2026-05-11  6:52     ` Fujunjie
  0 siblings, 1 reply; 6+ messages in thread
From: Matthew Wilcox @ 2026-05-10 18:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: fujunjie, David Hildenbrand, Jan Kara, Vlastimil Babka,
	Michal Hocko, Lorenzo Stoakes, Liam R. Howlett, Mike Rapoport,
	Suren Baghdasaryan, linux-mm, linux-fsdevel, linux-kernel

On Fri, May 08, 2026 at 04:35:49PM -0700, Andrew Morton wrote:
> On Wed,  6 May 2026 11:23:59 +0000 fujunjie <fujunjie1@qq.com> wrote:
> 
> > generic_fadvise(POSIX_FADV_DONTNEED) drains the local LRU batch and
> > then tries to invalidate the requested page-cache range. If any folio
> > could not be evicted, it assumes that a remote per-cpu LRU batch might
> > be pinning the folio, calls lru_add_drain_all(), and walks the mapping
> > again.
> > 
> > ...
> > 
> > Teach the folio eviction path to report whether a failure hit the
> > existing refcount check on a clean, unmapped folio. Only request the
> > global drain for that case. This preserves the existing fallback for
> > failures that a remote LRU drain can plausibly fix, while avoiding it
> > for failure reasons that a remote drain is not expected to resolve.
> > 
> 
> Thanks.  AI reviw asked one question:
> 	https://sashiko.dev/#/patchset/tencent_DEDC345B4071ED3C9B293ACD437B9C8DBC0A@qq.com

That's a pretty poor quality review.

My question is why we do this bizarre thing of switching between an
unsigned long and a bool pointer.

I would also avoid creating __mapping_evict_folio() and do the test for
folio_mapped() in mapping_try_invalidate().  It all feels a bit weird
(both this patch and the original code), and probably needs a lot more
thought, which I'm not in a position to do right now.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/fadvise: avoid remote LRU drain for mapped folio failures
  2026-05-06 11:23 [PATCH] mm/fadvise: avoid remote LRU drain for mapped folio failures fujunjie
  2026-05-08 23:35 ` Andrew Morton
@ 2026-05-11  5:47 ` David Hildenbrand (Arm)
  2026-05-11  6:46   ` Fujunjie
  1 sibling, 1 reply; 6+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-11  5:47 UTC (permalink / raw)
  To: fujunjie, Andrew Morton, Matthew Wilcox (Oracle)
  Cc: Jan Kara, Vlastimil Babka, Michal Hocko, Lorenzo Stoakes,
	Liam R. Howlett, Mike Rapoport, Suren Baghdasaryan, linux-mm,
	linux-fsdevel, linux-kernel

> -long mapping_evict_folio(struct address_space *mapping, struct folio *folio)
> +static long __mapping_evict_folio(struct address_space *mapping,
> +				  struct folio *folio, bool *lru_refs)
>  {
> +	if (lru_refs)
> +		*lru_refs = false;
>  	/* The page may have been truncated before it was locked */
>  	if (!mapping)
>  		return 0;
> @@ -331,14 +323,38 @@ long mapping_evict_folio(struct address_space *mapping, struct folio *folio)
>  		return 0;
>  	/* The refcount will be elevated if any page in the folio is mapped */
>  	if (folio_ref_count(folio) >
> -			folio_nr_pages(folio) + folio_has_private(folio) + 1)
> +			folio_nr_pages(folio) + folio_has_private(folio) + 1) {
> +		/*
> +		 * A remote LRU drain can only help with extra references on
> +		 * otherwise evictable folios.  Mapped folios also have an
> +		 * elevated refcount, but draining LRU caches cannot unmap them.
> +		 */
> +		if (lru_refs && !folio_mapped(folio))
> +			*lru_refs = true;

How can you conclude that it's a LRU refs? Could also just be something trivial
like a remaining GUP reference, or am I wrong?

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/fadvise: avoid remote LRU drain for mapped folio failures
  2026-05-11  5:47 ` David Hildenbrand (Arm)
@ 2026-05-11  6:46   ` Fujunjie
  0 siblings, 0 replies; 6+ messages in thread
From: Fujunjie @ 2026-05-11  6:46 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Jan Kara, Michal Hocko, Lorenzo Stoakes, linux-mm, linux-fsdevel,
	linux-kernel, Matthew Wilcox (Oracle), Andrew Morton



On 5/11/2026 1:47 PM, David Hildenbrand (Arm) wrote:
>> -long mapping_evict_folio(struct address_space *mapping, struct folio *folio)
>> +static long __mapping_evict_folio(struct address_space *mapping,
>> +				  struct folio *folio, bool *lru_refs)
>>  {
>> +	if (lru_refs)
>> +		*lru_refs = false;
>>  	/* The page may have been truncated before it was locked */
>>  	if (!mapping)
>>  		return 0;
>> @@ -331,14 +323,38 @@ long mapping_evict_folio(struct address_space *mapping, struct folio *folio)
>>  		return 0;
>>  	/* The refcount will be elevated if any page in the folio is mapped */
>>  	if (folio_ref_count(folio) >
>> -			folio_nr_pages(folio) + folio_has_private(folio) + 1)
>> +			folio_nr_pages(folio) + folio_has_private(folio) + 1) {
>> +		/*
>> +		 * A remote LRU drain can only help with extra references on
>> +		 * otherwise evictable folios.  Mapped folios also have an
>> +		 * elevated refcount, but draining LRU caches cannot unmap them.
>> +		 */
>> +		if (lru_refs && !folio_mapped(folio))
>> +			*lru_refs = true;
> 
> How can you conclude that it's a LRU refs? Could also just be something trivial
> like a remaining GUP reference, or am I wrong?
> 

You're right. The elevated refcount does not prove that the extra
reference is from a remote LRU batch; it could be GUP or another transient
reference.

The intent was to use this as an approximate filter: avoid the remote
drain for failures where it clearly cannot help, such as mapped folios,
rather than to prove the exact source of the extra reference.

"lru_refs" is too strong or misleading as a name. For v2 I will take
Matthew's suggestion into account as well: avoid adding
__mapping_evict_folio(), do the folio_mapped() test in
mapping_try_invalidate(), and think more carefully about where the
fallback boundary should be. I will also use naming that describes the
decision more accurately rather than implying that the extra reference
has been proven to be an LRU-batch reference.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/fadvise: avoid remote LRU drain for mapped folio failures
  2026-05-10 18:14   ` Matthew Wilcox
@ 2026-05-11  6:52     ` Fujunjie
  0 siblings, 0 replies; 6+ messages in thread
From: Fujunjie @ 2026-05-11  6:52 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: David Hildenbrand, Jan Kara, Michal Hocko, Lorenzo Stoakes,
	Suren Baghdasaryan, linux-mm, linux-fsdevel, linux-kernel,
	Andrew Morton



On 5/11/2026 2:14 AM, Matthew Wilcox wrote:
> On Fri, May 08, 2026 at 04:35:49PM -0700, Andrew Morton wrote:
>> On Wed,  6 May 2026 11:23:59 +0000 fujunjie <fujunjie1@qq.com> wrote:
>>
>>> generic_fadvise(POSIX_FADV_DONTNEED) drains the local LRU batch and
>>> then tries to invalidate the requested page-cache range. If any folio
>>> could not be evicted, it assumes that a remote per-cpu LRU batch might
>>> be pinning the folio, calls lru_add_drain_all(), and walks the mapping
>>> again.
>>>
>>> ...
>>>
>>> Teach the folio eviction path to report whether a failure hit the
>>> existing refcount check on a clean, unmapped folio. Only request the
>>> global drain for that case. This preserves the existing fallback for
>>> failures that a remote LRU drain can plausibly fix, while avoiding it
>>> for failure reasons that a remote drain is not expected to resolve.
>>>
>>
>> Thanks.  AI reviw asked one question:
>> 	https://sashiko.dev/#/patchset/tencent_DEDC345B4071ED3C9B293ACD437B9C8DBC0A@qq.com
> 
> That's a pretty poor quality review.
> 
> My question is why we do this bizarre thing of switching between an
> unsigned long and a bool pointer.
> 
> I would also avoid creating __mapping_evict_folio() and do the test for
> folio_mapped() in mapping_try_invalidate().  It all feels a bit weird
> (both this patch and the original code), and probably needs a lot more
> thought, which I'm not in a position to do right now.

Thanks,agreed.

The count is not really used as a count by the fadvise path; it only gates
whether to do one remote drain and retry. Also, adding
__mapping_evict_folio() just to pass back this bit makes the interface more
awkward than it needs to be.

I will rework this for v2 to keep mapping_evict_folio() unchanged and do
the folio_mapped() check in mapping_try_invalidate(), as you suggested.
I will also think more carefully about the fallback boundary and use less
misleading naming, since an elevated refcount does not by itself prove a
remote LRU-batch reference.




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-11  6:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-06 11:23 [PATCH] mm/fadvise: avoid remote LRU drain for mapped folio failures fujunjie
2026-05-08 23:35 ` Andrew Morton
2026-05-10 18:14   ` Matthew Wilcox
2026-05-11  6:52     ` Fujunjie
2026-05-11  5:47 ` David Hildenbrand (Arm)
2026-05-11  6:46   ` Fujunjie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox