* [PATCH v3 1/4] mm: avoid unnecessary lru drain for wp_can_reuse_anon_folio()
2026-07-01 23:59 [PATCH v3 0/4] mm: drop redundant lru_add_drain in anon folio reuse paths Barry Song (Xiaomi)
@ 2026-07-01 23:59 ` Barry Song (Xiaomi)
2026-07-02 8:05 ` David Hildenbrand (Arm)
2026-07-01 23:59 ` [PATCH v3 2/4] mm: drop stale folio_ref_count()==1 check in do_swap_page reuse logic Barry Song (Xiaomi)
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: Barry Song (Xiaomi) @ 2026-07-01 23:59 UTC (permalink / raw)
To: akpm, linux-mm
Cc: baoquan.he, chrisl, david, jp.kobryn, kasong, liam, linux-kernel,
ljs, mhocko, nphamcs, rppt, shakeel.butt, shikemeng, surenb,
usama.arif, vbabka, youngjun.park, Barry Song (Xiaomi)
There is a case where `folio_ref_count(folio) == 3` and
`!folio_test_swapcache(folio)`. In that case, both
`folio_ref_count(folio) > 3` and
`folio_ref_count(folio) > 1 + folio_test_swapcache(folio)` evaluate
false, causing an unnecessary local LRU drain.
During an Ubuntu boot, I observed over 5,000 redundant local LRU
drains. For a kernel build with a minimal configuration, I observed
more than 20,000 redundant drains.
Fix this by checking against: `1 + in_swapcache + in_lrucache`
instead of hardcoding `folio_ref_count(folio) > 3`.
Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Kairui Song <kasong@tencent.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Baoquan He <baoquan.he@linux.dev>
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
mm/memory.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index ff338c2abe92..87da78eb1abd 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4181,6 +4181,9 @@ static bool __wp_can_reuse_large_anon_folio(struct folio *folio,
static bool wp_can_reuse_anon_folio(struct folio *folio,
struct vm_area_struct *vma)
{
+ const bool in_lru_cache = !folio_test_lru(folio);
+ const bool in_swapcache = folio_test_swapcache(folio);
+
if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && folio_test_large(folio))
return __wp_can_reuse_large_anon_folio(folio, vma);
@@ -4191,15 +4194,16 @@ static bool wp_can_reuse_anon_folio(struct folio *folio,
*
* KSM doesn't necessarily raise the folio refcount.
*/
- if (folio_test_ksm(folio) || folio_ref_count(folio) > 3)
+ if (folio_test_ksm(folio) ||
+ folio_ref_count(folio) > 1 + in_lru_cache + in_swapcache)
return false;
- if (!folio_test_lru(folio))
+ if (in_lru_cache)
/*
* We cannot easily detect+handle references from
* remote LRU caches or references to LRU folios.
*/
lru_add_drain();
- if (folio_ref_count(folio) > 1 + folio_test_swapcache(folio))
+ if (folio_ref_count(folio) > 1 + in_swapcache)
return false;
if (!folio_trylock(folio))
return false;
--
2.39.3 (Apple Git-146)
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v3 1/4] mm: avoid unnecessary lru drain for wp_can_reuse_anon_folio()
2026-07-01 23:59 ` [PATCH v3 1/4] mm: avoid unnecessary lru drain for wp_can_reuse_anon_folio() Barry Song (Xiaomi)
@ 2026-07-02 8:05 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand (Arm) @ 2026-07-02 8:05 UTC (permalink / raw)
To: Barry Song (Xiaomi), akpm, linux-mm
Cc: baoquan.he, chrisl, jp.kobryn, kasong, liam, linux-kernel, ljs,
mhocko, nphamcs, rppt, shakeel.butt, shikemeng, surenb,
usama.arif, vbabka, youngjun.park
On 7/2/26 01:59, Barry Song (Xiaomi) wrote:
> There is a case where `folio_ref_count(folio) == 3` and
> `!folio_test_swapcache(folio)`. In that case, both
> `folio_ref_count(folio) > 3` and
> `folio_ref_count(folio) > 1 + folio_test_swapcache(folio)` evaluate
> false, causing an unnecessary local LRU drain.
>
> During an Ubuntu boot, I observed over 5,000 redundant local LRU
> drains. For a kernel build with a minimal configuration, I observed
> more than 20,000 redundant drains.
>
> Fix this by checking against: `1 + in_swapcache + in_lrucache`
> instead of hardcoding `folio_ref_count(folio) > 3`.
>
> Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
> Reviewed-by: Kairui Song <kasong@tencent.com>
> Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
> Reviewed-by: Baoquan He <baoquan.he@linux.dev>
> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> ---
The other folks should probably re-review this patch that changed quite a bit :)
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 2/4] mm: drop stale folio_ref_count()==1 check in do_swap_page reuse logic
2026-07-01 23:59 [PATCH v3 0/4] mm: drop redundant lru_add_drain in anon folio reuse paths Barry Song (Xiaomi)
2026-07-01 23:59 ` [PATCH v3 1/4] mm: avoid unnecessary lru drain for wp_can_reuse_anon_folio() Barry Song (Xiaomi)
@ 2026-07-01 23:59 ` Barry Song (Xiaomi)
2026-07-02 8:07 ` David Hildenbrand (Arm)
2026-07-02 8:14 ` David Hildenbrand (Arm)
2026-07-01 23:59 ` [PATCH v3 3/4] mm: entirely remove lru_add_drain in do_swap_page Barry Song (Xiaomi)
2026-07-01 23:59 ` [PATCH v3 4/4] mm: clarify the folio_free_swap() for do_swap_page() Barry Song (Xiaomi)
3 siblings, 2 replies; 9+ messages in thread
From: Barry Song (Xiaomi) @ 2026-07-01 23:59 UTC (permalink / raw)
To: akpm, linux-mm
Cc: baoquan.he, chrisl, david, jp.kobryn, kasong, liam, linux-kernel,
ljs, mhocko, nphamcs, rppt, shakeel.butt, shikemeng, surenb,
usama.arif, vbabka, youngjun.park, Barry Song (Xiaomi)
The "we just allocated them without exposing them to the swapcache"
case no longer exists, as Kairui has routed synchronous I/O through
the swapcache as well in his series "unify swapin use swap cache and
cleanup flags"[1]. As a result, folio_ref_count() should never be 1
in this path, since at least two references are held (base ref plus
swapcache). Remove the folio_ref_count()==1 check and update the
comment accordingly.
[1] https://lore.kernel.org/all/20251220-swap-table-p2-v5-0-8862a265a033@tencent.com/
Acked-by: Usama Arif <usama.arif@linux.dev>
Reviewed-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Baoquan He <baoquan.he@linux.dev>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
mm/memory.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 87da78eb1abd..71e9d394816b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5046,13 +5046,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
pte = pte_mkuffd_wp(pte);
/*
- * Same logic as in do_wp_page(); however, optimize for pages that are
- * certainly not shared either because we just allocated them without
- * exposing them to the swapcache or because the swap entry indicates
- * exclusivity.
+ * Similar logic as in do_wp_page(); however, optimize for pages that are
+ * certainly not because the swap entry indicates exclusivity.
*/
- if (!folio_test_ksm(folio) &&
- (exclusive || folio_ref_count(folio) == 1)) {
+ if (exclusive) {
if ((vma->vm_flags & VM_WRITE) && !userfaultfd_pte_wp(vma, pte) &&
!pte_needs_soft_dirty_wp(vma, pte)) {
pte = pte_mkwrite(pte, vma);
--
2.39.3 (Apple Git-146)
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v3 2/4] mm: drop stale folio_ref_count()==1 check in do_swap_page reuse logic
2026-07-01 23:59 ` [PATCH v3 2/4] mm: drop stale folio_ref_count()==1 check in do_swap_page reuse logic Barry Song (Xiaomi)
@ 2026-07-02 8:07 ` David Hildenbrand (Arm)
2026-07-02 8:14 ` David Hildenbrand (Arm)
1 sibling, 0 replies; 9+ messages in thread
From: David Hildenbrand (Arm) @ 2026-07-02 8:07 UTC (permalink / raw)
To: Barry Song (Xiaomi), akpm, linux-mm
Cc: baoquan.he, chrisl, jp.kobryn, kasong, liam, linux-kernel, ljs,
mhocko, nphamcs, rppt, shakeel.butt, shikemeng, surenb,
usama.arif, vbabka, youngjun.park
On 7/2/26 01:59, Barry Song (Xiaomi) wrote:
> The "we just allocated them without exposing them to the swapcache"
> case no longer exists, as Kairui has routed synchronous I/O through
> the swapcache as well in his series "unify swapin use swap cache and
> cleanup flags"[1]. As a result, folio_ref_count() should never be 1
> in this path, since at least two references are held (base ref plus
> swapcache). Remove the folio_ref_count()==1 check and update the
> comment accordingly.
>
> [1] https://lore.kernel.org/all/20251220-swap-table-p2-v5-0-8862a265a033@tencent.com/
>
> Acked-by: Usama Arif <usama.arif@linux.dev>
> Reviewed-by: Kairui Song <kasong@tencent.com>
> Reviewed-by: Baoquan He <baoquan.he@linux.dev>
> Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> ---
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/4] mm: drop stale folio_ref_count()==1 check in do_swap_page reuse logic
2026-07-01 23:59 ` [PATCH v3 2/4] mm: drop stale folio_ref_count()==1 check in do_swap_page reuse logic Barry Song (Xiaomi)
2026-07-02 8:07 ` David Hildenbrand (Arm)
@ 2026-07-02 8:14 ` David Hildenbrand (Arm)
1 sibling, 0 replies; 9+ messages in thread
From: David Hildenbrand (Arm) @ 2026-07-02 8:14 UTC (permalink / raw)
To: Barry Song (Xiaomi), akpm, linux-mm
Cc: baoquan.he, chrisl, jp.kobryn, kasong, liam, linux-kernel, ljs,
mhocko, nphamcs, rppt, shakeel.butt, shikemeng, surenb,
usama.arif, vbabka, youngjun.park
On 7/2/26 01:59, Barry Song (Xiaomi) wrote:
> The "we just allocated them without exposing them to the swapcache"
> case no longer exists, as Kairui has routed synchronous I/O through
> the swapcache as well in his series "unify swapin use swap cache and
> cleanup flags"[1]. As a result, folio_ref_count() should never be 1
> in this path, since at least two references are held (base ref plus
> swapcache). Remove the folio_ref_count()==1 check and update the
> comment accordingly.
Sashiko points out two minor things (one flagged as medium, lol, sure sure).
Here, you can clarify that folio_ref_count()==1 is true for freshly allocated
pages (due to the KSM check) in which case exclusive=true already.
>
> [1] https://lore.kernel.org/all/20251220-swap-table-p2-v5-0-8862a265a033@tencent.com/
>
> Acked-by: Usama Arif <usama.arif@linux.dev>
> Reviewed-by: Kairui Song <kasong@tencent.com>
> Reviewed-by: Baoquan He <baoquan.he@linux.dev>
> Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> ---
> mm/memory.c | 9 +++------
> 1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 87da78eb1abd..71e9d394816b 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5046,13 +5046,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> pte = pte_mkuffd_wp(pte);
>
> /*
> - * Same logic as in do_wp_page(); however, optimize for pages that are
> - * certainly not shared either because we just allocated them without
> - * exposing them to the swapcache or because the swap entry indicates
> - * exclusivity.
> + * Similar logic as in do_wp_page(); however, optimize for pages that are
> + * certainly not because the swap entry indicates exclusivity.
s/not/not shared/
but likely you should just simplify to "... pages that are certainly exclusive."
> */
> - if (!folio_test_ksm(folio) &&
> - (exclusive || folio_ref_count(folio) == 1)) {
> + if (exclusive) {
> if ((vma->vm_flags & VM_WRITE) && !userfaultfd_pte_wp(vma, pte) &&
> !pte_needs_soft_dirty_wp(vma, pte)) {
> pte = pte_mkwrite(pte, vma);
--
Cheers,
David
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 3/4] mm: entirely remove lru_add_drain in do_swap_page
2026-07-01 23:59 [PATCH v3 0/4] mm: drop redundant lru_add_drain in anon folio reuse paths Barry Song (Xiaomi)
2026-07-01 23:59 ` [PATCH v3 1/4] mm: avoid unnecessary lru drain for wp_can_reuse_anon_folio() Barry Song (Xiaomi)
2026-07-01 23:59 ` [PATCH v3 2/4] mm: drop stale folio_ref_count()==1 check in do_swap_page reuse logic Barry Song (Xiaomi)
@ 2026-07-01 23:59 ` Barry Song (Xiaomi)
2026-07-01 23:59 ` [PATCH v3 4/4] mm: clarify the folio_free_swap() for do_swap_page() Barry Song (Xiaomi)
3 siblings, 0 replies; 9+ messages in thread
From: Barry Song (Xiaomi) @ 2026-07-01 23:59 UTC (permalink / raw)
To: akpm, linux-mm
Cc: baoquan.he, chrisl, david, jp.kobryn, kasong, liam, linux-kernel,
ljs, mhocko, nphamcs, rppt, shakeel.butt, shikemeng, surenb,
usama.arif, vbabka, youngjun.park, Barry Song (Xiaomi)
We are doing a lot of redundant lru_add_drain() calls in
do_swap_page(), especially for synchronous I/O devices. For
example, the test program below currently ends up draining
lru_cache 100% of the time:
int main(int argc, char *argv[])
{
int i;
#define SIZE 100*1024*1024
while(1) {
volatile int *p = mmap(0, SIZE, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
for (int i = 0; i < SIZE/sizeof(int); i++)
p[i] = i%64;
madvise((void *)p, SIZE, MADV_PAGEOUT);
for (int i = 0; i < SIZE/sizeof(int); i++)
p[i] = i%64;
munmap(p, SIZE);
}
return 0;
}
Folio reuse now relies primarily on the exclusive hint, making
lru_cache draining to drop the refcount in lru_cache largely
irrelevant.
For a kernel build with a minimal configuration running in a 1 GB
memcg, this patch skips more than 43,000 redundant local LRU drains.
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Baoquan He <baoquan.he@linux.dev>
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
mm/memory.c | 10 ----------
1 file changed, 10 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 71e9d394816b..4665405ace5a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4901,16 +4901,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
} else if (folio != swapcache)
page = folio_page(folio, 0);
- /*
- * If we want to map a page that's in the swapcache writable, we
- * have to detect via the refcount if we're really the exclusive
- * owner. Try removing the extra reference from the local LRU
- * caches if required.
- */
- if ((vmf->flags & FAULT_FLAG_WRITE) &&
- !folio_test_ksm(folio) && !folio_test_lru(folio))
- lru_add_drain();
-
folio_throttle_swaprate(folio, GFP_KERNEL);
/*
--
2.39.3 (Apple Git-146)
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH v3 4/4] mm: clarify the folio_free_swap() for do_swap_page()
2026-07-01 23:59 [PATCH v3 0/4] mm: drop redundant lru_add_drain in anon folio reuse paths Barry Song (Xiaomi)
` (2 preceding siblings ...)
2026-07-01 23:59 ` [PATCH v3 3/4] mm: entirely remove lru_add_drain in do_swap_page Barry Song (Xiaomi)
@ 2026-07-01 23:59 ` Barry Song (Xiaomi)
2026-07-02 8:10 ` David Hildenbrand (Arm)
3 siblings, 1 reply; 9+ messages in thread
From: Barry Song (Xiaomi) @ 2026-07-01 23:59 UTC (permalink / raw)
To: akpm, linux-mm
Cc: baoquan.he, chrisl, david, jp.kobryn, kasong, liam, linux-kernel,
ljs, mhocko, nphamcs, rppt, shakeel.butt, shikemeng, surenb,
usama.arif, vbabka, youngjun.park, Barry Song (Xiaomi)
Since commit 4b34f1d82c654 ("mm, swap: free the swap cache after
folio is mapped"), we have relied on do_wp_page() to handle the
non-exclusive case, where the folio may either be reused or require
CoW.
As a result, using the refcount in do_swap_page() to decide when to
free the swap cache is no longer necessary, since do_wp_page() can
handle this more cleanly and consistently.
We can now simply use FAULT_FLAG_WRITE together with exclusivity to
decide when to free the swap cache in do_swap_page().
Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
mm/memory.c | 20 ++++++++------------
1 file changed, 8 insertions(+), 12 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 4665405ace5a..a29cd38ad547 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4516,7 +4516,7 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
static inline bool should_try_to_free_swap(struct swap_info_struct *si,
struct folio *folio,
struct vm_area_struct *vma,
- unsigned int extra_refs,
+ bool exclusive,
unsigned int fault_flags)
{
if (!folio_test_swapcache(folio))
@@ -4532,14 +4532,12 @@ static inline bool should_try_to_free_swap(struct swap_info_struct *si,
if (mem_cgroup_swap_full(folio) || (vma->vm_flags & VM_LOCKED) ||
folio_test_mlocked(folio))
return true;
+
/*
- * If we want to map a page that's in the swapcache writable, we
- * have to detect via the refcount if we're really the exclusive
- * user. Try freeing the swapcache to get rid of the swapcache
- * reference only in case it's likely that we'll be the exclusive user.
+ * Free the swapcache only if we are the exclusive user and
+ * this is a write fault.
*/
- return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) &&
- folio_ref_count(folio) == (extra_refs + folio_nr_pages(folio));
+ return (fault_flags & FAULT_FLAG_WRITE) && exclusive;
}
static vm_fault_t pte_marker_clear(struct vm_fault *vmf)
@@ -5043,10 +5041,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
if ((vma->vm_flags & VM_WRITE) && !userfaultfd_pte_wp(vma, pte) &&
!pte_needs_soft_dirty_wp(vma, pte)) {
pte = pte_mkwrite(pte, vma);
- if (vmf->flags & FAULT_FLAG_WRITE) {
+ if (vmf->flags & FAULT_FLAG_WRITE)
pte = pte_mkdirty(pte);
- vmf->flags &= ~FAULT_FLAG_WRITE;
- }
}
rmap_flags |= RMAP_EXCLUSIVE;
}
@@ -5086,7 +5082,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
* Do it after mapping, so raced page faults will likely see the folio
* in swap cache and wait on the folio lock.
*/
- if (should_try_to_free_swap(si, folio, vma, nr_pages, vmf->flags))
+ if (should_try_to_free_swap(si, folio, vma, exclusive, vmf->flags))
folio_free_swap(folio);
folio_unlock(folio);
@@ -5103,7 +5099,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
folio_put(swapcache);
}
- if (vmf->flags & FAULT_FLAG_WRITE) {
+ if ((vmf->flags & FAULT_FLAG_WRITE) && !pte_write(pte)) {
ret |= do_wp_page(vmf);
if (ret & VM_FAULT_ERROR)
ret &= VM_FAULT_ERROR;
--
2.39.3 (Apple Git-146)
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v3 4/4] mm: clarify the folio_free_swap() for do_swap_page()
2026-07-01 23:59 ` [PATCH v3 4/4] mm: clarify the folio_free_swap() for do_swap_page() Barry Song (Xiaomi)
@ 2026-07-02 8:10 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand (Arm) @ 2026-07-02 8:10 UTC (permalink / raw)
To: Barry Song (Xiaomi), akpm, linux-mm
Cc: baoquan.he, chrisl, jp.kobryn, kasong, liam, linux-kernel, ljs,
mhocko, nphamcs, rppt, shakeel.butt, shikemeng, surenb,
usama.arif, vbabka, youngjun.park
On 7/2/26 01:59, Barry Song (Xiaomi) wrote:
> Since commit 4b34f1d82c654 ("mm, swap: free the swap cache after
> folio is mapped"), we have relied on do_wp_page() to handle the
> non-exclusive case, where the folio may either be reused or require
> CoW.
>
> As a result, using the refcount in do_swap_page() to decide when to
> free the swap cache is no longer necessary, since do_wp_page() can
> handle this more cleanly and consistently.
>
> We can now simply use FAULT_FLAG_WRITE together with exclusivity to
> decide when to free the swap cache in do_swap_page().
>
> Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> ---
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 9+ messages in thread