[PATCH v2 0/4] mm: drop redundant lru_add_drain in anon folio reuse paths

Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/4] mm: drop redundant lru_add_drain in anon folio reuse paths
@ 2026-06-23 23:16 Barry Song (Xiaomi)
  2026-06-23 23:16 ` [PATCH v2 1/4] mm: avoid unnecessary lru drain for wp_can_reuse_anon_folio() Barry Song (Xiaomi)
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Barry Song (Xiaomi) @ 2026-06-23 23:16 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: baoquan.he, chrisl, david, jp.kobryn, kasong, liam, linux-kernel,
	ljs, mhocko, nphamcs, rppt, shakeel.butt, shikemeng, surenb,
	usama.arif, vbabka, youngjun.park, Barry Song (Xiaomi)

We are doing a large number of redundant lru_add_drain() calls in
both wp_can_reuse_anon_folio() and do_swap_page(), leading to LRU
lock contention and unnecessary overhead.

In wp_can_reuse_anon_folio(), we can check the refcount against the
lru_cache before deciding to drain. In do_swap_page(), the drain is
now entirely redundant after Kairui's work to route SYNC I/O through
the swapcache in the same way as ASYNC I/O.

Build the kernel within a 1GB memcg using 20 threads with zRAM swap.
The number of lru_add_drain() calls is reduced from 276,787 to
230,283, while sys time decreases slightly from 3m40.125s to
3m37.128s.

Build the kernel within an 800MB memcg using 20 threads with zRAM
swap. The number of lru_add_drain() calls is reduced from 796,661 to
537,262, while sys time decreases slightly from 6m25.981s to
6m22.678s.

-v2:
 * collect the reviewed-by and acked-by tags from Usama, Baoquan,
   Shakeel, Kairui, thanks!
 * add patch4 to free swapcache for non-LRU folios, as suggested
   by Kairui, thanks!

-RFC:
 https://lore.kernel.org/linux-mm/20260611105124.98668-1-baohua@kernel.org/ 

Barry Song (Xiaomi) (4):
  mm: avoid unnecessary lru drain for wp_can_reuse_anon_folio()
  mm: drop stale folio_ref_count()==1 check in do_swap_page reuse logic
  mm: entirely remove lru_add_drain in do_swap_page
  mm: try to free swapcache for non-LRU folios

 mm/memory.c | 30 +++++++++++++-----------------
 1 file changed, 13 insertions(+), 17 deletions(-)

-- 
2.39.3 (Apple Git-146)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/4] mm: avoid unnecessary lru drain for wp_can_reuse_anon_folio()
  2026-06-23 23:16 [PATCH v2 0/4] mm: drop redundant lru_add_drain in anon folio reuse paths Barry Song (Xiaomi)
@ 2026-06-23 23:16 ` Barry Song (Xiaomi)
  2026-06-24 10:14   ` Kairui Song
  2026-06-24 15:02   ` David Hildenbrand (Arm)
  2026-06-23 23:16 ` [PATCH v2 2/4] mm: drop stale folio_ref_count()==1 check in do_swap_page reuse logic Barry Song (Xiaomi)
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 11+ messages in thread
From: Barry Song (Xiaomi) @ 2026-06-23 23:16 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: baoquan.he, chrisl, david, jp.kobryn, kasong, liam, linux-kernel,
	ljs, mhocko, nphamcs, rppt, shakeel.butt, shikemeng, surenb,
	usama.arif, vbabka, youngjun.park, Barry Song (Xiaomi)

We always unconditionally drain the LRU before retrying anon folio
reuse in wp_can_reuse_anon_folio(). Instead, assume !LRU anon folios
are in lru_cache, and use the refcount to avoid many unnecessary LRU
drains.

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Baoquan He <baoquan.he@linux.dev>
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
 mm/memory.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index ff338c2abe92..f6848f4234a6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4193,12 +4193,18 @@ static bool wp_can_reuse_anon_folio(struct folio *folio,
 	 */
 	if (folio_test_ksm(folio) || folio_ref_count(folio) > 3)
 		return false;
-	if (!folio_test_lru(folio))
+	if (!folio_test_lru(folio)) {
+		/*
+		 * Assume folio is on lru_cache and holds a cache reference.
+		 */
+		if (folio_ref_count(folio) > 2 + folio_test_swapcache(folio))
+			return false;
 		/*
 		 * We cannot easily detect+handle references from
 		 * remote LRU caches or references to LRU folios.
 		 */
 		lru_add_drain();
+	}
 	if (folio_ref_count(folio) > 1 + folio_test_swapcache(folio))
 		return false;
 	if (!folio_trylock(folio))
-- 
2.39.3 (Apple Git-146)



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/4] mm: avoid unnecessary lru drain for wp_can_reuse_anon_folio()
  2026-06-23 23:16 ` [PATCH v2 1/4] mm: avoid unnecessary lru drain for wp_can_reuse_anon_folio() Barry Song (Xiaomi)
@ 2026-06-24 10:14   ` Kairui Song
  2026-06-24 15:02   ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 11+ messages in thread
From: Kairui Song @ 2026-06-24 10:14 UTC (permalink / raw)
  To: Barry Song (Xiaomi)
  Cc: akpm, linux-mm, baoquan.he, chrisl, david, jp.kobryn, liam,
	linux-kernel, ljs, mhocko, nphamcs, rppt, shakeel.butt, shikemeng,
	surenb, usama.arif, vbabka, youngjun.park

On Wed, Jun 24, 2026 at 7:16 AM Barry Song (Xiaomi) <baohua@kernel.org> wrote:
>
> We always unconditionally drain the LRU before retrying anon folio
> reuse in wp_can_reuse_anon_folio(). Instead, assume !LRU anon folios
> are in lru_cache, and use the refcount to avoid many unnecessary LRU
> drains.
>
> Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
> Reviewed-by: Baoquan He <baoquan.he@linux.dev>
> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> ---
>  mm/memory.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index ff338c2abe92..f6848f4234a6 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4193,12 +4193,18 @@ static bool wp_can_reuse_anon_folio(struct folio *folio,
>          */
>         if (folio_test_ksm(folio) || folio_ref_count(folio) > 3)
>                 return false;
> -       if (!folio_test_lru(folio))
> +       if (!folio_test_lru(folio)) {
> +               /*
> +                * Assume folio is on lru_cache and holds a cache reference.
> +                */
> +               if (folio_ref_count(folio) > 2 + folio_test_swapcache(folio))
> +                       return false;
>                 /*
>                  * We cannot easily detect+handle references from
>                  * remote LRU caches or references to LRU folios.
>                  */
>                 lru_add_drain();
> +       }
>         if (folio_ref_count(folio) > 1 + folio_test_swapcache(folio))
>                 return false;
>         if (!folio_trylock(folio))
> --
> 2.39.3 (Apple Git-146)
>

Reviewed-by: Kairui Song <kasong@tencent.com>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/4] mm: avoid unnecessary lru drain for wp_can_reuse_anon_folio()
  2026-06-23 23:16 ` [PATCH v2 1/4] mm: avoid unnecessary lru drain for wp_can_reuse_anon_folio() Barry Song (Xiaomi)
  2026-06-24 10:14   ` Kairui Song
@ 2026-06-24 15:02   ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 11+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-24 15:02 UTC (permalink / raw)
  To: Barry Song (Xiaomi), akpm, linux-mm
  Cc: baoquan.he, chrisl, jp.kobryn, kasong, liam, linux-kernel, ljs,
	mhocko, nphamcs, rppt, shakeel.butt, shikemeng, surenb,
	usama.arif, vbabka, youngjun.park

On 6/24/26 01:16, Barry Song (Xiaomi) wrote:
> We always unconditionally drain the LRU before retrying anon folio
> reuse in wp_can_reuse_anon_folio(). Instead, assume !LRU anon folios
> are in lru_cache, and use the refcount to avoid many unnecessary LRU
> drains.
> 
> Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
> Reviewed-by: Baoquan He <baoquan.he@linux.dev>
> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> ---
>  mm/memory.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index ff338c2abe92..f6848f4234a6 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4193,12 +4193,18 @@ static bool wp_can_reuse_anon_folio(struct folio *folio,
>  	 */
>  	if (folio_test_ksm(folio) || folio_ref_count(folio) > 3)
>  		return false;
> -	if (!folio_test_lru(folio))
> +	if (!folio_test_lru(folio)) {
> +		/*
> +		 * Assume folio is on lru_cache and holds a cache reference.
> +		 */
> +		if (folio_ref_count(folio) > 2 + folio_test_swapcache(folio))
> +			return false;

I'm not keen on making this function even uglier, so no, not like that.

We have the earlier "folio_ref_count(folio) > 3" check.

In which scenarios can you trigger this such that we would care?

If the answer is "I don't know" there is no reason for a change.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 2/4] mm: drop stale folio_ref_count()==1 check in do_swap_page reuse logic
  2026-06-23 23:16 [PATCH v2 0/4] mm: drop redundant lru_add_drain in anon folio reuse paths Barry Song (Xiaomi)
  2026-06-23 23:16 ` [PATCH v2 1/4] mm: avoid unnecessary lru drain for wp_can_reuse_anon_folio() Barry Song (Xiaomi)
@ 2026-06-23 23:16 ` Barry Song (Xiaomi)
  2026-06-24 15:07   ` David Hildenbrand (Arm)
  2026-06-23 23:16 ` [PATCH v2 3/4] mm: entirely remove lru_add_drain in do_swap_page Barry Song (Xiaomi)
  2026-06-23 23:16 ` [PATCH v2 4/4] mm: try to free swapcache for non-LRU folios Barry Song (Xiaomi)
  3 siblings, 1 reply; 11+ messages in thread
From: Barry Song (Xiaomi) @ 2026-06-23 23:16 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: baoquan.he, chrisl, david, jp.kobryn, kasong, liam, linux-kernel,
	ljs, mhocko, nphamcs, rppt, shakeel.butt, shikemeng, surenb,
	usama.arif, vbabka, youngjun.park, Barry Song (Xiaomi)

The "we just allocated them without exposing them to the swapcache"
case no longer exists, as Kairui has routed synchronous I/O through
the swapcache as well in his series "unify swapin use swap cache and
cleanup flags"[1]. As a result, folio_ref_count() should never be 1
in this path, since at least two references are held (base ref plus
swapcache). Remove the folio_ref_count()==1 check and update the
comment accordingly.

[1] https://lore.kernel.org/all/20251220-swap-table-p2-v5-0-8862a265a033@tencent.com/

Acked-by: Usama Arif <usama.arif@linux.dev>
Reviewed-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Baoquan He <baoquan.he@linux.dev>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
 mm/memory.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index f6848f4234a6..abd0adcf65f0 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5049,12 +5049,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 
 	/*
 	 * Same logic as in do_wp_page(); however, optimize for pages that are
-	 * certainly not shared either because we just allocated them without
-	 * exposing them to the swapcache or because the swap entry indicates
-	 * exclusivity.
+	 * certainly not because the swap entry indicates exclusivity.
 	 */
-	if (!folio_test_ksm(folio) &&
-	    (exclusive || folio_ref_count(folio) == 1)) {
+	if (!folio_test_ksm(folio) && exclusive) {
 		if ((vma->vm_flags & VM_WRITE) && !userfaultfd_pte_wp(vma, pte) &&
 		    !pte_needs_soft_dirty_wp(vma, pte)) {
 			pte = pte_mkwrite(pte, vma);
-- 
2.39.3 (Apple Git-146)



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/4] mm: drop stale folio_ref_count()==1 check in do_swap_page reuse logic
  2026-06-23 23:16 ` [PATCH v2 2/4] mm: drop stale folio_ref_count()==1 check in do_swap_page reuse logic Barry Song (Xiaomi)
@ 2026-06-24 15:07   ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-24 15:07 UTC (permalink / raw)
  To: Barry Song (Xiaomi), akpm, linux-mm
  Cc: baoquan.he, chrisl, jp.kobryn, kasong, liam, linux-kernel, ljs,
	mhocko, nphamcs, rppt, shakeel.butt, shikemeng, surenb,
	usama.arif, vbabka, youngjun.park

On 6/24/26 01:16, Barry Song (Xiaomi) wrote:
> The "we just allocated them without exposing them to the swapcache"
> case no longer exists, as Kairui has routed synchronous I/O through
> the swapcache as well in his series "unify swapin use swap cache and
> cleanup flags"[1]. As a result, folio_ref_count() should never be 1
> in this path, since at least two references are held (base ref plus
> swapcache). Remove the folio_ref_count()==1 check and update the
> comment accordingly.
> 
> [1] https://lore.kernel.org/all/20251220-swap-table-p2-v5-0-8862a265a033@tencent.com/
> 
> Acked-by: Usama Arif <usama.arif@linux.dev>
> Reviewed-by: Kairui Song <kasong@tencent.com>
> Reviewed-by: Baoquan He <baoquan.he@linux.dev>
> Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> ---
>  mm/memory.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index f6848f4234a6..abd0adcf65f0 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5049,12 +5049,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>  
>  	/*
>  	 * Same logic as in do_wp_page(); however, optimize for pages that are

s/Same/Similar/ ?

> -	 * certainly not shared either because we just allocated them without
> -	 * exposing them to the swapcache or because the swap entry indicates
> -	 * exclusivity.
> +	 * certainly not because the swap entry indicates exclusivity.
>  	 */
> -	if (!folio_test_ksm(folio) &&
> -	    (exclusive || folio_ref_count(folio) == 1)) {
> +	if (!folio_test_ksm(folio) && exclusive) {

Hmm, but KSM folios should never have "exclusive" set. So I think you can drop
that as well (was only relevant with folio_ref_count==1 check IIRC).

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 3/4] mm: entirely remove lru_add_drain in do_swap_page
  2026-06-23 23:16 [PATCH v2 0/4] mm: drop redundant lru_add_drain in anon folio reuse paths Barry Song (Xiaomi)
  2026-06-23 23:16 ` [PATCH v2 1/4] mm: avoid unnecessary lru drain for wp_can_reuse_anon_folio() Barry Song (Xiaomi)
  2026-06-23 23:16 ` [PATCH v2 2/4] mm: drop stale folio_ref_count()==1 check in do_swap_page reuse logic Barry Song (Xiaomi)
@ 2026-06-23 23:16 ` Barry Song (Xiaomi)
  2026-06-24 10:16   ` Kairui Song
  2026-06-24 15:10   ` David Hildenbrand (Arm)
  2026-06-23 23:16 ` [PATCH v2 4/4] mm: try to free swapcache for non-LRU folios Barry Song (Xiaomi)
  3 siblings, 2 replies; 11+ messages in thread
From: Barry Song (Xiaomi) @ 2026-06-23 23:16 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: baoquan.he, chrisl, david, jp.kobryn, kasong, liam, linux-kernel,
	ljs, mhocko, nphamcs, rppt, shakeel.butt, shikemeng, surenb,
	usama.arif, vbabka, youngjun.park, Barry Song (Xiaomi)

We are doing a lot of redundant lru_add_drain() calls in
do_swap_page(), especially for synchronous I/O devices. For
example, the test program below currently ends up draining
lru_cache 100% of the time:

int main(int argc, char *argv[])
{
        int i;
 #define SIZE 100*1024*1024
	while(1) {
		volatile int *p = mmap(0, SIZE, PROT_READ | PROT_WRITE,
                        MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

		for (int i = 0; i < SIZE/sizeof(int); i++)
			p[i] =  i%64;
		madvise((void *)p, SIZE, MADV_PAGEOUT);
		for (int i = 0; i < SIZE/sizeof(int); i++)
			p[i] =  i%64;
		munmap(p, SIZE);
	}
	return 0;
}

Folio reuse now relies primarily on the exclusive hint, making
lru_cache draining to drop the refcount in lru_cache largely
irrelevant.

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Baoquan He <baoquan.he@linux.dev>
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
 mm/memory.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index abd0adcf65f0..2983a6baf474 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4903,16 +4903,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	} else if (folio != swapcache)
 		page = folio_page(folio, 0);
 
-	/*
-	 * If we want to map a page that's in the swapcache writable, we
-	 * have to detect via the refcount if we're really the exclusive
-	 * owner. Try removing the extra reference from the local LRU
-	 * caches if required.
-	 */
-	if ((vmf->flags & FAULT_FLAG_WRITE) &&
-	    !folio_test_ksm(folio) && !folio_test_lru(folio))
-		lru_add_drain();
-
 	folio_throttle_swaprate(folio, GFP_KERNEL);
 
 	/*
-- 
2.39.3 (Apple Git-146)



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/4] mm: entirely remove lru_add_drain in do_swap_page
  2026-06-23 23:16 ` [PATCH v2 3/4] mm: entirely remove lru_add_drain in do_swap_page Barry Song (Xiaomi)
@ 2026-06-24 10:16   ` Kairui Song
  2026-06-24 15:10   ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 11+ messages in thread
From: Kairui Song @ 2026-06-24 10:16 UTC (permalink / raw)
  To: Barry Song (Xiaomi)
  Cc: akpm, linux-mm, baoquan.he, chrisl, david, jp.kobryn, liam,
	linux-kernel, ljs, mhocko, nphamcs, rppt, shakeel.butt, shikemeng,
	surenb, usama.arif, vbabka, youngjun.park

On Wed, Jun 24, 2026 at 7:18 AM Barry Song (Xiaomi) <baohua@kernel.org> wrote:
>
> We are doing a lot of redundant lru_add_drain() calls in
> do_swap_page(), especially for synchronous I/O devices. For
> example, the test program below currently ends up draining
> lru_cache 100% of the time:
>
> int main(int argc, char *argv[])
> {
>         int i;
>  #define SIZE 100*1024*1024
>         while(1) {
>                 volatile int *p = mmap(0, SIZE, PROT_READ | PROT_WRITE,
>                         MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>
>                 for (int i = 0; i < SIZE/sizeof(int); i++)
>                         p[i] =  i%64;
>                 madvise((void *)p, SIZE, MADV_PAGEOUT);
>                 for (int i = 0; i < SIZE/sizeof(int); i++)
>                         p[i] =  i%64;
>                 munmap(p, SIZE);
>         }
>         return 0;
> }
>
> Folio reuse now relies primarily on the exclusive hint, making
> lru_cache draining to drop the refcount in lru_cache largely
> irrelevant.
>
> Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
> Reviewed-by: Baoquan He <baoquan.he@linux.dev>
> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> ---
>  mm/memory.c | 10 ----------
>  1 file changed, 10 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index abd0adcf65f0..2983a6baf474 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4903,16 +4903,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>         } else if (folio != swapcache)
>                 page = folio_page(folio, 0);
>
> -       /*
> -        * If we want to map a page that's in the swapcache writable, we
> -        * have to detect via the refcount if we're really the exclusive
> -        * owner. Try removing the extra reference from the local LRU
> -        * caches if required.
> -        */
> -       if ((vmf->flags & FAULT_FLAG_WRITE) &&
> -           !folio_test_ksm(folio) && !folio_test_lru(folio))
> -               lru_add_drain();
> -
>         folio_throttle_swaprate(folio, GFP_KERNEL);
>
>         /*
> --
> 2.39.3 (Apple Git-146)
>

Thanks, I saw the previous discussed problem is address in the next
patch, it's totally fine so:

Reviewed-by: Kairui Song <kasong@tencent.com>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/4] mm: entirely remove lru_add_drain in do_swap_page
  2026-06-23 23:16 ` [PATCH v2 3/4] mm: entirely remove lru_add_drain in do_swap_page Barry Song (Xiaomi)
  2026-06-24 10:16   ` Kairui Song
@ 2026-06-24 15:10   ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 11+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-24 15:10 UTC (permalink / raw)
  To: Barry Song (Xiaomi), akpm, linux-mm
  Cc: baoquan.he, chrisl, jp.kobryn, kasong, liam, linux-kernel, ljs,
	mhocko, nphamcs, rppt, shakeel.butt, shikemeng, surenb,
	usama.arif, vbabka, youngjun.park

On 6/24/26 01:16, Barry Song (Xiaomi) wrote:
> We are doing a lot of redundant lru_add_drain() calls in
> do_swap_page(), especially for synchronous I/O devices. For
> example, the test program below currently ends up draining
> lru_cache 100% of the time:
> 
> int main(int argc, char *argv[])
> {
>         int i;
>  #define SIZE 100*1024*1024
> 	while(1) {
> 		volatile int *p = mmap(0, SIZE, PROT_READ | PROT_WRITE,
>                         MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> 
> 		for (int i = 0; i < SIZE/sizeof(int); i++)
> 			p[i] =  i%64;
> 		madvise((void *)p, SIZE, MADV_PAGEOUT);
> 		for (int i = 0; i < SIZE/sizeof(int); i++)
> 			p[i] =  i%64;
> 		munmap(p, SIZE);
> 	}
> 	return 0;
> }
> 
> Folio reuse now relies primarily on the exclusive hint, making
> lru_cache draining to drop the refcount in lru_cache largely
> irrelevant.

Makes sense, we'll fallback to do_wp_page() where we handle the non-exclusive
either way.

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 4/4] mm: try to free swapcache for non-LRU folios
  2026-06-23 23:16 [PATCH v2 0/4] mm: drop redundant lru_add_drain in anon folio reuse paths Barry Song (Xiaomi)
                   ` (2 preceding siblings ...)
  2026-06-23 23:16 ` [PATCH v2 3/4] mm: entirely remove lru_add_drain in do_swap_page Barry Song (Xiaomi)
@ 2026-06-23 23:16 ` Barry Song (Xiaomi)
  2026-06-24 15:20   ` David Hildenbrand (Arm)
  3 siblings, 1 reply; 11+ messages in thread
From: Barry Song (Xiaomi) @ 2026-06-23 23:16 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: baoquan.he, chrisl, david, jp.kobryn, kasong, liam, linux-kernel,
	ljs, mhocko, nphamcs, rppt, shakeel.butt, shikemeng, surenb,
	usama.arif, vbabka, youngjun.park, Barry Song (Xiaomi),
	Kairui Song

Originally, we unconditionally called lru_add_drain() for write
swap-in page faults. This might drop the reference held by the per-CPU
LRU cache if the folio happened to reside there. However, there was no
guarantee that the folio was actually cached on the current CPU.

Now that lru_add_drain() has been removed, we have lost one
opportunity to drop a reference held by the LRU cache. We could
instead incorporate that possibility into the condition evaluated by
should_try_to_free_swap().

Suggested-by: Kairui Song <ryncsn@gmail.com>
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
 mm/memory.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index 2983a6baf474..14577c67c61a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5087,8 +5087,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	 * Remove the swap entry and conditionally try to free up the swapcache.
 	 * Do it after mapping, so raced page faults will likely see the folio
 	 * in swap cache and wait on the folio lock.
+	 * Assume non-LRU folios may be queued in the LRU cache, which contributes
+	 * an additional reference to the folio.
 	 */
-	if (should_try_to_free_swap(si, folio, vma, nr_pages, vmf->flags))
+	if (should_try_to_free_swap(si, folio, vma, nr_pages +
+			!folio_test_lru(folio), vmf->flags))
 		folio_free_swap(folio);
 
 	folio_unlock(folio);
-- 
2.39.3 (Apple Git-146)



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 4/4] mm: try to free swapcache for non-LRU folios
  2026-06-23 23:16 ` [PATCH v2 4/4] mm: try to free swapcache for non-LRU folios Barry Song (Xiaomi)
@ 2026-06-24 15:20   ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-24 15:20 UTC (permalink / raw)
  To: Barry Song (Xiaomi), akpm, linux-mm
  Cc: baoquan.he, chrisl, jp.kobryn, kasong, liam, linux-kernel, ljs,
	mhocko, nphamcs, rppt, shakeel.butt, shikemeng, surenb,
	usama.arif, vbabka, youngjun.park, Kairui Song

On 6/24/26 01:16, Barry Song (Xiaomi) wrote:
> Originally, we unconditionally called lru_add_drain() for write
> swap-in page faults. This might drop the reference held by the per-CPU
> LRU cache if the folio happened to reside there. However, there was no
> guarantee that the folio was actually cached on the current CPU.
> 
> Now that lru_add_drain() has been removed, we have lost one
> opportunity to drop a reference held by the LRU cache. We could
> instead incorporate that possibility into the condition evaluated by
> should_try_to_free_swap().
> 
> Suggested-by: Kairui Song <ryncsn@gmail.com>
> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> ---
>  mm/memory.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 2983a6baf474..14577c67c61a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5087,8 +5087,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>  	 * Remove the swap entry and conditionally try to free up the swapcache.
>  	 * Do it after mapping, so raced page faults will likely see the folio
>  	 * in swap cache and wait on the folio lock.
> +	 * Assume non-LRU folios may be queued in the LRU cache, which contributes
> +	 * an additional reference to the folio.
>  	 */
> -	if (should_try_to_free_swap(si, folio, vma, nr_pages, vmf->flags))
> +	if (should_try_to_free_swap(si, folio, vma, nr_pages +
> +			!folio_test_lru(folio), vmf->flags))
>  		folio_free_swap(folio);
>  
>  	folio_unlock(folio);

Hm, in wp_can_reuse_anon_folio() we'll try dropping the swapcache ourselves.

So I wonder if we still need that handling ("If we want to map a page that's in
the swapcache writable, we ...") at all?


Ahh, I see the problem now:

commit 4b34f1d82c6549837b2061096dea249e881a4495
Author: Kairui Song <kasong@tencent.com>
Date:   Sat Dec 20 03:43:35 2025 +0800

    mm, swap: free the swap cache after folio is mapped

    Currently, we remove the folio from the swap cache and free the swap cache
    before mapping the PTE.  To reduce repeated faults due to parallel swapins
    of the same PTE, change it to remove the folio from the swap cache after
    it is mapped.  So new faults from the swap PTE will be much more likely to
    see the folio in the swap cache and wait on it.

    This does not eliminate all swapin races: an ongoing swapin fault may
    still see an empty swap cache.  That's harmless, as the PTE is changed
    before the swap cache is cleared, so it will just return and not trigger
    any repeated faults.  This does help to reduce the chance.


That changed that behavior such that we *must* now always fallback to do_wp_page().

What a mess (I didn't ack)

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-06-24 15:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-23 23:16 [PATCH v2 0/4] mm: drop redundant lru_add_drain in anon folio reuse paths Barry Song (Xiaomi)
2026-06-23 23:16 ` [PATCH v2 1/4] mm: avoid unnecessary lru drain for wp_can_reuse_anon_folio() Barry Song (Xiaomi)
2026-06-24 10:14   ` Kairui Song
2026-06-24 15:02   ` David Hildenbrand (Arm)
2026-06-23 23:16 ` [PATCH v2 2/4] mm: drop stale folio_ref_count()==1 check in do_swap_page reuse logic Barry Song (Xiaomi)
2026-06-24 15:07   ` David Hildenbrand (Arm)
2026-06-23 23:16 ` [PATCH v2 3/4] mm: entirely remove lru_add_drain in do_swap_page Barry Song (Xiaomi)
2026-06-24 10:16   ` Kairui Song
2026-06-24 15:10   ` David Hildenbrand (Arm)
2026-06-23 23:16 ` [PATCH v2 4/4] mm: try to free swapcache for non-LRU folios Barry Song (Xiaomi)
2026-06-24 15:20   ` David Hildenbrand (Arm)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox