Re: [DISCUSSION] Unconditionally lock folios when calling rmap_walk()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Lokesh Gidra <lokeshgidra@google.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Harry Yoo <harry.yoo@oracle.com>, Zi Yan <ziy@nvidia.com>,
	Barry Song <21cnbao@gmail.com>,
	"open list:MEMORY MANAGEMENT" <linux-mm@kvack.org>,
	Peter Xu <peterx@redhat.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Kalesh Singh <kaleshsingh@google.com>,
	android-mm <android-mm@google.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Jann Horn <jannh@google.com>, Rik van Riel <riel@surriel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>
Subject: Re: [DISCUSSION] Unconditionally lock folios when calling rmap_walk()
Date: Mon, 25 Aug 2025 17:19:05 +0200	[thread overview]
Message-ID: <dc92aef8-757f-4432-923e-70d92d13fb37@redhat.com> (raw)
In-Reply-To: <CA+EESO4Z6wtX7ZMdDHQRe5jAAS_bQ-POq5+4aDx5jh2DvY6UHg@mail.gmail.com>

On 22.08.25 19:29, Lokesh Gidra wrote:
> Hi all,
> 
> Currently, some callers of rmap_walk() conditionally avoid try-locking
> non-ksm anon folios. This necessitates serialization through anon_vma
> write-lock elsewhere when folio->mapping and/or folio->index (fields
> involved in rmap_walk()) are to be updated. This hurts scalability due
> to coarse granularity of the lock. For instance, when multiple threads
> invoke userfaultfd’s MOVE ioctl simultaneously to move distinct pages
> from the same src VMA, they all contend for the corresponding
> anon_vma’s lock. Field traces for arm64 android devices reveal over
> 30ms of uninterruptible sleep in the main UI thread, leading to janky
> user interactions.
> 
> Among all rmap_walk() callers that don’t lock anon folios,
> folio_referenced() is the most critical (others are
> page_idle_clear_pte_refs(), damon_folio_young(), and
> damon_folio_mkold()). The relevant code in folio_referenced() is:
> 
> if (!is_locked && (!folio_test_anon(folio) || folio_test_ksm(folio))) {
>          we_locked = folio_trylock(folio);
>          if (!we_locked)
>                  return 1;
> }
> 
> It’s unclear why locking anon_vma exclusively (when updating
> folio->mapping, like in uffd MOVE) is beneficial over walking rmap
> with folio locked. It’s in the reclaim path, so should not be a
> critical path that necessitates some special treatment, unless I’m
> missing something.
> 
> Therefore, I propose simplifying the locking mechanism by ensuring the
> folio is locked before calling rmap_walk().

Essentially, what you mean is roughly:

diff --git a/mm/rmap.c b/mm/rmap.c
index 34333ae3bd80f..0800e73c0796e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1005,7 +1005,7 @@ int folio_referenced(struct folio *folio, int is_locked,
         if (!folio_raw_mapping(folio))
                 return 0;

-       if (!is_locked && (!folio_test_anon(folio) || folio_test_ksm(folio))) {
+       if (!is_locked) {
                 we_locked = folio_trylock(folio);
                 if (!we_locked)
                         return 1;

The downside of that change is that ordinary (!ksm) folios will observe being locked
when we are actually only trying to asses if they were referenced.

Does it matter?

I can only speculate that it might have been very relevant before
6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive").

Essentially any R/O fault would have resulted in us copying the page, simply because
there is concurrent folio_referenced() happening.

Before 09854ba94c6a ("mm: do_wp_page() simplification") that wasn't an issue, but
it would have meant that the write fault would be stuck until folio_referenced()
would be done, which is also suboptimal.

So likely, avoiding grabbing the folio lock was beneficial.

Today, this would only affect R/O pages after fork (PageAnonExclusive not set).

Staring at shrink_active_list()->folio_referenced(), we isolate the folio first
(grabbing reference+clearing LRU), so do_wp_page()->wp_can_reuse_anon_folio()
would already refuse to reuse immediately, because it would spot a raised reference.
The folio lock does not make a difference anymore.

Is there any other anon-specific (!ksm) folio locking? Nothing exciting comes to mind,
except maybe some folio splitting or khugepaged that maybe would have to wait.

But khugepaged would already also fail to isolate these folios, so probably it's not that
relevant anymore ...

-- 
Cheers

David / dhildenb

next prev parent reply	other threads:[~2025-08-25 15:19 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-22 17:29 [DISCUSSION] Unconditionally lock folios when calling rmap_walk() Lokesh Gidra
2025-08-22 17:44 ` Matthew Wilcox
2025-08-22 18:08   ` Lorenzo Stoakes
2025-08-24  4:18 ` Lokesh Gidra
2025-08-24  5:31   ` Harry Yoo
2025-08-24  6:45     ` Lokesh Gidra
2025-08-25 10:52       ` Harry Yoo
2025-08-28 11:32         ` Lorenzo Stoakes
2025-08-25 15:19 ` David Hildenbrand [this message]
2025-08-25 18:46   ` Lokesh Gidra
2025-08-28 12:04   ` Lorenzo Stoakes
2025-08-29  0:23     ` Lokesh Gidra
2025-08-29  8:42       ` David Hildenbrand
2025-08-29  9:04         ` Lorenzo Stoakes
2025-09-02 18:59           ` Lokesh Gidra
2025-09-02 19:01             ` David Hildenbrand
2025-09-02 19:04               ` Lokesh Gidra
2025-08-29  9:01       ` Lorenzo Stoakes
2025-08-26 15:52 ` Lorenzo Stoakes
2025-08-26 22:23   ` Lokesh Gidra
2025-08-28 11:31     ` Lorenzo Stoakes

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:34333ae3bd80 dfblob:0800e73c0796 )
 OR (
bs:"Re: [DISCUSSION] Unconditionally lock folios when calling rmap_walk()" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dc92aef8-757f-4432-923e-70d92d13fb37@redhat.com \
    --to=david@redhat.com \
    --cc=21cnbao@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=android-mm@google.com \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=kaleshsingh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lokeshgidra@google.com \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).