Re: [RFC] Unconditionally lock folios when calling rmap

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [RFC] Unconditionally lock folios when calling rmap_walk()
       [not found] <CA+EESO6dR5=4zaecmYqQqOX4702wwGSTX=4+Ani_Q9+o+WUnQA@mail.gmail.com>
@ 2025-08-21  4:29 ` Lokesh Gidra
  2025-08-21 12:01   ` Barry Song
  0 siblings, 1 reply; 7+ messages in thread
From: Lokesh Gidra @ 2025-08-21  4:29 UTC (permalink / raw)
  To: open list:MEMORY MANAGEMENT
  Cc: Peter Xu, Barry Song, David Hildenbrand, Suren Baghdasaryan,
	Kalesh Singh, Andrew Morton, android-mm, linux-kernel, Jann Horn

Adding linux-mm mailing list. Mistakenly used the wrong email address.

On Wed, Aug 20, 2025 at 9:23 PM Lokesh Gidra <lokeshgidra@google.com> wrote:
>
> Hi all,
>
> Currently, some callers of rmap_walk() conditionally avoid try-locking
> non-ksm anon folios. This necessitates serialization through anon_vma
> write-lock when folio->mapping and/or folio->index (fields involved in
> rmap_walk()) are to be updated. This hurts scalability due to coarse
> granularity of the lock. For instance, when multiple threads invoke
> userfaultfd’s MOVE ioctl simultaneously to move distinct pages from
> the same src VMA, they all contend for the corresponding anon_vma’s
> lock. Field traces for arm64 android devices reveal over 30ms of
> uninterruptible sleep in the main UI thread, leading to janky user
> interactions.
>
> Among all rmap_walk() callers that don’t lock anon folios,
> folio_referenced() is the most critical (others are
> page_idle_clear_pte_refs(), damon_folio_young(), and
> damon_folio_mkold()). The relevant code in folio_referenced() is:
>
> if (!is_locked && (!folio_test_anon(folio) || folio_test_ksm(folio))) {
>         we_locked = folio_trylock(folio);
>         if (!we_locked)
>                 return 1;
> }
>
> It’s unclear why locking anon_vma (when updating folio->mapping) is
> beneficial over locking the folio here. It’s in the reclaim path, so
> should not be a critical path that necessitates some special
> treatment, unless I’m missing something.
>
> Therefore, I propose simplifying the locking mechanism by
> unconditionally try-locking the folio in such cases. This helps avoid
> locking anon_vma when updating folio->mapping, which, for instance,
> will help eliminate the uninterruptible sleep observed in the field
> traces mentioned earlier. Furthermore, it enables us to simplify the
> code in folio_lock_anon_vma_read() by removing the re-check to ensure
> that the field hasn’t changed under us.
>
> Thanks,
> Lokesh


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Unconditionally lock folios when calling rmap_walk()
  2025-08-21  4:29 ` [RFC] Unconditionally lock folios when calling rmap_walk() Lokesh Gidra
@ 2025-08-21 12:01   ` Barry Song
  2025-08-21 16:13     ` Zi Yan
  0 siblings, 1 reply; 7+ messages in thread
From: Barry Song @ 2025-08-21 12:01 UTC (permalink / raw)
  To: Lokesh Gidra
  Cc: open list:MEMORY MANAGEMENT, Peter Xu, David Hildenbrand,
	Suren Baghdasaryan, Kalesh Singh, Andrew Morton, android-mm,
	linux-kernel, Jann Horn

On Thu, Aug 21, 2025 at 12:29 PM Lokesh Gidra <lokeshgidra@google.com> wrote:
>
> Adding linux-mm mailing list. Mistakenly used the wrong email address.
>
> On Wed, Aug 20, 2025 at 9:23 PM Lokesh Gidra <lokeshgidra@google.com> wrote:
> >
> > Hi all,
> >
> > Currently, some callers of rmap_walk() conditionally avoid try-locking
> > non-ksm anon folios. This necessitates serialization through anon_vma
> > write-lock when folio->mapping and/or folio->index (fields involved in
> > rmap_walk()) are to be updated. This hurts scalability due to coarse
> > granularity of the lock. For instance, when multiple threads invoke
> > userfaultfd’s MOVE ioctl simultaneously to move distinct pages from
> > the same src VMA, they all contend for the corresponding anon_vma’s
> > lock. Field traces for arm64 android devices reveal over 30ms of
> > uninterruptible sleep in the main UI thread, leading to janky user
> > interactions.
> >
> > Among all rmap_walk() callers that don’t lock anon folios,
> > folio_referenced() is the most critical (others are
> > page_idle_clear_pte_refs(), damon_folio_young(), and
> > damon_folio_mkold()). The relevant code in folio_referenced() is:
> >
> > if (!is_locked && (!folio_test_anon(folio) || folio_test_ksm(folio))) {
> >         we_locked = folio_trylock(folio);
> >         if (!we_locked)
> >                 return 1;
> > }
> >
> > It’s unclear why locking anon_vma (when updating folio->mapping) is
> > beneficial over locking the folio here. It’s in the reclaim path, so
> > should not be a critical path that necessitates some special
> > treatment, unless I’m missing something.
> >
> > Therefore, I propose simplifying the locking mechanism by
> > unconditionally try-locking the folio in such cases. This helps avoid
> > locking anon_vma when updating folio->mapping, which, for instance,
> > will help eliminate the uninterruptible sleep observed in the field
> > traces mentioned earlier. Furthermore, it enables us to simplify the
> > code in folio_lock_anon_vma_read() by removing the re-check to ensure
> > that the field hasn’t changed under us.

Thanks, I’m personally quite interested in this topic and will take a
closer look as well. Beyond this one userfaultfd move, we’ve observed
severe anon_vma lock contention between fork, unmap (process exit), and
memory reclamation. This has caused noticeable UI stutters, especially
when many VMAs share the same anon_vma root.

Thanks
Barry


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Unconditionally lock folios when calling rmap_walk()
  2025-08-21 12:01   ` Barry Song
@ 2025-08-21 16:13     ` Zi Yan
  2025-08-21 17:56       ` Lokesh Gidra
  0 siblings, 1 reply; 7+ messages in thread
From: Zi Yan @ 2025-08-21 16:13 UTC (permalink / raw)
  To: Lokesh Gidra, Barry Song
  Cc: open list:MEMORY MANAGEMENT, Peter Xu, David Hildenbrand,
	Suren Baghdasaryan, Kalesh Singh, Andrew Morton, android-mm,
	linux-kernel, Jann Horn

On 21 Aug 2025, at 8:01, Barry Song wrote:

> On Thu, Aug 21, 2025 at 12:29 PM Lokesh Gidra <lokeshgidra@google.com> wrote:
>>
>> Adding linux-mm mailing list. Mistakenly used the wrong email address.
>>
>> On Wed, Aug 20, 2025 at 9:23 PM Lokesh Gidra <lokeshgidra@google.com> wrote:
>>>
>>> Hi all,
>>>
>>> Currently, some callers of rmap_walk() conditionally avoid try-locking
>>> non-ksm anon folios. This necessitates serialization through anon_vma
>>> write-lock when folio->mapping and/or folio->index (fields involved in
>>> rmap_walk()) are to be updated. This hurts scalability due to coarse
>>> granularity of the lock. For instance, when multiple threads invoke
>>> userfaultfd’s MOVE ioctl simultaneously to move distinct pages from
>>> the same src VMA, they all contend for the corresponding anon_vma’s
>>> lock. Field traces for arm64 android devices reveal over 30ms of
>>> uninterruptible sleep in the main UI thread, leading to janky user
>>> interactions.
>>>
>>> Among all rmap_walk() callers that don’t lock anon folios,
>>> folio_referenced() is the most critical (others are
>>> page_idle_clear_pte_refs(), damon_folio_young(), and
>>> damon_folio_mkold()). The relevant code in folio_referenced() is:
>>>
>>> if (!is_locked && (!folio_test_anon(folio) || folio_test_ksm(folio))) {
>>>         we_locked = folio_trylock(folio);
>>>         if (!we_locked)
>>>                 return 1;
>>> }

This seems to be legacy code from commit 5ad6468801d2 ("ksm: let shared pages be
swappable"). From the commit log, the lock is used to protect KSM stable
tree from concurrent modification.

>>>
>>> It’s unclear why locking anon_vma (when updating folio->mapping) is
>>> beneficial over locking the folio here. It’s in the reclaim path, so
>>> should not be a critical path that necessitates some special
>>> treatment, unless I’m missing something.

The decision was made before the first git commit 1da177e4c3f4 based on
git history. Maybe it is time to revisit it and improve it.


>>>
>>> Therefore, I propose simplifying the locking mechanism by
>>> unconditionally try-locking the folio in such cases. This helps avoid
>>> locking anon_vma when updating folio->mapping, which, for instance,
>>> will help eliminate the uninterruptible sleep observed in the field
>>> traces mentioned earlier. Furthermore, it enables us to simplify the
>>> code in folio_lock_anon_vma_read() by removing the re-check to ensure
>>> that the field hasn’t changed under us.
>
> Thanks, I’m personally quite interested in this topic and will take a
> closer look as well. Beyond this one userfaultfd move, we’ve observed
> severe anon_vma lock contention between fork, unmap (process exit), and
> memory reclamation. This has caused noticeable UI stutters, especially
> when many VMAs share the same anon_vma root.
>
> Thanks
> Barry


--
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Unconditionally lock folios when calling rmap_walk()
  2025-08-21 16:13     ` Zi Yan
@ 2025-08-21 17:56       ` Lokesh Gidra
  2025-08-22 10:36         ` Harry Yoo
  0 siblings, 1 reply; 7+ messages in thread
From: Lokesh Gidra @ 2025-08-21 17:56 UTC (permalink / raw)
  To: Zi Yan
  Cc: Barry Song, open list:MEMORY MANAGEMENT, Peter Xu,
	David Hildenbrand, Suren Baghdasaryan, Kalesh Singh,
	Andrew Morton, android-mm, linux-kernel, Jann Horn

 (

On Thu, Aug 21, 2025 at 9:14 AM Zi Yan <ziy@nvidia.com> wrote:
>
> On 21 Aug 2025, at 8:01, Barry Song wrote:
>
> > On Thu, Aug 21, 2025 at 12:29 PM Lokesh Gidra <lokeshgidra@google.com> wrote:
> >>
> >> Adding linux-mm mailing list. Mistakenly used the wrong email address.
> >>
> >> On Wed, Aug 20, 2025 at 9:23 PM Lokesh Gidra <lokeshgidra@google.com> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> Currently, some callers of rmap_walk() conditionally avoid try-locking
> >>> non-ksm anon folios. This necessitates serialization through anon_vma
> >>> write-lock when folio->mapping and/or folio->index (fields involved in
> >>> rmap_walk()) are to be updated. This hurts scalability due to coarse
> >>> granularity of the lock. For instance, when multiple threads invoke
> >>> userfaultfd’s MOVE ioctl simultaneously to move distinct pages from
> >>> the same src VMA, they all contend for the corresponding anon_vma’s
> >>> lock. Field traces for arm64 android devices reveal over 30ms of
> >>> uninterruptible sleep in the main UI thread, leading to janky user
> >>> interactions.
> >>>
> >>> Among all rmap_walk() callers that don’t lock anon folios,
> >>> folio_referenced() is the most critical (others are
> >>> page_idle_clear_pte_refs(), damon_folio_young(), and
> >>> damon_folio_mkold()). The relevant code in folio_referenced() is:
> >>>
> >>> if (!is_locked && (!folio_test_anon(folio) || folio_test_ksm(folio))) {
> >>>         we_locked = folio_trylock(folio);
> >>>         if (!we_locked)
> >>>                 return 1;
> >>> }
>
> This seems to be legacy code from commit 5ad6468801d2 ("ksm: let shared pages be
> swappable"). From the commit log, the lock is used to protect KSM stable
> tree from concurrent modification.
>
It seems like the conditional locking of file page/folio was added in
a 2004 commit  edcc56dc6a7c758c ("maplock: kill page_map_lock"). Later
in the commit you mentioned locking was also added for KSM, and now
only non-KSM anon folios are left :-)

> >>>
> >>> It’s unclear why locking anon_vma (when updating folio->mapping) is
> >>> beneficial over locking the folio here. It’s in the reclaim path, so
> >>> should not be a critical path that necessitates some special
> >>> treatment, unless I’m missing something.
>
> The decision was made before the first git commit 1da177e4c3f4 based on
> git history. Maybe it is time to revisit it and improve it.
>
>
> >>>
> >>> Therefore, I propose simplifying the locking mechanism by
> >>> unconditionally try-locking the folio in such cases. This helps avoid
> >>> locking anon_vma when updating folio->mapping, which, for instance,
> >>> will help eliminate the uninterruptible sleep observed in the field
> >>> traces mentioned earlier. Furthermore, it enables us to simplify the
> >>> code in folio_lock_anon_vma_read() by removing the re-check to ensure
> >>> that the field hasn’t changed under us.
> >
> > Thanks, I’m personally quite interested in this topic and will take a
> > closer look as well. Beyond this one userfaultfd move, we’ve observed
> > severe anon_vma lock contention between fork, unmap (process exit), and
> > memory reclamation. This has caused noticeable UI stutters, especially
> > when many VMAs share the same anon_vma root.
> >
> > Thanks
> > Barry
>
>
> --
> Best Regards,
> Yan, Zi


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Unconditionally lock folios when calling rmap_walk()
  2025-08-21 17:56       ` Lokesh Gidra
@ 2025-08-22 10:36         ` Harry Yoo
  2025-08-22 10:50           ` Lorenzo Stoakes
  0 siblings, 1 reply; 7+ messages in thread
From: Harry Yoo @ 2025-08-22 10:36 UTC (permalink / raw)
  To: Lokesh Gidra
  Cc: Zi Yan, Barry Song, open list:MEMORY MANAGEMENT, Peter Xu,
	David Hildenbrand, Suren Baghdasaryan, Kalesh Singh,
	Andrew Morton, android-mm, linux-kernel, Jann Horn,
	Lorenzo Stoakes, Rik van Riel, Vlastimil Babka, Liam R. Howlett

On Thu, Aug 21, 2025 at 10:56:02AM -0700, Lokesh Gidra wrote:
> On Thu, Aug 21, 2025 at 9:14 AM Zi Yan <ziy@nvidia.com> wrote:
> >
> > On 21 Aug 2025, at 8:01, Barry Song wrote:
> >
> > > On Thu, Aug 21, 2025 at 12:29 PM Lokesh Gidra <lokeshgidra@google.com> wrote:
> > >>
> > >> Adding linux-mm mailing list. Mistakenly used the wrong email address.

[Cc'ing MEMORY MANAGEMENT - RMAP folks]

> > >> On Wed, Aug 20, 2025 at 9:23 PM Lokesh Gidra <lokeshgidra@google.com> wrote:
> > >>>
> > >>> Hi all,
> > >>>
> > >>> Currently, some callers of rmap_walk() conditionally avoid try-locking
> > >>> non-ksm anon folios. This necessitates serialization through anon_vma
> > >>> write-lock when folio->mapping and/or folio->index (fields involved in
> > >>> rmap_walk()) are to be updated.

For non-ksm anon folios, rmap needs to take anon_vma lock.

Simply acquiring the folio lock instead of anon_vma lock isn't enough
1) because the kernel can't stablize anon_vma without anon_vma lock
(an anon_vma cannot be freed while someone's holding anon_vma lock,
see anon_vma_free()).

2) without anon_vma lock the kernel can't reliably unmap folios because
they can be mapped to other processes (by fork()) while the kernel is
iterating list of VMAs that can possibly map the folio. fork() doens't
and shouldn't acquire folio lock.

3) Holding anon_vma lock also prevents anon_vma_chains from
   being freed while holding the lock.

[Are there more things to worry about that I missed?
 Please add them if so]

Any idea to relax locking requirements while addressing these
requirements?

If some users don't care about missing some PTE A bits due to race
against fork() (perhaps folio_referenced()?), a crazy idea might be to
RCU-protect anon_vma_chains (but then we need to check if the VMA is
still alive) and use refcount to stablize anon_vmas?

> > >>> This hurts scalability due to coarse
> > >>> granularity of the lock. For instance, when multiple threads invoke
> > >>> userfaultfd’s MOVE ioctl simultaneously to move distinct pages from
> > >>> the same src VMA, they all contend for the corresponding anon_vma’s
> > >>> lock. Field traces for arm64 android devices reveal over 30ms of
> > >>> uninterruptible sleep in the main UI thread, leading to janky user
> > >>> interactions.
> > >>>
> > >>> Among all rmap_walk() callers that don’t lock anon folios,
> > >>> folio_referenced() is the most critical (others are
> > >>> page_idle_clear_pte_refs(), damon_folio_young(), and
> > >>> damon_folio_mkold()). The relevant code in folio_referenced() is:
> > >>>
> > >>> if (!is_locked && (!folio_test_anon(folio) || folio_test_ksm(folio))) {
> > >>>         we_locked = folio_trylock(folio);
> > >>>         if (!we_locked)
> > >>>                 return 1;
> > >>> }
> >
> > This seems to be legacy code from commit 5ad6468801d2 ("ksm: let shared pages be
> > swappable"). From the commit log, the lock is used to protect KSM stable
> > tree from concurrent modification.
> >
> It seems like the conditional locking of file page/folio was added in
> a 2004 commit  edcc56dc6a7c758c ("maplock: kill page_map_lock"). Later
> in the commit you mentioned locking was also added for KSM, and now
> only non-KSM anon folios are left :-)
>
> > >>> It’s unclear why locking anon_vma (when updating folio->mapping) is
> > >>> beneficial over locking the folio here. It’s in the reclaim path, so
> > >>> should not be a critical path that necessitates some special
> > >>> treatment, unless I’m missing something.
> >
> > The decision was made before the first git commit 1da177e4c3f4 based on
> > git history. Maybe it is time to revisit it and improve it.
> >
> >
> > >>> Therefore, I propose simplifying the locking mechanism by
> > >>> unconditionally try-locking the folio in such cases. This helps avoid
> > >>> locking anon_vma when updating folio->mapping, which, for instance,
> > >>> will help eliminate the uninterruptible sleep observed in the field
> > >>> traces mentioned earlier.

As mentioned above, that isn't enough.

-- 
Cheers,
Harry / Hyeonggon

> > >>> Furthermore, it enables us to simplify the
> > >>> code in folio_lock_anon_vma_read() by removing the re-check to ensure
> > >>> that the field hasn’t changed under us.
> > >
> > > Thanks, I’m personally quite interested in this topic and will take a
> > > closer look as well. Beyond this one userfaultfd move, we’ve observed
> > > severe anon_vma lock contention between fork, unmap (process exit), and
> > > memory reclamation. This has caused noticeable UI stutters, especially
> > > when many VMAs share the same anon_vma root.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Unconditionally lock folios when calling rmap_walk()
  2025-08-22 10:36         ` Harry Yoo
@ 2025-08-22 10:50           ` Lorenzo Stoakes
  2025-08-22 17:16             ` Lokesh Gidra
  0 siblings, 1 reply; 7+ messages in thread
From: Lorenzo Stoakes @ 2025-08-22 10:50 UTC (permalink / raw)
  To: Lokesh Gidra
  Cc: Harry Yoo, Zi Yan, Barry Song, open list:MEMORY MANAGEMENT,
	Peter Xu, David Hildenbrand, Suren Baghdasaryan, Kalesh Singh,
	Andrew Morton, android-mm, linux-kernel, Jann Horn, Rik van Riel,
	Vlastimil Babka, Liam R. Howlett

Lokesh, this is a mess :/

I've been a reverse mapping co-maintainer for 4 months now, please check
MAINTAINERS before sending this stuff out. It's not really impressive to
learn about this 2nd hand...

Also I cannot get lei to get this mail to me, no matter how hard I try. So
it's _really hard_ for me to respond to this.

And you label this '[RFC]' but I can't find any code (unless it's lost in
the thread somehow). It should be '[DISCUSSION]'.

rmap locking is _extremely_ sensitive and the discussion needs careful
attention.

It's hard for me to even track what's going on here or join the discussion,
could you just please resend, correctly cc'ing the maintainers/reviewers of
rmap, and prefix with '[DISCUSSION]'?

You've already got responses here so we're inevitably going to fork the
conversation, but unfortunately I don't see any way around this, because
I'm going to miss all the conversation that I'm not cc'd on.

Anwyay I simply can't engage on this as-is, and I really _want to_, because
rmap and the locking around it are issues of PRIMARY importance to me.

Please try to make my life easier :)

Thanks, Lorenzo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Unconditionally lock folios when calling rmap_walk()
  2025-08-22 10:50           ` Lorenzo Stoakes
@ 2025-08-22 17:16             ` Lokesh Gidra
  0 siblings, 0 replies; 7+ messages in thread
From: Lokesh Gidra @ 2025-08-22 17:16 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Harry Yoo, Zi Yan, Barry Song, open list:MEMORY MANAGEMENT,
	Peter Xu, David Hildenbrand, Suren Baghdasaryan, Kalesh Singh,
	Andrew Morton, android-mm, linux-kernel, Jann Horn, Rik van Riel,
	Vlastimil Babka, Liam R. Howlett

On Fri, Aug 22, 2025 at 3:50 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> Lokesh, this is a mess :/
>
> I've been a reverse mapping co-maintainer for 4 months now, please check
> MAINTAINERS before sending this stuff out. It's not really impressive to
> learn about this 2nd hand...
>
> Also I cannot get lei to get this mail to me, no matter how hard I try. So
> it's _really hard_ for me to respond to this.
>
> And you label this '[RFC]' but I can't find any code (unless it's lost in
> the thread somehow). It should be '[DISCUSSION]'.
>
> rmap locking is _extremely_ sensitive and the discussion needs careful
> attention.
>
> It's hard for me to even track what's going on here or join the discussion,
> could you just please resend, correctly cc'ing the maintainers/reviewers of
> rmap, and prefix with '[DISCUSSION]'?

Sincere apologies. I'll resend correcting both the mistakes.
>
> You've already got responses here so we're inevitably going to fork the
> conversation, but unfortunately I don't see any way around this, because
> I'm going to miss all the conversation that I'm not cc'd on.
>
> Anwyay I simply can't engage on this as-is, and I really _want to_, because
> rmap and the locking around it are issues of PRIMARY importance to me.
>
> Please try to make my life easier :)
>
> Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-08-22 17:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CA+EESO6dR5=4zaecmYqQqOX4702wwGSTX=4+Ani_Q9+o+WUnQA@mail.gmail.com>
2025-08-21  4:29 ` [RFC] Unconditionally lock folios when calling rmap_walk() Lokesh Gidra
2025-08-21 12:01   ` Barry Song
2025-08-21 16:13     ` Zi Yan
2025-08-21 17:56       ` Lokesh Gidra
2025-08-22 10:36         ` Harry Yoo
2025-08-22 10:50           ` Lorenzo Stoakes
2025-08-22 17:16             ` Lokesh Gidra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).