* Re: [RFC] Unconditionally lock folios when calling rmap_walk() [not found] <CA+EESO6dR5=4zaecmYqQqOX4702wwGSTX=4+Ani_Q9+o+WUnQA@mail.gmail.com> @ 2025-08-21 4:29 ` Lokesh Gidra 2025-08-21 12:01 ` Barry Song 0 siblings, 1 reply; 7+ messages in thread From: Lokesh Gidra @ 2025-08-21 4:29 UTC (permalink / raw) To: open list:MEMORY MANAGEMENT Cc: Peter Xu, Barry Song, David Hildenbrand, Suren Baghdasaryan, Kalesh Singh, Andrew Morton, android-mm, linux-kernel, Jann Horn Adding linux-mm mailing list. Mistakenly used the wrong email address. On Wed, Aug 20, 2025 at 9:23 PM Lokesh Gidra <lokeshgidra@google.com> wrote: > > Hi all, > > Currently, some callers of rmap_walk() conditionally avoid try-locking > non-ksm anon folios. This necessitates serialization through anon_vma > write-lock when folio->mapping and/or folio->index (fields involved in > rmap_walk()) are to be updated. This hurts scalability due to coarse > granularity of the lock. For instance, when multiple threads invoke > userfaultfd’s MOVE ioctl simultaneously to move distinct pages from > the same src VMA, they all contend for the corresponding anon_vma’s > lock. Field traces for arm64 android devices reveal over 30ms of > uninterruptible sleep in the main UI thread, leading to janky user > interactions. > > Among all rmap_walk() callers that don’t lock anon folios, > folio_referenced() is the most critical (others are > page_idle_clear_pte_refs(), damon_folio_young(), and > damon_folio_mkold()). The relevant code in folio_referenced() is: > > if (!is_locked && (!folio_test_anon(folio) || folio_test_ksm(folio))) { > we_locked = folio_trylock(folio); > if (!we_locked) > return 1; > } > > It’s unclear why locking anon_vma (when updating folio->mapping) is > beneficial over locking the folio here. It’s in the reclaim path, so > should not be a critical path that necessitates some special > treatment, unless I’m missing something. > > Therefore, I propose simplifying the locking mechanism by > unconditionally try-locking the folio in such cases. This helps avoid > locking anon_vma when updating folio->mapping, which, for instance, > will help eliminate the uninterruptible sleep observed in the field > traces mentioned earlier. Furthermore, it enables us to simplify the > code in folio_lock_anon_vma_read() by removing the re-check to ensure > that the field hasn’t changed under us. > > Thanks, > Lokesh ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] Unconditionally lock folios when calling rmap_walk() 2025-08-21 4:29 ` [RFC] Unconditionally lock folios when calling rmap_walk() Lokesh Gidra @ 2025-08-21 12:01 ` Barry Song 2025-08-21 16:13 ` Zi Yan 0 siblings, 1 reply; 7+ messages in thread From: Barry Song @ 2025-08-21 12:01 UTC (permalink / raw) To: Lokesh Gidra Cc: open list:MEMORY MANAGEMENT, Peter Xu, David Hildenbrand, Suren Baghdasaryan, Kalesh Singh, Andrew Morton, android-mm, linux-kernel, Jann Horn On Thu, Aug 21, 2025 at 12:29 PM Lokesh Gidra <lokeshgidra@google.com> wrote: > > Adding linux-mm mailing list. Mistakenly used the wrong email address. > > On Wed, Aug 20, 2025 at 9:23 PM Lokesh Gidra <lokeshgidra@google.com> wrote: > > > > Hi all, > > > > Currently, some callers of rmap_walk() conditionally avoid try-locking > > non-ksm anon folios. This necessitates serialization through anon_vma > > write-lock when folio->mapping and/or folio->index (fields involved in > > rmap_walk()) are to be updated. This hurts scalability due to coarse > > granularity of the lock. For instance, when multiple threads invoke > > userfaultfd’s MOVE ioctl simultaneously to move distinct pages from > > the same src VMA, they all contend for the corresponding anon_vma’s > > lock. Field traces for arm64 android devices reveal over 30ms of > > uninterruptible sleep in the main UI thread, leading to janky user > > interactions. > > > > Among all rmap_walk() callers that don’t lock anon folios, > > folio_referenced() is the most critical (others are > > page_idle_clear_pte_refs(), damon_folio_young(), and > > damon_folio_mkold()). The relevant code in folio_referenced() is: > > > > if (!is_locked && (!folio_test_anon(folio) || folio_test_ksm(folio))) { > > we_locked = folio_trylock(folio); > > if (!we_locked) > > return 1; > > } > > > > It’s unclear why locking anon_vma (when updating folio->mapping) is > > beneficial over locking the folio here. It’s in the reclaim path, so > > should not be a critical path that necessitates some special > > treatment, unless I’m missing something. > > > > Therefore, I propose simplifying the locking mechanism by > > unconditionally try-locking the folio in such cases. This helps avoid > > locking anon_vma when updating folio->mapping, which, for instance, > > will help eliminate the uninterruptible sleep observed in the field > > traces mentioned earlier. Furthermore, it enables us to simplify the > > code in folio_lock_anon_vma_read() by removing the re-check to ensure > > that the field hasn’t changed under us. Thanks, I’m personally quite interested in this topic and will take a closer look as well. Beyond this one userfaultfd move, we’ve observed severe anon_vma lock contention between fork, unmap (process exit), and memory reclamation. This has caused noticeable UI stutters, especially when many VMAs share the same anon_vma root. Thanks Barry ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] Unconditionally lock folios when calling rmap_walk() 2025-08-21 12:01 ` Barry Song @ 2025-08-21 16:13 ` Zi Yan 2025-08-21 17:56 ` Lokesh Gidra 0 siblings, 1 reply; 7+ messages in thread From: Zi Yan @ 2025-08-21 16:13 UTC (permalink / raw) To: Lokesh Gidra, Barry Song Cc: open list:MEMORY MANAGEMENT, Peter Xu, David Hildenbrand, Suren Baghdasaryan, Kalesh Singh, Andrew Morton, android-mm, linux-kernel, Jann Horn On 21 Aug 2025, at 8:01, Barry Song wrote: > On Thu, Aug 21, 2025 at 12:29 PM Lokesh Gidra <lokeshgidra@google.com> wrote: >> >> Adding linux-mm mailing list. Mistakenly used the wrong email address. >> >> On Wed, Aug 20, 2025 at 9:23 PM Lokesh Gidra <lokeshgidra@google.com> wrote: >>> >>> Hi all, >>> >>> Currently, some callers of rmap_walk() conditionally avoid try-locking >>> non-ksm anon folios. This necessitates serialization through anon_vma >>> write-lock when folio->mapping and/or folio->index (fields involved in >>> rmap_walk()) are to be updated. This hurts scalability due to coarse >>> granularity of the lock. For instance, when multiple threads invoke >>> userfaultfd’s MOVE ioctl simultaneously to move distinct pages from >>> the same src VMA, they all contend for the corresponding anon_vma’s >>> lock. Field traces for arm64 android devices reveal over 30ms of >>> uninterruptible sleep in the main UI thread, leading to janky user >>> interactions. >>> >>> Among all rmap_walk() callers that don’t lock anon folios, >>> folio_referenced() is the most critical (others are >>> page_idle_clear_pte_refs(), damon_folio_young(), and >>> damon_folio_mkold()). The relevant code in folio_referenced() is: >>> >>> if (!is_locked && (!folio_test_anon(folio) || folio_test_ksm(folio))) { >>> we_locked = folio_trylock(folio); >>> if (!we_locked) >>> return 1; >>> } This seems to be legacy code from commit 5ad6468801d2 ("ksm: let shared pages be swappable"). From the commit log, the lock is used to protect KSM stable tree from concurrent modification. >>> >>> It’s unclear why locking anon_vma (when updating folio->mapping) is >>> beneficial over locking the folio here. It’s in the reclaim path, so >>> should not be a critical path that necessitates some special >>> treatment, unless I’m missing something. The decision was made before the first git commit 1da177e4c3f4 based on git history. Maybe it is time to revisit it and improve it. >>> >>> Therefore, I propose simplifying the locking mechanism by >>> unconditionally try-locking the folio in such cases. This helps avoid >>> locking anon_vma when updating folio->mapping, which, for instance, >>> will help eliminate the uninterruptible sleep observed in the field >>> traces mentioned earlier. Furthermore, it enables us to simplify the >>> code in folio_lock_anon_vma_read() by removing the re-check to ensure >>> that the field hasn’t changed under us. > > Thanks, I’m personally quite interested in this topic and will take a > closer look as well. Beyond this one userfaultfd move, we’ve observed > severe anon_vma lock contention between fork, unmap (process exit), and > memory reclamation. This has caused noticeable UI stutters, especially > when many VMAs share the same anon_vma root. > > Thanks > Barry -- Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] Unconditionally lock folios when calling rmap_walk() 2025-08-21 16:13 ` Zi Yan @ 2025-08-21 17:56 ` Lokesh Gidra 2025-08-22 10:36 ` Harry Yoo 0 siblings, 1 reply; 7+ messages in thread From: Lokesh Gidra @ 2025-08-21 17:56 UTC (permalink / raw) To: Zi Yan Cc: Barry Song, open list:MEMORY MANAGEMENT, Peter Xu, David Hildenbrand, Suren Baghdasaryan, Kalesh Singh, Andrew Morton, android-mm, linux-kernel, Jann Horn ( On Thu, Aug 21, 2025 at 9:14 AM Zi Yan <ziy@nvidia.com> wrote: > > On 21 Aug 2025, at 8:01, Barry Song wrote: > > > On Thu, Aug 21, 2025 at 12:29 PM Lokesh Gidra <lokeshgidra@google.com> wrote: > >> > >> Adding linux-mm mailing list. Mistakenly used the wrong email address. > >> > >> On Wed, Aug 20, 2025 at 9:23 PM Lokesh Gidra <lokeshgidra@google.com> wrote: > >>> > >>> Hi all, > >>> > >>> Currently, some callers of rmap_walk() conditionally avoid try-locking > >>> non-ksm anon folios. This necessitates serialization through anon_vma > >>> write-lock when folio->mapping and/or folio->index (fields involved in > >>> rmap_walk()) are to be updated. This hurts scalability due to coarse > >>> granularity of the lock. For instance, when multiple threads invoke > >>> userfaultfd’s MOVE ioctl simultaneously to move distinct pages from > >>> the same src VMA, they all contend for the corresponding anon_vma’s > >>> lock. Field traces for arm64 android devices reveal over 30ms of > >>> uninterruptible sleep in the main UI thread, leading to janky user > >>> interactions. > >>> > >>> Among all rmap_walk() callers that don’t lock anon folios, > >>> folio_referenced() is the most critical (others are > >>> page_idle_clear_pte_refs(), damon_folio_young(), and > >>> damon_folio_mkold()). The relevant code in folio_referenced() is: > >>> > >>> if (!is_locked && (!folio_test_anon(folio) || folio_test_ksm(folio))) { > >>> we_locked = folio_trylock(folio); > >>> if (!we_locked) > >>> return 1; > >>> } > > This seems to be legacy code from commit 5ad6468801d2 ("ksm: let shared pages be > swappable"). From the commit log, the lock is used to protect KSM stable > tree from concurrent modification. > It seems like the conditional locking of file page/folio was added in a 2004 commit edcc56dc6a7c758c ("maplock: kill page_map_lock"). Later in the commit you mentioned locking was also added for KSM, and now only non-KSM anon folios are left :-) > >>> > >>> It’s unclear why locking anon_vma (when updating folio->mapping) is > >>> beneficial over locking the folio here. It’s in the reclaim path, so > >>> should not be a critical path that necessitates some special > >>> treatment, unless I’m missing something. > > The decision was made before the first git commit 1da177e4c3f4 based on > git history. Maybe it is time to revisit it and improve it. > > > >>> > >>> Therefore, I propose simplifying the locking mechanism by > >>> unconditionally try-locking the folio in such cases. This helps avoid > >>> locking anon_vma when updating folio->mapping, which, for instance, > >>> will help eliminate the uninterruptible sleep observed in the field > >>> traces mentioned earlier. Furthermore, it enables us to simplify the > >>> code in folio_lock_anon_vma_read() by removing the re-check to ensure > >>> that the field hasn’t changed under us. > > > > Thanks, I’m personally quite interested in this topic and will take a > > closer look as well. Beyond this one userfaultfd move, we’ve observed > > severe anon_vma lock contention between fork, unmap (process exit), and > > memory reclamation. This has caused noticeable UI stutters, especially > > when many VMAs share the same anon_vma root. > > > > Thanks > > Barry > > > -- > Best Regards, > Yan, Zi ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] Unconditionally lock folios when calling rmap_walk() 2025-08-21 17:56 ` Lokesh Gidra @ 2025-08-22 10:36 ` Harry Yoo 2025-08-22 10:50 ` Lorenzo Stoakes 0 siblings, 1 reply; 7+ messages in thread From: Harry Yoo @ 2025-08-22 10:36 UTC (permalink / raw) To: Lokesh Gidra Cc: Zi Yan, Barry Song, open list:MEMORY MANAGEMENT, Peter Xu, David Hildenbrand, Suren Baghdasaryan, Kalesh Singh, Andrew Morton, android-mm, linux-kernel, Jann Horn, Lorenzo Stoakes, Rik van Riel, Vlastimil Babka, Liam R. Howlett On Thu, Aug 21, 2025 at 10:56:02AM -0700, Lokesh Gidra wrote: > On Thu, Aug 21, 2025 at 9:14 AM Zi Yan <ziy@nvidia.com> wrote: > > > > On 21 Aug 2025, at 8:01, Barry Song wrote: > > > > > On Thu, Aug 21, 2025 at 12:29 PM Lokesh Gidra <lokeshgidra@google.com> wrote: > > >> > > >> Adding linux-mm mailing list. Mistakenly used the wrong email address. [Cc'ing MEMORY MANAGEMENT - RMAP folks] > > >> On Wed, Aug 20, 2025 at 9:23 PM Lokesh Gidra <lokeshgidra@google.com> wrote: > > >>> > > >>> Hi all, > > >>> > > >>> Currently, some callers of rmap_walk() conditionally avoid try-locking > > >>> non-ksm anon folios. This necessitates serialization through anon_vma > > >>> write-lock when folio->mapping and/or folio->index (fields involved in > > >>> rmap_walk()) are to be updated. For non-ksm anon folios, rmap needs to take anon_vma lock. Simply acquiring the folio lock instead of anon_vma lock isn't enough 1) because the kernel can't stablize anon_vma without anon_vma lock (an anon_vma cannot be freed while someone's holding anon_vma lock, see anon_vma_free()). 2) without anon_vma lock the kernel can't reliably unmap folios because they can be mapped to other processes (by fork()) while the kernel is iterating list of VMAs that can possibly map the folio. fork() doens't and shouldn't acquire folio lock. 3) Holding anon_vma lock also prevents anon_vma_chains from being freed while holding the lock. [Are there more things to worry about that I missed? Please add them if so] Any idea to relax locking requirements while addressing these requirements? If some users don't care about missing some PTE A bits due to race against fork() (perhaps folio_referenced()?), a crazy idea might be to RCU-protect anon_vma_chains (but then we need to check if the VMA is still alive) and use refcount to stablize anon_vmas? > > >>> This hurts scalability due to coarse > > >>> granularity of the lock. For instance, when multiple threads invoke > > >>> userfaultfd’s MOVE ioctl simultaneously to move distinct pages from > > >>> the same src VMA, they all contend for the corresponding anon_vma’s > > >>> lock. Field traces for arm64 android devices reveal over 30ms of > > >>> uninterruptible sleep in the main UI thread, leading to janky user > > >>> interactions. > > >>> > > >>> Among all rmap_walk() callers that don’t lock anon folios, > > >>> folio_referenced() is the most critical (others are > > >>> page_idle_clear_pte_refs(), damon_folio_young(), and > > >>> damon_folio_mkold()). The relevant code in folio_referenced() is: > > >>> > > >>> if (!is_locked && (!folio_test_anon(folio) || folio_test_ksm(folio))) { > > >>> we_locked = folio_trylock(folio); > > >>> if (!we_locked) > > >>> return 1; > > >>> } > > > > This seems to be legacy code from commit 5ad6468801d2 ("ksm: let shared pages be > > swappable"). From the commit log, the lock is used to protect KSM stable > > tree from concurrent modification. > > > It seems like the conditional locking of file page/folio was added in > a 2004 commit edcc56dc6a7c758c ("maplock: kill page_map_lock"). Later > in the commit you mentioned locking was also added for KSM, and now > only non-KSM anon folios are left :-) > > > >>> It’s unclear why locking anon_vma (when updating folio->mapping) is > > >>> beneficial over locking the folio here. It’s in the reclaim path, so > > >>> should not be a critical path that necessitates some special > > >>> treatment, unless I’m missing something. > > > > The decision was made before the first git commit 1da177e4c3f4 based on > > git history. Maybe it is time to revisit it and improve it. > > > > > > >>> Therefore, I propose simplifying the locking mechanism by > > >>> unconditionally try-locking the folio in such cases. This helps avoid > > >>> locking anon_vma when updating folio->mapping, which, for instance, > > >>> will help eliminate the uninterruptible sleep observed in the field > > >>> traces mentioned earlier. As mentioned above, that isn't enough. -- Cheers, Harry / Hyeonggon > > >>> Furthermore, it enables us to simplify the > > >>> code in folio_lock_anon_vma_read() by removing the re-check to ensure > > >>> that the field hasn’t changed under us. > > > > > > Thanks, I’m personally quite interested in this topic and will take a > > > closer look as well. Beyond this one userfaultfd move, we’ve observed > > > severe anon_vma lock contention between fork, unmap (process exit), and > > > memory reclamation. This has caused noticeable UI stutters, especially > > > when many VMAs share the same anon_vma root. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] Unconditionally lock folios when calling rmap_walk() 2025-08-22 10:36 ` Harry Yoo @ 2025-08-22 10:50 ` Lorenzo Stoakes 2025-08-22 17:16 ` Lokesh Gidra 0 siblings, 1 reply; 7+ messages in thread From: Lorenzo Stoakes @ 2025-08-22 10:50 UTC (permalink / raw) To: Lokesh Gidra Cc: Harry Yoo, Zi Yan, Barry Song, open list:MEMORY MANAGEMENT, Peter Xu, David Hildenbrand, Suren Baghdasaryan, Kalesh Singh, Andrew Morton, android-mm, linux-kernel, Jann Horn, Rik van Riel, Vlastimil Babka, Liam R. Howlett Lokesh, this is a mess :/ I've been a reverse mapping co-maintainer for 4 months now, please check MAINTAINERS before sending this stuff out. It's not really impressive to learn about this 2nd hand... Also I cannot get lei to get this mail to me, no matter how hard I try. So it's _really hard_ for me to respond to this. And you label this '[RFC]' but I can't find any code (unless it's lost in the thread somehow). It should be '[DISCUSSION]'. rmap locking is _extremely_ sensitive and the discussion needs careful attention. It's hard for me to even track what's going on here or join the discussion, could you just please resend, correctly cc'ing the maintainers/reviewers of rmap, and prefix with '[DISCUSSION]'? You've already got responses here so we're inevitably going to fork the conversation, but unfortunately I don't see any way around this, because I'm going to miss all the conversation that I'm not cc'd on. Anwyay I simply can't engage on this as-is, and I really _want to_, because rmap and the locking around it are issues of PRIMARY importance to me. Please try to make my life easier :) Thanks, Lorenzo ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] Unconditionally lock folios when calling rmap_walk() 2025-08-22 10:50 ` Lorenzo Stoakes @ 2025-08-22 17:16 ` Lokesh Gidra 0 siblings, 0 replies; 7+ messages in thread From: Lokesh Gidra @ 2025-08-22 17:16 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Harry Yoo, Zi Yan, Barry Song, open list:MEMORY MANAGEMENT, Peter Xu, David Hildenbrand, Suren Baghdasaryan, Kalesh Singh, Andrew Morton, android-mm, linux-kernel, Jann Horn, Rik van Riel, Vlastimil Babka, Liam R. Howlett On Fri, Aug 22, 2025 at 3:50 AM Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > > Lokesh, this is a mess :/ > > I've been a reverse mapping co-maintainer for 4 months now, please check > MAINTAINERS before sending this stuff out. It's not really impressive to > learn about this 2nd hand... > > Also I cannot get lei to get this mail to me, no matter how hard I try. So > it's _really hard_ for me to respond to this. > > And you label this '[RFC]' but I can't find any code (unless it's lost in > the thread somehow). It should be '[DISCUSSION]'. > > rmap locking is _extremely_ sensitive and the discussion needs careful > attention. > > It's hard for me to even track what's going on here or join the discussion, > could you just please resend, correctly cc'ing the maintainers/reviewers of > rmap, and prefix with '[DISCUSSION]'? Sincere apologies. I'll resend correcting both the mistakes. > > You've already got responses here so we're inevitably going to fork the > conversation, but unfortunately I don't see any way around this, because > I'm going to miss all the conversation that I'm not cc'd on. > > Anwyay I simply can't engage on this as-is, and I really _want to_, because > rmap and the locking around it are issues of PRIMARY importance to me. > > Please try to make my life easier :) > > Thanks, Lorenzo ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-08-22 17:16 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <CA+EESO6dR5=4zaecmYqQqOX4702wwGSTX=4+Ani_Q9+o+WUnQA@mail.gmail.com> 2025-08-21 4:29 ` [RFC] Unconditionally lock folios when calling rmap_walk() Lokesh Gidra 2025-08-21 12:01 ` Barry Song 2025-08-21 16:13 ` Zi Yan 2025-08-21 17:56 ` Lokesh Gidra 2025-08-22 10:36 ` Harry Yoo 2025-08-22 10:50 ` Lorenzo Stoakes 2025-08-22 17:16 ` Lokesh Gidra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).