From: Oscar Salvador <osalvador@suse.de>
To: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Muchun Song <muchun.song@linux.dev>,
James Houghton <jthoughton@google.com>,
Peter Xu <peterx@redhat.com>, Gavin Guo <gavinguo@igalia.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/5] mm,hugetlb: Document the reason to lock the folio in the faulting path
Date: Tue, 17 Jun 2025 12:03:13 +0200 [thread overview]
Message-ID: <aFE9YTNcCHAGBtKi@localhost.localdomain> (raw)
In-Reply-To: <1297fdd5-3de2-45bc-b146-e14061643fee@redhat.com>
On Mon, Jun 16, 2025 at 04:41:20PM +0200, David Hildenbrand wrote:
> On 16.06.25 16:10, Oscar Salvador wrote:
> > What do you mean by stable?
>
> The same "stable" you used in the doc, that I complained about ;)
Touche :-D
> > In the generic faulting path, we're not worried about the page going away
> > because we hold a reference, so I guess the lock must be to keep content stable?
>
> What you want to avoid is IIRC, is someone doing a truncation/reclaim on the
> folio while you are mapping it.
Ok, I see. I thought it was more about holding writes, but this makes sense.
> Take a look at truncate_inode_pages_range() where we do a folio_lock()
> around truncate_inode_folio().
>
> In other words, while you hold the folio lock (and verified that the folio
> was not truncated yet: for example, that folio->mapping is still set), you
> know that it cannot get truncated concurrently -- without holding other
> expensive locks.
>
> Observe how truncate_cleanup_folio() calls
>
> if (folio_mapped(folio))
> unmap_mapping_folio(folio);
>
> To remove all page table mappings.
>
> So while holding the folio lock, new page table mappings are not expected to
> appear (IIRC).
Ah ok, so it's more that we don't end up mapping something that's not there
anymore (or something completely different).
> > I mean, yes, after we have mapped the page privately into the pagetables,
> > we don't have business about content-integrity anymore, so given this rule, yes,
> > I guess hugetlb_wp() wouldn't need the lock (for !anonymous) because we already
> > have mapped it privately at that point.
>
> That's my understanding. And while holding the PTL it cannot get unmapped.
> Whenever you temporarily drop the PTL, you have to do a pte_same() check to
> make sure concurrent truncation didn't happen.
Yap, hugetlb_wp() drops the locks temporarily when it needs to unmap the private
page from other processes, but then does the pte_same() check.
> So far my understanding at least of common filemap code.
>
> >
> > But there's something I don't fully understand and makes me feel uneasy.
> > If the lock in the generic faultin path is to keep content stable till we
> > have mapped it privately, wouldn't be more correct to also hold it
> > during the copy in hugetlb_wp, to kinda emulate that?
> As long there us a page table mapping, it cannot get truncated. So if you
> find a PTE under PTL that maps that folio, truncation could not have
> happened.
I see, this makes a lot of sense, thanks for walking me through David!
Alright, then, with all this clear now we should:
- Not take any locks on hugetlb_fault()->hugetlb_wp(), hugetlb_wp() will take it
if it's an anonymous folio (re-use check)
- Drop the lock in hugetlb_no_page() after we have mapped the page in
the pagetables
- hugetlb_wp() will take the lock IFF the folio is anonymous
This will lead to something like the following:
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index dfa09fc3b2c6..4d48cda8a56d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6198,6 +6198,8 @@ static vm_fault_t hugetlb_wp(struct vm_fault *vmf)
* in scenarios that used to work. As a side effect, there can still
* be leaks between processes, for example, with FOLL_GET users.
*/
+ if (folio_test_anon(old_folio))
+ folio_lock(old_folio);
if (folio_mapcount(old_folio) == 1 && folio_test_anon(old_folio)) {
if (!PageAnonExclusive(&old_folio->page)) {
folio_move_anon_rmap(old_folio, vma);
@@ -6212,6 +6214,8 @@ static vm_fault_t hugetlb_wp(struct vm_fault *vmf)
}
VM_BUG_ON_PAGE(folio_test_anon(old_folio) &&
PageAnonExclusive(&old_folio->page), &old_folio->page);
+ if (folio_test_anon(old_folio))
+ folio_unlock(old_folio);
/*
* If the process that created a MAP_PRIVATE mapping is about to perform
@@ -6537,11 +6541,6 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
}
new_pagecache_folio = true;
} else {
- /*
- * hugetlb_wp() expects the folio to be locked in order to
- * check whether we can re-use this page exclusively for us.
- */
- folio_lock(folio);
anon_rmap = 1;
}
} else {
@@ -6558,7 +6557,8 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
/* Check for page in userfault range. */
if (userfaultfd_minor(vma)) {
- folio_unlock(folio);
+ if (!anon_rmap)
+ folio_unlock(folio);
folio_put(folio);
/* See comment in userfaultfd_missing() block above */
if (!hugetlb_pte_stable(h, mm, vmf->address, vmf->pte, vmf->orig_pte)) {
@@ -6604,6 +6604,13 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
new_pte = huge_pte_mkuffd_wp(new_pte);
set_huge_pte_at(mm, vmf->address, vmf->pte, new_pte, huge_page_size(h));
+ /*
+ * This folio cannot have been truncated since we were holding the lock,
+ * and we just mapped it into the pagetables. Drop the lock now.
+ */
+ if (!anon_rmap)
+ folio_unlock(folio);
+
hugetlb_count_add(pages_per_huge_page(h), mm);
if ((vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
/* Optimization, do the COW without a second fault */
@@ -6619,8 +6626,6 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
*/
if (new_folio)
folio_set_hugetlb_migratable(folio);
-
- folio_unlock(folio);
out:
hugetlb_vma_unlock_read(vma);
@@ -6639,8 +6644,8 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
backout_unlocked:
if (new_folio && !new_pagecache_folio)
restore_reserve_on_error(h, vma, vmf->address, folio);
-
- folio_unlock(folio);
+ if (!anon_rmap)
+ folio_unlock(folio);
folio_put(folio);
goto out;
}
@@ -6805,21 +6810,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
/* Fallthrough to CoW */
}
- /*
- * We need to lock the folio before calling hugetlb_wp().
- * Either the folio is in the pagecache and we need to copy it over
- * to another file, so it must remain stable throughout the operation,
- * or the folio is anonymous and we need to lock it in order to check
- * whether we can re-use it and mark it exclusive for this process.
- * The timespan for the lock differs depending on the type, since
- * anonymous folios only need to hold the lock while checking whether we
- * can re-use it, while we need to hold it throughout the copy in case
- * we are dealing with a folio from a pagecache.
- * Representing this difference would be tricky with the current code,
- * so just hold the lock for the duration of hugetlb_wp().
- */
folio = page_folio(pte_page(vmf.orig_pte));
- folio_lock(folio);
folio_get(folio);
if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) {
@@ -6835,7 +6826,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
flags & FAULT_FLAG_WRITE))
update_mmu_cache(vma, vmf.address, vmf.pte);
out_put_page:
- folio_unlock(folio);
folio_put(folio);
out_ptl:
spin_unlock(vmf.ptl);
This should be patch#2 with something like "Sorting out locking" per
title, and maybe explaining a bit more why the lock in hugelb_wp for
anonymous folios.
What do you think?
--
Oscar Salvador
SUSE Labs
next prev parent reply other threads:[~2025-06-17 10:03 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-12 13:46 [PATCH 0/5] Misc rework on hugetlb_fault Oscar Salvador
2025-06-12 13:46 ` [PATCH 1/5] mm,hugetlb: Change mechanism to detect a COW on private mapping Oscar Salvador
2025-06-13 13:52 ` David Hildenbrand
2025-06-12 13:46 ` [PATCH 2/5] mm,hugetlb: Document the reason to lock the folio in the faulting path Oscar Salvador
2025-06-13 13:56 ` David Hildenbrand
2025-06-13 14:23 ` Oscar Salvador
2025-06-13 19:57 ` David Hildenbrand
2025-06-13 21:47 ` Oscar Salvador
2025-06-14 9:07 ` Oscar Salvador
2025-06-16 9:22 ` David Hildenbrand
2025-06-16 14:10 ` Oscar Salvador
2025-06-16 14:41 ` David Hildenbrand
2025-06-17 10:03 ` Oscar Salvador [this message]
2025-06-17 11:27 ` David Hildenbrand
2025-06-17 12:04 ` Oscar Salvador
2025-06-17 12:08 ` David Hildenbrand
2025-06-17 12:10 ` Oscar Salvador
2025-06-17 12:50 ` Oscar Salvador
2025-06-17 13:42 ` David Hildenbrand
2025-06-17 14:00 ` Oscar Salvador
2025-06-19 11:52 ` Oscar Salvador
2025-06-12 13:46 ` [PATCH 3/5] mm,hugetlb: Conver anon_rmap into boolean Oscar Salvador
2025-06-13 13:48 ` David Hildenbrand
2025-06-12 13:47 ` [PATCH 4/5] mm,hugetlb: Drop obsolete comment about non-present pte and second faults Oscar Salvador
2025-06-12 13:47 ` [PATCH 5/5] mm,hugetlb: Drop unlikelys from hugetlb_fault Oscar Salvador
2025-06-13 8:55 ` [PATCH 0/5] Misc rework on hugetlb_fault Oscar Salvador
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aFE9YTNcCHAGBtKi@localhost.localdomain \
--to=osalvador@suse.de \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=gavinguo@igalia.com \
--cc=jthoughton@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=muchun.song@linux.dev \
--cc=peterx@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).