Re: [PATCH 2/5] mm,hugetlb: Document the reason to lock the folio in the faulting path

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Oscar Salvador <osalvador@suse.de>
To: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Muchun Song <muchun.song@linux.dev>,
	James Houghton <jthoughton@google.com>,
	Peter Xu <peterx@redhat.com>, Gavin Guo <gavinguo@igalia.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/5] mm,hugetlb: Document the reason to lock the folio in the faulting path
Date: Tue, 17 Jun 2025 12:03:13 +0200	[thread overview]
Message-ID: <aFE9YTNcCHAGBtKi@localhost.localdomain> (raw)
In-Reply-To: <1297fdd5-3de2-45bc-b146-e14061643fee@redhat.com>

On Mon, Jun 16, 2025 at 04:41:20PM +0200, David Hildenbrand wrote:
> On 16.06.25 16:10, Oscar Salvador wrote:
> > What do you mean by stable?
> 
> The same "stable" you used in the doc, that I complained about ;)

Touche :-D

> > In the generic faulting path, we're not worried about the page going away
> > because we hold a reference, so I guess the lock must be to keep content stable?
> 
> What you want to avoid is IIRC, is someone doing a truncation/reclaim on the
> folio while you are mapping it.

Ok, I see. I thought it was more about holding writes, but this makes sense.

> Take a look at truncate_inode_pages_range() where we do a folio_lock()
> around truncate_inode_folio().
> 
> In other words, while you hold the folio lock (and verified that the folio
> was not truncated yet: for example, that folio->mapping is still set), you
> know that it cannot get truncated concurrently -- without holding other
> expensive locks.
> 
> Observe how truncate_cleanup_folio() calls
> 
> 	if (folio_mapped(folio))
> 		unmap_mapping_folio(folio);
> 
> To remove all page table mappings.
> 
> So while holding the folio lock, new page table mappings are not expected to
> appear (IIRC).

Ah ok, so it's more that we don't end up mapping something that's not there
anymore (or something completely different).

> > I mean, yes, after we have mapped the page privately into the pagetables,
> > we don't have business about content-integrity anymore, so given this rule, yes,
> > I guess hugetlb_wp() wouldn't need the lock (for !anonymous) because we already
> > have mapped it privately at that point.
> 
> That's my understanding. And while holding the PTL it cannot get unmapped.
> Whenever you temporarily drop the PTL, you have to do a pte_same() check to
> make sure concurrent truncation didn't happen.

Yap, hugetlb_wp() drops the locks temporarily when it needs to unmap the private
page from other processes, but then does the pte_same() check.

> So far my understanding at least of common filemap code.
> 
> > 
> > But there's something I don't fully understand and makes me feel uneasy.
> > If the lock in the generic faultin path is to keep content stable till we
> > have mapped it privately, wouldn't be more correct to also hold it
> > during the copy in hugetlb_wp, to kinda emulate that?
> As long there us a page table mapping, it cannot get truncated. So if you
> find a PTE under PTL that maps that folio, truncation could not have
> happened.

I see, this makes a lot of sense, thanks for walking me through David!
Alright, then, with all this clear now we should:

- Not take any locks on hugetlb_fault()->hugetlb_wp(), hugetlb_wp() will take it
  if it's an anonymous folio (re-use check)
- Drop the lock in hugetlb_no_page() after we have mapped the page in
  the pagetables
- hugetlb_wp() will take the lock IFF the folio is anonymous

This will lead to something like the following:

 diff --git a/mm/hugetlb.c b/mm/hugetlb.c
 index dfa09fc3b2c6..4d48cda8a56d 100644
 --- a/mm/hugetlb.c
 +++ b/mm/hugetlb.c
 @@ -6198,6 +6198,8 @@ static vm_fault_t hugetlb_wp(struct vm_fault *vmf)
  	 * in scenarios that used to work. As a side effect, there can still
  	 * be leaks between processes, for example, with FOLL_GET users.
  	 */
 +	if (folio_test_anon(old_folio))
 +		folio_lock(old_folio);
  	if (folio_mapcount(old_folio) == 1 && folio_test_anon(old_folio)) {
  		if (!PageAnonExclusive(&old_folio->page)) {
  			folio_move_anon_rmap(old_folio, vma);
 @@ -6212,6 +6214,8 @@ static vm_fault_t hugetlb_wp(struct vm_fault *vmf)
  	}
  	VM_BUG_ON_PAGE(folio_test_anon(old_folio) &&
  		       PageAnonExclusive(&old_folio->page), &old_folio->page);
 +	if (folio_test_anon(old_folio))
 +		folio_unlock(old_folio);
 
  	/*
  	 * If the process that created a MAP_PRIVATE mapping is about to perform
 @@ -6537,11 +6541,6 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
  			}
  			new_pagecache_folio = true;
  		} else {
 -			/*
 -			 * hugetlb_wp() expects the folio to be locked in order to
 -			 * check whether we can re-use this page exclusively for us.
 -			 */
 -			folio_lock(folio);
  			anon_rmap = 1;
  		}
  	} else {
 @@ -6558,7 +6557,8 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
 
  		/* Check for page in userfault range. */
  		if (userfaultfd_minor(vma)) {
 -			folio_unlock(folio);
 +			if (!anon_rmap)
 +				folio_unlock(folio);
  			folio_put(folio);
  			/* See comment in userfaultfd_missing() block above */
  			if (!hugetlb_pte_stable(h, mm, vmf->address, vmf->pte, vmf->orig_pte)) {
 @@ -6604,6 +6604,13 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
  		new_pte = huge_pte_mkuffd_wp(new_pte);
  	set_huge_pte_at(mm, vmf->address, vmf->pte, new_pte, huge_page_size(h));
 
 +	/*
 +	 * This folio cannot have been truncated since we were holding the lock,
 +	 * and we just mapped it into the pagetables. Drop the lock now.
 +	 */
 +	if (!anon_rmap)
 +		folio_unlock(folio);
 +
  	hugetlb_count_add(pages_per_huge_page(h), mm);
  	if ((vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
  		/* Optimization, do the COW without a second fault */
 @@ -6619,8 +6626,6 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
  	 */
  	if (new_folio)
  		folio_set_hugetlb_migratable(folio);
 -
 -	folio_unlock(folio);
  out:
  	hugetlb_vma_unlock_read(vma);
 
 @@ -6639,8 +6644,8 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
  backout_unlocked:
  	if (new_folio && !new_pagecache_folio)
  		restore_reserve_on_error(h, vma, vmf->address, folio);
 -
 -	folio_unlock(folio);
 +	if (!anon_rmap)
 +		folio_unlock(folio);
  	folio_put(folio);
  	goto out;
  }
 @@ -6805,21 +6810,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
  		/* Fallthrough to CoW */
  	}
 
 -	/*
 -	 * We need to lock the folio before calling hugetlb_wp().
 -	 * Either the folio is in the pagecache and we need to copy it over
 -	 * to another file, so it must remain stable throughout the operation,
 -	 * or the folio is anonymous and we need to lock it in order to check
 -	 * whether we can re-use it and mark it exclusive for this process.
 -	 * The timespan for the lock differs depending on the type, since
 -	 * anonymous folios only need to hold the lock while checking whether we
 -	 * can re-use it, while we need to hold it throughout the copy in case
 -	 * we are dealing with a folio from a pagecache.
 -	 * Representing this difference would be tricky with the current code,
 -	 * so just hold the lock for the duration of hugetlb_wp().
 -	 */
  	folio = page_folio(pte_page(vmf.orig_pte));
 -	folio_lock(folio);
  	folio_get(folio);
 
  	if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) {
 @@ -6835,7 +6826,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
  						flags & FAULT_FLAG_WRITE))
  		update_mmu_cache(vma, vmf.address, vmf.pte);
  out_put_page:
 -	folio_unlock(folio);
  	folio_put(folio);
  out_ptl:
  	spin_unlock(vmf.ptl);
  
This should be patch#2 with something like "Sorting out locking" per
title, and maybe explaining a bit more why the lock in hugelb_wp for
anonymous folios.

What do you think?

 

-- 
Oscar Salvador
SUSE Labs

next prev parent reply	other threads:[~2025-06-17 10:03 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-12 13:46 [PATCH 0/5] Misc rework on hugetlb_fault Oscar Salvador
2025-06-12 13:46 ` [PATCH 1/5] mm,hugetlb: Change mechanism to detect a COW on private mapping Oscar Salvador
2025-06-13 13:52   ` David Hildenbrand
2025-06-12 13:46 ` [PATCH 2/5] mm,hugetlb: Document the reason to lock the folio in the faulting path Oscar Salvador
2025-06-13 13:56   ` David Hildenbrand
2025-06-13 14:23     ` Oscar Salvador
2025-06-13 19:57       ` David Hildenbrand
2025-06-13 21:47         ` Oscar Salvador
2025-06-14  9:07           ` Oscar Salvador
2025-06-16  9:22             ` David Hildenbrand
2025-06-16 14:10               ` Oscar Salvador
2025-06-16 14:41                 ` David Hildenbrand
2025-06-17 10:03                   ` Oscar Salvador [this message]
2025-06-17 11:27                     ` David Hildenbrand
2025-06-17 12:04                       ` Oscar Salvador
2025-06-17 12:08                         ` David Hildenbrand
2025-06-17 12:10                           ` Oscar Salvador
2025-06-17 12:50                             ` Oscar Salvador
2025-06-17 13:42                               ` David Hildenbrand
2025-06-17 14:00                                 ` Oscar Salvador
2025-06-19 11:52                                 ` Oscar Salvador
2025-06-12 13:46 ` [PATCH 3/5] mm,hugetlb: Conver anon_rmap into boolean Oscar Salvador
2025-06-13 13:48   ` David Hildenbrand
2025-06-12 13:47 ` [PATCH 4/5] mm,hugetlb: Drop obsolete comment about non-present pte and second faults Oscar Salvador
2025-06-12 13:47 ` [PATCH 5/5] mm,hugetlb: Drop unlikelys from hugetlb_fault Oscar Salvador
2025-06-13  8:55 ` [PATCH 0/5] Misc rework on hugetlb_fault Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aFE9YTNcCHAGBtKi@localhost.localdomain \
    --to=osalvador@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=gavinguo@igalia.com \
    --cc=jthoughton@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).