Re: [PATCH 2/5] mm,hugetlb: Document the reason to lock the folio in the faulting path

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Oscar Salvador <osalvador@suse.de>
To: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Muchun Song <muchun.song@linux.dev>,
	James Houghton <jthoughton@google.com>,
	Peter Xu <peterx@redhat.com>, Gavin Guo <gavinguo@igalia.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/5] mm,hugetlb: Document the reason to lock the folio in the faulting path
Date: Tue, 17 Jun 2025 12:03:13 +0200	[thread overview]
Message-ID: <aFE9YTNcCHAGBtKi@localhost.localdomain> (raw)
In-Reply-To: <1297fdd5-3de2-45bc-b146-e14061643fee@redhat.com>

On Mon, Jun 16, 2025 at 04:41:20PM +0200, David Hildenbrand wrote:
> On 16.06.25 16:10, Oscar Salvador wrote:
> > What do you mean by stable?
> 
> The same "stable" you used in the doc, that I complained about ;)

Touche :-D

> > In the generic faulting path, we're not worried about the page going away
> > because we hold a reference, so I guess the lock must be to keep content stable?
> 
> What you want to avoid is IIRC, is someone doing a truncation/reclaim on the
> folio while you are mapping it.

Ok, I see. I thought it was more about holding writes, but this makes sense.

> Take a look at truncate_inode_pages_range() where we do a folio_lock()
> around truncate_inode_folio().
> 
> In other words, while you hold the folio lock (and verified that the folio
> was not truncated yet: for example, that folio->mapping is still set), you
> know that it cannot get truncated concurrently -- without holding other
> expensive locks.
> 
> Observe how truncate_cleanup_folio() calls
> 
> 	if (folio_mapped(folio))
> 		unmap_mapping_folio(folio);
> 
> To remove all page table mappings.
> 
> So while holding the folio lock, new page table mappings are not expected to
> appear (IIRC).

Ah ok, so it's more that we don't end up mapping something that's not there
anymore (or something completely different).

> > I mean, yes, after we have mapped the page privately into the pagetables,
> > we don't have business about content-integrity anymore, so given this rule, yes,
> > I guess hugetlb_wp() wouldn't need the lock (for !anonymous) because we already
> > have mapped it privately at that point.
> 
> That's my understanding. And while holding the PTL it cannot get unmapped.
> Whenever you temporarily drop the PTL, you have to do a pte_same() check to
> make sure concurrent truncation didn't happen.

Yap, hugetlb_wp() drops the locks temporarily when it needs to unmap the private
page from other processes, but then does the pte_same() check.

> So far my understanding at least of common filemap code.
> 
> > 
> > But there's something I don't fully understand and makes me feel uneasy.
> > If the lock in the generic faultin path is to keep content stable till we
> > have mapped it privately, wouldn't be more correct to also hold it
> > during the copy in hugetlb_wp, to kinda emulate that?
> As long there us a page table mapping, it cannot get truncated. So if you
> find a PTE under PTL that maps that folio, truncation could not have
> happened.

I see, this makes a lot of sense, thanks for walking me through David!
Alright, then, with all this clear now we should:

- Not take any locks on hugetlb_fault()->hugetlb_wp(), hugetlb_wp() will take it
  if it's an anonymous folio (re-use check)
- Drop the lock in hugetlb_no_page() after we have mapped the page in
  the pagetables
- hugetlb_wp() will take the lock IFF the folio is anonymous

This will lead to something like the following:

 diff --git a/mm/hugetlb.c b/mm/hugetlb.c
 index dfa09fc3b2c6..4d48cda8a56d 100644
 --- a/mm/hugetlb.c
 +++ b/mm/hugetlb.c
 @@ -6198,6 +6198,8 @@ static vm_fault_t hugetlb_wp(struct vm_fault *vmf)
  	 * in scenarios that used to work. As a side effect, there can still
  	 * be leaks between processes, for example, with FOLL_GET users.
  	 */
 +	if (folio_test_anon(old_folio))
 +		folio_lock(old_folio);
  	if (folio_mapcount(old_folio) == 1 && folio_test_anon(old_folio)) {
  		if (!PageAnonExclusive(&old_folio->page)) {
  			folio_move_anon_rmap(old_folio, vma);
 @@ -6212,6 +6214,8 @@ static vm_fault_t hugetlb_wp(struct vm_fault *vmf)
  	}
  	VM_BUG_ON_PAGE(folio_test_anon(old_folio) &&
  		       PageAnonExclusive(&old_folio->page), &old_folio->page);
 +	if (folio_test_anon(old_folio))
 +		folio_unlock(old_folio);
 
  	/*
  	 * If the process that created a MAP_PRIVATE mapping is about to perform
 @@ -6537,11 +6541,6 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
  			}
  			new_pagecache_folio = true;
  		} else {
 -			/*
 -			 * hugetlb_wp() expects the folio to be locked in order to
 -			 * check whether we can re-use this page exclusively for us.
 -			 */
 -			folio_lock(folio);
  			anon_rmap = 1;
  		}
  	} else {
 @@ -6558,7 +6557,8 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
 
  		/* Check for page in userfault range. */
  		if (userfaultfd_minor(vma)) {
 -			folio_unlock(folio);
 +			if (!anon_rmap)
 +				folio_unlock(folio);
  			folio_put(folio);
  			/* See comment in userfaultfd_missing() block above */
  			if (!hugetlb_pte_stable(h, mm, vmf->address, vmf->pte, vmf->orig_pte)) {
 @@ -6604,6 +6604,13 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
  		new_pte = huge_pte_mkuffd_wp(new_pte);
  	set_huge_pte_at(mm, vmf->address, vmf->pte, new_pte, huge_page_size(h));
 
 +	/*
 +	 * This folio cannot have been truncated since we were holding the lock,
 +	 * and we just mapped it into the pagetables. Drop the lock now.
 +	 */
 +	if (!anon_rmap)
 +		folio_unlock(folio);
 +
  	hugetlb_count_add(pages_per_huge_page(h), mm);
  	if ((vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
  		/* Optimization, do the COW without a second fault */
 @@ -6619,8 +6626,6 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
  	 */
  	if (new_folio)
  		folio_set_hugetlb_migratable(folio);
 -
 -	folio_unlock(folio);
  out:
  	hugetlb_vma_unlock_read(vma);
 
 @@ -6639,8 +6644,8 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
  backout_unlocked:
  	if (new_folio && !new_pagecache_folio)
  		restore_reserve_on_error(h, vma, vmf->address, folio);
 -
 -	folio_unlock(folio);
 +	if (!anon_rmap)
 +		folio_unlock(folio);
  	folio_put(folio);
  	goto out;
  }
 @@ -6805,21 +6810,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
  		/* Fallthrough to CoW */
  	}
 
 -	/*
 -	 * We need to lock the folio before calling hugetlb_wp().
 -	 * Either the folio is in the pagecache and we need to copy it over
 -	 * to another file, so it must remain stable throughout the operation,
 -	 * or the folio is anonymous and we need to lock it in order to check
 -	 * whether we can re-use it and mark it exclusive for this process.
 -	 * The timespan for the lock differs depending on the type, since
 -	 * anonymous folios only need to hold the lock while checking whether we
 -	 * can re-use it, while we need to hold it throughout the copy in case
 -	 * we are dealing with a folio from a pagecache.
 -	 * Representing this difference would be tricky with the current code,
 -	 * so just hold the lock for the duration of hugetlb_wp().
 -	 */
  	folio = page_folio(pte_page(vmf.orig_pte));
 -	folio_lock(folio);
  	folio_get(folio);
 
  	if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) {
 @@ -6835,7 +6826,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
  						flags & FAULT_FLAG_WRITE))
  		update_mmu_cache(vma, vmf.address, vmf.pte);
  out_put_page:
 -	folio_unlock(folio);
  	folio_put(folio);
  out_ptl:
  	spin_unlock(vmf.ptl);
  
This should be patch#2 with something like "Sorting out locking" per
title, and maybe explaining a bit more why the lock in hugelb_wp for
anonymous folios.

What do you think?

 

-- 
Oscar Salvador
SUSE Labs

next prev parent reply	other threads:[~2025-06-17 10:03 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-12 13:46 [PATCH 0/5] Misc rework on hugetlb_fault Oscar Salvador
2025-06-12 13:46 ` [PATCH 1/5] mm,hugetlb: Change mechanism to detect a COW on private mapping Oscar Salvador
2025-06-13 13:52   ` David Hildenbrand
2025-06-12 13:46 ` [PATCH 2/5] mm,hugetlb: Document the reason to lock the folio in the faulting path Oscar Salvador
2025-06-13 13:56   ` David Hildenbrand
2025-06-13 14:23     ` Oscar Salvador
2025-06-13 19:57       ` David Hildenbrand
2025-06-13 21:47         ` Oscar Salvador
2025-06-14  9:07           ` Oscar Salvador
2025-06-16  9:22             ` David Hildenbrand
2025-06-16 14:10               ` Oscar Salvador
2025-06-16 14:41                 ` David Hildenbrand
2025-06-17 10:03                   ` Oscar Salvador [this message]
2025-06-17 11:27                     ` David Hildenbrand
2025-06-17 12:04                       ` Oscar Salvador
2025-06-17 12:08                         ` David Hildenbrand
2025-06-17 12:10                           ` Oscar Salvador
2025-06-17 12:50                             ` Oscar Salvador
2025-06-17 13:42                               ` David Hildenbrand
2025-06-17 14:00                                 ` Oscar Salvador
2025-06-19 11:52                                 ` Oscar Salvador
2025-06-12 13:46 ` [PATCH 3/5] mm,hugetlb: Conver anon_rmap into boolean Oscar Salvador
2025-06-13 13:48   ` David Hildenbrand
2025-06-12 13:47 ` [PATCH 4/5] mm,hugetlb: Drop obsolete comment about non-present pte and second faults Oscar Salvador
2025-06-12 13:47 ` [PATCH 5/5] mm,hugetlb: Drop unlikelys from hugetlb_fault Oscar Salvador
2025-06-13  8:55 ` [PATCH 0/5] Misc rework on hugetlb_fault Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aFE9YTNcCHAGBtKi@localhost.localdomain \
    --to=osalvador@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=gavinguo@igalia.com \
    --cc=jthoughton@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.