[PATCH v2 0/5] Misc rework on hugetlb faulting path

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/5] Misc rework on hugetlb faulting path
@ 2025-06-20 12:30 Oscar Salvador
  2025-06-20 12:30 ` [PATCH v2 1/5] mm,hugetlb: Change mechanism to detect a COW on private mapping Oscar Salvador
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Oscar Salvador @ 2025-06-20 12:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Muchun Song, Peter Xu, Gavin Guo, linux-mm,
	linux-kernel, Oscar Salvador

 v1 -> v2:
   - Addressed feedback from David
   - Settle ideas wrt. locking in faulting path after
     discussion with David
   - Add Acks-by

 RFC -> v1:
   - Stop looking up folio in the pagecache for detecting a COW
     on a private mapping.
   - Document the locks

This patchset aims to give some love to the hugetlb faulting path, doing so
by removing obsolete comments that are no longer true, sorting out the folio
lock, and changing the mechanism we use to determine whether we are COWing a
private mapping already.

The most important patch of the series is #1, as it fixes a deadlock that
was described in [1], where two processes were holding the same lock
for the folio in the pagecache, and then deadlocked in the mutex.
Looking up and locking the folio in the pagecache was done to check whether
that folio was the same folio we had mapped in our pagetables, meaning that if it
was different we knew that we already mapped that folio privately, so any
further CoW would be made on a private mapping, which lead us to the  question:
 __Was the reservation for that address consumed?__
That is all we care about, because if it was indeed consumed and we are the
owner and we cannot allocate more folios, we need to unmap the folio from the
processes pagetables and make it exclusive for us.

We figured we do not need to look up the folio at all, and it is just enough to
check whether the folio we have mapped is anonymous, which means we mapped it
privately, so the reservation was indeed consumed.

Patch#2 sorts out folio locking in the faulting path, reducing the scope of it
,only taking it when we are dealing with an anonymous folio and document it.
More details in the patch.

Patch#3-5 are cleanups.

[1] https://lore.kernel.org/lkml/20250513093448.592150-1-gavinguo@igalia.com/

Oscar Salvador (5):
  mm,hugetlb: Change mechanism to detect a COW on private mapping
  mm,hugetlb: Sort out folio locking in the faulting path
  mm,hugetlb: Rename anon_rmap to new_anon_folio and make it boolean
  mm,hugetlb: Drop obsolete comment about non-present pte and second
    faults
  mm,hugetlb: Drop unlikelys from hugetlb_fault

 mm/hugetlb.c | 130 ++++++++++++++++++++++-----------------------------
 1 file changed, 55 insertions(+), 75 deletions(-)

-- 
2.50.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/5] mm,hugetlb: Change mechanism to detect a COW on private mapping
  2025-06-20 12:30 [PATCH v2 0/5] Misc rework on hugetlb faulting path Oscar Salvador
@ 2025-06-20 12:30 ` Oscar Salvador
  2025-06-23 14:09   ` David Hildenbrand
  2025-06-20 12:30 ` [PATCH v2 2/5] mm,hugetlb: Sort out folio locking in the faulting path Oscar Salvador
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Oscar Salvador @ 2025-06-20 12:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Muchun Song, Peter Xu, Gavin Guo, linux-mm,
	linux-kernel, Oscar Salvador

hugetlb_wp() checks whether the process is trying to COW on a private mapping
in order to know whether the reservation for that address was already consumed
or not.
If it was consumed and we are the ownner of the mapping, the folio will have to
be unmapped from the other processes.

Currently, that check is done by looking up the folio in the pagecache and
compare it to the folio which is mapped in our pagetables.
If it differs, it means we already mapped it privately before, consuming a
reservation on the way.
All we are interested in is whether the mapped folio is anonymous, so we can
simplify and check for that instead.

Also, we transition from a trylock to a folio_lock, since the former was only
needed when hugetlb_fault() had to lock both folios, in order to avoid deadlock.

Fixes: 40549ba8f8e0 ("hugetlb: use new vma_lock for pmd sharing synchronization")
Reported-by: Gavin Guo <gavinguo@igalia.com>
Closes: https://lore.kernel.org/lkml/20250513093448.592150-1-gavinguo@igalia.com/
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Suggested-by: Peter Xu <peterx@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
---
 mm/hugetlb.c | 70 +++++++++++++---------------------------------------
 1 file changed, 17 insertions(+), 53 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8746ed2fec13..175edafeec67 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6152,8 +6152,7 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
  * cannot race with other handlers or page migration.
  * Keep the pte_same checks anyway to make transition from the mutex easier.
  */
-static vm_fault_t hugetlb_wp(struct folio *pagecache_folio,
-		       struct vm_fault *vmf)
+static vm_fault_t hugetlb_wp(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct mm_struct *mm = vma->vm_mm;
@@ -6215,16 +6214,17 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio,
 		       PageAnonExclusive(&old_folio->page), &old_folio->page);
 
 	/*
-	 * If the process that created a MAP_PRIVATE mapping is about to
-	 * perform a COW due to a shared page count, attempt to satisfy
-	 * the allocation without using the existing reserves. The pagecache
-	 * page is used to determine if the reserve at this address was
-	 * consumed or not. If reserves were used, a partial faulted mapping
-	 * at the time of fork() could consume its reserves on COW instead
-	 * of the full address range.
+	 * If the process that created a MAP_PRIVATE mapping is about to perform
+	 * a COW due to a shared page count, attempt to satisfy the allocation
+	 * without using the existing reserves.
+	 * In order to determine where this is a COW on a MAP_PRIVATE mapping it
+	 * is enough to check whether the old_folio is anonymous. This means that
+	 * the reserve for this address was consumed. If reserves were used, a
+	 * partial faulted mapping at the fime of fork() could consume its reserves
+	 * on COW instead of the full address range.
 	 */
 	if (is_vma_resv_set(vma, HPAGE_RESV_OWNER) &&
-			old_folio != pagecache_folio)
+	    folio_test_anon(old_folio))
 		cow_from_owner = true;
 
 	folio_get(old_folio);
@@ -6603,7 +6603,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
 	hugetlb_count_add(pages_per_huge_page(h), mm);
 	if ((vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
 		/* Optimization, do the COW without a second fault */
-		ret = hugetlb_wp(folio, vmf);
+		ret = hugetlb_wp(vmf);
 	}
 
 	spin_unlock(vmf->ptl);
@@ -6670,11 +6670,9 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 {
 	vm_fault_t ret;
 	u32 hash;
-	struct folio *folio = NULL;
-	struct folio *pagecache_folio = NULL;
+	struct folio *folio;
 	struct hstate *h = hstate_vma(vma);
 	struct address_space *mapping;
-	int need_wait_lock = 0;
 	struct vm_fault vmf = {
 		.vma = vma,
 		.address = address & huge_page_mask(h),
@@ -6769,8 +6767,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * If we are going to COW/unshare the mapping later, we examine the
 	 * pending reservations for this page now. This will ensure that any
 	 * allocations necessary to record that reservation occur outside the
-	 * spinlock. Also lookup the pagecache page now as it is used to
-	 * determine if a reservation has been consumed.
+	 * spinlock.
 	 */
 	if ((flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) &&
 	    !(vma->vm_flags & VM_MAYSHARE) && !huge_pte_write(vmf.orig_pte)) {
@@ -6780,11 +6777,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 		}
 		/* Just decrements count, does not deallocate */
 		vma_end_reservation(h, vma, vmf.address);
-
-		pagecache_folio = filemap_lock_hugetlb_folio(h, mapping,
-							     vmf.pgoff);
-		if (IS_ERR(pagecache_folio))
-			pagecache_folio = NULL;
 	}
 
 	vmf.ptl = huge_pte_lock(h, mm, vmf.pte);
@@ -6798,10 +6790,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	    (flags & FAULT_FLAG_WRITE) && !huge_pte_write(vmf.orig_pte)) {
 		if (!userfaultfd_wp_async(vma)) {
 			spin_unlock(vmf.ptl);
-			if (pagecache_folio) {
-				folio_unlock(pagecache_folio);
-				folio_put(pagecache_folio);
-			}
 			hugetlb_vma_unlock_read(vma);
 			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
 			return handle_userfault(&vmf, VM_UFFD_WP);
@@ -6813,23 +6801,14 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 		/* Fallthrough to CoW */
 	}
 
-	/*
-	 * hugetlb_wp() requires page locks of pte_page(vmf.orig_pte) and
-	 * pagecache_folio, so here we need take the former one
-	 * when folio != pagecache_folio or !pagecache_folio.
-	 */
+	/* hugetlb_wp() requires page locks of pte_page(vmf.orig_pte) */
 	folio = page_folio(pte_page(vmf.orig_pte));
-	if (folio != pagecache_folio)
-		if (!folio_trylock(folio)) {
-			need_wait_lock = 1;
-			goto out_ptl;
-		}
-
+	folio_lock(folio);
 	folio_get(folio);
 
 	if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) {
 		if (!huge_pte_write(vmf.orig_pte)) {
-			ret = hugetlb_wp(pagecache_folio, &vmf);
+			ret = hugetlb_wp(&vmf);
 			goto out_put_page;
 		} else if (likely(flags & FAULT_FLAG_WRITE)) {
 			vmf.orig_pte = huge_pte_mkdirty(vmf.orig_pte);
@@ -6840,16 +6819,10 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 						flags & FAULT_FLAG_WRITE))
 		update_mmu_cache(vma, vmf.address, vmf.pte);
 out_put_page:
-	if (folio != pagecache_folio)
-		folio_unlock(folio);
+	folio_unlock(folio);
 	folio_put(folio);
 out_ptl:
 	spin_unlock(vmf.ptl);
-
-	if (pagecache_folio) {
-		folio_unlock(pagecache_folio);
-		folio_put(pagecache_folio);
-	}
 out_mutex:
 	hugetlb_vma_unlock_read(vma);
 
@@ -6861,15 +6834,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 		vma_end_read(vma);
 
 	mutex_unlock(&hugetlb_fault_mutex_table[hash]);
-	/*
-	 * Generally it's safe to hold refcount during waiting page lock. But
-	 * here we just wait to defer the next page fault to avoid busy loop and
-	 * the page is not used after unlocked before returning from the current
-	 * page fault. So we are safe from accessing freed page, even if we wait
-	 * here without taking refcount.
-	 */
-	if (need_wait_lock)
-		folio_wait_locked(folio);
 	return ret;
 }
 
-- 
2.50.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/5] mm,hugetlb: Sort out folio locking in the faulting path
  2025-06-20 12:30 [PATCH v2 0/5] Misc rework on hugetlb faulting path Oscar Salvador
  2025-06-20 12:30 ` [PATCH v2 1/5] mm,hugetlb: Change mechanism to detect a COW on private mapping Oscar Salvador
@ 2025-06-20 12:30 ` Oscar Salvador
  2025-06-23 14:11   ` David Hildenbrand
  2025-06-20 12:30 ` [PATCH v2 3/5] mm,hugetlb: Rename anon_rmap to new_anon_folio and make it boolean Oscar Salvador
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Oscar Salvador @ 2025-06-20 12:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Muchun Song, Peter Xu, Gavin Guo, linux-mm,
	linux-kernel, Oscar Salvador

Recent conversations showed that there was a misunderstanding about why we
were locking the folio prior to call in hugetlb_wp().
In fact, as soon as we have the folio mapped into the pagetables, we no longer
need to hold it locked, because we know that no concurrent truncation could have
happened.
There is only one case where the folio needs to be locked, and that is when we
are handling an anonymous folio, because hugetlb_wp() will check whether it can
re-use it exclusively for the process that is faulting it in.

So, pass the folio locked to hugetlb_wp() when that is the case.

Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/hugetlb.c | 43 +++++++++++++++++++++++++++++++++----------
 1 file changed, 33 insertions(+), 10 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 175edafeec67..1a5f713c1e4c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6437,6 +6437,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
 	pte_t new_pte;
 	bool new_folio, new_pagecache_folio = false;
 	u32 hash = hugetlb_fault_mutex_hash(mapping, vmf->pgoff);
+	bool folio_locked = true;
 
 	/*
 	 * Currently, we are forced to kill the process in the event the
@@ -6602,6 +6603,11 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
 
 	hugetlb_count_add(pages_per_huge_page(h), mm);
 	if ((vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
+		/* No need to lock file folios. See comment in hugetlb_fault() */
+		if (!anon_rmap) {
+			folio_locked = false;
+			folio_unlock(folio);
+		}
 		/* Optimization, do the COW without a second fault */
 		ret = hugetlb_wp(vmf);
 	}
@@ -6616,7 +6622,8 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
 	if (new_folio)
 		folio_set_hugetlb_migratable(folio);
 
-	folio_unlock(folio);
+	if (folio_locked)
+		folio_unlock(folio);
 out:
 	hugetlb_vma_unlock_read(vma);
 
@@ -6636,7 +6643,8 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
 	if (new_folio && !new_pagecache_folio)
 		restore_reserve_on_error(h, vma, vmf->address, folio);
 
-	folio_unlock(folio);
+	if (folio_locked)
+		folio_unlock(folio);
 	folio_put(folio);
 	goto out;
 }
@@ -6670,7 +6678,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 {
 	vm_fault_t ret;
 	u32 hash;
-	struct folio *folio;
+	struct folio *folio = NULL;
 	struct hstate *h = hstate_vma(vma);
 	struct address_space *mapping;
 	struct vm_fault vmf = {
@@ -6687,6 +6695,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 		 * be hard to debug if called functions make assumptions
 		 */
 	};
+	bool folio_locked = false;
 
 	/*
 	 * Serialize hugepage allocation and instantiation, so that we don't
@@ -6801,13 +6810,24 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 		/* Fallthrough to CoW */
 	}
 
-	/* hugetlb_wp() requires page locks of pte_page(vmf.orig_pte) */
-	folio = page_folio(pte_page(vmf.orig_pte));
-	folio_lock(folio);
-	folio_get(folio);
-
 	if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) {
 		if (!huge_pte_write(vmf.orig_pte)) {
+			/*
+			 * Anonymous folios need to be lock since hugetlb_wp()
+			 * checks whether we can re-use the folio exclusively
+			 * for us in case we are the only user of it.
+			 */
+			folio = page_folio(pte_page(vmf.orig_pte));
+			folio_get(folio);
+			if (folio_test_anon(folio)) {
+				spin_unlock(vmf.ptl);
+				folio_lock(folio);
+				folio_locked = true;
+				spin_lock(vmf.ptl);
+				if (unlikely(!pte_same(vmf.orig_pte, huge_ptep_get(mm,
+						   vmf.address, vmf.pte))))
+					goto out_put_page;
+			}
 			ret = hugetlb_wp(&vmf);
 			goto out_put_page;
 		} else if (likely(flags & FAULT_FLAG_WRITE)) {
@@ -6819,8 +6839,11 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 						flags & FAULT_FLAG_WRITE))
 		update_mmu_cache(vma, vmf.address, vmf.pte);
 out_put_page:
-	folio_unlock(folio);
-	folio_put(folio);
+	if (folio) {
+		if (folio_locked)
+			folio_unlock(folio);
+		folio_put(folio);
+	}
 out_ptl:
 	spin_unlock(vmf.ptl);
 out_mutex:
-- 
2.50.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/5] mm,hugetlb: Rename anon_rmap to new_anon_folio and make it boolean
  2025-06-20 12:30 [PATCH v2 0/5] Misc rework on hugetlb faulting path Oscar Salvador
  2025-06-20 12:30 ` [PATCH v2 1/5] mm,hugetlb: Change mechanism to detect a COW on private mapping Oscar Salvador
  2025-06-20 12:30 ` [PATCH v2 2/5] mm,hugetlb: Sort out folio locking in the faulting path Oscar Salvador
@ 2025-06-20 12:30 ` Oscar Salvador
  2025-06-20 14:28   ` Oscar Salvador
  2025-06-23 14:13   ` David Hildenbrand
  2025-06-20 12:30 ` [PATCH v2 4/5] mm,hugetlb: Drop obsolete comment about non-present pte and second faults Oscar Salvador
  2025-06-20 12:30 ` [PATCH v2 5/5] mm,hugetlb: Drop unlikelys from hugetlb_fault Oscar Salvador
  4 siblings, 2 replies; 15+ messages in thread
From: Oscar Salvador @ 2025-06-20 12:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Muchun Song, Peter Xu, Gavin Guo, linux-mm,
	linux-kernel, Oscar Salvador

anon_rmap is used to determine whether the new allocated folio is anonymous.
Rename it to something more meaningul like new_anon_folio and make it boolean,
as we use it like that.
While we are at it, drop 'new_pagecache_folio' as 'new_anon_folio' is enough to
check whether we need to restore the consumed reservation.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/hugetlb.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1a5f713c1e4c..57bb8b2dce21 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6427,17 +6427,16 @@ static bool hugetlb_pte_stable(struct hstate *h, struct mm_struct *mm, unsigned
 static vm_fault_t hugetlb_no_page(struct address_space *mapping,
 			struct vm_fault *vmf)
 {
+	u32 hash = hugetlb_fault_mutex_hash(mapping, vmf->pgoff);
+	bool new_folio, new_anon_folio = false;
 	struct vm_area_struct *vma = vmf->vma;
 	struct mm_struct *mm = vma->vm_mm;
 	struct hstate *h = hstate_vma(vma);
 	vm_fault_t ret = VM_FAULT_SIGBUS;
-	int anon_rmap = 0;
-	unsigned long size;
+	bool folio_locked = true;
 	struct folio *folio;
+	unsigned long size;
 	pte_t new_pte;
-	bool new_folio, new_pagecache_folio = false;
-	u32 hash = hugetlb_fault_mutex_hash(mapping, vmf->pgoff);
-	bool folio_locked = true;
 
 	/*
 	 * Currently, we are forced to kill the process in the event the
@@ -6518,6 +6517,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
 		folio_zero_user(folio, vmf->real_address);
 		__folio_mark_uptodate(folio);
 		new_folio = true;
+		new_anon_folio = !(vma->vm_flags & VM_MAYSHARE);
 
 		if (vma->vm_flags & VM_MAYSHARE) {
 			int err = hugetlb_add_to_page_cache(folio, mapping,
@@ -6536,10 +6536,8 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
 				ret = VM_FAULT_SIGBUS;
 				goto out;
 			}
-			new_pagecache_folio = true;
 		} else {
 			folio_lock(folio);
-			anon_rmap = 1;
 		}
 	} else {
 		/*
@@ -6588,7 +6586,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
 	if (!pte_same(huge_ptep_get(mm, vmf->address, vmf->pte), vmf->orig_pte))
 		goto backout;
 
-	if (anon_rmap)
+	if (new_anon_folio)
 		hugetlb_add_new_anon_rmap(folio, vma, vmf->address);
 	else
 		hugetlb_add_file_rmap(folio);
@@ -6604,7 +6602,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
 	hugetlb_count_add(pages_per_huge_page(h), mm);
 	if ((vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
 		/* No need to lock file folios. See comment in hugetlb_fault() */
-		if (!anon_rmap) {
+		if (!new_anon_folio) {
 			folio_locked = false;
 			folio_unlock(folio);
 		}
@@ -6640,7 +6638,8 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
 backout:
 	spin_unlock(vmf->ptl);
 backout_unlocked:
-	if (new_folio && !new_pagecache_folio)
+	/* We only need to restore reservations for private mappings */
+	if (new_folio && new_anon_folio)
 		restore_reserve_on_error(h, vma, vmf->address, folio);
 
 	if (folio_locked)
-- 
2.50.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/5] mm,hugetlb: Drop obsolete comment about non-present pte and second faults
  2025-06-20 12:30 [PATCH v2 0/5] Misc rework on hugetlb faulting path Oscar Salvador
                   ` (2 preceding siblings ...)
  2025-06-20 12:30 ` [PATCH v2 3/5] mm,hugetlb: Rename anon_rmap to new_anon_folio and make it boolean Oscar Salvador
@ 2025-06-20 12:30 ` Oscar Salvador
  2025-06-20 12:30 ` [PATCH v2 5/5] mm,hugetlb: Drop unlikelys from hugetlb_fault Oscar Salvador
  4 siblings, 0 replies; 15+ messages in thread
From: Oscar Salvador @ 2025-06-20 12:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Muchun Song, Peter Xu, Gavin Guo, linux-mm,
	linux-kernel, Oscar Salvador

There is a comment in hugetlb_fault() that does not hold anymore.
This one:

 /*
  * vmf.orig_pte could be a migration/hwpoison vmf.orig_pte at this
  * point, so this check prevents the kernel from going below assuming
  * that we have an active hugepage in pagecache. This goto expects
  * the 2nd page fault, and is_hugetlb_entry_(migration|hwpoisoned)
  * check will properly handle it.
  */

This was written because back in the day we used to do:

 hugetlb_fault () {
  ptep = huge_pte_offset(...)
  if (ptep) {
    entry = huge_ptep_get(ptep)
    if (unlikely(is_hugetlb_entry_migration(entry))
        ...
    else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
        ...
  }

  ...
  ...

  /*
   * entry could be a migration/hwpoison entry at this point, so this
   * check prevents the kernel from going below assuming that we have
   * a active hugepage in pagecache. This goto expects the 2nd page fault,
   * and is_hugetlb_entry_(migration|hwpoisoned) check will properly
   * handle it.
   */
  if (!pte_present(entry))
          goto out_mutex;
  ...
 }

The code was designed to check for hwpoisoned/migration entries upfront,
and then bail out if further down the pte was not present anymore,
relying on the second fault to properly handle migration/hwpoison entries that
time around.

The way we handle this is different nowadays, so drop the misleading comment.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/hugetlb.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 57bb8b2dce21..868ee8ed45c0 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6746,13 +6746,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	ret = 0;
 
-	/*
-	 * vmf.orig_pte could be a migration/hwpoison vmf.orig_pte at this
-	 * point, so this check prevents the kernel from going below assuming
-	 * that we have an active hugepage in pagecache. This goto expects
-	 * the 2nd page fault, and is_hugetlb_entry_(migration|hwpoisoned)
-	 * check will properly handle it.
-	 */
+	/* Not present, either a migration or a hwpoisoned entry */
 	if (!pte_present(vmf.orig_pte)) {
 		if (unlikely(is_hugetlb_entry_migration(vmf.orig_pte))) {
 			/*
-- 
2.50.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 5/5] mm,hugetlb: Drop unlikelys from hugetlb_fault
  2025-06-20 12:30 [PATCH v2 0/5] Misc rework on hugetlb faulting path Oscar Salvador
                   ` (3 preceding siblings ...)
  2025-06-20 12:30 ` [PATCH v2 4/5] mm,hugetlb: Drop obsolete comment about non-present pte and second faults Oscar Salvador
@ 2025-06-20 12:30 ` Oscar Salvador
  4 siblings, 0 replies; 15+ messages in thread
From: Oscar Salvador @ 2025-06-20 12:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Muchun Song, Peter Xu, Gavin Guo, linux-mm,
	linux-kernel, Oscar Salvador

The unlikely predates an era where we were checking for hwpoisoned/migration
entries prior to checking whether the pte was present.

Currently, we check for the pte to be a migration/hwpoison entry after we
have checked that is not present, so it must be either one or the other.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/hugetlb.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 868ee8ed45c0..1ef554b6934a 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6748,7 +6748,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	/* Not present, either a migration or a hwpoisoned entry */
 	if (!pte_present(vmf.orig_pte)) {
-		if (unlikely(is_hugetlb_entry_migration(vmf.orig_pte))) {
+		if (is_hugetlb_entry_migration(vmf.orig_pte)) {
 			/*
 			 * Release the hugetlb fault lock now, but retain
 			 * the vma lock, because it is needed to guard the
@@ -6759,7 +6759,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
 			migration_entry_wait_huge(vma, vmf.address, vmf.pte);
 			return 0;
-		} else if (unlikely(is_hugetlb_entry_hwpoisoned(vmf.orig_pte)))
+		} else if (is_hugetlb_entry_hwpoisoned(vmf.orig_pte))
 			ret = VM_FAULT_HWPOISON_LARGE |
 			    VM_FAULT_SET_HINDEX(hstate_index(h));
 		goto out_mutex;
-- 
2.50.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/5] mm,hugetlb: Rename anon_rmap to new_anon_folio and make it boolean
  2025-06-20 12:30 ` [PATCH v2 3/5] mm,hugetlb: Rename anon_rmap to new_anon_folio and make it boolean Oscar Salvador
@ 2025-06-20 14:28   ` Oscar Salvador
  2025-06-23 14:13   ` David Hildenbrand
  1 sibling, 0 replies; 15+ messages in thread
From: Oscar Salvador @ 2025-06-20 14:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Muchun Song, Peter Xu, Gavin Guo, linux-mm,
	linux-kernel

On Fri, Jun 20, 2025 at 02:30:12PM +0200, Oscar Salvador wrote:
> anon_rmap is used to determine whether the new allocated folio is anonymous.
> Rename it to something more meaningul like new_anon_folio and make it boolean,
> as we use it like that.
> While we are at it, drop 'new_pagecache_folio' as 'new_anon_folio' is enough to
> check whether we need to restore the consumed reservation.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  mm/hugetlb.c | 19 +++++++++----------
>  1 file changed, 9 insertions(+), 10 deletions(-)
[...]
> @@ -6518,6 +6517,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
>  		folio_zero_user(folio, vmf->real_address);
>  		__folio_mark_uptodate(folio);
>  		new_folio = true;
> +		new_anon_folio = !(vma->vm_flags & VM_MAYSHARE);

>  
>  		if (vma->vm_flags & VM_MAYSHARE) {
>  			int err = hugetlb_add_to_page_cache(folio, mapping,
> @@ -6536,10 +6536,8 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
>  				ret = VM_FAULT_SIGBUS;
>  				goto out;
>  			}
> -			new_pagecache_folio = true;
>  		} else {
>  			folio_lock(folio);
> -			anon_rmap = 1;

Let's just on top:

 diff --git a/mm/hugetlb.c b/mm/hugetlb.c
 index 57bb8b2dce21..f6ea1864ce5c 100644
 --- a/mm/hugetlb.c
 +++ b/mm/hugetlb.c
 @@ -6517,7 +6517,6 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
                 folio_zero_user(folio, vmf->real_address);
                 __folio_mark_uptodate(folio);
                 new_folio = true;
 -               new_anon_folio = !(vma->vm_flags & VM_MAYSHARE);
 
                 if (vma->vm_flags & VM_MAYSHARE) {
                         int err = hugetlb_add_to_page_cache(folio, mapping,
 @@ -6537,6 +6536,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
                                 goto out;
                         }
                 } else {
 +                       new_anon_folio = true;
                         folio_lock(folio);
                 }
         } else {

which is more explicit :-)

 

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/5] mm,hugetlb: Change mechanism to detect a COW on private mapping
  2025-06-20 12:30 ` [PATCH v2 1/5] mm,hugetlb: Change mechanism to detect a COW on private mapping Oscar Salvador
@ 2025-06-23 14:09   ` David Hildenbrand
  2025-06-25  7:49     ` Oscar Salvador
  0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2025-06-23 14:09 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: Muchun Song, Peter Xu, Gavin Guo, linux-mm, linux-kernel


>   
> -	/*
> -	 * hugetlb_wp() requires page locks of pte_page(vmf.orig_pte) and
> -	 * pagecache_folio, so here we need take the former one
> -	 * when folio != pagecache_folio or !pagecache_folio.
> -	 */
> +	/* hugetlb_wp() requires page locks of pte_page(vmf.orig_pte) */
>   	folio = page_folio(pte_page(vmf.orig_pte));
> -	if (folio != pagecache_folio)
> -		if (!folio_trylock(folio)) {
> -			need_wait_lock = 1;
> -			goto out_ptl;
> -		}
> -
> +	folio_lock(folio);
>   	folio_get(folio);

Just realized that this won't work for this patch here, as we are 
holding the PTL.

In patch #2 you do the right thing.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/5] mm,hugetlb: Sort out folio locking in the faulting path
  2025-06-20 12:30 ` [PATCH v2 2/5] mm,hugetlb: Sort out folio locking in the faulting path Oscar Salvador
@ 2025-06-23 14:11   ` David Hildenbrand
  2025-06-25  7:47     ` Oscar Salvador
  0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2025-06-23 14:11 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: Muchun Song, Peter Xu, Gavin Guo, linux-mm, linux-kernel

On 20.06.25 14:30, Oscar Salvador wrote:
> Recent conversations showed that there was a misunderstanding about why we
> were locking the folio prior to call in hugetlb_wp().
> In fact, as soon as we have the folio mapped into the pagetables, we no longer
> need to hold it locked, because we know that no concurrent truncation could have
> happened.
> There is only one case where the folio needs to be locked, and that is when we
> are handling an anonymous folio, because hugetlb_wp() will check whether it can
> re-use it exclusively for the process that is faulting it in.
> 
> So, pass the folio locked to hugetlb_wp() when that is the case.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>   mm/hugetlb.c | 43 +++++++++++++++++++++++++++++++++----------
>   1 file changed, 33 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 175edafeec67..1a5f713c1e4c 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6437,6 +6437,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
>   	pte_t new_pte;
>   	bool new_folio, new_pagecache_folio = false;
>   	u32 hash = hugetlb_fault_mutex_hash(mapping, vmf->pgoff);
> +	bool folio_locked = true;
>   
>   	/*
>   	 * Currently, we are forced to kill the process in the event the
> @@ -6602,6 +6603,11 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
>   
>   	hugetlb_count_add(pages_per_huge_page(h), mm);
>   	if ((vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
> +		/* No need to lock file folios. See comment in hugetlb_fault() */
> +		if (!anon_rmap) {
> +			folio_locked = false;
> +			folio_unlock(folio);
> +		}
>   		/* Optimization, do the COW without a second fault */
>   		ret = hugetlb_wp(vmf);
>   	}
> @@ -6616,7 +6622,8 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
>   	if (new_folio)
>   		folio_set_hugetlb_migratable(folio);
>   
> -	folio_unlock(folio);
> +	if (folio_locked)
> +		folio_unlock(folio);
>   out:
>   	hugetlb_vma_unlock_read(vma);
>   
> @@ -6636,7 +6643,8 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
>   	if (new_folio && !new_pagecache_folio)
>   		restore_reserve_on_error(h, vma, vmf->address, folio);
>   
> -	folio_unlock(folio);
> +	if (folio_locked)
> +		folio_unlock(folio);
>   	folio_put(folio);
>   	goto out;
>   }
> @@ -6670,7 +6678,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>   {
>   	vm_fault_t ret;
>   	u32 hash;
> -	struct folio *folio;
> +	struct folio *folio = NULL;
>   	struct hstate *h = hstate_vma(vma);
>   	struct address_space *mapping;
>   	struct vm_fault vmf = {
> @@ -6687,6 +6695,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>   		 * be hard to debug if called functions make assumptions
>   		 */
>   	};
> +	bool folio_locked = false;
>   
>   	/*
>   	 * Serialize hugepage allocation and instantiation, so that we don't
> @@ -6801,13 +6810,24 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>   		/* Fallthrough to CoW */
>   	}
>   
> -	/* hugetlb_wp() requires page locks of pte_page(vmf.orig_pte) */
> -	folio = page_folio(pte_page(vmf.orig_pte));
> -	folio_lock(folio);
> -	folio_get(folio);
> -
>   	if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) {
>   		if (!huge_pte_write(vmf.orig_pte)) {
> +			/*
> +			 * Anonymous folios need to be lock since hugetlb_wp()
> +			 * checks whether we can re-use the folio exclusively
> +			 * for us in case we are the only user of it.
> +			 */

Should we move that comment to hugetlb_wp() instead? And if we are 
already doing this PTL unlock dance now, why not do it in hugetlb_wp() 
instead so we can simplify this code?

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/5] mm,hugetlb: Rename anon_rmap to new_anon_folio and make it boolean
  2025-06-20 12:30 ` [PATCH v2 3/5] mm,hugetlb: Rename anon_rmap to new_anon_folio and make it boolean Oscar Salvador
  2025-06-20 14:28   ` Oscar Salvador
@ 2025-06-23 14:13   ` David Hildenbrand
  2025-06-25  7:43     ` Oscar Salvador
  1 sibling, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2025-06-23 14:13 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: Muchun Song, Peter Xu, Gavin Guo, linux-mm, linux-kernel

On 20.06.25 14:30, Oscar Salvador wrote:
> anon_rmap is used to determine whether the new allocated folio is anonymous.
> Rename it to something more meaningul like new_anon_folio and make it boolean,
> as we use it like that.
> While we are at it, drop 'new_pagecache_folio' as 'new_anon_folio' is enough to
> check whether we need to restore the consumed reservation.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---


}
> @@ -6640,7 +6638,8 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
>   backout:
>   	spin_unlock(vmf->ptl);
>   backout_unlocked:
> -	if (new_folio && !new_pagecache_folio)
> +	/* We only need to restore reservations for private mappings */
> +	if (new_folio && new_anon_folio)

Could this be simplified to

if (new_anon_folio) {

...

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/5] mm,hugetlb: Rename anon_rmap to new_anon_folio and make it boolean
  2025-06-23 14:13   ` David Hildenbrand
@ 2025-06-25  7:43     ` Oscar Salvador
  0 siblings, 0 replies; 15+ messages in thread
From: Oscar Salvador @ 2025-06-25  7:43 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Muchun Song, Peter Xu, Gavin Guo, linux-mm,
	linux-kernel

On Mon, Jun 23, 2025 at 04:13:17PM +0200, David Hildenbrand wrote:
> > -	if (new_folio && !new_pagecache_folio)
> > +	/* We only need to restore reservations for private mappings */
> > +	if (new_folio && new_anon_folio)
> 
> Could this be simplified to
> 
> if (new_anon_folio) {

Yes, definitely, will do in the next version.

 

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/5] mm,hugetlb: Sort out folio locking in the faulting path
  2025-06-23 14:11   ` David Hildenbrand
@ 2025-06-25  7:47     ` Oscar Salvador
  2025-06-25 20:47       ` Oscar Salvador
  0 siblings, 1 reply; 15+ messages in thread
From: Oscar Salvador @ 2025-06-25  7:47 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Muchun Song, Peter Xu, Gavin Guo, linux-mm,
	linux-kernel

On Mon, Jun 23, 2025 at 04:11:38PM +0200, David Hildenbrand wrote:
> > @@ -6801,13 +6810,24 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> >   		/* Fallthrough to CoW */
> >   	}
> > -	/* hugetlb_wp() requires page locks of pte_page(vmf.orig_pte) */
> > -	folio = page_folio(pte_page(vmf.orig_pte));
> > -	folio_lock(folio);
> > -	folio_get(folio);
> > -
> >   	if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) {
> >   		if (!huge_pte_write(vmf.orig_pte)) {
> > +			/*
> > +			 * Anonymous folios need to be lock since hugetlb_wp()
> > +			 * checks whether we can re-use the folio exclusively
> > +			 * for us in case we are the only user of it.
> > +			 */
> 
> Should we move that comment to hugetlb_wp() instead? And if we are already
> doing this PTL unlock dance now, why not do it in hugetlb_wp() instead so we
> can simplify this code?

Yes, probably we can move it further up.
Let me see how it would look.

thanks! 

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/5] mm,hugetlb: Change mechanism to detect a COW on private mapping
  2025-06-23 14:09   ` David Hildenbrand
@ 2025-06-25  7:49     ` Oscar Salvador
  2025-06-25 10:13       ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: Oscar Salvador @ 2025-06-25  7:49 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Muchun Song, Peter Xu, Gavin Guo, linux-mm,
	linux-kernel

On Mon, Jun 23, 2025 at 04:09:51PM +0200, David Hildenbrand wrote:
> 
> > -	/*
> > -	 * hugetlb_wp() requires page locks of pte_page(vmf.orig_pte) and
> > -	 * pagecache_folio, so here we need take the former one
> > -	 * when folio != pagecache_folio or !pagecache_folio.
> > -	 */
> > +	/* hugetlb_wp() requires page locks of pte_page(vmf.orig_pte) */
> >   	folio = page_folio(pte_page(vmf.orig_pte));
> > -	if (folio != pagecache_folio)
> > -		if (!folio_trylock(folio)) {
> > -			need_wait_lock = 1;
> > -			goto out_ptl;
> > -		}
> > -
> > +	folio_lock(folio);
> >   	folio_get(folio);
> 
> Just realized that this won't work for this patch here, as we are holding
> the PTL.
> 
> In patch #2 you do the right thing.

Yap, missed that.
I might have to do the lock-unlock-dance here, and then in patch#2 move
it to hugetlb_wp.
Sounds reasonable?

 

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/5] mm,hugetlb: Change mechanism to detect a COW on private mapping
  2025-06-25  7:49     ` Oscar Salvador
@ 2025-06-25 10:13       ` David Hildenbrand
  0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2025-06-25 10:13 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, Muchun Song, Peter Xu, Gavin Guo, linux-mm,
	linux-kernel

On 25.06.25 09:49, Oscar Salvador wrote:
> On Mon, Jun 23, 2025 at 04:09:51PM +0200, David Hildenbrand wrote:
>>
>>> -	/*
>>> -	 * hugetlb_wp() requires page locks of pte_page(vmf.orig_pte) and
>>> -	 * pagecache_folio, so here we need take the former one
>>> -	 * when folio != pagecache_folio or !pagecache_folio.
>>> -	 */
>>> +	/* hugetlb_wp() requires page locks of pte_page(vmf.orig_pte) */
>>>    	folio = page_folio(pte_page(vmf.orig_pte));
>>> -	if (folio != pagecache_folio)
>>> -		if (!folio_trylock(folio)) {
>>> -			need_wait_lock = 1;
>>> -			goto out_ptl;
>>> -		}
>>> -
>>> +	folio_lock(folio);
>>>    	folio_get(folio);
>>
>> Just realized that this won't work for this patch here, as we are holding
>> the PTL.
>>
>> In patch #2 you do the right thing.
> 
> Yap, missed that.
> I might have to do the lock-unlock-dance here, and then in patch#2 move
> it to hugetlb_wp.
> Sounds reasonable?

Yes

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/5] mm,hugetlb: Sort out folio locking in the faulting path
  2025-06-25  7:47     ` Oscar Salvador
@ 2025-06-25 20:47       ` Oscar Salvador
  0 siblings, 0 replies; 15+ messages in thread
From: Oscar Salvador @ 2025-06-25 20:47 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Muchun Song, Peter Xu, Gavin Guo, linux-mm,
	linux-kernel

On Wed, Jun 25, 2025 at 09:47:00AM +0200, Oscar Salvador wrote:
> On Mon, Jun 23, 2025 at 04:11:38PM +0200, David Hildenbrand wrote:
> > > @@ -6801,13 +6810,24 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> > >   		/* Fallthrough to CoW */
> > >   	}
> > > -	/* hugetlb_wp() requires page locks of pte_page(vmf.orig_pte) */
> > > -	folio = page_folio(pte_page(vmf.orig_pte));
> > > -	folio_lock(folio);
> > > -	folio_get(folio);
> > > -
> > >   	if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) {
> > >   		if (!huge_pte_write(vmf.orig_pte)) {
> > > +			/*
> > > +			 * Anonymous folios need to be lock since hugetlb_wp()
> > > +			 * checks whether we can re-use the folio exclusively
> > > +			 * for us in case we are the only user of it.
> > > +			 */
> > 
> > Should we move that comment to hugetlb_wp() instead? And if we are already
> > doing this PTL unlock dance now, why not do it in hugetlb_wp() instead so we
> > can simplify this code?
> 
> Yes, probably we can move it further up.
> Let me see how it would look.

So, I've been thinking about this, and I'm not so sure.
By default, the state of the folio in hugetlb_no_page and hugetlb_fault is
different.

hugetlb_no_page() has the folio locked already, and hugetlb_fault() hasn't, which
means that if we want to move this further up, 1) hugetlb_no_page() would have to
unlock the folio to then lock it in hugetlb_wp() in case it's anonymous or
2) pass a parameter to hugetlb_wp() to let it know whether the folio is already locked.

Don't really like any of them. Case 1) seems suboptimal as right now (with this patch)
we only unlock the folio in !anon case in hugetlb_no_page(). If we want to move the 'dance'
from hugetlb_fault() to hugetlb_wp(), we'd have to unlock and then lock it again.

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-06-25 20:47 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-20 12:30 [PATCH v2 0/5] Misc rework on hugetlb faulting path Oscar Salvador
2025-06-20 12:30 ` [PATCH v2 1/5] mm,hugetlb: Change mechanism to detect a COW on private mapping Oscar Salvador
2025-06-23 14:09   ` David Hildenbrand
2025-06-25  7:49     ` Oscar Salvador
2025-06-25 10:13       ` David Hildenbrand
2025-06-20 12:30 ` [PATCH v2 2/5] mm,hugetlb: Sort out folio locking in the faulting path Oscar Salvador
2025-06-23 14:11   ` David Hildenbrand
2025-06-25  7:47     ` Oscar Salvador
2025-06-25 20:47       ` Oscar Salvador
2025-06-20 12:30 ` [PATCH v2 3/5] mm,hugetlb: Rename anon_rmap to new_anon_folio and make it boolean Oscar Salvador
2025-06-20 14:28   ` Oscar Salvador
2025-06-23 14:13   ` David Hildenbrand
2025-06-25  7:43     ` Oscar Salvador
2025-06-20 12:30 ` [PATCH v2 4/5] mm,hugetlb: Drop obsolete comment about non-present pte and second faults Oscar Salvador
2025-06-20 12:30 ` [PATCH v2 5/5] mm,hugetlb: Drop unlikelys from hugetlb_fault Oscar Salvador

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).