* [PATCH v4 1/5] mm,hugetlb: change mechanism to detect a COW on private mapping
2025-06-30 14:42 [PATCH v4 0/5] Misc rework on hugetlb faulting path Oscar Salvador
@ 2025-06-30 14:42 ` Oscar Salvador
2025-06-30 14:42 ` [PATCH v4 2/5] mm,hugetlb: sort out folio locking in the faulting path Oscar Salvador
` (3 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Oscar Salvador @ 2025-06-30 14:42 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Muchun Song, Peter Xu, Gavin Guo, linux-mm,
linux-kernel, Oscar Salvador
hugetlb_wp() checks whether the process is trying to COW on a private
mapping in order to know whether the reservation for that address was
already consumed. If it was consumed and we are the ownner of the mapping,
the folio will have to be unmapped from the other processes.
Currently, that check is done by looking up the folio in the pagecache and
compare it to the folio which is mapped in our pagetables. If it differs,
it means we already mapped it privately before, consuming a reservation on
the way. All we are interested in is whether the mapped folio is anonymous,
so we can simplify and check for that instead.
Link: https://lkml.kernel.org/r/20250627102904.107202-1-osalvador@suse.de
Link: https://lkml.kernel.org/r/20250627102904.107202-2-osalvador@suse.de
Link: https://lore.kernel.org/lkml/20250513093448.592150-1-gavinguo@igalia.com/ [1]
Fixes: 40549ba8f8e0 ("hugetlb: use new vma_lock for pmd sharing synchronization")
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reported-by: Gavin Guo <gavinguo@igalia.com>
Closes: https://lore.kernel.org/lkml/20250513093448.592150-1-gavinguo@igalia.com/
Suggested-by: Peter Xu <peterx@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/hugetlb.c | 88 ++++++++++++++++++++--------------------------------
1 file changed, 34 insertions(+), 54 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index fa7faf38c99e..14274a02dd14 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6149,8 +6149,7 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
* cannot race with other handlers or page migration.
* Keep the pte_same checks anyway to make transition from the mutex easier.
*/
-static vm_fault_t hugetlb_wp(struct folio *pagecache_folio,
- struct vm_fault *vmf)
+static vm_fault_t hugetlb_wp(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
struct mm_struct *mm = vma->vm_mm;
@@ -6212,16 +6211,17 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio,
PageAnonExclusive(&old_folio->page), &old_folio->page);
/*
- * If the process that created a MAP_PRIVATE mapping is about to
- * perform a COW due to a shared page count, attempt to satisfy
- * the allocation without using the existing reserves. The pagecache
- * page is used to determine if the reserve at this address was
- * consumed or not. If reserves were used, a partial faulted mapping
- * at the time of fork() could consume its reserves on COW instead
- * of the full address range.
+ * If the process that created a MAP_PRIVATE mapping is about to perform
+ * a COW due to a shared page count, attempt to satisfy the allocation
+ * without using the existing reserves.
+ * In order to determine where this is a COW on a MAP_PRIVATE mapping it
+ * is enough to check whether the old_folio is anonymous. This means that
+ * the reserve for this address was consumed. If reserves were used, a
+ * partial faulted mapping at the fime of fork() could consume its reserves
+ * on COW instead of the full address range.
*/
if (is_vma_resv_set(vma, HPAGE_RESV_OWNER) &&
- old_folio != pagecache_folio)
+ folio_test_anon(old_folio))
cow_from_owner = true;
folio_get(old_folio);
@@ -6600,7 +6600,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
hugetlb_count_add(pages_per_huge_page(h), mm);
if ((vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
/* Optimization, do the COW without a second fault */
- ret = hugetlb_wp(folio, vmf);
+ ret = hugetlb_wp(vmf);
}
spin_unlock(vmf->ptl);
@@ -6668,10 +6668,9 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
vm_fault_t ret;
u32 hash;
struct folio *folio = NULL;
- struct folio *pagecache_folio = NULL;
struct hstate *h = hstate_vma(vma);
struct address_space *mapping;
- int need_wait_lock = 0;
+ bool need_wait_lock = false;
struct vm_fault vmf = {
.vma = vma,
.address = address & huge_page_mask(h),
@@ -6766,8 +6765,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
* If we are going to COW/unshare the mapping later, we examine the
* pending reservations for this page now. This will ensure that any
* allocations necessary to record that reservation occur outside the
- * spinlock. Also lookup the pagecache page now as it is used to
- * determine if a reservation has been consumed.
+ * spinlock.
*/
if ((flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) &&
!(vma->vm_flags & VM_MAYSHARE) && !huge_pte_write(vmf.orig_pte)) {
@@ -6777,11 +6775,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
}
/* Just decrements count, does not deallocate */
vma_end_reservation(h, vma, vmf.address);
-
- pagecache_folio = filemap_lock_hugetlb_folio(h, mapping,
- vmf.pgoff);
- if (IS_ERR(pagecache_folio))
- pagecache_folio = NULL;
}
vmf.ptl = huge_pte_lock(h, mm, vmf.pte);
@@ -6795,10 +6788,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
(flags & FAULT_FLAG_WRITE) && !huge_pte_write(vmf.orig_pte)) {
if (!userfaultfd_wp_async(vma)) {
spin_unlock(vmf.ptl);
- if (pagecache_folio) {
- folio_unlock(pagecache_folio);
- folio_put(pagecache_folio);
- }
hugetlb_vma_unlock_read(vma);
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
return handle_userfault(&vmf, VM_UFFD_WP);
@@ -6810,24 +6799,19 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
/* Fallthrough to CoW */
}
- /*
- * hugetlb_wp() requires page locks of pte_page(vmf.orig_pte) and
- * pagecache_folio, so here we need take the former one
- * when folio != pagecache_folio or !pagecache_folio.
- */
- folio = page_folio(pte_page(vmf.orig_pte));
- if (folio != pagecache_folio)
- if (!folio_trylock(folio)) {
- need_wait_lock = 1;
- goto out_ptl;
- }
-
- folio_get(folio);
-
if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) {
if (!huge_pte_write(vmf.orig_pte)) {
- ret = hugetlb_wp(pagecache_folio, &vmf);
- goto out_put_page;
+ /* hugetlb_wp() requires page locks of pte_page(vmf.orig_pte) */
+ folio = page_folio(pte_page(vmf.orig_pte));
+ if (!folio_trylock(folio)) {
+ need_wait_lock = true;
+ goto out_ptl;
+ }
+ folio_get(folio);
+ ret = hugetlb_wp(&vmf);
+ folio_unlock(folio);
+ folio_put(folio);
+ goto out_ptl;
} else if (likely(flags & FAULT_FLAG_WRITE)) {
vmf.orig_pte = huge_pte_mkdirty(vmf.orig_pte);
}
@@ -6836,17 +6820,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
if (huge_ptep_set_access_flags(vma, vmf.address, vmf.pte, vmf.orig_pte,
flags & FAULT_FLAG_WRITE))
update_mmu_cache(vma, vmf.address, vmf.pte);
-out_put_page:
- if (folio != pagecache_folio)
- folio_unlock(folio);
- folio_put(folio);
out_ptl:
spin_unlock(vmf.ptl);
-
- if (pagecache_folio) {
- folio_unlock(pagecache_folio);
- folio_put(pagecache_folio);
- }
out_mutex:
hugetlb_vma_unlock_read(vma);
@@ -6859,11 +6834,16 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
/*
- * Generally it's safe to hold refcount during waiting page lock. But
- * here we just wait to defer the next page fault to avoid busy loop and
- * the page is not used after unlocked before returning from the current
- * page fault. So we are safe from accessing freed page, even if we wait
- * here without taking refcount.
+ * hugetlb_wp drops all the locks, but the folio lock, before trying to
+ * unmap the folio from other processes. During that window, if another
+ * process mapping that folio faults in, it will take the mutex and then
+ * it will wait on folio_lock, causing an ABBA deadlock.
+ * Use trylock instead and bail out if we fail.
+ *
+ * Ideally, we should hold a refcount on the folio we wait for, but we do
+ * not want to use the folio after it becomes unlocked, but rather just
+ * wait for it to become unlocked, so hopefully next fault successes on
+ * the trylock.
*/
if (need_wait_lock)
folio_wait_locked(folio);
--
2.50.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v4 2/5] mm,hugetlb: sort out folio locking in the faulting path
2025-06-30 14:42 [PATCH v4 0/5] Misc rework on hugetlb faulting path Oscar Salvador
2025-06-30 14:42 ` [PATCH v4 1/5] mm,hugetlb: change mechanism to detect a COW on private mapping Oscar Salvador
@ 2025-06-30 14:42 ` Oscar Salvador
2025-07-04 10:45 ` David Hildenbrand
2025-06-30 14:42 ` [PATCH v4 3/5] mm,hugetlb: rename anon_rmap to new_anon_folio and make it boolean Oscar Salvador
` (2 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Oscar Salvador @ 2025-06-30 14:42 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Muchun Song, Peter Xu, Gavin Guo, linux-mm,
linux-kernel, Oscar Salvador
Recent conversations showed that there was a misunderstanding about why we
were locking the folio prior to call in hugetlb_wp(). In fact, as soon as
we have the folio mapped into the pagetables, we no longer need to hold it
locked, because we know that no concurrent truncation could have happened.
There is only one case where the folio needs to be locked, and that is
when we are handling an anonymous folio, because hugetlb_wp() will check
whether it can re-use it exclusively for the process that is faulting it
in.
So, pass the folio locked to hugetlb_wp() when that is the case.
Link: https://lkml.kernel.org/r/20250627102904.107202-3-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Suggested-by: David Hildenbrand <david@redhat.com>
Cc: Gavin Guo <gavinguo@igalia.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/hugetlb.c | 23 +++++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 14274a02dd14..31d39e2a0879 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6434,6 +6434,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
pte_t new_pte;
bool new_folio, new_pagecache_folio = false;
u32 hash = hugetlb_fault_mutex_hash(mapping, vmf->pgoff);
+ bool folio_locked = true;
/*
* Currently, we are forced to kill the process in the event the
@@ -6599,6 +6600,14 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
hugetlb_count_add(pages_per_huge_page(h), mm);
if ((vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
+ /*
+ * No need to keep file folios locked. See comment in
+ * hugetlb_fault().
+ */
+ if (!anon_rmap) {
+ folio_locked = false;
+ folio_unlock(folio);
+ }
/* Optimization, do the COW without a second fault */
ret = hugetlb_wp(vmf);
}
@@ -6613,7 +6622,8 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
if (new_folio)
folio_set_hugetlb_migratable(folio);
- folio_unlock(folio);
+ if (folio_locked)
+ folio_unlock(folio);
out:
hugetlb_vma_unlock_read(vma);
@@ -6801,15 +6811,20 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) {
if (!huge_pte_write(vmf.orig_pte)) {
- /* hugetlb_wp() requires page locks of pte_page(vmf.orig_pte) */
+ /*
+ * Anonymous folios need to be lock since hugetlb_wp()
+ * checks whether we can re-use the folio exclusively
+ * for us in case we are the only user of it.
+ */
folio = page_folio(pte_page(vmf.orig_pte));
- if (!folio_trylock(folio)) {
+ if (folio_test_anon(folio) && !folio_trylock(folio)) {
need_wait_lock = true;
goto out_ptl;
}
folio_get(folio);
ret = hugetlb_wp(&vmf);
- folio_unlock(folio);
+ if (folio_test_anon(folio))
+ folio_unlock(folio);
folio_put(folio);
goto out_ptl;
} else if (likely(flags & FAULT_FLAG_WRITE)) {
--
2.50.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v4 2/5] mm,hugetlb: sort out folio locking in the faulting path
2025-06-30 14:42 ` [PATCH v4 2/5] mm,hugetlb: sort out folio locking in the faulting path Oscar Salvador
@ 2025-07-04 10:45 ` David Hildenbrand
0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand @ 2025-07-04 10:45 UTC (permalink / raw)
To: Oscar Salvador, Andrew Morton
Cc: Muchun Song, Peter Xu, Gavin Guo, linux-mm, linux-kernel
On 30.06.25 16:42, Oscar Salvador wrote:
> Recent conversations showed that there was a misunderstanding about why we
> were locking the folio prior to call in hugetlb_wp(). In fact, as soon as
> we have the folio mapped into the pagetables, we no longer need to hold it
> locked, because we know that no concurrent truncation could have happened.
>
> There is only one case where the folio needs to be locked, and that is
> when we are handling an anonymous folio, because hugetlb_wp() will check
> whether it can re-use it exclusively for the process that is faulting it
> in.
>
> So, pass the folio locked to hugetlb_wp() when that is the case.
>
> Link: https://lkml.kernel.org/r/20250627102904.107202-3-osalvador@suse.de
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Cc: Gavin Guo <gavinguo@igalia.com>
> Cc: Muchun Song <muchun.song@linux.dev>
> Cc: Peter Xu <peterx@redhat.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v4 3/5] mm,hugetlb: rename anon_rmap to new_anon_folio and make it boolean
2025-06-30 14:42 [PATCH v4 0/5] Misc rework on hugetlb faulting path Oscar Salvador
2025-06-30 14:42 ` [PATCH v4 1/5] mm,hugetlb: change mechanism to detect a COW on private mapping Oscar Salvador
2025-06-30 14:42 ` [PATCH v4 2/5] mm,hugetlb: sort out folio locking in the faulting path Oscar Salvador
@ 2025-06-30 14:42 ` Oscar Salvador
2025-07-04 10:46 ` David Hildenbrand
2025-06-30 14:42 ` [PATCH v4 4/5] mm,hugetlb: drop obsolete comment about non-present pte and second faults Oscar Salvador
2025-06-30 14:42 ` [PATCH v4 5/5] mm,hugetlb: drop unlikelys from hugetlb_fault Oscar Salvador
4 siblings, 1 reply; 10+ messages in thread
From: Oscar Salvador @ 2025-06-30 14:42 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Muchun Song, Peter Xu, Gavin Guo, linux-mm,
linux-kernel, Oscar Salvador
anon_rmap is used to determine whether the new allocated folio is
anonymous. Rename it to something more meaningul like new_anon_folio and
make it boolean, as we use it like that.
While we are at it, drop 'new_pagecache_folio' as 'new_anon_folio' is
enough to check whether we need to restore the consumed reservation.
Link: https://lkml.kernel.org/r/20250627102904.107202-4-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Gavin Guo <gavinguo@igalia.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/hugetlb.c | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 31d39e2a0879..67f3c9c16348 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6424,17 +6424,16 @@ static bool hugetlb_pte_stable(struct hstate *h, struct mm_struct *mm, unsigned
static vm_fault_t hugetlb_no_page(struct address_space *mapping,
struct vm_fault *vmf)
{
+ u32 hash = hugetlb_fault_mutex_hash(mapping, vmf->pgoff);
+ bool new_folio, new_anon_folio = false;
struct vm_area_struct *vma = vmf->vma;
struct mm_struct *mm = vma->vm_mm;
struct hstate *h = hstate_vma(vma);
vm_fault_t ret = VM_FAULT_SIGBUS;
- int anon_rmap = 0;
- unsigned long size;
+ bool folio_locked = true;
struct folio *folio;
+ unsigned long size;
pte_t new_pte;
- bool new_folio, new_pagecache_folio = false;
- u32 hash = hugetlb_fault_mutex_hash(mapping, vmf->pgoff);
- bool folio_locked = true;
/*
* Currently, we are forced to kill the process in the event the
@@ -6533,10 +6532,9 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
ret = VM_FAULT_SIGBUS;
goto out;
}
- new_pagecache_folio = true;
} else {
+ new_anon_folio = true;
folio_lock(folio);
- anon_rmap = 1;
}
} else {
/*
@@ -6585,7 +6583,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
if (!pte_same(huge_ptep_get(mm, vmf->address, vmf->pte), vmf->orig_pte))
goto backout;
- if (anon_rmap)
+ if (new_anon_folio)
hugetlb_add_new_anon_rmap(folio, vma, vmf->address);
else
hugetlb_add_file_rmap(folio);
@@ -6604,7 +6602,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
* No need to keep file folios locked. See comment in
* hugetlb_fault().
*/
- if (!anon_rmap) {
+ if (!new_anon_folio) {
folio_locked = false;
folio_unlock(folio);
}
@@ -6640,7 +6638,8 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
backout:
spin_unlock(vmf->ptl);
backout_unlocked:
- if (new_folio && !new_pagecache_folio)
+ /* We only need to restore reservations for private mappings */
+ if (new_anon_folio)
restore_reserve_on_error(h, vma, vmf->address, folio);
folio_unlock(folio);
--
2.50.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v4 3/5] mm,hugetlb: rename anon_rmap to new_anon_folio and make it boolean
2025-06-30 14:42 ` [PATCH v4 3/5] mm,hugetlb: rename anon_rmap to new_anon_folio and make it boolean Oscar Salvador
@ 2025-07-04 10:46 ` David Hildenbrand
0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand @ 2025-07-04 10:46 UTC (permalink / raw)
To: Oscar Salvador, Andrew Morton
Cc: Muchun Song, Peter Xu, Gavin Guo, linux-mm, linux-kernel
On 30.06.25 16:42, Oscar Salvador wrote:
> anon_rmap is used to determine whether the new allocated folio is
> anonymous. Rename it to something more meaningul like new_anon_folio and
> make it boolean, as we use it like that.
>
> While we are at it, drop 'new_pagecache_folio' as 'new_anon_folio' is
> enough to check whether we need to restore the consumed reservation.
>
> Link: https://lkml.kernel.org/r/20250627102904.107202-4-osalvador@suse.de
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Gavin Guo <gavinguo@igalia.com>
> Cc: Muchun Song <muchun.song@linux.dev>
> Cc: Peter Xu <peterx@redhat.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
BTW, these SOB are wrong :)
> ---
Nothing jumped at me, but it's all very complicated stuff
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v4 4/5] mm,hugetlb: drop obsolete comment about non-present pte and second faults
2025-06-30 14:42 [PATCH v4 0/5] Misc rework on hugetlb faulting path Oscar Salvador
` (2 preceding siblings ...)
2025-06-30 14:42 ` [PATCH v4 3/5] mm,hugetlb: rename anon_rmap to new_anon_folio and make it boolean Oscar Salvador
@ 2025-06-30 14:42 ` Oscar Salvador
2025-07-04 10:47 ` David Hildenbrand
2025-06-30 14:42 ` [PATCH v4 5/5] mm,hugetlb: drop unlikelys from hugetlb_fault Oscar Salvador
4 siblings, 1 reply; 10+ messages in thread
From: Oscar Salvador @ 2025-06-30 14:42 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Muchun Song, Peter Xu, Gavin Guo, linux-mm,
linux-kernel, Oscar Salvador
There is a comment in hugetlb_fault() that does not hold anymore. This
one:
/*
* vmf.orig_pte could be a migration/hwpoison vmf.orig_pte at this
* point, so this check prevents the kernel from going below assuming
* that we have an active hugepage in pagecache. This goto expects
* the 2nd page fault, and is_hugetlb_entry_(migration|hwpoisoned)
* check will properly handle it.
*/
This was written because back in the day we used to do:
hugetlb_fault () {
ptep = huge_pte_offset(...)
if (ptep) {
entry = huge_ptep_get(ptep)
if (unlikely(is_hugetlb_entry_migration(entry))
...
else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
...
}
...
...
/*
* entry could be a migration/hwpoison entry at this point, so this
* check prevents the kernel from going below assuming that we have
* a active hugepage in pagecache. This goto expects the 2nd page fault,
* and is_hugetlb_entry_(migration|hwpoisoned) check will properly
* handle it.
*/
if (!pte_present(entry))
goto out_mutex;
...
}
The code was designed to check for hwpoisoned/migration entries upfront,
and then bail out if further down the pte was not present anymore, relying
on the second fault to properly handle migration/hwpoison entries that
time around.
The way we handle this is different nowadays, so drop the misleading
comment.
Link: https://lkml.kernel.org/r/20250627102904.107202-5-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Gavin Guo <gavinguo@igalia.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/hugetlb.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 67f3c9c16348..ba078aa1cb96 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6745,13 +6745,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
ret = 0;
- /*
- * vmf.orig_pte could be a migration/hwpoison vmf.orig_pte at this
- * point, so this check prevents the kernel from going below assuming
- * that we have an active hugepage in pagecache. This goto expects
- * the 2nd page fault, and is_hugetlb_entry_(migration|hwpoisoned)
- * check will properly handle it.
- */
+ /* Not present, either a migration or a hwpoisoned entry */
if (!pte_present(vmf.orig_pte)) {
if (unlikely(is_hugetlb_entry_migration(vmf.orig_pte))) {
/*
--
2.50.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v4 4/5] mm,hugetlb: drop obsolete comment about non-present pte and second faults
2025-06-30 14:42 ` [PATCH v4 4/5] mm,hugetlb: drop obsolete comment about non-present pte and second faults Oscar Salvador
@ 2025-07-04 10:47 ` David Hildenbrand
0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand @ 2025-07-04 10:47 UTC (permalink / raw)
To: Oscar Salvador, Andrew Morton
Cc: Muchun Song, Peter Xu, Gavin Guo, linux-mm, linux-kernel
On 30.06.25 16:42, Oscar Salvador wrote:
> There is a comment in hugetlb_fault() that does not hold anymore. This
> one:
>
> /*
> * vmf.orig_pte could be a migration/hwpoison vmf.orig_pte at this
> * point, so this check prevents the kernel from going below assuming
> * that we have an active hugepage in pagecache. This goto expects
> * the 2nd page fault, and is_hugetlb_entry_(migration|hwpoisoned)
> * check will properly handle it.
> */
>
> This was written because back in the day we used to do:
>
> hugetlb_fault () {
> ptep = huge_pte_offset(...)
> if (ptep) {
> entry = huge_ptep_get(ptep)
> if (unlikely(is_hugetlb_entry_migration(entry))
> ...
> else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
> ...
> }
>
> ...
> ...
>
> /*
> * entry could be a migration/hwpoison entry at this point, so this
> * check prevents the kernel from going below assuming that we have
> * a active hugepage in pagecache. This goto expects the 2nd page fault,
> * and is_hugetlb_entry_(migration|hwpoisoned) check will properly
> * handle it.
> */
> if (!pte_present(entry))
> goto out_mutex;
> ...
> }
>
> The code was designed to check for hwpoisoned/migration entries upfront,
> and then bail out if further down the pte was not present anymore, relying
> on the second fault to properly handle migration/hwpoison entries that
> time around.
>
> The way we handle this is different nowadays, so drop the misleading
> comment.
>
> Link: https://lkml.kernel.org/r/20250627102904.107202-5-osalvador@suse.de
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Gavin Guo <gavinguo@igalia.com>
> Cc: Muchun Song <muchun.song@linux.dev>
> Cc: Peter Xu <peterx@redhat.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v4 5/5] mm,hugetlb: drop unlikelys from hugetlb_fault
2025-06-30 14:42 [PATCH v4 0/5] Misc rework on hugetlb faulting path Oscar Salvador
` (3 preceding siblings ...)
2025-06-30 14:42 ` [PATCH v4 4/5] mm,hugetlb: drop obsolete comment about non-present pte and second faults Oscar Salvador
@ 2025-06-30 14:42 ` Oscar Salvador
2025-07-04 10:47 ` David Hildenbrand
4 siblings, 1 reply; 10+ messages in thread
From: Oscar Salvador @ 2025-06-30 14:42 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Muchun Song, Peter Xu, Gavin Guo, linux-mm,
linux-kernel, Oscar Salvador
The unlikely predates an era where we were checking for
hwpoisoned/migration entries prior to checking whether the pte was
present.
Currently, we check for the pte to be a migration/hwpoison entry after we
have checked that is not present, so it must be either one or the other.
Link: https://lkml.kernel.org/r/20250627102904.107202-6-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Gavin Guo <gavinguo@igalia.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/hugetlb.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ba078aa1cb96..fda6b748e985 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6747,7 +6747,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
/* Not present, either a migration or a hwpoisoned entry */
if (!pte_present(vmf.orig_pte)) {
- if (unlikely(is_hugetlb_entry_migration(vmf.orig_pte))) {
+ if (is_hugetlb_entry_migration(vmf.orig_pte)) {
/*
* Release the hugetlb fault lock now, but retain
* the vma lock, because it is needed to guard the
@@ -6758,7 +6758,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
migration_entry_wait_huge(vma, vmf.address, vmf.pte);
return 0;
- } else if (unlikely(is_hugetlb_entry_hwpoisoned(vmf.orig_pte)))
+ } else if (is_hugetlb_entry_hwpoisoned(vmf.orig_pte))
ret = VM_FAULT_HWPOISON_LARGE |
VM_FAULT_SET_HINDEX(hstate_index(h));
goto out_mutex;
--
2.50.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v4 5/5] mm,hugetlb: drop unlikelys from hugetlb_fault
2025-06-30 14:42 ` [PATCH v4 5/5] mm,hugetlb: drop unlikelys from hugetlb_fault Oscar Salvador
@ 2025-07-04 10:47 ` David Hildenbrand
0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand @ 2025-07-04 10:47 UTC (permalink / raw)
To: Oscar Salvador, Andrew Morton
Cc: Muchun Song, Peter Xu, Gavin Guo, linux-mm, linux-kernel
On 30.06.25 16:42, Oscar Salvador wrote:
> The unlikely predates an era where we were checking for
> hwpoisoned/migration entries prior to checking whether the pte was
> present.
>
> Currently, we check for the pte to be a migration/hwpoison entry after we
> have checked that is not present, so it must be either one or the other.
>
> Link: https://lkml.kernel.org/r/20250627102904.107202-6-osalvador@suse.de
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Gavin Guo <gavinguo@igalia.com>
> Cc: Muchun Song <muchun.song@linux.dev>
> Cc: Peter Xu <peterx@redhat.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 10+ messages in thread