* [PATCH] mm/memory: avoid unnecessary #PF on mTHP allocation race
@ 2026-05-12 9:50 Wandun Chen
2026-05-12 10:35 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 2+ messages in thread
From: Wandun Chen @ 2026-05-12 9:50 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: akpm, david, ljs, liam, vbabka, rppt, surenb, mhocko
When an mTHP folio is allocated in do_anonymous_page() and the target
pte range is not fully empty, current code would release the folio
and return.
This results an illusion that a page fault has already been processed
even if the fact is vmf->address itself is still pte_none(). Another
page fault will be triggered again.
The race scenario as below, use 64KB mTHP for example, two threads
of the same process, base page 4KB, range = [X, X + 64KB),
X < Y < X + 64KB
CPU 0 (writer, faults at X) CPU 1 (reader, faults at Y)
-------------------------------- -----------------------------
do_anonymous_page() do_anonymous_page()
alloc_anon_folio()
pte_range_none(R) --> true
vma_alloc_folio() --> 64KB
pte_offset_map_lock(Y)
install zero_pfn PTE at Y
pte_unmap_unlock()
pte_offset_map_lock(X)
pte_range_none(R) -> false, Y is populated
/* but pte at X is still none */
goto release
return 0
In order to avoid this, check if vmf->address has been mapped, if not
mapped, try alloc_anon_folio and subsequent operations again. On
retry, alloc_anon_folio() re-checks pte_range_none() and falls back
to a smaller order, so no infinite loop situation.
Signed-off-by: Wandun Chen <chenwandun@lixiang.com>
---
Reproducer (not included in the patch, available on request):
two threads hammer the same 64K mTHP range, writer at offset 0,
reader at offset 32K, per-round barrier, 1024 rounds.
Minor faults before: writer=1951 reader=973 (927 extra faults)
Minor faults after: writer=1024 reader=1022
I'm not sure if this situation often occurs in real workloads.
---
mm/memory.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 0c9d9c2cbf0e..104f5be1de36 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5339,10 +5339,12 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
unsigned long addr = vmf->address;
+ unsigned long fault_offset;
struct folio *folio;
vm_fault_t ret = 0;
int nr_pages;
pte_t entry;
+ bool should_retry = false;
/* File mapping without ->vm_ops ? */
if (vma->vm_flags & VM_SHARED)
@@ -5389,6 +5391,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
ret = vmf_anon_prepare(vmf);
if (ret)
return ret;
+retry:
/* Returns NULL on OOM or ERR_PTR(-EAGAIN) if we must retry the fault */
folio = alloc_anon_folio(vmf);
if (IS_ERR(folio))
@@ -5413,14 +5416,26 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
update_mmu_tlb(vma, addr, vmf->pte);
goto release;
} else if (nr_pages > 1 && !pte_range_none(vmf->pte, nr_pages)) {
- update_mmu_tlb_range(vma, addr, vmf->pte, nr_pages);
- goto release;
+ fault_offset = (vmf->address - addr) >> PAGE_SHIFT;
+ if (!pte_none(ptep_get(vmf->pte + fault_offset))) {
+ update_mmu_tlb_range(vma, addr, vmf->pte, nr_pages);
+ goto release;
+ }
+
+ should_retry = true;
}
ret = check_stable_address_space(vma->vm_mm);
if (ret)
goto release;
+ if (should_retry) {
+ pte_unmap_unlock(vmf->pte, vmf->ptl);
+ folio_put(folio);
+ should_retry = false;
+ goto retry;
+ }
+
/* Deliver the page fault to userland, check inside PT lock */
if (userfaultfd_missing(vma)) {
pte_unmap_unlock(vmf->pte, vmf->ptl);
--
2.43.0
^ permalink raw reply related [flat|nested] 2+ messages in thread* Re: [PATCH] mm/memory: avoid unnecessary #PF on mTHP allocation race
2026-05-12 9:50 [PATCH] mm/memory: avoid unnecessary #PF on mTHP allocation race Wandun Chen
@ 2026-05-12 10:35 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 2+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-12 10:35 UTC (permalink / raw)
To: Wandun Chen, linux-mm, linux-kernel
Cc: akpm, ljs, liam, vbabka, rppt, surenb, mhocko
On 5/12/26 11:50, Wandun Chen wrote:
> When an mTHP folio is allocated in do_anonymous_page() and the target
> pte range is not fully empty, current code would release the folio
> and return.
>
> This results an illusion that a page fault has already been processed
> even if the fact is vmf->address itself is still pte_none(). Another
> page fault will be triggered again.
Yes. Why is that a problem?
--
Cheers,
David
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-05-12 10:35 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-12 9:50 [PATCH] mm/memory: avoid unnecessary #PF on mTHP allocation race Wandun Chen
2026-05-12 10:35 ` David Hildenbrand (Arm)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox