linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/5] Misc rework on hugetlb faulting path
@ 2025-06-30 14:42 Oscar Salvador
  2025-06-30 14:42 ` [PATCH v4 1/5] mm,hugetlb: change mechanism to detect a COW on private mapping Oscar Salvador
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Oscar Salvador @ 2025-06-30 14:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Muchun Song, Peter Xu, Gavin Guo, linux-mm,
	linux-kernel, Oscar Salvador


 v3 -> v4:
  - Fix the deadlock for real in patch#1 (Kudos to Galving for testing out v3)
  - Keep trylock as the original code does
  - Add reproducer in the cover letter

 v2 -> v3:
   - Addressed issue folio_lock when holding spinlock (per David)
   - Simplify new_anon_folio (per David)
   - Slightly rework patch#2 to make it simpler

 v1 -> v2:
   - Addressed feedback from David
   - Settle ideas wrt. locking in faulting path after
     discussion with David
   - Add Acks-by

 RFC -> v1:
   - Stop looking up folio in the pagecache for detecting a COW
     on a private mapping.
   - Document the locks

This patchset aims to give some love to the hugetlb faulting path, doing so
by removing obsolete comments that are no longer true, sorting out the folio
lock, and changing the mechanism we use to determine whether we are COWing a
private mapping already.

The most important patch of the series is #1, as it fixes a deadlock that
was described in [1], where two processes were holding the same lock
for the folio in the pagecache, and then deadlocked in the mutex.
Note that this can also happen for anymous folios. This has been tested using
this reproducer [2].

Looking up and locking the folio in the pagecache was done to check whether
that folio was the same folio we had mapped in our pagetables, meaning that if it
was different we knew that we already mapped that folio privately, so any
further CoW would be made on a private mapping, which lead us to the  question:
 __Was the reservation for that address consumed?__
That is all we care about, because if it was indeed consumed and we are the
owner and we cannot allocate more folios, we need to unmap the folio from the
processes pagetables and make it exclusive for us.

We figured we do not need to look up the folio at all, and it is just enough to
check whether the folio we have mapped is anonymous, which means we mapped it
privately, so the reservation was indeed consumed.

Patch#2 sorts out folio locking in the faulting path, reducing the scope of it
,only taking it when we are dealing with an anonymous folio and document it.
More details in the patch.

Patch#3-5 are cleanups.

[1] https://lore.kernel.org/lkml/20250513093448.592150-1-gavinguo@igalia.com/
[2] Here is the reproducer:

 #include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>
 #include <sys/mman.h>
 #include <sys/wait.h>
 
 #define PROTECTION (PROT_READ | PROT_WRITE)
 #define LENGTH (2UL*1024*1024)
 
 #define ADDR (void *)(0x0UL)
 #define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB)
 
 void __read(char *addr)
 {
 	int i = 0;
 
 	printf("a[%d]: %c\n", i, addr[i]);
 }
 
 void fill(char *addr)
 {
 	addr[0] = 'd';
 
 	printf("addr: %c\n", addr[0]);
 }
 
 int main(void)
 {
 	void *addr;
 	pid_t pid, wpid;
 	int status;
 
 	addr = mmap(ADDR, LENGTH, PROTECTION, FLAGS, -1, 0);
 	if (addr == MAP_FAILED) {
 		perror("mmap");
 		return -1;
 	}
 
 	printf("Parent faulting in RO\n");
 	__read(addr);
 
 	sleep (10);
 	printf("Forking\n");
 
 	pid = fork();
 	switch (pid) {
 	case -1:
 		perror("fork");
 		break;
 	case 0:
 		sleep (4);
 		printf("Child: Faulting in\n");
 		fill(addr);
 		exit(0);
 		break;
 	default:
 		printf("Parent: Faulting in\n");
 		fill(addr);
 		while((wpid = wait(&status)) > 0);
 		if (munmap(addr, LENGTH))
 			perror("munmap");
 	}
 
 
 	return 0;
 }

You will also have to add a delay in hugetlb_wp, after releasing the mutex and
before unmapping, so the window is large enough to reproduce it reliably.

 diff --git a/mm/hugetlb.c b/mm/hugetlb.c
 index fda6b748e985..5601a9cf819b 100644
 --- a/mm/hugetlb.c
 +++ b/mm/hugetlb.c
 @@ -38,6 +38,7 @@
  #include <linux/memory.h>
  #include <linux/mm_inline.h>
  #include <linux/padata.h>
 +#include <linux/delay.h>
  
  #include <asm/page.h>
  #include <asm/pgalloc.h>
 @@ -6261,6 +6262,8 @@ static vm_fault_t hugetlb_wp(struct vm_fault *vmf)
  			hugetlb_vma_unlock_read(vma);
  			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
  
 +			mdelay(8000);
 +
  			unmap_ref_private(mm, vma, old_folio, vmf->address);
  
  			mutex_lock(&hugetlb_fault_mutex_table[hash]);


Oscar Salvador (5):
  mm,hugetlb: change mechanism to detect a COW on private mapping
  mm,hugetlb: sort out folio locking in the faulting path
  mm,hugetlb: rename anon_rmap to new_anon_folio and make it boolean
  mm,hugetlb: drop obsolete comment about non-present pte and second
    faults
  mm,hugetlb: drop unlikelys from hugetlb_fault

 mm/hugetlb.c | 132 +++++++++++++++++++++++----------------------------
 1 file changed, 60 insertions(+), 72 deletions(-)

-- 
2.50.0



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-07-04 10:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-30 14:42 [PATCH v4 0/5] Misc rework on hugetlb faulting path Oscar Salvador
2025-06-30 14:42 ` [PATCH v4 1/5] mm,hugetlb: change mechanism to detect a COW on private mapping Oscar Salvador
2025-06-30 14:42 ` [PATCH v4 2/5] mm,hugetlb: sort out folio locking in the faulting path Oscar Salvador
2025-07-04 10:45   ` David Hildenbrand
2025-06-30 14:42 ` [PATCH v4 3/5] mm,hugetlb: rename anon_rmap to new_anon_folio and make it boolean Oscar Salvador
2025-07-04 10:46   ` David Hildenbrand
2025-06-30 14:42 ` [PATCH v4 4/5] mm,hugetlb: drop obsolete comment about non-present pte and second faults Oscar Salvador
2025-07-04 10:47   ` David Hildenbrand
2025-06-30 14:42 ` [PATCH v4 5/5] mm,hugetlb: drop unlikelys from hugetlb_fault Oscar Salvador
2025-07-04 10:47   ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).