[PATCH rfc -next v2 00/10] mm: convert to generic VMA lock-based page fault

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH rfc -next v2 00/10] mm: convert to generic VMA lock-based page fault
@ 2023-08-21 12:30 Kefeng Wang
  2023-08-21 12:30 ` [PATCH rfc v2 01/10] mm: add a generic VMA lock-based page fault handler Kefeng Wang
                   ` (9 more replies)
  0 siblings, 10 replies; 19+ messages in thread
From: Kefeng Wang @ 2023-08-21 12:30 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: surenb, willy, Russell King, Catalin Marinas, Will Deacon,
	Huacai Chen, WANG Xuerui, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H . Peter Anvin, linux-arm-kernel,
	linux-kernel, loongarch, linuxppc-dev, linux-riscv, linux-s390,
	Kefeng Wang

Add a generic VMA lock-based page fault handler in mm core, and convert
architectures to use it, which eliminate architectures's duplicated
codes.

With it, we can avoid multiple changes in architectures's code if we 
add new feature or bugfix, in the end, enable this feature on ARM32
and Loongarch.

This is based on next-20230817, only built test.

v2: 
- convert "int arch_vma_check_access()" to "bool arch_vma_access_error()"
  still use __weak function for arch_vma_access_error(), which avoid to
  declare access_error() in architecture's(x86/powerpc/riscv/loongarch)
  headfile.
- re-use struct vm_fault instead of adding new struct vm_locked_fault,
  per Matthew Wilcox, add necessary pt_regs/fault error code/vm flags
  into vm_fault since they could be used in arch_vma_access_error()
- add special VM_FAULT_NONE and make try_vma_locked_page_fault() to
  return vm_fault_t

Kefeng Wang (10):
  mm: add a generic VMA lock-based page fault handler
  arm64: mm: use try_vma_locked_page_fault()
  x86: mm: use try_vma_locked_page_fault()
  s390: mm: use try_vma_locked_page_fault()
  powerpc: mm: use try_vma_locked_page_fault()
  riscv: mm: use try_vma_locked_page_fault()
  ARM: mm: try VMA lock-based page fault handling first
  loongarch: mm: cleanup __do_page_fault()
  loongarch: mm: add access_error() helper
  loongarch: mm: try VMA lock-based page fault handling first

 arch/arm/Kconfig          |   1 +
 arch/arm/mm/fault.c       |  35 ++++++++----
 arch/arm64/mm/fault.c     |  60 ++++++++-------------
 arch/loongarch/Kconfig    |   1 +
 arch/loongarch/mm/fault.c | 111 ++++++++++++++++++++++----------------
 arch/powerpc/mm/fault.c   |  66 +++++++++++------------
 arch/riscv/mm/fault.c     |  58 +++++++++-----------
 arch/s390/mm/fault.c      |  66 ++++++++++-------------
 arch/x86/mm/fault.c       |  55 ++++++++-----------
 include/linux/mm.h        |  17 ++++++
 include/linux/mm_types.h  |   2 +
 mm/memory.c               |  39 ++++++++++++++
 12 files changed, 278 insertions(+), 233 deletions(-)

-- 
2.27.0



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH rfc v2 01/10] mm: add a generic VMA lock-based page fault handler
  2023-08-21 12:30 [PATCH rfc -next v2 00/10] mm: convert to generic VMA lock-based page fault Kefeng Wang
@ 2023-08-21 12:30 ` Kefeng Wang
  2023-08-21 15:13   ` kernel test robot
  2023-08-24  7:12   ` Alexander Gordeev
  2023-08-21 12:30 ` [PATCH rfc v2 02/10] arm64: mm: use try_vma_locked_page_fault() Kefeng Wang
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 19+ messages in thread
From: Kefeng Wang @ 2023-08-21 12:30 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: surenb, willy, Russell King, Catalin Marinas, Will Deacon,
	Huacai Chen, WANG Xuerui, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H . Peter Anvin, linux-arm-kernel,
	linux-kernel, loongarch, linuxppc-dev, linux-riscv, linux-s390,
	Kefeng Wang

The ARCH_SUPPORTS_PER_VMA_LOCK are enabled by more and more architectures,
eg, x86, arm64, powerpc and s390, and riscv, those implementation are very
similar which results in some duplicated codes, let's add a generic VMA
lock-based page fault handler try_to_vma_locked_page_fault() to eliminate
them, and which also make us easy to support this on new architectures.

Since different architectures use different way to check vma whether is
accessable or not, the struct pt_regs, page fault error code and vma flags
are added into struct vm_fault, then, the architecture's page fault code
could re-use struct vm_fault to record and check vma accessable by each
own implementation.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 include/linux/mm.h       | 17 +++++++++++++++++
 include/linux/mm_types.h |  2 ++
 mm/memory.c              | 39 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 58 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3f764e84e567..22a6f4c56ff3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -512,9 +512,12 @@ struct vm_fault {
 		pgoff_t pgoff;			/* Logical page offset based on vma */
 		unsigned long address;		/* Faulting virtual address - masked */
 		unsigned long real_address;	/* Faulting virtual address - unmasked */
+		unsigned long fault_code;	/* Faulting error code during page fault */
+		struct pt_regs *regs;		/* The registers stored during page fault */
 	};
 	enum fault_flag flags;		/* FAULT_FLAG_xxx flags
 					 * XXX: should really be 'const' */
+	vm_flags_t vm_flags;		/* VMA flags to be used for access checking */
 	pmd_t *pmd;			/* Pointer to pmd entry matching
 					 * the 'address' */
 	pud_t *pud;			/* Pointer to pud entry matching
@@ -774,6 +777,9 @@ static inline void assert_fault_locked(struct vm_fault *vmf)
 struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
 					  unsigned long address);
 
+bool arch_vma_access_error(struct vm_area_struct *vma, struct vm_fault *vmf);
+vm_fault_t try_vma_locked_page_fault(struct vm_fault *vmf);
+
 #else /* CONFIG_PER_VMA_LOCK */
 
 static inline bool vma_start_read(struct vm_area_struct *vma)
@@ -801,6 +807,17 @@ static inline void assert_fault_locked(struct vm_fault *vmf)
 	mmap_assert_locked(vmf->vma->vm_mm);
 }
 
+static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
+		unsigned long address)
+{
+	return NULL;
+}
+
+static inline vm_fault_t try_vma_locked_page_fault(struct vm_fault *vmf)
+{
+	return VM_FAULT_NONE;
+}
+
 #endif /* CONFIG_PER_VMA_LOCK */
 
 extern const struct vm_operations_struct vma_dummy_vm_ops;
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index f5ba5b0bc836..702820cea3f9 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1119,6 +1119,7 @@ typedef __bitwise unsigned int vm_fault_t;
  * fault. Used to decide whether a process gets delivered SIGBUS or
  * just gets major/minor fault counters bumped up.
  *
+ * @VM_FAULT_NONE:		Special case, not starting to handle fault
  * @VM_FAULT_OOM:		Out Of Memory
  * @VM_FAULT_SIGBUS:		Bad access
  * @VM_FAULT_MAJOR:		Page read from storage
@@ -1139,6 +1140,7 @@ typedef __bitwise unsigned int vm_fault_t;
  *
  */
 enum vm_fault_reason {
+	VM_FAULT_NONE		= (__force vm_fault_t)0x000000,
 	VM_FAULT_OOM            = (__force vm_fault_t)0x000001,
 	VM_FAULT_SIGBUS         = (__force vm_fault_t)0x000002,
 	VM_FAULT_MAJOR          = (__force vm_fault_t)0x000004,
diff --git a/mm/memory.c b/mm/memory.c
index 3b4aaa0d2fff..60fe35db5134 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5510,6 +5510,45 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
 	count_vm_vma_lock_event(VMA_LOCK_ABORT);
 	return NULL;
 }
+
+#ifdef CONFIG_PER_VMA_LOCK
+bool __weak arch_vma_access_error(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	return (vma->vm_flags & vmf->vm_flags) == 0;
+}
+#endif
+
+vm_fault_t try_vma_locked_page_fault(struct vm_fault *vmf)
+{
+	vm_fault_t fault = VM_FAULT_NONE;
+	struct vm_area_struct *vma;
+
+	if (!(vmf->flags & FAULT_FLAG_USER))
+		return fault;
+
+	vma = lock_vma_under_rcu(current->mm, vmf->real_address);
+	if (!vma)
+		return fault;
+
+	if (arch_vma_access_error(vma, vmf)) {
+		vma_end_read(vma);
+		return fault;
+	}
+
+	fault = handle_mm_fault(vma, vmf->real_address,
+				vmf->flags | FAULT_FLAG_VMA_LOCK, vmf->regs);
+
+	if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
+		vma_end_read(vma);
+
+	if (fault & VM_FAULT_RETRY)
+		count_vm_vma_lock_event(VMA_LOCK_RETRY);
+	else
+		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+
+	return fault;
+}
+
 #endif /* CONFIG_PER_VMA_LOCK */
 
 #ifndef __PAGETABLE_P4D_FOLDED
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rfc v2 02/10] arm64: mm: use try_vma_locked_page_fault()
  2023-08-21 12:30 [PATCH rfc -next v2 00/10] mm: convert to generic VMA lock-based page fault Kefeng Wang
  2023-08-21 12:30 ` [PATCH rfc v2 01/10] mm: add a generic VMA lock-based page fault handler Kefeng Wang
@ 2023-08-21 12:30 ` Kefeng Wang
  2023-08-21 12:30 ` [PATCH rfc v2 03/10] x86: " Kefeng Wang
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Kefeng Wang @ 2023-08-21 12:30 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: surenb, willy, Russell King, Catalin Marinas, Will Deacon,
	Huacai Chen, WANG Xuerui, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H . Peter Anvin, linux-arm-kernel,
	linux-kernel, loongarch, linuxppc-dev, linux-riscv, linux-s390,
	Kefeng Wang

Use new try_vma_locked_page_fault() helper to simplify code, also
pass struct vmf to __do_page_fault() directly instead of each
independent variable. No functional change intended.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 arch/arm64/mm/fault.c | 60 ++++++++++++++++---------------------------
 1 file changed, 22 insertions(+), 38 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 2e5d1e238af9..2b7a1e610b3e 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -498,9 +498,8 @@ static void do_bad_area(unsigned long far, unsigned long esr,
 #define VM_FAULT_BADACCESS	((__force vm_fault_t)0x020000)
 
 static vm_fault_t __do_page_fault(struct mm_struct *mm,
-				  struct vm_area_struct *vma, unsigned long addr,
-				  unsigned int mm_flags, unsigned long vm_flags,
-				  struct pt_regs *regs)
+				  struct vm_area_struct *vma,
+				  struct vm_fault *vmf)
 {
 	/*
 	 * Ok, we have a good vm_area for this memory access, so we can handle
@@ -508,9 +507,9 @@ static vm_fault_t __do_page_fault(struct mm_struct *mm,
 	 * Check that the permissions on the VMA allow for the fault which
 	 * occurred.
 	 */
-	if (!(vma->vm_flags & vm_flags))
+	if (!(vma->vm_flags & vmf->vm_flags))
 		return VM_FAULT_BADACCESS;
-	return handle_mm_fault(vma, addr, mm_flags, regs);
+	return handle_mm_fault(vma, vmf->real_address, vmf->flags, vmf->regs);
 }
 
 static bool is_el0_instruction_abort(unsigned long esr)
@@ -533,10 +532,12 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
 	const struct fault_info *inf;
 	struct mm_struct *mm = current->mm;
 	vm_fault_t fault;
-	unsigned long vm_flags;
-	unsigned int mm_flags = FAULT_FLAG_DEFAULT;
 	unsigned long addr = untagged_addr(far);
 	struct vm_area_struct *vma;
+	struct vm_fault vmf = {
+		.real_address = addr,
+		.flags = FAULT_FLAG_DEFAULT,
+	};
 
 	if (kprobe_page_fault(regs, esr))
 		return 0;
@@ -549,7 +550,7 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
 		goto no_context;
 
 	if (user_mode(regs))
-		mm_flags |= FAULT_FLAG_USER;
+		vmf.flags |= FAULT_FLAG_USER;
 
 	/*
 	 * vm_flags tells us what bits we must have in vma->vm_flags
@@ -559,20 +560,20 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
 	 */
 	if (is_el0_instruction_abort(esr)) {
 		/* It was exec fault */
-		vm_flags = VM_EXEC;
-		mm_flags |= FAULT_FLAG_INSTRUCTION;
+		vmf.vm_flags = VM_EXEC;
+		vmf.flags |= FAULT_FLAG_INSTRUCTION;
 	} else if (is_write_abort(esr)) {
 		/* It was write fault */
-		vm_flags = VM_WRITE;
-		mm_flags |= FAULT_FLAG_WRITE;
+		vmf.vm_flags = VM_WRITE;
+		vmf.flags |= FAULT_FLAG_WRITE;
 	} else {
 		/* It was read fault */
-		vm_flags = VM_READ;
+		vmf.vm_flags = VM_READ;
 		/* Write implies read */
-		vm_flags |= VM_WRITE;
+		vmf.vm_flags |= VM_WRITE;
 		/* If EPAN is absent then exec implies read */
 		if (!cpus_have_const_cap(ARM64_HAS_EPAN))
-			vm_flags |= VM_EXEC;
+			vmf.vm_flags |= VM_EXEC;
 	}
 
 	if (is_ttbr0_addr(addr) && is_el1_permission_fault(addr, esr, regs)) {
@@ -587,26 +588,11 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
 
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
 
-	if (!(mm_flags & FAULT_FLAG_USER))
-		goto lock_mmap;
-
-	vma = lock_vma_under_rcu(mm, addr);
-	if (!vma)
-		goto lock_mmap;
-
-	if (!(vma->vm_flags & vm_flags)) {
-		vma_end_read(vma);
-		goto lock_mmap;
-	}
-	fault = handle_mm_fault(vma, addr, mm_flags | FAULT_FLAG_VMA_LOCK, regs);
-	if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-		vma_end_read(vma);
-
-	if (!(fault & VM_FAULT_RETRY)) {
-		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+	fault = try_vma_locked_page_fault(&vmf);
+	if (fault == VM_FAULT_NONE)
+		goto retry;
+	if (!(fault & VM_FAULT_RETRY))
 		goto done;
-	}
-	count_vm_vma_lock_event(VMA_LOCK_RETRY);
 
 	/* Quick path to respond to signals */
 	if (fault_signal_pending(fault, regs)) {
@@ -614,8 +600,6 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
 			goto no_context;
 		return 0;
 	}
-lock_mmap:
-
 retry:
 	vma = lock_mm_and_find_vma(mm, addr, regs);
 	if (unlikely(!vma)) {
@@ -623,7 +607,7 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
 		goto done;
 	}
 
-	fault = __do_page_fault(mm, vma, addr, mm_flags, vm_flags, regs);
+	fault = __do_page_fault(mm, vma, &vmf);
 
 	/* Quick path to respond to signals */
 	if (fault_signal_pending(fault, regs)) {
@@ -637,7 +621,7 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
 		return 0;
 
 	if (fault & VM_FAULT_RETRY) {
-		mm_flags |= FAULT_FLAG_TRIED;
+		vmf.flags |= FAULT_FLAG_TRIED;
 		goto retry;
 	}
 	mmap_read_unlock(mm);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rfc v2 03/10] x86: mm: use try_vma_locked_page_fault()
  2023-08-21 12:30 [PATCH rfc -next v2 00/10] mm: convert to generic VMA lock-based page fault Kefeng Wang
  2023-08-21 12:30 ` [PATCH rfc v2 01/10] mm: add a generic VMA lock-based page fault handler Kefeng Wang
  2023-08-21 12:30 ` [PATCH rfc v2 02/10] arm64: mm: use try_vma_locked_page_fault() Kefeng Wang
@ 2023-08-21 12:30 ` Kefeng Wang
  2023-08-21 12:30 ` [PATCH rfc v2 04/10] s390: " Kefeng Wang
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Kefeng Wang @ 2023-08-21 12:30 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: surenb, willy, Russell King, Catalin Marinas, Will Deacon,
	Huacai Chen, WANG Xuerui, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H . Peter Anvin, linux-arm-kernel,
	linux-kernel, loongarch, linuxppc-dev, linux-riscv, linux-s390,
	Kefeng Wang

Use new try_vma_locked_page_fault() helper to simplify code.
No functional change intended.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 arch/x86/mm/fault.c | 55 +++++++++++++++++++--------------------------
 1 file changed, 23 insertions(+), 32 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index ab778eac1952..3edc9edc0b28 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1227,6 +1227,13 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
 }
 NOKPROBE_SYMBOL(do_kern_addr_fault);
 
+#ifdef CONFIG_PER_VMA_LOCK
+bool arch_vma_access_error(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	return access_error(vmf->fault_code, vma);
+}
+#endif
+
 /*
  * Handle faults in the user portion of the address space.  Nothing in here
  * should check X86_PF_USER without a specific justification: for almost
@@ -1241,13 +1248,13 @@ void do_user_addr_fault(struct pt_regs *regs,
 			unsigned long address)
 {
 	struct vm_area_struct *vma;
-	struct task_struct *tsk;
-	struct mm_struct *mm;
+	struct mm_struct *mm = current->mm;
 	vm_fault_t fault;
-	unsigned int flags = FAULT_FLAG_DEFAULT;
-
-	tsk = current;
-	mm = tsk->mm;
+	struct vm_fault vmf = {
+		.real_address = address,
+		.fault_code = error_code,
+		.flags = FAULT_FLAG_DEFAULT
+	};
 
 	if (unlikely((error_code & (X86_PF_USER | X86_PF_INSTR)) == X86_PF_INSTR)) {
 		/*
@@ -1311,7 +1318,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 	 */
 	if (user_mode(regs)) {
 		local_irq_enable();
-		flags |= FAULT_FLAG_USER;
+		vmf.flags |= FAULT_FLAG_USER;
 	} else {
 		if (regs->flags & X86_EFLAGS_IF)
 			local_irq_enable();
@@ -1326,11 +1333,11 @@ void do_user_addr_fault(struct pt_regs *regs,
 	 * maybe_mkwrite() can create a proper shadow stack PTE.
 	 */
 	if (error_code & X86_PF_SHSTK)
-		flags |= FAULT_FLAG_WRITE;
+		vmf.flags |= FAULT_FLAG_WRITE;
 	if (error_code & X86_PF_WRITE)
-		flags |= FAULT_FLAG_WRITE;
+		vmf.flags |= FAULT_FLAG_WRITE;
 	if (error_code & X86_PF_INSTR)
-		flags |= FAULT_FLAG_INSTRUCTION;
+		vmf.flags |= FAULT_FLAG_INSTRUCTION;
 
 #ifdef CONFIG_X86_64
 	/*
@@ -1350,26 +1357,11 @@ void do_user_addr_fault(struct pt_regs *regs,
 	}
 #endif
 
-	if (!(flags & FAULT_FLAG_USER))
-		goto lock_mmap;
-
-	vma = lock_vma_under_rcu(mm, address);
-	if (!vma)
-		goto lock_mmap;
-
-	if (unlikely(access_error(error_code, vma))) {
-		vma_end_read(vma);
-		goto lock_mmap;
-	}
-	fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
-	if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-		vma_end_read(vma);
-
-	if (!(fault & VM_FAULT_RETRY)) {
-		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+	fault = try_vma_locked_page_fault(&vmf);
+	if (fault == VM_FAULT_NONE)
+		goto retry;
+	if (!(fault & VM_FAULT_RETRY))
 		goto done;
-	}
-	count_vm_vma_lock_event(VMA_LOCK_RETRY);
 
 	/* Quick path to respond to signals */
 	if (fault_signal_pending(fault, regs)) {
@@ -1379,7 +1371,6 @@ void do_user_addr_fault(struct pt_regs *regs,
 						 ARCH_DEFAULT_PKEY);
 		return;
 	}
-lock_mmap:
 
 retry:
 	vma = lock_mm_and_find_vma(mm, address, regs);
@@ -1410,7 +1401,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 	 * userland). The return to userland is identified whenever
 	 * FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in flags.
 	 */
-	fault = handle_mm_fault(vma, address, flags, regs);
+	fault = handle_mm_fault(vma, address, vmf.flags, regs);
 
 	if (fault_signal_pending(fault, regs)) {
 		/*
@@ -1434,7 +1425,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 	 * that we made any progress. Handle this case first.
 	 */
 	if (unlikely(fault & VM_FAULT_RETRY)) {
-		flags |= FAULT_FLAG_TRIED;
+		vmf.flags |= FAULT_FLAG_TRIED;
 		goto retry;
 	}
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rfc v2 04/10] s390: mm: use try_vma_locked_page_fault()
  2023-08-21 12:30 [PATCH rfc -next v2 00/10] mm: convert to generic VMA lock-based page fault Kefeng Wang
                   ` (2 preceding siblings ...)
  2023-08-21 12:30 ` [PATCH rfc v2 03/10] x86: " Kefeng Wang
@ 2023-08-21 12:30 ` Kefeng Wang
  2023-08-24  8:16   ` Alexander Gordeev
  2023-08-21 12:30 ` [PATCH rfc v2 05/10] powerpc: " Kefeng Wang
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Kefeng Wang @ 2023-08-21 12:30 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: surenb, willy, Russell King, Catalin Marinas, Will Deacon,
	Huacai Chen, WANG Xuerui, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H . Peter Anvin, linux-arm-kernel,
	linux-kernel, loongarch, linuxppc-dev, linux-riscv, linux-s390,
	Kefeng Wang

Use new try_vma_locked_page_fault() helper to simplify code.
No functional change intended.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 arch/s390/mm/fault.c | 66 ++++++++++++++++++--------------------------
 1 file changed, 27 insertions(+), 39 deletions(-)

diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 099c4824dd8a..fbbdebde6ea7 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -357,16 +357,18 @@ static noinline void do_fault_error(struct pt_regs *regs, vm_fault_t fault)
 static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
 {
 	struct gmap *gmap;
-	struct task_struct *tsk;
-	struct mm_struct *mm;
 	struct vm_area_struct *vma;
 	enum fault_type type;
-	unsigned long address;
-	unsigned int flags;
+	struct mm_struct *mm = current->mm;
+	unsigned long address = get_fault_address(regs);
 	vm_fault_t fault;
 	bool is_write;
+	struct vm_fault vmf = {
+		.real_address = address,
+		.flags = FAULT_FLAG_DEFAULT,
+		.vm_flags = access,
+	};
 
-	tsk = current;
 	/*
 	 * The instruction that caused the program check has
 	 * been nullified. Don't signal single step via SIGTRAP.
@@ -376,8 +378,6 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
 	if (kprobe_page_fault(regs, 14))
 		return 0;
 
-	mm = tsk->mm;
-	address = get_fault_address(regs);
 	is_write = fault_is_write(regs);
 
 	/*
@@ -398,45 +398,33 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
 	}
 
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
-	flags = FAULT_FLAG_DEFAULT;
 	if (user_mode(regs))
-		flags |= FAULT_FLAG_USER;
+		vmf.flags |= FAULT_FLAG_USER;
 	if (is_write)
-		access = VM_WRITE;
-	if (access == VM_WRITE)
-		flags |= FAULT_FLAG_WRITE;
-	if (!(flags & FAULT_FLAG_USER))
-		goto lock_mmap;
-	vma = lock_vma_under_rcu(mm, address);
-	if (!vma)
-		goto lock_mmap;
-	if (!(vma->vm_flags & access)) {
-		vma_end_read(vma);
-		goto lock_mmap;
-	}
-	fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
-	if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-		vma_end_read(vma);
-	if (!(fault & VM_FAULT_RETRY)) {
-		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
-		if (likely(!(fault & VM_FAULT_ERROR)))
-			fault = 0;
+		vmf.vm_flags = VM_WRITE;
+	if (vmf.vm_flags == VM_WRITE)
+		vmf.flags |= FAULT_FLAG_WRITE;
+
+	fault = try_vma_locked_page_fault(&vmf);
+	if (fault == VM_FAULT_NONE)
+		goto lock_mm;
+	if (!(fault & VM_FAULT_RETRY))
 		goto out;
-	}
-	count_vm_vma_lock_event(VMA_LOCK_RETRY);
+
 	/* Quick path to respond to signals */
 	if (fault_signal_pending(fault, regs)) {
 		fault = VM_FAULT_SIGNAL;
 		goto out;
 	}
-lock_mmap:
+
+lock_mm:
 	mmap_read_lock(mm);
 
 	gmap = NULL;
 	if (IS_ENABLED(CONFIG_PGSTE) && type == GMAP_FAULT) {
 		gmap = (struct gmap *) S390_lowcore.gmap;
 		current->thread.gmap_addr = address;
-		current->thread.gmap_write_flag = !!(flags & FAULT_FLAG_WRITE);
+		current->thread.gmap_write_flag = !!(vmf.flags & FAULT_FLAG_WRITE);
 		current->thread.gmap_int_code = regs->int_code & 0xffff;
 		address = __gmap_translate(gmap, address);
 		if (address == -EFAULT) {
@@ -444,7 +432,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
 			goto out_up;
 		}
 		if (gmap->pfault_enabled)
-			flags |= FAULT_FLAG_RETRY_NOWAIT;
+			vmf.flags |= FAULT_FLAG_RETRY_NOWAIT;
 	}
 
 retry:
@@ -466,7 +454,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
 	 * we can handle it..
 	 */
 	fault = VM_FAULT_BADACCESS;
-	if (unlikely(!(vma->vm_flags & access)))
+	if (unlikely(!(vma->vm_flags & vmf.vm_flags)))
 		goto out_up;
 
 	/*
@@ -474,10 +462,10 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
 	 * make sure we exit gracefully rather than endlessly redo
 	 * the fault.
 	 */
-	fault = handle_mm_fault(vma, address, flags, regs);
+	fault = handle_mm_fault(vma, address, vmf.flags, regs);
 	if (fault_signal_pending(fault, regs)) {
 		fault = VM_FAULT_SIGNAL;
-		if (flags & FAULT_FLAG_RETRY_NOWAIT)
+		if (vmf.flags & FAULT_FLAG_RETRY_NOWAIT)
 			goto out_up;
 		goto out;
 	}
@@ -497,7 +485,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
 
 	if (fault & VM_FAULT_RETRY) {
 		if (IS_ENABLED(CONFIG_PGSTE) && gmap &&
-			(flags & FAULT_FLAG_RETRY_NOWAIT)) {
+			(vmf.flags & FAULT_FLAG_RETRY_NOWAIT)) {
 			/*
 			 * FAULT_FLAG_RETRY_NOWAIT has been set, mmap_lock has
 			 * not been released
@@ -506,8 +494,8 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
 			fault = VM_FAULT_PFAULT;
 			goto out_up;
 		}
-		flags &= ~FAULT_FLAG_RETRY_NOWAIT;
-		flags |= FAULT_FLAG_TRIED;
+		vmf.flags &= ~FAULT_FLAG_RETRY_NOWAIT;
+		vmf.flags |= FAULT_FLAG_TRIED;
 		mmap_read_lock(mm);
 		goto retry;
 	}
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rfc v2 05/10] powerpc: mm: use try_vma_locked_page_fault()
  2023-08-21 12:30 [PATCH rfc -next v2 00/10] mm: convert to generic VMA lock-based page fault Kefeng Wang
                   ` (3 preceding siblings ...)
  2023-08-21 12:30 ` [PATCH rfc v2 04/10] s390: " Kefeng Wang
@ 2023-08-21 12:30 ` Kefeng Wang
  2023-08-22  9:38   ` Christophe Leroy
  2023-08-21 12:30 ` [PATCH rfc v2 06/10] riscv: " Kefeng Wang
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Kefeng Wang @ 2023-08-21 12:30 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: surenb, willy, Russell King, Catalin Marinas, Will Deacon,
	Huacai Chen, WANG Xuerui, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H . Peter Anvin, linux-arm-kernel,
	linux-kernel, loongarch, linuxppc-dev, linux-riscv, linux-s390,
	Kefeng Wang

Use new try_vma_locked_page_fault() helper to simplify code.
No functional change intended.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 arch/powerpc/mm/fault.c | 66 ++++++++++++++++++++---------------------
 1 file changed, 32 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index b1723094d464..52f9546e020e 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -391,6 +391,22 @@ static int page_fault_is_bad(unsigned long err)
 #define page_fault_is_bad(__err)	((__err) & DSISR_BAD_FAULT_32S)
 #endif
 
+#ifdef CONFIG_PER_VMA_LOCK
+bool arch_vma_access_error(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	int is_exec = TRAP(vmf->regs) == INTERRUPT_INST_STORAGE;
+	int is_write = page_fault_is_write(vmf->fault_code);
+
+	if (unlikely(access_pkey_error(is_write, is_exec,
+				(vmf->fault_code & DSISR_KEYFAULT), vma)))
+		return true;
+
+	if (unlikely(access_error(is_write, is_exec, vma)))
+		return true;
+	return false;
+}
+#endif
+
 /*
  * For 600- and 800-family processors, the error_code parameter is DSISR
  * for a data fault, SRR1 for an instruction fault.
@@ -407,12 +423,18 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
 {
 	struct vm_area_struct * vma;
 	struct mm_struct *mm = current->mm;
-	unsigned int flags = FAULT_FLAG_DEFAULT;
 	int is_exec = TRAP(regs) == INTERRUPT_INST_STORAGE;
 	int is_user = user_mode(regs);
 	int is_write = page_fault_is_write(error_code);
 	vm_fault_t fault, major = 0;
 	bool kprobe_fault = kprobe_page_fault(regs, 11);
+	struct vm_fault vmf = {
+		.real_address = address,
+		.fault_code = error_code,
+		.regs = regs,
+		.flags = FAULT_FLAG_DEFAULT,
+	};
+
 
 	if (unlikely(debugger_fault_handler(regs) || kprobe_fault))
 		return 0;
@@ -463,45 +485,21 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
 	 * mmap_lock held
 	 */
 	if (is_user)
-		flags |= FAULT_FLAG_USER;
+		vmf.flags |= FAULT_FLAG_USER;
 	if (is_write)
-		flags |= FAULT_FLAG_WRITE;
+		vmf.flags |= FAULT_FLAG_WRITE;
 	if (is_exec)
-		flags |= FAULT_FLAG_INSTRUCTION;
+		vmf.flags |= FAULT_FLAG_INSTRUCTION;
 
-	if (!(flags & FAULT_FLAG_USER))
-		goto lock_mmap;
-
-	vma = lock_vma_under_rcu(mm, address);
-	if (!vma)
-		goto lock_mmap;
-
-	if (unlikely(access_pkey_error(is_write, is_exec,
-				       (error_code & DSISR_KEYFAULT), vma))) {
-		vma_end_read(vma);
-		goto lock_mmap;
-	}
-
-	if (unlikely(access_error(is_write, is_exec, vma))) {
-		vma_end_read(vma);
-		goto lock_mmap;
-	}
-
-	fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
-	if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-		vma_end_read(vma);
-
-	if (!(fault & VM_FAULT_RETRY)) {
-		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+	fault = try_vma_locked_page_fault(&vmf);
+	if (fault == VM_FAULT_NONE)
+		goto retry;
+	if (!(fault & VM_FAULT_RETRY))
 		goto done;
-	}
-	count_vm_vma_lock_event(VMA_LOCK_RETRY);
 
 	if (fault_signal_pending(fault, regs))
 		return user_mode(regs) ? 0 : SIGBUS;
 
-lock_mmap:
-
 	/* When running in the kernel we expect faults to occur only to
 	 * addresses in user space.  All other faults represent errors in the
 	 * kernel and should generate an OOPS.  Unfortunately, in the case of an
@@ -528,7 +526,7 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
 	 * make sure we exit gracefully rather than endlessly redo
 	 * the fault.
 	 */
-	fault = handle_mm_fault(vma, address, flags, regs);
+	fault = handle_mm_fault(vma, address, vmf.flags, regs);
 
 	major |= fault & VM_FAULT_MAJOR;
 
@@ -544,7 +542,7 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
 	 * case.
 	 */
 	if (unlikely(fault & VM_FAULT_RETRY)) {
-		flags |= FAULT_FLAG_TRIED;
+		vmf.flags |= FAULT_FLAG_TRIED;
 		goto retry;
 	}
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rfc v2 06/10] riscv: mm: use try_vma_locked_page_fault()
  2023-08-21 12:30 [PATCH rfc -next v2 00/10] mm: convert to generic VMA lock-based page fault Kefeng Wang
                   ` (4 preceding siblings ...)
  2023-08-21 12:30 ` [PATCH rfc v2 05/10] powerpc: " Kefeng Wang
@ 2023-08-21 12:30 ` Kefeng Wang
  2023-08-21 12:30 ` [PATCH rfc v2 07/10] ARM: mm: try VMA lock-based page fault handling first Kefeng Wang
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Kefeng Wang @ 2023-08-21 12:30 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: surenb, willy, Russell King, Catalin Marinas, Will Deacon,
	Huacai Chen, WANG Xuerui, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H . Peter Anvin, linux-arm-kernel,
	linux-kernel, loongarch, linuxppc-dev, linux-riscv, linux-s390,
	Kefeng Wang

Use new try_vma_locked_page_fault() helper to simplify code.
No functional change intended.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 arch/riscv/mm/fault.c | 58 ++++++++++++++++++-------------------------
 1 file changed, 24 insertions(+), 34 deletions(-)

diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index 6115d7514972..b46129b636f2 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -215,6 +215,13 @@ static inline bool access_error(unsigned long cause, struct vm_area_struct *vma)
 	return false;
 }
 
+#ifdef CONFIG_PER_VMA_LOCK
+bool arch_vma_access_error(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	return access_error(vmf->fault_code, vma);
+}
+#endif
+
 /*
  * This routine handles page faults.  It determines the address and the
  * problem, and then passes it off to one of the appropriate routines.
@@ -223,17 +230,16 @@ void handle_page_fault(struct pt_regs *regs)
 {
 	struct task_struct *tsk;
 	struct vm_area_struct *vma;
-	struct mm_struct *mm;
-	unsigned long addr, cause;
-	unsigned int flags = FAULT_FLAG_DEFAULT;
+	struct mm_struct *mm = current->mm;
+	unsigned long addr = regs->badaddr;
+	unsigned long cause = regs->cause;
 	int code = SEGV_MAPERR;
 	vm_fault_t fault;
-
-	cause = regs->cause;
-	addr = regs->badaddr;
-
-	tsk = current;
-	mm = tsk->mm;
+	struct vm_fault vmf = {
+		.real_address = addr,
+		.fault_code = cause,
+		.flags = FAULT_FLAG_DEFAULT,
+	};
 
 	if (kprobe_page_fault(regs, cause))
 		return;
@@ -268,7 +274,7 @@ void handle_page_fault(struct pt_regs *regs)
 	}
 
 	if (user_mode(regs))
-		flags |= FAULT_FLAG_USER;
+		vmf.flags |= FAULT_FLAG_USER;
 
 	if (!user_mode(regs) && addr < TASK_SIZE && unlikely(!(regs->status & SR_SUM))) {
 		if (fixup_exception(regs))
@@ -280,37 +286,21 @@ void handle_page_fault(struct pt_regs *regs)
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
 
 	if (cause == EXC_STORE_PAGE_FAULT)
-		flags |= FAULT_FLAG_WRITE;
+		vmf.flags |= FAULT_FLAG_WRITE;
 	else if (cause == EXC_INST_PAGE_FAULT)
-		flags |= FAULT_FLAG_INSTRUCTION;
-	if (!(flags & FAULT_FLAG_USER))
-		goto lock_mmap;
-
-	vma = lock_vma_under_rcu(mm, addr);
-	if (!vma)
-		goto lock_mmap;
+		vmf.flags |= FAULT_FLAG_INSTRUCTION;
 
-	if (unlikely(access_error(cause, vma))) {
-		vma_end_read(vma);
-		goto lock_mmap;
-	}
-
-	fault = handle_mm_fault(vma, addr, flags | FAULT_FLAG_VMA_LOCK, regs);
-	if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-		vma_end_read(vma);
-
-	if (!(fault & VM_FAULT_RETRY)) {
-		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+	fault = try_vma_locked_page_fault(&vmf);
+	if (fault == VM_FAULT_NONE)
+		goto retry;
+	if (!(fault & VM_FAULT_RETRY))
 		goto done;
-	}
-	count_vm_vma_lock_event(VMA_LOCK_RETRY);
 
 	if (fault_signal_pending(fault, regs)) {
 		if (!user_mode(regs))
 			no_context(regs, addr);
 		return;
 	}
-lock_mmap:
 
 retry:
 	vma = lock_mm_and_find_vma(mm, addr, regs);
@@ -337,7 +327,7 @@ void handle_page_fault(struct pt_regs *regs)
 	 * make sure we exit gracefully rather than endlessly redo
 	 * the fault.
 	 */
-	fault = handle_mm_fault(vma, addr, flags, regs);
+	fault = handle_mm_fault(vma, addr, vmf.flags, regs);
 
 	/*
 	 * If we need to retry but a fatal signal is pending, handle the
@@ -355,7 +345,7 @@ void handle_page_fault(struct pt_regs *regs)
 		return;
 
 	if (unlikely(fault & VM_FAULT_RETRY)) {
-		flags |= FAULT_FLAG_TRIED;
+		vmf.flags |= FAULT_FLAG_TRIED;
 
 		/*
 		 * No need to mmap_read_unlock(mm) as we would
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rfc v2 07/10] ARM: mm: try VMA lock-based page fault handling first
  2023-08-21 12:30 [PATCH rfc -next v2 00/10] mm: convert to generic VMA lock-based page fault Kefeng Wang
                   ` (5 preceding siblings ...)
  2023-08-21 12:30 ` [PATCH rfc v2 06/10] riscv: " Kefeng Wang
@ 2023-08-21 12:30 ` Kefeng Wang
  2023-08-21 12:30 ` [PATCH rfc v2 08/10] loongarch: mm: cleanup __do_page_fault() Kefeng Wang
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Kefeng Wang @ 2023-08-21 12:30 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: surenb, willy, Russell King, Catalin Marinas, Will Deacon,
	Huacai Chen, WANG Xuerui, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H . Peter Anvin, linux-arm-kernel,
	linux-kernel, loongarch, linuxppc-dev, linux-riscv, linux-s390,
	Kefeng Wang

Attempt VMA lock-based page fault handling first, and fall back
to the existing mmap_lock-based handling if that fails.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 arch/arm/Kconfig    |  1 +
 arch/arm/mm/fault.c | 35 +++++++++++++++++++++++++----------
 2 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 1a6a6eb48a15..8b6d4507ccee 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -34,6 +34,7 @@ config ARM
 	select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT if CPU_V7
 	select ARCH_SUPPORTS_ATOMIC_RMW
 	select ARCH_SUPPORTS_HUGETLBFS if ARM_LPAE
+	select ARCH_SUPPORTS_PER_VMA_LOCK
 	select ARCH_USE_BUILTIN_BSWAP
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select ARCH_USE_MEMTEST
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index fef62e4a9edd..d53bb028899a 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -242,8 +242,11 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 	struct vm_area_struct *vma;
 	int sig, code;
 	vm_fault_t fault;
-	unsigned int flags = FAULT_FLAG_DEFAULT;
-	unsigned long vm_flags = VM_ACCESS_FLAGS;
+	struct vm_fault vmf = {
+		.real_address = addr,
+		.flags = FAULT_FLAG_DEFAULT,
+		.vm_flags = VM_ACCESS_FLAGS,
+	};
 
 	if (kprobe_page_fault(regs, fsr))
 		return 0;
@@ -261,15 +264,15 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 		goto no_context;
 
 	if (user_mode(regs))
-		flags |= FAULT_FLAG_USER;
+		vmf.flags |= FAULT_FLAG_USER;
 
 	if (is_write_fault(fsr)) {
-		flags |= FAULT_FLAG_WRITE;
-		vm_flags = VM_WRITE;
+		vmf.flags |= FAULT_FLAG_WRITE;
+		vmf.vm_flags = VM_WRITE;
 	}
 
 	if (fsr & FSR_LNX_PF) {
-		vm_flags = VM_EXEC;
+		vmf.vm_flags = VM_EXEC;
 
 		if (is_permission_fault(fsr) && !user_mode(regs))
 			die_kernel_fault("execution of memory",
@@ -278,6 +281,18 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
 
+	fault = try_vma_locked_page_fault(&vmf);
+	if (fault == VM_FAULT_NONE)
+		goto retry;
+	if (!(fault & VM_FAULT_RETRY))
+		goto done;
+
+	if (fault_signal_pending(fault, regs)) {
+		if (!user_mode(regs))
+			goto no_context;
+		return 0;
+	}
+
 retry:
 	vma = lock_mm_and_find_vma(mm, addr, regs);
 	if (unlikely(!vma)) {
@@ -289,10 +304,10 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 	 * ok, we have a good vm_area for this memory access, check the
 	 * permissions on the VMA allow for the fault which occurred.
 	 */
-	if (!(vma->vm_flags & vm_flags))
+	if (!(vma->vm_flags & vmf.vm_flags))
 		fault = VM_FAULT_BADACCESS;
 	else
-		fault = handle_mm_fault(vma, addr & PAGE_MASK, flags, regs);
+		fault = handle_mm_fault(vma, addr & PAGE_MASK, vmf.flags, regs);
 
 	/* If we need to retry but a fatal signal is pending, handle the
 	 * signal first. We do not need to release the mmap_lock because
@@ -310,13 +325,13 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 
 	if (!(fault & VM_FAULT_ERROR)) {
 		if (fault & VM_FAULT_RETRY) {
-			flags |= FAULT_FLAG_TRIED;
+			vmf.flags |= FAULT_FLAG_TRIED;
 			goto retry;
 		}
 	}
 
 	mmap_read_unlock(mm);
-
+done:
 	/*
 	 * Handle the "normal" case first - VM_FAULT_MAJOR
 	 */
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rfc v2 08/10] loongarch: mm: cleanup __do_page_fault()
  2023-08-21 12:30 [PATCH rfc -next v2 00/10] mm: convert to generic VMA lock-based page fault Kefeng Wang
                   ` (6 preceding siblings ...)
  2023-08-21 12:30 ` [PATCH rfc v2 07/10] ARM: mm: try VMA lock-based page fault handling first Kefeng Wang
@ 2023-08-21 12:30 ` Kefeng Wang
  2023-08-21 12:30 ` [PATCH rfc v2 09/10] loongarch: mm: add access_error() helper Kefeng Wang
  2023-08-21 12:30 ` [PATCH rfc v2 10/10] loongarch: mm: try VMA lock-based page fault handling first Kefeng Wang
  9 siblings, 0 replies; 19+ messages in thread
From: Kefeng Wang @ 2023-08-21 12:30 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: surenb, willy, Russell King, Catalin Marinas, Will Deacon,
	Huacai Chen, WANG Xuerui, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H . Peter Anvin, linux-arm-kernel,
	linux-kernel, loongarch, linuxppc-dev, linux-riscv, linux-s390,
	Kefeng Wang

Cleanup __do_page_fault() by reuse bad_area_nosemaphore and
bad_area label.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 arch/loongarch/mm/fault.c | 48 +++++++++++++--------------------------
 1 file changed, 16 insertions(+), 32 deletions(-)

diff --git a/arch/loongarch/mm/fault.c b/arch/loongarch/mm/fault.c
index e6376e3dce86..5d4c742c4bc5 100644
--- a/arch/loongarch/mm/fault.c
+++ b/arch/loongarch/mm/fault.c
@@ -157,18 +157,15 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
 		if (!user_mode(regs))
 			no_context(regs, write, address);
 		else
-			do_sigsegv(regs, write, address, si_code);
-		return;
+			goto bad_area_nosemaphore;
 	}
 
 	/*
 	 * If we're in an interrupt or have no user
 	 * context, we must not take the fault..
 	 */
-	if (faulthandler_disabled() || !mm) {
-		do_sigsegv(regs, write, address, si_code);
-		return;
-	}
+	if (faulthandler_disabled() || !mm)
+		goto bad_area_nosemaphore;
 
 	if (user_mode(regs))
 		flags |= FAULT_FLAG_USER;
@@ -178,23 +175,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
 	vma = lock_mm_and_find_vma(mm, address, regs);
 	if (unlikely(!vma))
 		goto bad_area_nosemaphore;
-	goto good_area;
-
-/*
- * Something tried to access memory that isn't in our memory map..
- * Fix it, but check if it's kernel or user first..
- */
-bad_area:
-	mmap_read_unlock(mm);
-bad_area_nosemaphore:
-	do_sigsegv(regs, write, address, si_code);
-	return;
 
-/*
- * Ok, we have a good vm_area for this memory access, so
- * we can handle it..
- */
-good_area:
 	si_code = SEGV_ACCERR;
 
 	if (write) {
@@ -235,22 +216,25 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
 		 */
 		goto retry;
 	}
+
+	mmap_read_unlock(mm);
+
 	if (unlikely(fault & VM_FAULT_ERROR)) {
-		mmap_read_unlock(mm);
-		if (fault & VM_FAULT_OOM) {
+		if (fault & VM_FAULT_OOM)
 			do_out_of_memory(regs, write, address);
-			return;
-		} else if (fault & VM_FAULT_SIGSEGV) {
-			do_sigsegv(regs, write, address, si_code);
-			return;
-		} else if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) {
+		else if (fault & VM_FAULT_SIGSEGV)
+			goto bad_area_nosemaphore;
+		else if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE))
 			do_sigbus(regs, write, address, si_code);
-			return;
-		}
-		BUG();
+		else
+			BUG();
 	}
 
+	return;
+bad_area:
 	mmap_read_unlock(mm);
+bad_area_nosemaphore:
+	do_sigsegv(regs, write, address, si_code);
 }
 
 asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rfc v2 09/10] loongarch: mm: add access_error() helper
  2023-08-21 12:30 [PATCH rfc -next v2 00/10] mm: convert to generic VMA lock-based page fault Kefeng Wang
                   ` (7 preceding siblings ...)
  2023-08-21 12:30 ` [PATCH rfc v2 08/10] loongarch: mm: cleanup __do_page_fault() Kefeng Wang
@ 2023-08-21 12:30 ` Kefeng Wang
  2023-08-21 12:30 ` [PATCH rfc v2 10/10] loongarch: mm: try VMA lock-based page fault handling first Kefeng Wang
  9 siblings, 0 replies; 19+ messages in thread
From: Kefeng Wang @ 2023-08-21 12:30 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: surenb, willy, Russell King, Catalin Marinas, Will Deacon,
	Huacai Chen, WANG Xuerui, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H . Peter Anvin, linux-arm-kernel,
	linux-kernel, loongarch, linuxppc-dev, linux-riscv, linux-s390,
	Kefeng Wang

Add access_error() to check whether vma could be accessible or not,
which will be used __do_page_fault() and later vma locked based page
fault.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 arch/loongarch/mm/fault.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/arch/loongarch/mm/fault.c b/arch/loongarch/mm/fault.c
index 5d4c742c4bc5..2a45e9f3a485 100644
--- a/arch/loongarch/mm/fault.c
+++ b/arch/loongarch/mm/fault.c
@@ -126,6 +126,22 @@ static void __kprobes do_sigsegv(struct pt_regs *regs,
 	force_sig_fault(SIGSEGV, si_code, (void __user *)address);
 }
 
+static inline bool access_error(unsigned int flags, struct pt_regs *regs,
+				unsigned long addr, struct vm_area_struct *vma)
+{
+	if (flags & FAULT_FLAG_WRITE) {
+		if (!(vma->vm_flags & VM_WRITE))
+			return true;
+	} else {
+		if (!(vma->vm_flags & VM_READ) && addr != exception_era(regs))
+			return true;
+		if (!(vma->vm_flags & VM_EXEC) && addr == exception_era(regs))
+			return true;
+	}
+
+	return false;
+}
+
 /*
  * This routine handles page faults.  It determines the address,
  * and the problem, and then passes it off to one of the appropriate
@@ -169,6 +185,8 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
 
 	if (user_mode(regs))
 		flags |= FAULT_FLAG_USER;
+	if (write)
+		flags |= FAULT_FLAG_WRITE;
 
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
 retry:
@@ -178,16 +196,8 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
 
 	si_code = SEGV_ACCERR;
 
-	if (write) {
-		flags |= FAULT_FLAG_WRITE;
-		if (!(vma->vm_flags & VM_WRITE))
-			goto bad_area;
-	} else {
-		if (!(vma->vm_flags & VM_READ) && address != exception_era(regs))
-			goto bad_area;
-		if (!(vma->vm_flags & VM_EXEC) && address == exception_era(regs))
-			goto bad_area;
-	}
+	if (access_error(flags, regs, vma))
+		goto bad_area;
 
 	/*
 	 * If for any reason at all we couldn't handle the fault,
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rfc v2 10/10] loongarch: mm: try VMA lock-based page fault handling first
  2023-08-21 12:30 [PATCH rfc -next v2 00/10] mm: convert to generic VMA lock-based page fault Kefeng Wang
                   ` (8 preceding siblings ...)
  2023-08-21 12:30 ` [PATCH rfc v2 09/10] loongarch: mm: add access_error() helper Kefeng Wang
@ 2023-08-21 12:30 ` Kefeng Wang
  9 siblings, 0 replies; 19+ messages in thread
From: Kefeng Wang @ 2023-08-21 12:30 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: surenb, willy, Russell King, Catalin Marinas, Will Deacon,
	Huacai Chen, WANG Xuerui, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H . Peter Anvin, linux-arm-kernel,
	linux-kernel, loongarch, linuxppc-dev, linux-riscv, linux-s390,
	Kefeng Wang

Attempt VMA lock-based page fault handling first, and fall back
to the existing mmap_lock-based handling if that fails.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 arch/loongarch/Kconfig    |  1 +
 arch/loongarch/mm/fault.c | 37 +++++++++++++++++++++++++++++++------
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 2b27b18a63af..6b821f621920 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -56,6 +56,7 @@ config LOONGARCH
 	select ARCH_SUPPORTS_LTO_CLANG
 	select ARCH_SUPPORTS_LTO_CLANG_THIN
 	select ARCH_SUPPORTS_NUMA_BALANCING
+	select ARCH_SUPPORTS_PER_VMA_LOCK
 	select ARCH_USE_BUILTIN_BSWAP
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select ARCH_USE_QUEUED_RWLOCKS
diff --git a/arch/loongarch/mm/fault.c b/arch/loongarch/mm/fault.c
index 2a45e9f3a485..f7ac3a14bb06 100644
--- a/arch/loongarch/mm/fault.c
+++ b/arch/loongarch/mm/fault.c
@@ -142,6 +142,13 @@ static inline bool access_error(unsigned int flags, struct pt_regs *regs,
 	return false;
 }
 
+#ifdef CONFIG_PER_VMA_LOCK
+bool arch_vma_access_error(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	return access_error(vmf->flags, vmf->regs, vmf->real_address, vma);
+}
+#endif
+
 /*
  * This routine handles page faults.  It determines the address,
  * and the problem, and then passes it off to one of the appropriate
@@ -151,11 +158,15 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
 			unsigned long write, unsigned long address)
 {
 	int si_code = SEGV_MAPERR;
-	unsigned int flags = FAULT_FLAG_DEFAULT;
 	struct task_struct *tsk = current;
 	struct mm_struct *mm = tsk->mm;
 	struct vm_area_struct *vma = NULL;
 	vm_fault_t fault;
+	struct vm_fault vmf = {
+		.real_address = address,
+		.regs = regs,
+		.flags = FAULT_FLAG_DEFAULT,
+	};
 
 	if (kprobe_page_fault(regs, current->thread.trap_nr))
 		return;
@@ -184,11 +195,24 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
 		goto bad_area_nosemaphore;
 
 	if (user_mode(regs))
-		flags |= FAULT_FLAG_USER;
+		vmf.flags |= FAULT_FLAG_USER;
 	if (write)
-		flags |= FAULT_FLAG_WRITE;
+		vmf.flags |= FAULT_FLAG_WRITE;
 
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
+
+	fault = try_vma_locked_page_fault(&vmf);
+	if (fault == VM_FAULT_NONE)
+		goto retry;
+	if (!(fault & VM_FAULT_RETRY))
+		goto done;
+
+	if (fault_signal_pending(fault, regs)) {
+		if (!user_mode(regs))
+			no_context(regs, write, address);
+		return;
+	}
+
 retry:
 	vma = lock_mm_and_find_vma(mm, address, regs);
 	if (unlikely(!vma))
@@ -196,7 +220,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
 
 	si_code = SEGV_ACCERR;
 
-	if (access_error(flags, regs, vma))
+	if (access_error(vmf.flags, regs, address, vma))
 		goto bad_area;
 
 	/*
@@ -204,7 +228,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
 	 * make sure we exit gracefully rather than endlessly redo
 	 * the fault.
 	 */
-	fault = handle_mm_fault(vma, address, flags, regs);
+	fault = handle_mm_fault(vma, address, vmf.flags, regs);
 
 	if (fault_signal_pending(fault, regs)) {
 		if (!user_mode(regs))
@@ -217,7 +241,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
 		return;
 
 	if (unlikely(fault & VM_FAULT_RETRY)) {
-		flags |= FAULT_FLAG_TRIED;
+		vmf.flags |= FAULT_FLAG_TRIED;
 
 		/*
 		 * No need to mmap_read_unlock(mm) as we would
@@ -229,6 +253,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs,
 
 	mmap_read_unlock(mm);
 
+done:
 	if (unlikely(fault & VM_FAULT_ERROR)) {
 		if (fault & VM_FAULT_OOM)
 			do_out_of_memory(regs, write, address);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH rfc v2 01/10] mm: add a generic VMA lock-based page fault handler
  2023-08-21 12:30 ` [PATCH rfc v2 01/10] mm: add a generic VMA lock-based page fault handler Kefeng Wang
@ 2023-08-21 15:13   ` kernel test robot
  2023-08-22  2:33     ` Kefeng Wang
  2023-08-24  7:12   ` Alexander Gordeev
  1 sibling, 1 reply; 19+ messages in thread
From: kernel test robot @ 2023-08-21 15:13 UTC (permalink / raw)
  To: Kefeng Wang, Andrew Morton
  Cc: oe-kbuild-all, Linux Memory Management List, surenb, willy,
	Russell King, Catalin Marinas, Will Deacon, Huacai Chen,
	WANG Xuerui, Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H . Peter Anvin, linux-arm-kernel

Hi Kefeng,

kernel test robot noticed the following build errors:

[auto build test ERROR on next-20230821]
[cannot apply to akpm-mm/mm-everything arm64/for-next/core tip/x86/mm s390/features powerpc/next powerpc/fixes arm/for-next arm/fixes linus/master v6.5-rc7 v6.5-rc6 v6.5-rc5 v6.5-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Kefeng-Wang/mm-add-a-generic-VMA-lock-based-page-fault-handler/20230821-203442
base:   next-20230821
patch link:    https://lore.kernel.org/r/20230821123056.2109942-2-wangkefeng.wang%40huawei.com
patch subject: [PATCH rfc v2 01/10] mm: add a generic VMA lock-based page fault handler
config: i386-tinyconfig (https://download.01.org/0day-ci/archive/20230821/202308212249.dZG3d55u-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce: (https://download.01.org/0day-ci/archive/20230821/202308212249.dZG3d55u-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308212249.dZG3d55u-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/linux/memcontrol.h:20,
                    from include/linux/swap.h:9,
                    from include/linux/suspend.h:5,
                    from arch/x86/kernel/asm-offsets.c:14:
>> include/linux/mm.h:810:38: error: redefinition of 'lock_vma_under_rcu'
     810 | static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
         |                                      ^~~~~~~~~~~~~~~~~~
   include/linux/mm.h:794:38: note: previous definition of 'lock_vma_under_rcu' with type 'struct vm_area_struct *(struct mm_struct *, long unsigned int)'
     794 | static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
         |                                      ^~~~~~~~~~~~~~~~~~
   make[3]: *** [scripts/Makefile.build:116: arch/x86/kernel/asm-offsets.s] Error 1
   make[3]: Target 'prepare' not remade because of errors.
   make[2]: *** [Makefile:1203: prepare0] Error 2
   make[2]: Target 'prepare' not remade because of errors.
   make[1]: *** [Makefile:234: __sub-make] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [Makefile:234: __sub-make] Error 2
   make: Target 'prepare' not remade because of errors.


vim +/lock_vma_under_rcu +810 include/linux/mm.h

   809	
 > 810	static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
   811			unsigned long address)
   812	{
   813		return NULL;
   814	}
   815	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rfc v2 01/10] mm: add a generic VMA lock-based page fault handler
  2023-08-21 15:13   ` kernel test robot
@ 2023-08-22  2:33     ` Kefeng Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Kefeng Wang @ 2023-08-22  2:33 UTC (permalink / raw)
  To: kernel test robot, Andrew Morton
  Cc: oe-kbuild-all, Linux Memory Management List, surenb, willy,
	Russell King, Catalin Marinas, Will Deacon, Huacai Chen,
	WANG Xuerui, Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H . Peter Anvin, linux-arm-kernel

Hi

On 2023/8/21 23:13, kernel test robot wrote:
> Hi Kefeng,
> 
> kernel test robot noticed the following build errors:
> 
> [auto build test ERROR on next-20230821]
> [cannot apply to akpm-mm/mm-everything arm64/for-next/core tip/x86/mm s390/features powerpc/next powerpc/fixes arm/for-next arm/fixes linus/master v6.5-rc7 v6.5-rc6 v6.5-rc5 v6.5-rc7]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Kefeng-Wang/mm-add-a-generic-VMA-lock-based-page-fault-handler/20230821-203442
> base:   next-20230821
> patch link:    https://lore.kernel.org/r/20230821123056.2109942-2-wangkefeng.wang%40huawei.com
> patch subject: [PATCH rfc v2 01/10] mm: add a generic VMA lock-based page fault handler
> config: i386-tinyconfig (https://download.01.org/0day-ci/archive/20230821/202308212249.dZG3d55u-lkp@intel.com/config)
> compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
> reproduce: (https://download.01.org/0day-ci/archive/20230821/202308212249.dZG3d55u-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202308212249.dZG3d55u-lkp@intel.com/
> 
> All errors (new ones prefixed by >>):
> 
>     In file included from include/linux/memcontrol.h:20,
>                      from include/linux/swap.h:9,
>                      from include/linux/suspend.h:5,
>                      from arch/x86/kernel/asm-offsets.c:14:
>>> include/linux/mm.h:810:38: error: redefinition of 'lock_vma_under_rcu'
>       810 | static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
>           |                                      ^~~~~~~~~~~~~~~~~~
>     include/linux/mm.h:794:38: note: previous definition of 'lock_vma_under_rcu' with type 'struct vm_area_struct *(struct mm_struct *, long unsigned int)'
>       794 | static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
>           |                                      ^~~~~~~~~~~~~~~~~~
>     make[3]: *** [scripts/Makefile.build:116: arch/x86/kernel/asm-offsets.s] Error 1
>     make[3]: Target 'prepare' not remade because of errors.
>     make[2]: *** [Makefile:1203: prepare0] Error 2
>     make[2]: Target 'prepare' not remade because of errors.
>     make[1]: *** [Makefile:234: __sub-make] Error 2
>     make[1]: Target 'prepare' not remade because of errors.
>     make: *** [Makefile:234: __sub-make] Error 2
>     make: Target 'prepare' not remade because of errors.
> 
> 

Yes, the following change should be drop as it is miscopied...

> vim +/lock_vma_under_rcu +810 include/linux/mm.h
> 
>     809	
>   > 810	static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
>     811			unsigned long address)
>     812	{
>     813		return NULL;
>     814	}
>     815	
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rfc v2 05/10] powerpc: mm: use try_vma_locked_page_fault()
  2023-08-21 12:30 ` [PATCH rfc v2 05/10] powerpc: " Kefeng Wang
@ 2023-08-22  9:38   ` Christophe Leroy
  2023-08-22 12:12     ` Kefeng Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Christophe Leroy @ 2023-08-22  9:38 UTC (permalink / raw)
  To: Kefeng Wang, Andrew Morton, linux-mm@kvack.org
  Cc: surenb@google.com, willy@infradead.org, Russell King,
	Catalin Marinas, Will Deacon, Huacai Chen, WANG Xuerui,
	Michael Ellerman, Nicholas Piggin, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Alexander Gordeev, Gerald Schaefer, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Sven Schnelle, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86@kernel.org, H . Peter Anvin,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, loongarch@lists.linux.dev,
	linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org,
	linux-s390@vger.kernel.org



Le 21/08/2023 à 14:30, Kefeng Wang a écrit :
> Use new try_vma_locked_page_fault() helper to simplify code.
> No functional change intended.

Does it really simplifies code ? It's 32 insertions versus 34 deletions 
so only removing 2 lines.

I don't like the struct vm_fault you are adding because when it was four 
independant variables it was handled through local registers. Now that 
it is a struct it has to go via the stack, leading to unnecessary memory 
read and writes. And going back and forth between architecture code and 
generic code may also be counter-performant.

Did you make any performance analysis ? Page faults are really a hot 
path when dealling with minor faults.

Thanks
Christophe

> 
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>   arch/powerpc/mm/fault.c | 66 ++++++++++++++++++++---------------------
>   1 file changed, 32 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index b1723094d464..52f9546e020e 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -391,6 +391,22 @@ static int page_fault_is_bad(unsigned long err)
>   #define page_fault_is_bad(__err)	((__err) & DSISR_BAD_FAULT_32S)
>   #endif
>   
> +#ifdef CONFIG_PER_VMA_LOCK
> +bool arch_vma_access_error(struct vm_area_struct *vma, struct vm_fault *vmf)
> +{
> +	int is_exec = TRAP(vmf->regs) == INTERRUPT_INST_STORAGE;
> +	int is_write = page_fault_is_write(vmf->fault_code);
> +
> +	if (unlikely(access_pkey_error(is_write, is_exec,
> +				(vmf->fault_code & DSISR_KEYFAULT), vma)))
> +		return true;
> +
> +	if (unlikely(access_error(is_write, is_exec, vma)))
> +		return true;
> +	return false;
> +}
> +#endif
> +
>   /*
>    * For 600- and 800-family processors, the error_code parameter is DSISR
>    * for a data fault, SRR1 for an instruction fault.
> @@ -407,12 +423,18 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
>   {
>   	struct vm_area_struct * vma;
>   	struct mm_struct *mm = current->mm;
> -	unsigned int flags = FAULT_FLAG_DEFAULT;
>   	int is_exec = TRAP(regs) == INTERRUPT_INST_STORAGE;
>   	int is_user = user_mode(regs);
>   	int is_write = page_fault_is_write(error_code);
>   	vm_fault_t fault, major = 0;
>   	bool kprobe_fault = kprobe_page_fault(regs, 11);
> +	struct vm_fault vmf = {
> +		.real_address = address,
> +		.fault_code = error_code,
> +		.regs = regs,
> +		.flags = FAULT_FLAG_DEFAULT,
> +	};
> +
>   
>   	if (unlikely(debugger_fault_handler(regs) || kprobe_fault))
>   		return 0;
> @@ -463,45 +485,21 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
>   	 * mmap_lock held
>   	 */
>   	if (is_user)
> -		flags |= FAULT_FLAG_USER;
> +		vmf.flags |= FAULT_FLAG_USER;
>   	if (is_write)
> -		flags |= FAULT_FLAG_WRITE;
> +		vmf.flags |= FAULT_FLAG_WRITE;
>   	if (is_exec)
> -		flags |= FAULT_FLAG_INSTRUCTION;
> +		vmf.flags |= FAULT_FLAG_INSTRUCTION;
>   
> -	if (!(flags & FAULT_FLAG_USER))
> -		goto lock_mmap;
> -
> -	vma = lock_vma_under_rcu(mm, address);
> -	if (!vma)
> -		goto lock_mmap;
> -
> -	if (unlikely(access_pkey_error(is_write, is_exec,
> -				       (error_code & DSISR_KEYFAULT), vma))) {
> -		vma_end_read(vma);
> -		goto lock_mmap;
> -	}
> -
> -	if (unlikely(access_error(is_write, is_exec, vma))) {
> -		vma_end_read(vma);
> -		goto lock_mmap;
> -	}
> -
> -	fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
> -	if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
> -		vma_end_read(vma);
> -
> -	if (!(fault & VM_FAULT_RETRY)) {
> -		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
> +	fault = try_vma_locked_page_fault(&vmf);
> +	if (fault == VM_FAULT_NONE)
> +		goto retry;
> +	if (!(fault & VM_FAULT_RETRY))
>   		goto done;
> -	}
> -	count_vm_vma_lock_event(VMA_LOCK_RETRY);
>   
>   	if (fault_signal_pending(fault, regs))
>   		return user_mode(regs) ? 0 : SIGBUS;
>   
> -lock_mmap:
> -
>   	/* When running in the kernel we expect faults to occur only to
>   	 * addresses in user space.  All other faults represent errors in the
>   	 * kernel and should generate an OOPS.  Unfortunately, in the case of an
> @@ -528,7 +526,7 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
>   	 * make sure we exit gracefully rather than endlessly redo
>   	 * the fault.
>   	 */
> -	fault = handle_mm_fault(vma, address, flags, regs);
> +	fault = handle_mm_fault(vma, address, vmf.flags, regs);
>   
>   	major |= fault & VM_FAULT_MAJOR;
>   
> @@ -544,7 +542,7 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
>   	 * case.
>   	 */
>   	if (unlikely(fault & VM_FAULT_RETRY)) {
> -		flags |= FAULT_FLAG_TRIED;
> +		vmf.flags |= FAULT_FLAG_TRIED;
>   		goto retry;
>   	}
>   

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rfc v2 05/10] powerpc: mm: use try_vma_locked_page_fault()
  2023-08-22  9:38   ` Christophe Leroy
@ 2023-08-22 12:12     ` Kefeng Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Kefeng Wang @ 2023-08-22 12:12 UTC (permalink / raw)
  To: Christophe Leroy, Andrew Morton, linux-mm@kvack.org
  Cc: surenb@google.com, willy@infradead.org, Russell King,
	Catalin Marinas, Will Deacon, Huacai Chen, WANG Xuerui,
	Michael Ellerman, Nicholas Piggin, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Alexander Gordeev, Gerald Schaefer, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Sven Schnelle, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86@kernel.org, H . Peter Anvin,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, loongarch@lists.linux.dev,
	linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org,
	linux-s390@vger.kernel.org



On 2023/8/22 17:38, Christophe Leroy wrote:
> 
> 
> Le 21/08/2023 à 14:30, Kefeng Wang a écrit :
>> Use new try_vma_locked_page_fault() helper to simplify code.
>> No functional change intended.
> 
> Does it really simplifies code ? It's 32 insertions versus 34 deletions
> so only removing 2 lines.

Yes，it is unfriendly for powerpc as the arch's vma access check is much
complex than other arch,
> 
> I don't like the struct vm_fault you are adding because when it was four
> independant variables it was handled through local registers. Now that
> it is a struct it has to go via the stack, leading to unnecessary memory
> read and writes. And going back and forth between architecture code and
> generic code may also be counter-performant.

Because different arch has different var to check vma access, so the
easy way to add them into vmf, I don' find a better way.
> 
> Did you make any performance analysis ? Page faults are really a hot
> path when dealling with minor faults.

no, this is only built and rfc to see the feedback about the conversion.

Thanks.

> 
> Thanks
> Christophe
> 
>>
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> ---
>>    arch/powerpc/mm/fault.c | 66 ++++++++++++++++++++---------------------
>>    1 file changed, 32 insertions(+), 34 deletions(-)
>>
>> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
>> index b1723094d464..52f9546e020e 100644
>> --- a/arch/powerpc/mm/fault.c
>> +++ b/arch/powerpc/mm/fault.c
>> @@ -391,6 +391,22 @@ static int page_fault_is_bad(unsigned long err)
>>    #define page_fault_is_bad(__err)	((__err) & DSISR_BAD_FAULT_32S)
>>    #endif
>>    
>> +#ifdef CONFIG_PER_VMA_LOCK
>> +bool arch_vma_access_error(struct vm_area_struct *vma, struct vm_fault *vmf)
>> +{
>> +	int is_exec = TRAP(vmf->regs) == INTERRUPT_INST_STORAGE;
>> +	int is_write = page_fault_is_write(vmf->fault_code);
>> +
>> +	if (unlikely(access_pkey_error(is_write, is_exec,
>> +				(vmf->fault_code & DSISR_KEYFAULT), vma)))
>> +		return true;
>> +
>> +	if (unlikely(access_error(is_write, is_exec, vma)))
>> +		return true;
>> +	return false;
>> +}
>> +#endif
>> +
>>    /*
>>     * For 600- and 800-family processors, the error_code parameter is DSISR
>>     * for a data fault, SRR1 for an instruction fault.
>> @@ -407,12 +423,18 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
>>    {
>>    	struct vm_area_struct * vma;
>>    	struct mm_struct *mm = current->mm;
>> -	unsigned int flags = FAULT_FLAG_DEFAULT;
>>    	int is_exec = TRAP(regs) == INTERRUPT_INST_STORAGE;
>>    	int is_user = user_mode(regs);
>>    	int is_write = page_fault_is_write(error_code);
>>    	vm_fault_t fault, major = 0;
>>    	bool kprobe_fault = kprobe_page_fault(regs, 11);
>> +	struct vm_fault vmf = {
>> +		.real_address = address,
>> +		.fault_code = error_code,
>> +		.regs = regs,
>> +		.flags = FAULT_FLAG_DEFAULT,
>> +	};
>> +
>>    
>>    	if (unlikely(debugger_fault_handler(regs) || kprobe_fault))
>>    		return 0;
>> @@ -463,45 +485,21 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
>>    	 * mmap_lock held
>>    	 */
>>    	if (is_user)
>> -		flags |= FAULT_FLAG_USER;
>> +		vmf.flags |= FAULT_FLAG_USER;
>>    	if (is_write)
>> -		flags |= FAULT_FLAG_WRITE;
>> +		vmf.flags |= FAULT_FLAG_WRITE;
>>    	if (is_exec)
>> -		flags |= FAULT_FLAG_INSTRUCTION;
>> +		vmf.flags |= FAULT_FLAG_INSTRUCTION;
>>    
>> -	if (!(flags & FAULT_FLAG_USER))
>> -		goto lock_mmap;
>> -
>> -	vma = lock_vma_under_rcu(mm, address);
>> -	if (!vma)
>> -		goto lock_mmap;
>> -
>> -	if (unlikely(access_pkey_error(is_write, is_exec,
>> -				       (error_code & DSISR_KEYFAULT), vma))) {
>> -		vma_end_read(vma);
>> -		goto lock_mmap;
>> -	}
>> -
>> -	if (unlikely(access_error(is_write, is_exec, vma))) {
>> -		vma_end_read(vma);
>> -		goto lock_mmap;
>> -	}
>> -
>> -	fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
>> -	if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
>> -		vma_end_read(vma);
>> -
>> -	if (!(fault & VM_FAULT_RETRY)) {
>> -		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
>> +	fault = try_vma_locked_page_fault(&vmf);
>> +	if (fault == VM_FAULT_NONE)
>> +		goto retry;
>> +	if (!(fault & VM_FAULT_RETRY))
>>    		goto done;
>> -	}
>> -	count_vm_vma_lock_event(VMA_LOCK_RETRY);
>>    
>>    	if (fault_signal_pending(fault, regs))
>>    		return user_mode(regs) ? 0 : SIGBUS;
>>    
>> -lock_mmap:
>> -
>>    	/* When running in the kernel we expect faults to occur only to
>>    	 * addresses in user space.  All other faults represent errors in the
>>    	 * kernel and should generate an OOPS.  Unfortunately, in the case of an
>> @@ -528,7 +526,7 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
>>    	 * make sure we exit gracefully rather than endlessly redo
>>    	 * the fault.
>>    	 */
>> -	fault = handle_mm_fault(vma, address, flags, regs);
>> +	fault = handle_mm_fault(vma, address, vmf.flags, regs);
>>    
>>    	major |= fault & VM_FAULT_MAJOR;
>>    
>> @@ -544,7 +542,7 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
>>    	 * case.
>>    	 */
>>    	if (unlikely(fault & VM_FAULT_RETRY)) {
>> -		flags |= FAULT_FLAG_TRIED;
>> +		vmf.flags |= FAULT_FLAG_TRIED;
>>    		goto retry;
>>    	}
>>    


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rfc v2 01/10] mm: add a generic VMA lock-based page fault handler
  2023-08-21 12:30 ` [PATCH rfc v2 01/10] mm: add a generic VMA lock-based page fault handler Kefeng Wang
  2023-08-21 15:13   ` kernel test robot
@ 2023-08-24  7:12   ` Alexander Gordeev
  2023-08-26  0:56     ` Kefeng Wang
  1 sibling, 1 reply; 19+ messages in thread
From: Alexander Gordeev @ 2023-08-24  7:12 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Andrew Morton, linux-mm, surenb, willy, Russell King,
	Catalin Marinas, Will Deacon, Huacai Chen, WANG Xuerui,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Sven Schnelle, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, linux-arm-kernel, linux-kernel, loongarch,
	linuxppc-dev, linux-riscv, linux-s390

On Mon, Aug 21, 2023 at 08:30:47PM +0800, Kefeng Wang wrote:

Hi Kefeng,

> The ARCH_SUPPORTS_PER_VMA_LOCK are enabled by more and more architectures,
> eg, x86, arm64, powerpc and s390, and riscv, those implementation are very
> similar which results in some duplicated codes, let's add a generic VMA
> lock-based page fault handler try_to_vma_locked_page_fault() to eliminate
> them, and which also make us easy to support this on new architectures.
> 
> Since different architectures use different way to check vma whether is
> accessable or not, the struct pt_regs, page fault error code and vma flags
> are added into struct vm_fault, then, the architecture's page fault code
> could re-use struct vm_fault to record and check vma accessable by each
> own implementation.
> 
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  include/linux/mm.h       | 17 +++++++++++++++++
>  include/linux/mm_types.h |  2 ++
>  mm/memory.c              | 39 +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 58 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 3f764e84e567..22a6f4c56ff3 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -512,9 +512,12 @@ struct vm_fault {
>  		pgoff_t pgoff;			/* Logical page offset based on vma */
>  		unsigned long address;		/* Faulting virtual address - masked */
>  		unsigned long real_address;	/* Faulting virtual address - unmasked */
> +		unsigned long fault_code;	/* Faulting error code during page fault */
> +		struct pt_regs *regs;		/* The registers stored during page fault */
>  	};
>  	enum fault_flag flags;		/* FAULT_FLAG_xxx flags
>  					 * XXX: should really be 'const' */
> +	vm_flags_t vm_flags;		/* VMA flags to be used for access checking */
>  	pmd_t *pmd;			/* Pointer to pmd entry matching
>  					 * the 'address' */
>  	pud_t *pud;			/* Pointer to pud entry matching
> @@ -774,6 +777,9 @@ static inline void assert_fault_locked(struct vm_fault *vmf)
>  struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
>  					  unsigned long address);
>  
> +bool arch_vma_access_error(struct vm_area_struct *vma, struct vm_fault *vmf);
> +vm_fault_t try_vma_locked_page_fault(struct vm_fault *vmf);
> +
>  #else /* CONFIG_PER_VMA_LOCK */
>  
>  static inline bool vma_start_read(struct vm_area_struct *vma)
> @@ -801,6 +807,17 @@ static inline void assert_fault_locked(struct vm_fault *vmf)
>  	mmap_assert_locked(vmf->vma->vm_mm);
>  }
>  
> +static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
> +		unsigned long address)
> +{
> +	return NULL;
> +}
> +
> +static inline vm_fault_t try_vma_locked_page_fault(struct vm_fault *vmf)
> +{
> +	return VM_FAULT_NONE;
> +}
> +
>  #endif /* CONFIG_PER_VMA_LOCK */
>  
>  extern const struct vm_operations_struct vma_dummy_vm_ops;
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index f5ba5b0bc836..702820cea3f9 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -1119,6 +1119,7 @@ typedef __bitwise unsigned int vm_fault_t;
>   * fault. Used to decide whether a process gets delivered SIGBUS or
>   * just gets major/minor fault counters bumped up.
>   *
> + * @VM_FAULT_NONE:		Special case, not starting to handle fault
>   * @VM_FAULT_OOM:		Out Of Memory
>   * @VM_FAULT_SIGBUS:		Bad access
>   * @VM_FAULT_MAJOR:		Page read from storage
> @@ -1139,6 +1140,7 @@ typedef __bitwise unsigned int vm_fault_t;
>   *
>   */
>  enum vm_fault_reason {
> +	VM_FAULT_NONE		= (__force vm_fault_t)0x000000,
>  	VM_FAULT_OOM            = (__force vm_fault_t)0x000001,
>  	VM_FAULT_SIGBUS         = (__force vm_fault_t)0x000002,
>  	VM_FAULT_MAJOR          = (__force vm_fault_t)0x000004,
> diff --git a/mm/memory.c b/mm/memory.c
> index 3b4aaa0d2fff..60fe35db5134 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5510,6 +5510,45 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
>  	count_vm_vma_lock_event(VMA_LOCK_ABORT);
>  	return NULL;
>  }
> +
> +#ifdef CONFIG_PER_VMA_LOCK
> +bool __weak arch_vma_access_error(struct vm_area_struct *vma, struct vm_fault *vmf)
> +{
> +	return (vma->vm_flags & vmf->vm_flags) == 0;
> +}
> +#endif
> +
> +vm_fault_t try_vma_locked_page_fault(struct vm_fault *vmf)
> +{
> +	vm_fault_t fault = VM_FAULT_NONE;
> +	struct vm_area_struct *vma;
> +
> +	if (!(vmf->flags & FAULT_FLAG_USER))
> +		return fault;
> +
> +	vma = lock_vma_under_rcu(current->mm, vmf->real_address);
> +	if (!vma)
> +		return fault;
> +
> +	if (arch_vma_access_error(vma, vmf)) {
> +		vma_end_read(vma);
> +		return fault;
> +	}
> +
> +	fault = handle_mm_fault(vma, vmf->real_address,
> +				vmf->flags | FAULT_FLAG_VMA_LOCK, vmf->regs);
> +
> +	if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
> +		vma_end_read(vma);

Could you please explain how vma_end_read() call could be conditional?

> +
> +	if (fault & VM_FAULT_RETRY)
> +		count_vm_vma_lock_event(VMA_LOCK_RETRY);
> +	else
> +		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
> +
> +	return fault;
> +}
> +
>  #endif /* CONFIG_PER_VMA_LOCK */
>  
>  #ifndef __PAGETABLE_P4D_FOLDED


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rfc v2 04/10] s390: mm: use try_vma_locked_page_fault()
  2023-08-21 12:30 ` [PATCH rfc v2 04/10] s390: " Kefeng Wang
@ 2023-08-24  8:16   ` Alexander Gordeev
       [not found]     ` <20230824083225.10112-A-hca@linux.ibm.com>
  0 siblings, 1 reply; 19+ messages in thread
From: Alexander Gordeev @ 2023-08-24  8:16 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Andrew Morton, linux-mm, surenb, willy, Russell King,
	Catalin Marinas, Will Deacon, Huacai Chen, WANG Xuerui,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Sven Schnelle, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, linux-arm-kernel, linux-kernel, loongarch,
	linuxppc-dev, linux-riscv, linux-s390

On Mon, Aug 21, 2023 at 08:30:50PM +0800, Kefeng Wang wrote:
> Use new try_vma_locked_page_fault() helper to simplify code.
> No functional change intended.
> 
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  arch/s390/mm/fault.c | 66 ++++++++++++++++++--------------------------
>  1 file changed, 27 insertions(+), 39 deletions(-)
> 
> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> index 099c4824dd8a..fbbdebde6ea7 100644
> --- a/arch/s390/mm/fault.c
> +++ b/arch/s390/mm/fault.c
> @@ -357,16 +357,18 @@ static noinline void do_fault_error(struct pt_regs *regs, vm_fault_t fault)
>  static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>  {
>  	struct gmap *gmap;
> -	struct task_struct *tsk;
> -	struct mm_struct *mm;
>  	struct vm_area_struct *vma;
>  	enum fault_type type;
> -	unsigned long address;
> -	unsigned int flags;
> +	struct mm_struct *mm = current->mm;
> +	unsigned long address = get_fault_address(regs);
>  	vm_fault_t fault;
>  	bool is_write;
> +	struct vm_fault vmf = {
> +		.real_address = address,
> +		.flags = FAULT_FLAG_DEFAULT,
> +		.vm_flags = access,
> +	};
>  
> -	tsk = current;
>  	/*
>  	 * The instruction that caused the program check has
>  	 * been nullified. Don't signal single step via SIGTRAP.
> @@ -376,8 +378,6 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>  	if (kprobe_page_fault(regs, 14))
>  		return 0;
>  
> -	mm = tsk->mm;
> -	address = get_fault_address(regs);
>  	is_write = fault_is_write(regs);
>  
>  	/*
> @@ -398,45 +398,33 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>  	}
>  
>  	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
> -	flags = FAULT_FLAG_DEFAULT;
>  	if (user_mode(regs))
> -		flags |= FAULT_FLAG_USER;
> +		vmf.flags |= FAULT_FLAG_USER;
>  	if (is_write)
> -		access = VM_WRITE;
> -	if (access == VM_WRITE)
> -		flags |= FAULT_FLAG_WRITE;
> -	if (!(flags & FAULT_FLAG_USER))
> -		goto lock_mmap;
> -	vma = lock_vma_under_rcu(mm, address);
> -	if (!vma)
> -		goto lock_mmap;
> -	if (!(vma->vm_flags & access)) {
> -		vma_end_read(vma);
> -		goto lock_mmap;
> -	}
> -	fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
> -	if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
> -		vma_end_read(vma);
> -	if (!(fault & VM_FAULT_RETRY)) {
> -		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
> -		if (likely(!(fault & VM_FAULT_ERROR)))
> -			fault = 0;

This fault fixup is removed in the new version.

> +		vmf.vm_flags = VM_WRITE;
> +	if (vmf.vm_flags == VM_WRITE)
> +		vmf.flags |= FAULT_FLAG_WRITE;
> +
> +	fault = try_vma_locked_page_fault(&vmf);
> +	if (fault == VM_FAULT_NONE)
> +		goto lock_mm;

Because VM_FAULT_NONE is set to 0 it gets confused with
the success code of 0 returned by a fault handler. In the
former case we want to continue, while in the latter -
successfully return. I think it applies to all archs.

> +	if (!(fault & VM_FAULT_RETRY))
>  		goto out;
> -	}
> -	count_vm_vma_lock_event(VMA_LOCK_RETRY);
> +
>  	/* Quick path to respond to signals */
>  	if (fault_signal_pending(fault, regs)) {
>  		fault = VM_FAULT_SIGNAL;
>  		goto out;
>  	}
> -lock_mmap:
> +
> +lock_mm:
>  	mmap_read_lock(mm);
>  
>  	gmap = NULL;
>  	if (IS_ENABLED(CONFIG_PGSTE) && type == GMAP_FAULT) {
>  		gmap = (struct gmap *) S390_lowcore.gmap;
>  		current->thread.gmap_addr = address;
> -		current->thread.gmap_write_flag = !!(flags & FAULT_FLAG_WRITE);
> +		current->thread.gmap_write_flag = !!(vmf.flags & FAULT_FLAG_WRITE);
>  		current->thread.gmap_int_code = regs->int_code & 0xffff;
>  		address = __gmap_translate(gmap, address);
>  		if (address == -EFAULT) {
> @@ -444,7 +432,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>  			goto out_up;
>  		}
>  		if (gmap->pfault_enabled)
> -			flags |= FAULT_FLAG_RETRY_NOWAIT;
> +			vmf.flags |= FAULT_FLAG_RETRY_NOWAIT;
>  	}
>  
>  retry:
> @@ -466,7 +454,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>  	 * we can handle it..
>  	 */
>  	fault = VM_FAULT_BADACCESS;
> -	if (unlikely(!(vma->vm_flags & access)))
> +	if (unlikely(!(vma->vm_flags & vmf.vm_flags)))
>  		goto out_up;
>  
>  	/*
> @@ -474,10 +462,10 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>  	 * make sure we exit gracefully rather than endlessly redo
>  	 * the fault.
>  	 */
> -	fault = handle_mm_fault(vma, address, flags, regs);
> +	fault = handle_mm_fault(vma, address, vmf.flags, regs);
>  	if (fault_signal_pending(fault, regs)) {
>  		fault = VM_FAULT_SIGNAL;
> -		if (flags & FAULT_FLAG_RETRY_NOWAIT)
> +		if (vmf.flags & FAULT_FLAG_RETRY_NOWAIT)
>  			goto out_up;
>  		goto out;
>  	}
> @@ -497,7 +485,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>  
>  	if (fault & VM_FAULT_RETRY) {
>  		if (IS_ENABLED(CONFIG_PGSTE) && gmap &&
> -			(flags & FAULT_FLAG_RETRY_NOWAIT)) {
> +			(vmf.flags & FAULT_FLAG_RETRY_NOWAIT)) {
>  			/*
>  			 * FAULT_FLAG_RETRY_NOWAIT has been set, mmap_lock has
>  			 * not been released
> @@ -506,8 +494,8 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>  			fault = VM_FAULT_PFAULT;
>  			goto out_up;
>  		}
> -		flags &= ~FAULT_FLAG_RETRY_NOWAIT;
> -		flags |= FAULT_FLAG_TRIED;
> +		vmf.flags &= ~FAULT_FLAG_RETRY_NOWAIT;
> +		vmf.flags |= FAULT_FLAG_TRIED;
>  		mmap_read_lock(mm);
>  		goto retry;
>  	}

FWIW, this series ends up with kernel BUG at arch/s390/mm/fault.c:341!

Thanks!


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rfc v2 01/10] mm: add a generic VMA lock-based page fault handler
  2023-08-24  7:12   ` Alexander Gordeev
@ 2023-08-26  0:56     ` Kefeng Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Kefeng Wang @ 2023-08-26  0:56 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Andrew Morton, linux-mm, surenb, willy, Russell King,
	Catalin Marinas, Will Deacon, Huacai Chen, WANG Xuerui,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Sven Schnelle, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, linux-arm-kernel, linux-kernel, loongarch,
	linuxppc-dev, linux-riscv, linux-s390



On 2023/8/24 15:12, Alexander Gordeev wrote:
> On Mon, Aug 21, 2023 at 08:30:47PM +0800, Kefeng Wang wrote:
> 
> Hi Kefeng,
> 
>> The ARCH_SUPPORTS_PER_VMA_LOCK are enabled by more and more architectures,
>> eg, x86, arm64, powerpc and s390, and riscv, those implementation are very
>> similar which results in some duplicated codes, let's add a generic VMA
>> lock-based page fault handler try_to_vma_locked_page_fault() to eliminate
>> them, and which also make us easy to support this on new architectures.
>>
>> Since different architectures use different way to check vma whether is
>> accessable or not, the struct pt_regs, page fault error code and vma flags
>> are added into struct vm_fault, then, the architecture's page fault code
>> could re-use struct vm_fault to record and check vma accessable by each
>> own implementation.
>>
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> ---
...
>> +
>> +vm_fault_t try_vma_locked_page_fault(struct vm_fault *vmf)
>> +{
>> +	vm_fault_t fault = VM_FAULT_NONE;
>> +	struct vm_area_struct *vma;
>> +
>> +	if (!(vmf->flags & FAULT_FLAG_USER))
>> +		return fault;
>> +
>> +	vma = lock_vma_under_rcu(current->mm, vmf->real_address);
>> +	if (!vma)
>> +		return fault;
>> +
>> +	if (arch_vma_access_error(vma, vmf)) {
>> +		vma_end_read(vma);
>> +		return fault;
>> +	}
>> +
>> +	fault = handle_mm_fault(vma, vmf->real_address,
>> +				vmf->flags | FAULT_FLAG_VMA_LOCK, vmf->regs);
>> +
>> +	if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
>> +		vma_end_read(vma);
> 
> Could you please explain how vma_end_read() call could be conditional?

The check is added for swap and userfault, see

https://lkml.kernel.org/r/20230630211957.1341547-4-surenb@google.com
> 
>> +
>> +	if (fault & VM_FAULT_RETRY)
>> +		count_vm_vma_lock_event(VMA_LOCK_RETRY);
>> +	else
>> +		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
>> +
>> +	return fault;
>> +}
>> +
>>   #endif /* CONFIG_PER_VMA_LOCK */
>>   
>>   #ifndef __PAGETABLE_P4D_FOLDED


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rfc v2 04/10] s390: mm: use try_vma_locked_page_fault()
       [not found]     ` <20230824083225.10112-A-hca@linux.ibm.com>
@ 2023-08-26  1:07       ` Kefeng Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Kefeng Wang @ 2023-08-26  1:07 UTC (permalink / raw)
  To: Heiko Carstens, Alexander Gordeev
  Cc: Andrew Morton, linux-mm, surenb, willy, Russell King,
	Catalin Marinas, Will Deacon, Huacai Chen, WANG Xuerui,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Gerald Schaefer,
	Vasily Gorbik, Christian Borntraeger, Sven Schnelle, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H . Peter Anvin, linux-arm-kernel,
	linux-kernel, loongarch, linuxppc-dev, linux-riscv, linux-s390



On 2023/8/24 16:32, Heiko Carstens wrote:
> On Thu, Aug 24, 2023 at 10:16:33AM +0200, Alexander Gordeev wrote:
>> On Mon, Aug 21, 2023 at 08:30:50PM +0800, Kefeng Wang wrote:
>>> Use new try_vma_locked_page_fault() helper to simplify code.
>>> No functional change intended.
>>>
>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>> ---
>>>   arch/s390/mm/fault.c | 66 ++++++++++++++++++--------------------------
>>>   1 file changed, 27 insertions(+), 39 deletions(-)
> ...
>>> -	fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
>>> -	if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
>>> -		vma_end_read(vma);
>>> -	if (!(fault & VM_FAULT_RETRY)) {
>>> -		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
>>> -		if (likely(!(fault & VM_FAULT_ERROR)))
>>> -			fault = 0;
>>
>> This fault fixup is removed in the new version.
> ...
> 
>>> +		vmf.vm_flags = VM_WRITE;
>>> +	if (vmf.vm_flags == VM_WRITE)
>>> +		vmf.flags |= FAULT_FLAG_WRITE;
>>> +
>>> +	fault = try_vma_locked_page_fault(&vmf);
>>> +	if (fault == VM_FAULT_NONE)
>>> +		goto lock_mm;
>>
>> Because VM_FAULT_NONE is set to 0 it gets confused with
>> the success code of 0 returned by a fault handler. In the
>> former case we want to continue, while in the latter -
>> successfully return. I think it applies to all archs.
> ...
>> FWIW, this series ends up with kernel BUG at arch/s390/mm/fault.c:341!
> 

I didn't test and only built, this is a RFC to want to know whether
the way to add three more numbers into vmf and using vmf in arch's page
fault is feasible or not.

> Without having looked in detail into this patch: all of this is likely
> because s390's fault handling is quite odd. Not only because fault is set
> to 0, but also because of the private VM_FAULT values like
> VM_FAULT_BADCONTEXT. I'm just cleaning up all of this, but it won't make it
> for the next merge window.

Sure, if re-post, will drop the s390's change, but as mentioned above, 
the abstract of the generic vma locked and changes may be not perfect,
let's wait for more response.

Thanks all.

> 
> Therefore I'd like to ask to drop the s390 conversion of this series, and
> if this series is supposed to be merged the s390 conversion needs to be
> done later. Let's not waste more time on the current implementation,
> please.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-08-26  1:07 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-21 12:30 [PATCH rfc -next v2 00/10] mm: convert to generic VMA lock-based page fault Kefeng Wang
2023-08-21 12:30 ` [PATCH rfc v2 01/10] mm: add a generic VMA lock-based page fault handler Kefeng Wang
2023-08-21 15:13   ` kernel test robot
2023-08-22  2:33     ` Kefeng Wang
2023-08-24  7:12   ` Alexander Gordeev
2023-08-26  0:56     ` Kefeng Wang
2023-08-21 12:30 ` [PATCH rfc v2 02/10] arm64: mm: use try_vma_locked_page_fault() Kefeng Wang
2023-08-21 12:30 ` [PATCH rfc v2 03/10] x86: " Kefeng Wang
2023-08-21 12:30 ` [PATCH rfc v2 04/10] s390: " Kefeng Wang
2023-08-24  8:16   ` Alexander Gordeev
     [not found]     ` <20230824083225.10112-A-hca@linux.ibm.com>
2023-08-26  1:07       ` Kefeng Wang
2023-08-21 12:30 ` [PATCH rfc v2 05/10] powerpc: " Kefeng Wang
2023-08-22  9:38   ` Christophe Leroy
2023-08-22 12:12     ` Kefeng Wang
2023-08-21 12:30 ` [PATCH rfc v2 06/10] riscv: " Kefeng Wang
2023-08-21 12:30 ` [PATCH rfc v2 07/10] ARM: mm: try VMA lock-based page fault handling first Kefeng Wang
2023-08-21 12:30 ` [PATCH rfc v2 08/10] loongarch: mm: cleanup __do_page_fault() Kefeng Wang
2023-08-21 12:30 ` [PATCH rfc v2 09/10] loongarch: mm: add access_error() helper Kefeng Wang
2023-08-21 12:30 ` [PATCH rfc v2 10/10] loongarch: mm: try VMA lock-based page fault handling first Kefeng Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).