[PATCH 6.4 10/28] mm: introduce new lock_mm_and_find_vma() page fault helper

stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 6.4 10/28] mm: introduce new lock_mm_and_find_vma() page fault helper
Date: Thu, 29 Jun 2023 20:43:58 +0200	[thread overview]
Message-ID: <20230629184152.286783570@linuxfoundation.org> (raw)
In-Reply-To: <20230629184151.888604958@linuxfoundation.org>

From: Linus Torvalds <torvalds@linux-foundation.org>

commit c2508ec5a58db67093f4fb8bf89a9a7c53a109e9 upstream.

.. and make x86 use it.

This basically extracts the existing x86 "find and expand faulting vma"
code, but extends it to also take the mmap lock for writing in case we
actually do need to expand the vma.

We've historically short-circuited that case, and have some rather ugly
special logic to serialize the stack segment expansion (since we only
hold the mmap lock for reading) that doesn't match the normal VM
locking.

That slight violation of locking worked well, right up until it didn't:
the maple tree code really does want proper locking even for simple
extension of an existing vma.

So extract the code for "look up the vma of the fault" from x86, fix it
up to do the necessary write locking, and make it available as a helper
function for other architectures that can use the common helper.

Note: I say "common helper", but it really only handles the normal
stack-grows-down case.  Which is all architectures except for PA-RISC
and IA64.  So some rare architectures can't use the helper, but if they
care they'll just need to open-code this logic.

It's also worth pointing out that this code really would like to have an
optimistic "mmap_upgrade_trylock()" to make it quicker to go from a
read-lock (for the common case) to taking the write lock (for having to
extend the vma) in the normal single-threaded situation where there is
no other locking activity.

But that _is_ all the very uncommon special case, so while it would be
nice to have such an operation, it probably doesn't matter in reality.
I did put in the skeleton code for such a possible future expansion,
even if it only acts as pseudo-documentation for what we're doing.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/Kconfig    |    1 
 arch/x86/mm/fault.c |   52 ----------------------
 include/linux/mm.h  |    2 
 mm/Kconfig          |    4 +
 mm/memory.c         |  121 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 130 insertions(+), 50 deletions(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -276,6 +276,7 @@ config X86
 	select HAVE_GENERIC_VDSO
 	select HOTPLUG_SMT			if SMP
 	select IRQ_FORCED_THREADING
+	select LOCK_MM_AND_FIND_VMA
 	select NEED_PER_CPU_EMBED_FIRST_CHUNK
 	select NEED_PER_CPU_PAGE_FIRST_CHUNK
 	select NEED_SG_DMA_LENGTH
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -880,12 +880,6 @@ __bad_area(struct pt_regs *regs, unsigne
 	__bad_area_nosemaphore(regs, error_code, address, pkey, si_code);
 }
 
-static noinline void
-bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address)
-{
-	__bad_area(regs, error_code, address, 0, SEGV_MAPERR);
-}
-
 static inline bool bad_area_access_from_pkeys(unsigned long error_code,
 		struct vm_area_struct *vma)
 {
@@ -1366,51 +1360,10 @@ void do_user_addr_fault(struct pt_regs *
 lock_mmap:
 #endif /* CONFIG_PER_VMA_LOCK */
 
-	/*
-	 * Kernel-mode access to the user address space should only occur
-	 * on well-defined single instructions listed in the exception
-	 * tables.  But, an erroneous kernel fault occurring outside one of
-	 * those areas which also holds mmap_lock might deadlock attempting
-	 * to validate the fault against the address space.
-	 *
-	 * Only do the expensive exception table search when we might be at
-	 * risk of a deadlock.  This happens if we
-	 * 1. Failed to acquire mmap_lock, and
-	 * 2. The access did not originate in userspace.
-	 */
-	if (unlikely(!mmap_read_trylock(mm))) {
-		if (!user_mode(regs) && !search_exception_tables(regs->ip)) {
-			/*
-			 * Fault from code in kernel from
-			 * which we do not expect faults.
-			 */
-			bad_area_nosemaphore(regs, error_code, address);
-			return;
-		}
 retry:
-		mmap_read_lock(mm);
-	} else {
-		/*
-		 * The above down_read_trylock() might have succeeded in
-		 * which case we'll have missed the might_sleep() from
-		 * down_read():
-		 */
-		might_sleep();
-	}
-
-	vma = find_vma(mm, address);
+	vma = lock_mm_and_find_vma(mm, address, regs);
 	if (unlikely(!vma)) {
-		bad_area(regs, error_code, address);
-		return;
-	}
-	if (likely(vma->vm_start <= address))
-		goto good_area;
-	if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) {
-		bad_area(regs, error_code, address);
-		return;
-	}
-	if (unlikely(expand_stack(vma, address))) {
-		bad_area(regs, error_code, address);
+		bad_area_nosemaphore(regs, error_code, address);
 		return;
 	}
 
@@ -1418,7 +1371,6 @@ retry:
 	 * Ok, we have a good vm_area for this memory access, so
 	 * we can handle it..
 	 */
-good_area:
 	if (unlikely(access_error(error_code, vma))) {
 		bad_area_access_error(regs, error_code, address, vma);
 		return;
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2325,6 +2325,8 @@ void unmap_mapping_pages(struct address_
 		pgoff_t start, pgoff_t nr, bool even_cows);
 void unmap_mapping_range(struct address_space *mapping,
 		loff_t const holebegin, loff_t const holelen, int even_cows);
+struct vm_area_struct *lock_mm_and_find_vma(struct mm_struct *mm,
+		unsigned long address, struct pt_regs *regs);
 #else
 static inline vm_fault_t handle_mm_fault(struct vm_area_struct *vma,
 					 unsigned long address, unsigned int flags,
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1206,6 +1206,10 @@ config PER_VMA_LOCK
 	  This feature allows locking each virtual memory area separately when
 	  handling page faults instead of taking mmap_lock.
 
+config LOCK_MM_AND_FIND_VMA
+	bool
+	depends on !STACK_GROWSUP
+
 source "mm/damon/Kconfig"
 
 endmenu
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5262,6 +5262,127 @@ out:
 }
 EXPORT_SYMBOL_GPL(handle_mm_fault);
 
+#ifdef CONFIG_LOCK_MM_AND_FIND_VMA
+#include <linux/extable.h>
+
+static inline bool get_mmap_lock_carefully(struct mm_struct *mm, struct pt_regs *regs)
+{
+	/* Even if this succeeds, make it clear we *might* have slept */
+	if (likely(mmap_read_trylock(mm))) {
+		might_sleep();
+		return true;
+	}
+
+	if (regs && !user_mode(regs)) {
+		unsigned long ip = instruction_pointer(regs);
+		if (!search_exception_tables(ip))
+			return false;
+	}
+
+	mmap_read_lock(mm);
+	return true;
+}
+
+static inline bool mmap_upgrade_trylock(struct mm_struct *mm)
+{
+	/*
+	 * We don't have this operation yet.
+	 *
+	 * It should be easy enough to do: it's basically a
+	 *    atomic_long_try_cmpxchg_acquire()
+	 * from RWSEM_READER_BIAS -> RWSEM_WRITER_LOCKED, but
+	 * it also needs the proper lockdep magic etc.
+	 */
+	return false;
+}
+
+static inline bool upgrade_mmap_lock_carefully(struct mm_struct *mm, struct pt_regs *regs)
+{
+	mmap_read_unlock(mm);
+	if (regs && !user_mode(regs)) {
+		unsigned long ip = instruction_pointer(regs);
+		if (!search_exception_tables(ip))
+			return false;
+	}
+	mmap_write_lock(mm);
+	return true;
+}
+
+/*
+ * Helper for page fault handling.
+ *
+ * This is kind of equivalend to "mmap_read_lock()" followed
+ * by "find_extend_vma()", except it's a lot more careful about
+ * the locking (and will drop the lock on failure).
+ *
+ * For example, if we have a kernel bug that causes a page
+ * fault, we don't want to just use mmap_read_lock() to get
+ * the mm lock, because that would deadlock if the bug were
+ * to happen while we're holding the mm lock for writing.
+ *
+ * So this checks the exception tables on kernel faults in
+ * order to only do this all for instructions that are actually
+ * expected to fault.
+ *
+ * We can also actually take the mm lock for writing if we
+ * need to extend the vma, which helps the VM layer a lot.
+ */
+struct vm_area_struct *lock_mm_and_find_vma(struct mm_struct *mm,
+			unsigned long addr, struct pt_regs *regs)
+{
+	struct vm_area_struct *vma;
+
+	if (!get_mmap_lock_carefully(mm, regs))
+		return NULL;
+
+	vma = find_vma(mm, addr);
+	if (likely(vma && (vma->vm_start <= addr)))
+		return vma;
+
+	/*
+	 * Well, dang. We might still be successful, but only
+	 * if we can extend a vma to do so.
+	 */
+	if (!vma || !(vma->vm_flags & VM_GROWSDOWN)) {
+		mmap_read_unlock(mm);
+		return NULL;
+	}
+
+	/*
+	 * We can try to upgrade the mmap lock atomically,
+	 * in which case we can continue to use the vma
+	 * we already looked up.
+	 *
+	 * Otherwise we'll have to drop the mmap lock and
+	 * re-take it, and also look up the vma again,
+	 * re-checking it.
+	 */
+	if (!mmap_upgrade_trylock(mm)) {
+		if (!upgrade_mmap_lock_carefully(mm, regs))
+			return NULL;
+
+		vma = find_vma(mm, addr);
+		if (!vma)
+			goto fail;
+		if (vma->vm_start <= addr)
+			goto success;
+		if (!(vma->vm_flags & VM_GROWSDOWN))
+			goto fail;
+	}
+
+	if (expand_stack(vma, addr))
+		goto fail;
+
+success:
+	mmap_write_downgrade(mm);
+	return vma;
+
+fail:
+	mmap_write_unlock(mm);
+	return NULL;
+}
+#endif
+
 #ifdef CONFIG_PER_VMA_LOCK
 /*
  * Lookup and lock a VMA under RCU protection. Returned VMA is guaranteed to be

next prev parent reply	other threads:[~2023-06-29 18:48 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-29 18:43 [PATCH 6.4 00/28] 6.4.1-rc1 review Greg Kroah-Hartman
2023-06-29 18:43 ` [PATCH 6.4 01/28] x86/microcode/AMD: Load late on both threads too Greg Kroah-Hartman
2023-06-29 18:43 ` [PATCH 6.4 02/28] x86/smp: Make stop_other_cpus() more robust Greg Kroah-Hartman
2023-06-29 18:43 ` [PATCH 6.4 03/28] x86/smp: Dont access non-existing CPUID leaf Greg Kroah-Hartman
2023-06-29 18:43 ` [PATCH 6.4 04/28] x86/smp: Remove pointless wmb()s from native_stop_other_cpus() Greg Kroah-Hartman
2023-06-29 18:43 ` [PATCH 6.4 05/28] x86/smp: Use dedicated cache-line for mwait_play_dead() Greg Kroah-Hartman
2023-06-29 18:43 ` [PATCH 6.4 06/28] x86/smp: Cure kexec() vs. mwait_play_dead() breakage Greg Kroah-Hartman
2023-06-29 18:43 ` [PATCH 6.4 07/28] cpufreq: amd-pstate: Make amd-pstate EPP driver name hyphenated Greg Kroah-Hartman
2023-06-29 18:43 ` [PATCH 6.4 08/28] can: isotp: isotp_sendmsg(): fix return error fix on TX path Greg Kroah-Hartman
2023-06-29 18:43 ` [PATCH 6.4 09/28] maple_tree: fix potential out-of-bounds access in mas_wr_end_piv() Greg Kroah-Hartman
2023-06-29 18:43 ` Greg Kroah-Hartman [this message]
2023-06-29 18:43 ` [PATCH 6.4 11/28] mm: make the page fault mmap locking killable Greg Kroah-Hartman
2023-06-29 18:44 ` [PATCH 6.4 12/28] arm64/mm: Convert to using lock_mm_and_find_vma() Greg Kroah-Hartman
2023-06-29 18:44 ` [PATCH 6.4 13/28] powerpc/mm: " Greg Kroah-Hartman
2023-06-29 18:44 ` [PATCH 6.4 14/28] mips/mm: " Greg Kroah-Hartman
2023-06-29 18:44 ` [PATCH 6.4 15/28] riscv/mm: " Greg Kroah-Hartman
2023-06-29 18:44 ` [PATCH 6.4 16/28] arm/mm: " Greg Kroah-Hartman
2023-06-29 18:44 ` [PATCH 6.4 17/28] mm/fault: convert remaining simple cases to lock_mm_and_find_vma() Greg Kroah-Hartman
2023-06-29 18:44 ` [PATCH 6.4 18/28] powerpc/mm: convert coprocessor fault " Greg Kroah-Hartman
2023-06-29 18:44 ` [PATCH 6.4 19/28] mm: make find_extend_vma() fail if write lock not held Greg Kroah-Hartman
2023-06-29 18:44 ` [PATCH 6.4 20/28] execve: expand new process stack manually ahead of time Greg Kroah-Hartman
2023-06-29 18:44 ` [PATCH 6.4 21/28] mm: always expand the stack with the mmap write lock held Greg Kroah-Hartman
2023-06-29 18:44 ` [PATCH 6.4 22/28] HID: wacom: Use ktime_t rather than int when dealing with timestamps Greg Kroah-Hartman
2023-06-29 18:44 ` [PATCH 6.4 23/28] gup: add warning if some caller would seem to want stack expansion Greg Kroah-Hartman
2023-06-29 18:44 ` [PATCH 6.4 24/28] mm/khugepaged: fix regression in collapse_file() Greg Kroah-Hartman
2023-06-29 18:44 ` [PATCH 6.4 25/28] fbdev: fix potential OOB read in fast_imageblit() Greg Kroah-Hartman
2023-06-29 18:44 ` [PATCH 6.4 26/28] HID: hidraw: fix data race on device refcount Greg Kroah-Hartman
2023-06-29 18:44 ` [PATCH 6.4 27/28] HID: logitech-hidpp: add HIDPP_QUIRK_DELAYED_INIT for the T651 Greg Kroah-Hartman
2023-06-29 18:44 ` [PATCH 6.4 28/28] Revert "thermal/drivers/mediatek: Use devm_of_iomap to avoid resource leak in mtk_thermal_probe" Greg Kroah-Hartman
2023-06-30  5:30 ` [PATCH 6.4 00/28] 6.4.1-rc1 review Naresh Kamboju
2023-06-30  5:52   ` Greg Kroah-Hartman
2023-06-30  6:16   ` Linus Torvalds
2023-06-30  6:29     ` Greg Kroah-Hartman
2023-06-30  6:56       ` Helge Deller
2023-07-02 21:33         ` [PATCH 6.4 00/28] 6.4.1-rc1 review - hppa argument list too long Helge Deller
2023-07-02 22:45           ` Linus Torvalds
2023-07-02 23:30             ` Linus Torvalds
2023-07-03  3:23               ` Guenter Roeck
2023-07-03  4:22                 ` Linus Torvalds
2023-07-03  4:46                   ` Guenter Roeck
2023-07-03  4:49                     ` Linus Torvalds
2023-07-03  5:33                       ` Guenter Roeck
2023-07-03  6:20                         ` Linus Torvalds
2023-07-03  7:08                           ` Helge Deller
2023-07-03 16:49                             ` Linus Torvalds
2023-07-03 17:19                               ` Guenter Roeck
2023-07-03 17:30                                 ` Linus Torvalds
2023-07-03 19:24                               ` Helge Deller
2023-07-03 19:31                                 ` Sam James
2023-07-03 19:36                                   ` Sam James
2023-07-03 12:59                           ` Guenter Roeck
2023-07-03 13:06                             ` Guenter Roeck
2023-06-30  6:29     ` [PATCH 6.4 00/28] 6.4.1-rc1 review Guenter Roeck
2023-06-30  6:33       ` Guenter Roeck
2023-06-30  6:33       ` Linus Torvalds
2023-06-30  6:47         ` Linus Torvalds
2023-06-30 22:51           ` Guenter Roeck
2023-07-01  1:24             ` Linus Torvalds
2023-07-01  2:49               ` Guenter Roeck
2023-07-01  4:22                 ` Linus Torvalds
2023-07-01  9:57                   ` Greg Kroah-Hartman
2023-07-01 10:32                   ` Max Filippov
2023-07-01 15:01                     ` Linus Torvalds
2023-06-30  6:51         ` Greg Kroah-Hartman
2023-06-30  6:28   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230629184152.286783570@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).