From: Andrew Morton <akpm@linux-foundation.org>
To: mm-commits@vger.kernel.org, yuzhao@google.com,
ying.huang@intel.com, willy@infradead.org,
viro@zeniv.linux.org.uk, vbabka@suse.cz,
punit.agrawal@bytedance.com, peterx@redhat.com,
pasha.tatashin@soleen.com, minchan@google.com,
michel@lespinasse.org, mhocko@suse.com, lstoakes@gmail.com,
Liam.Howlett@oracle.com, ldufour@linux.ibm.com,
josef@toxicpanda.com, jack@suse.cz, hughd@google.com,
hdanton@sina.com, hch@lst.de, hannes@cmpxchg.org,
dhowells@redhat.com, david@redhat.com, dave@stgolabs.net,
brauner@kernel.org, apopple@nvidia.com, surenb@google.com,
akpm@linux-foundation.org
Subject: + mm-handle-userfaults-under-vma-lock.patch added to mm-unstable branch
Date: Sun, 02 Jul 2023 10:55:07 -0700 [thread overview]
Message-ID: <20230702175507.B120CC433C8@smtp.kernel.org> (raw)
The patch titled
Subject: mm: handle userfaults under VMA lock
has been added to the -mm mm-unstable branch. Its filename is
mm-handle-userfaults-under-vma-lock.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-handle-userfaults-under-vma-lock.patch
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Suren Baghdasaryan <surenb@google.com>
Subject: mm: handle userfaults under VMA lock
Date: Fri, 30 Jun 2023 14:19:57 -0700
Enable handle_userfault to operate under VMA lock by releasing VMA lock
instead of mmap_lock and retrying. Note that FAULT_FLAG_RETRY_NOWAIT
should never be used when handling faults under per-VMA lock protection
because that would break the assumption that lock is dropped on retry.
Link: https://lkml.kernel.org/r/20230630211957.1341547-7-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michel Lespinasse <michel@lespinasse.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/userfaultfd.c | 34 ++++++++++++++--------------------
include/linux/mm.h | 24 ++++++++++++++++++++++++
mm/memory.c | 9 ---------
3 files changed, 38 insertions(+), 29 deletions(-)
--- a/fs/userfaultfd.c~mm-handle-userfaults-under-vma-lock
+++ a/fs/userfaultfd.c
@@ -277,17 +277,16 @@ static inline struct uffd_msg userfault_
* hugepmd ranges.
*/
static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
- struct vm_area_struct *vma,
- unsigned long address,
- unsigned long flags,
- unsigned long reason)
+ struct vm_fault *vmf,
+ unsigned long reason)
{
+ struct vm_area_struct *vma = vmf->vma;
pte_t *ptep, pte;
bool ret = true;
- mmap_assert_locked(ctx->mm);
+ assert_fault_locked(vmf);
- ptep = hugetlb_walk(vma, address, vma_mmu_pagesize(vma));
+ ptep = hugetlb_walk(vma, vmf->address, vma_mmu_pagesize(vma));
if (!ptep)
goto out;
@@ -308,10 +307,8 @@ out:
}
#else
static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
- struct vm_area_struct *vma,
- unsigned long address,
- unsigned long flags,
- unsigned long reason)
+ struct vm_fault *vmf,
+ unsigned long reason)
{
return false; /* should never get here */
}
@@ -325,11 +322,11 @@ static inline bool userfaultfd_huge_must
* threads.
*/
static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx,
- unsigned long address,
- unsigned long flags,
+ struct vm_fault *vmf,
unsigned long reason)
{
struct mm_struct *mm = ctx->mm;
+ unsigned long address = vmf->address;
pgd_t *pgd;
p4d_t *p4d;
pud_t *pud;
@@ -338,7 +335,7 @@ static inline bool userfaultfd_must_wait
pte_t ptent;
bool ret = true;
- mmap_assert_locked(mm);
+ assert_fault_locked(vmf);
pgd = pgd_offset(mm, address);
if (!pgd_present(*pgd))
@@ -440,7 +437,7 @@ vm_fault_t handle_userfault(struct vm_fa
* Coredumping runs without mmap_lock so we can only check that
* the mmap_lock is held, if PF_DUMPCORE was not set.
*/
- mmap_assert_locked(mm);
+ assert_fault_locked(vmf);
ctx = vma->vm_userfaultfd_ctx.ctx;
if (!ctx)
@@ -556,15 +553,12 @@ vm_fault_t handle_userfault(struct vm_fa
spin_unlock_irq(&ctx->fault_pending_wqh.lock);
if (!is_vm_hugetlb_page(vma))
- must_wait = userfaultfd_must_wait(ctx, vmf->address, vmf->flags,
- reason);
+ must_wait = userfaultfd_must_wait(ctx, vmf, reason);
else
- must_wait = userfaultfd_huge_must_wait(ctx, vma,
- vmf->address,
- vmf->flags, reason);
+ must_wait = userfaultfd_huge_must_wait(ctx, vmf, reason);
if (is_vm_hugetlb_page(vma))
hugetlb_vma_unlock_read(vma);
- mmap_read_unlock(mm);
+ release_fault_lock(vmf);
if (likely(must_wait && !READ_ONCE(ctx->released))) {
wake_up_poll(&ctx->fd_wqh, EPOLLIN);
--- a/include/linux/mm.h~mm-handle-userfaults-under-vma-lock
+++ a/include/linux/mm.h
@@ -705,6 +705,17 @@ static inline bool vma_try_start_write(s
return true;
}
+static inline void vma_assert_locked(struct vm_area_struct *vma)
+{
+ int mm_lock_seq;
+
+ if (__is_vma_write_locked(vma, &mm_lock_seq))
+ return;
+
+ lockdep_assert_held(&vma->vm_lock->lock);
+ VM_BUG_ON_VMA(!rwsem_is_locked(&vma->vm_lock->lock), vma);
+}
+
static inline void vma_assert_write_locked(struct vm_area_struct *vma)
{
int mm_lock_seq;
@@ -728,6 +739,14 @@ static inline void release_fault_lock(st
mmap_read_unlock(vmf->vma->vm_mm);
}
+static inline void assert_fault_locked(struct vm_fault *vmf)
+{
+ if (vmf->flags & FAULT_FLAG_VMA_LOCK)
+ vma_assert_locked(vmf->vma);
+ else
+ mmap_assert_locked(vmf->vma->vm_mm);
+}
+
struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
unsigned long address);
@@ -748,6 +767,11 @@ static inline void release_fault_lock(st
mmap_read_unlock(vmf->vma->vm_mm);
}
+static inline void assert_fault_locked(struct vm_fault *vmf)
+{
+ mmap_assert_locked(vmf->vma->vm_mm);
+}
+
#endif /* CONFIG_PER_VMA_LOCK */
/*
--- a/mm/memory.c~mm-handle-userfaults-under-vma-lock
+++ a/mm/memory.c
@@ -5410,15 +5410,6 @@ retry:
if (!vma_start_read(vma))
goto inval;
- /*
- * Due to the possibility of userfault handler dropping mmap_lock, avoid
- * it for now and fall back to page fault handling under mmap_lock.
- */
- if (userfaultfd_armed(vma)) {
- vma_end_read(vma);
- goto inval;
- }
-
/* Check since vm_start/vm_end might change before we lock the VMA */
if (unlikely(address < vma->vm_start || address >= vma->vm_end)) {
vma_end_read(vma);
_
Patches currently in -mm which might be from surenb@google.com are
swap-remove-remnants-of-polling-from-read_swap_cache_async.patch
mm-add-missing-vm_fault_result_trace-name-for-vm_fault_completed.patch
mm-drop-per-vma-lock-when-returning-vm_fault_retry-or-vm_fault_completed.patch
mm-change-folio_lock_or_retry-to-use-vm_fault-directly.patch
mm-handle-swap-page-faults-under-per-vma-lock.patch
mm-handle-userfaults-under-vma-lock.patch
reply other threads:[~2023-07-02 17:55 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230702175507.B120CC433C8@smtp.kernel.org \
--to=akpm@linux-foundation.org \
--cc=Liam.Howlett@oracle.com \
--cc=apopple@nvidia.com \
--cc=brauner@kernel.org \
--cc=dave@stgolabs.net \
--cc=david@redhat.com \
--cc=dhowells@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=hch@lst.de \
--cc=hdanton@sina.com \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=josef@toxicpanda.com \
--cc=ldufour@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lstoakes@gmail.com \
--cc=mhocko@suse.com \
--cc=michel@lespinasse.org \
--cc=minchan@google.com \
--cc=mm-commits@vger.kernel.org \
--cc=pasha.tatashin@soleen.com \
--cc=peterx@redhat.com \
--cc=punit.agrawal@bytedance.com \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.