From: Andrii Nakryiko <andrii@kernel.org>
To: linux-fsdevel@vger.kernel.org, brauner@kernel.org,
viro@zeniv.linux.org.uk, akpm@linux-foundation.org
Cc: linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
gregkh@linuxfoundation.org, linux-mm@kvack.org,
liam.howlett@oracle.com, surenb@google.com, rppt@kernel.org,
Andrii Nakryiko <andrii@kernel.org>
Subject: [PATCH v3 4/9] fs/procfs: use per-VMA RCU-protected locking in PROCMAP_QUERY API
Date: Tue, 4 Jun 2024 17:24:49 -0700 [thread overview]
Message-ID: <20240605002459.4091285-5-andrii@kernel.org> (raw)
In-Reply-To: <20240605002459.4091285-1-andrii@kernel.org>
Attempt to use RCU-protected per-VMA lock when looking up requested VMA
as much as possible, only falling back to mmap_lock if per-VMA lock
failed. This is done so that querying of VMAs doesn't interfere with
other critical tasks, like page fault handling.
This has been suggested by mm folks, and we make use of a newly added
internal API that works like find_vma(), but tries to use per-VMA lock.
We have two sets of setup/query/teardown helper functions with different
implementations depending on availability of per-VMA lock (conditioned
on CONFIG_PER_VMA_LOCK) to abstract per-VMA lock subtleties.
When per-VMA lock is available, lookup is done under RCU, attempting to
take a per-VMA lock. If that fails, we fallback to mmap_lock, but then
proceed to unconditionally grab per-VMA lock again, dropping mmap_lock
immediately. In this configuration mmap_lock is never helf for long,
minimizing disruptions while querying.
When per-VMA lock is compiled out, we take mmap_lock once, query VMAs
using find_vma() API, and then unlock mmap_lock at the very end once as
well. In this setup we avoid locking/unlocking mmap_lock on every looked
up VMA (depending on query parameters we might need to iterate a few of
them).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
fs/proc/task_mmu.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 46 insertions(+)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 614fbe5d0667..140032ffc551 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -388,6 +388,49 @@ static int pid_maps_open(struct inode *inode, struct file *file)
PROCMAP_QUERY_VMA_FLAGS \
)
+#ifdef CONFIG_PER_VMA_LOCK
+static int query_vma_setup(struct mm_struct *mm)
+{
+ /* in the presence of per-VMA lock we don't need any setup/teardown */
+ return 0;
+}
+
+static void query_vma_teardown(struct mm_struct *mm, struct vm_area_struct *vma)
+{
+ /* in the presence of per-VMA lock we need to unlock vma, if present */
+ if (vma)
+ vma_end_read(vma);
+}
+
+static struct vm_area_struct *query_vma_find_by_addr(struct mm_struct *mm, unsigned long addr)
+{
+ struct vm_area_struct *vma;
+
+ /* try to use less disruptive per-VMA lock */
+ vma = find_and_lock_vma_rcu(mm, addr);
+ if (IS_ERR(vma)) {
+ /* failed to take per-VMA lock, fallback to mmap_lock */
+ if (mmap_read_lock_killable(mm))
+ return ERR_PTR(-EINTR);
+
+ vma = find_vma(mm, addr);
+ if (vma) {
+ /*
+ * We cannot use vma_start_read() as it may fail due to
+ * false locked (see comment in vma_start_read()). We
+ * can avoid that by directly locking vm_lock under
+ * mmap_lock, which guarantees that nobody can lock the
+ * vma for write (vma_start_write()) under us.
+ */
+ down_read(&vma->vm_lock->lock);
+ }
+
+ mmap_read_unlock(mm);
+ }
+
+ return vma;
+}
+#else
static int query_vma_setup(struct mm_struct *mm)
{
return mmap_read_lock_killable(mm);
@@ -402,6 +445,7 @@ static struct vm_area_struct *query_vma_find_by_addr(struct mm_struct *mm, unsig
{
return find_vma(mm, addr);
}
+#endif
static struct vm_area_struct *query_matching_vma(struct mm_struct *mm,
unsigned long addr, u32 flags)
@@ -441,8 +485,10 @@ static struct vm_area_struct *query_matching_vma(struct mm_struct *mm,
skip_vma:
/*
* If the user needs closest matching VMA, keep iterating.
+ * But before we proceed we might need to unlock current VMA.
*/
addr = vma->vm_end;
+ vma_end_read(vma); /* no-op under !CONFIG_PER_VMA_LOCK */
if (flags & PROCMAP_QUERY_COVERING_OR_NEXT_VMA)
goto next_vma;
no_vma:
--
2.43.0
next prev parent reply other threads:[~2024-06-05 0:25 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-05 0:24 [PATCH v3 0/9] ioctl()-based API to query VMAs from /proc/<pid>/maps Andrii Nakryiko
2024-06-05 0:24 ` [PATCH v3 1/9] mm: add find_vma()-like API but RCU protected and taking VMA lock Andrii Nakryiko
2024-06-05 0:57 ` Matthew Wilcox
2024-06-05 13:33 ` Liam R. Howlett
2024-06-05 16:13 ` Andrii Nakryiko
2024-06-05 16:24 ` Andrii Nakryiko
2024-06-05 16:27 ` Andrii Nakryiko
2024-06-05 17:03 ` Liam R. Howlett
2024-06-05 23:22 ` Suren Baghdasaryan
2024-06-06 16:51 ` Andrii Nakryiko
2024-06-06 17:13 ` Suren Baghdasaryan
2024-06-05 0:24 ` [PATCH v3 2/9] fs/procfs: extract logic for getting VMA name constituents Andrii Nakryiko
2024-06-05 0:24 ` [PATCH v3 3/9] fs/procfs: implement efficient VMA querying API for /proc/<pid>/maps Andrii Nakryiko
2024-06-07 22:31 ` Andrei Vagin
2024-06-10 8:17 ` Andrii Nakryiko
2024-06-12 17:48 ` Andrei Vagin
2024-06-05 0:24 ` Andrii Nakryiko [this message]
2024-06-05 23:15 ` [PATCH v3 4/9] fs/procfs: use per-VMA RCU-protected locking in PROCMAP_QUERY API Suren Baghdasaryan
2024-06-06 16:51 ` Andrii Nakryiko
2024-06-06 17:12 ` Suren Baghdasaryan
2024-06-06 18:03 ` Andrii Nakryiko
2024-06-06 17:15 ` Liam R. Howlett
2024-06-06 17:33 ` Suren Baghdasaryan
2024-06-06 18:07 ` Liam R. Howlett
2024-06-06 18:09 ` Andrii Nakryiko
2024-06-06 18:32 ` Liam R. Howlett
2024-06-05 0:24 ` [PATCH v3 5/9] fs/procfs: add build ID fetching to " Andrii Nakryiko
2024-06-05 0:24 ` [PATCH v3 6/9] docs/procfs: call out ioctl()-based PROCMAP_QUERY command existence Andrii Nakryiko
2024-06-05 0:24 ` [PATCH v3 7/9] tools: sync uapi/linux/fs.h header into tools subdir Andrii Nakryiko
2024-06-05 0:24 ` [PATCH v3 8/9] selftests/bpf: make use of PROCMAP_QUERY ioctl if available Andrii Nakryiko
2024-06-05 0:24 ` [PATCH v3 9/9] selftests/bpf: add simple benchmark tool for /proc/<pid>/maps APIs Andrii Nakryiko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240605002459.4091285-5-andrii@kernel.org \
--to=andrii@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=bpf@vger.kernel.org \
--cc=brauner@kernel.org \
--cc=gregkh@linuxfoundation.org \
--cc=liam.howlett@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).