From: Andrei Vagin <avagin@gmail.com>
To: Andrii Nakryiko <andrii@kernel.org>
Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org,
viro@zeniv.linux.org.uk, akpm@linux-foundation.org,
linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
gregkh@linuxfoundation.org, linux-mm@kvack.org,
liam.howlett@oracle.com, surenb@google.com, rppt@kernel.org
Subject: Re: [PATCH v3 3/9] fs/procfs: implement efficient VMA querying API for /proc/<pid>/maps
Date: Fri, 7 Jun 2024 15:31:14 -0700 [thread overview]
Message-ID: <ZmOKMgZn_ki17UYM@gmail.com> (raw)
In-Reply-To: <20240605002459.4091285-4-andrii@kernel.org>
On Tue, Jun 04, 2024 at 05:24:48PM -0700, Andrii Nakryiko wrote:
> /proc/<pid>/maps file is extremely useful in practice for various tasks
> involving figuring out process memory layout, what files are backing any
> given memory range, etc. One important class of applications that
> absolutely rely on this are profilers/stack symbolizers (perf tool being one
> of them). Patterns of use differ, but they generally would fall into two
> categories.
>
> In on-demand pattern, a profiler/symbolizer would normally capture stack
> trace containing absolute memory addresses of some functions, and would
> then use /proc/<pid>/maps file to find corresponding backing ELF files
> (normally, only executable VMAs are of interest), file offsets within
> them, and then continue from there to get yet more information (ELF
> symbols, DWARF information) to get human-readable symbolic information.
> This pattern is used by Meta's fleet-wide profiler, as one example.
>
> In preprocessing pattern, application doesn't know the set of addresses
> of interest, so it has to fetch all relevant VMAs (again, probably only
> executable ones), store or cache them, then proceed with profiling and
> stack trace capture. Once done, it would do symbolization based on
> stored VMA information. This can happen at much later point in time.
> This patterns is used by perf tool, as an example.
>
> In either case, there are both performance and correctness requirement
> involved. This address to VMA information translation has to be done as
> efficiently as possible, but also not miss any VMA (especially in the
> case of loading/unloading shared libraries). In practice, correctness
> can't be guaranteed (due to process dying before VMA data can be
> captured, or shared library being unloaded, etc), but any effort to
> maximize the chance of finding the VMA is appreciated.
>
> Unfortunately, for all the /proc/<pid>/maps file universality and
> usefulness, it doesn't fit the above use cases 100%.
>
> First, it's main purpose is to emit all VMAs sequentially, but in
> practice captured addresses would fall only into a smaller subset of all
> process' VMAs, mainly containing executable text. Yet, library would
> need to parse most or all of the contents to find needed VMAs, as there
> is no way to skip VMAs that are of no use. Efficient library can do the
> linear pass and it is still relatively efficient, but it's definitely an
> overhead that can be avoided, if there was a way to do more targeted
> querying of the relevant VMA information.
>
> Second, it's a text based interface, which makes its programmatic use from
> applications and libraries more cumbersome and inefficient due to the
> need to handle text parsing to get necessary pieces of information. The
> overhead is actually payed both by kernel, formatting originally binary
> VMA data into text, and then by user space application, parsing it back
> into binary data for further use.
I was trying to solve all these issues in a more generic way:
https://lwn.net/Articles/683371/
We definitely interested in this new interface to use it in CRIU.
<snip>
> +
> + if (karg.vma_name_size) {
> + size_t name_buf_sz = min_t(size_t, PATH_MAX, karg.vma_name_size);
> + const struct path *path;
> + const char *name_fmt;
> + size_t name_sz = 0;
> +
> + get_vma_name(vma, &path, &name, &name_fmt);
> +
> + if (path || name_fmt || name) {
> + name_buf = kmalloc(name_buf_sz, GFP_KERNEL);
> + if (!name_buf) {
> + err = -ENOMEM;
> + goto out;
> + }
> + }
> + if (path) {
> + name = d_path(path, name_buf, name_buf_sz);
> + if (IS_ERR(name)) {
> + err = PTR_ERR(name);
> + goto out;
It always fails if a file path name is longer than PATH_MAX.
Can we add a flag to indicate whether file names are needed to be
resolved? In criu, we use special names like "vvar", "vdso", but we dump
files via /proc/pid/map_files.
> + }
> + name_sz = name_buf + name_buf_sz - name;
> + } else if (name || name_fmt) {
> + name_sz = 1 + snprintf(name_buf, name_buf_sz, name_fmt ?: "%s", name);
> + name = name_buf;
> + }
> + if (name_sz > name_buf_sz) {
> + err = -ENAMETOOLONG;
> + goto out;
> + }
> + karg.vma_name_size = name_sz;
> + }
Thanks,
Andrei
next prev parent reply other threads:[~2024-06-07 22:31 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-05 0:24 [PATCH v3 0/9] ioctl()-based API to query VMAs from /proc/<pid>/maps Andrii Nakryiko
2024-06-05 0:24 ` [PATCH v3 1/9] mm: add find_vma()-like API but RCU protected and taking VMA lock Andrii Nakryiko
2024-06-05 0:57 ` Matthew Wilcox
2024-06-05 13:33 ` Liam R. Howlett
2024-06-05 16:13 ` Andrii Nakryiko
2024-06-05 16:24 ` Andrii Nakryiko
2024-06-05 16:27 ` Andrii Nakryiko
2024-06-05 17:03 ` Liam R. Howlett
2024-06-05 23:22 ` Suren Baghdasaryan
2024-06-06 16:51 ` Andrii Nakryiko
2024-06-06 17:13 ` Suren Baghdasaryan
2024-06-05 0:24 ` [PATCH v3 2/9] fs/procfs: extract logic for getting VMA name constituents Andrii Nakryiko
2024-06-05 0:24 ` [PATCH v3 3/9] fs/procfs: implement efficient VMA querying API for /proc/<pid>/maps Andrii Nakryiko
2024-06-07 22:31 ` Andrei Vagin [this message]
2024-06-10 8:17 ` Andrii Nakryiko
2024-06-12 17:48 ` Andrei Vagin
2024-06-05 0:24 ` [PATCH v3 4/9] fs/procfs: use per-VMA RCU-protected locking in PROCMAP_QUERY API Andrii Nakryiko
2024-06-05 23:15 ` Suren Baghdasaryan
2024-06-06 16:51 ` Andrii Nakryiko
2024-06-06 17:12 ` Suren Baghdasaryan
2024-06-06 18:03 ` Andrii Nakryiko
2024-06-06 17:15 ` Liam R. Howlett
2024-06-06 17:33 ` Suren Baghdasaryan
2024-06-06 18:07 ` Liam R. Howlett
2024-06-06 18:09 ` Andrii Nakryiko
2024-06-06 18:32 ` Liam R. Howlett
2024-06-05 0:24 ` [PATCH v3 5/9] fs/procfs: add build ID fetching to " Andrii Nakryiko
2024-06-05 0:24 ` [PATCH v3 6/9] docs/procfs: call out ioctl()-based PROCMAP_QUERY command existence Andrii Nakryiko
2024-06-05 0:24 ` [PATCH v3 7/9] tools: sync uapi/linux/fs.h header into tools subdir Andrii Nakryiko
2024-06-05 0:24 ` [PATCH v3 8/9] selftests/bpf: make use of PROCMAP_QUERY ioctl if available Andrii Nakryiko
2024-06-05 0:24 ` [PATCH v3 9/9] selftests/bpf: add simple benchmark tool for /proc/<pid>/maps APIs Andrii Nakryiko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZmOKMgZn_ki17UYM@gmail.com \
--to=avagin@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brauner@kernel.org \
--cc=gregkh@linuxfoundation.org \
--cc=liam.howlett@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.