linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com,
	david@redhat.com,  vbabka@suse.cz, peterx@redhat.com,
	jannh@google.com, hannes@cmpxchg.org,  mhocko@kernel.org,
	paulmck@kernel.org, shuah@kernel.org, adobriyan@gmail.com,
	 brauner@kernel.org, josef@toxicpanda.com, yebin10@huawei.com,
	 linux@weissschuh.net, willy@infradead.org, osalvador@suse.de,
	 andrii@kernel.org, ryan.roberts@arm.com,
	christophe.leroy@csgroup.eu,  tjmercier@google.com,
	kaleshsingh@google.com, linux-kernel@vger.kernel.org,
	 linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	 linux-kselftest@vger.kernel.org, surenb@google.com
Subject: [PATCH v4 0/7] use per-vma locks for /proc/pid/maps reads and PROCMAP_QUERY
Date: Wed,  4 Jun 2025 16:11:44 -0700	[thread overview]
Message-ID: <20250604231151.799834-1-surenb@google.com> (raw)

Reading /proc/pid/maps requires read-locking mmap_lock which prevents any
other task from concurrently modifying the address space. This guarantees
coherent reporting of virtual address ranges, however it can block
important updates from happening. Oftentimes /proc/pid/maps readers are
low priority monitoring tasks and them blocking high priority tasks
results in priority inversion.

Locking the entire address space is required to present fully coherent
picture of the address space, however even current implementation does not
strictly guarantee that by outputting vmas in page-size chunks and
dropping mmap_lock in between each chunk. Address space modifications are
possible while mmap_lock is dropped and userspace reading the content is
expected to deal with possible concurrent address space modifications.
Considering these relaxed rules, holding mmap_lock is not strictly needed
as long as we can guarantee that a concurrently modified vma is reported
either in its original form or after it was modified.

This patchset switches from holding mmap_lock while reading /proc/pid/maps
to taking per-vma locks as we walk the vma tree. This reduces the
contention with tasks modifying the address space because they would have
to contend for the same vma as opposed to the entire address space. Same
is done for PROCMAP_QUERY ioctl which locks only the vma that fell into
the requested range instead of the entire address space. Previous version
of this patchset [1] tried to perform /proc/pid/maps reading under RCU,
however its implementation is quite complex and the results are worse than
the new version because it still relied on mmap_lock speculation which
retries if any part of the address space gets modified. New implementaion
is both simpler and results in less contention. Note that similar approach
would not work for /proc/pid/smaps reading as it also walks the page table
and that's not RCU-safe.

Paul McKenney's designed a test [2] to measure mmap/munmap latencies while
concurrently reading /proc/pid/maps. The test has a pair of processes
scanning /proc/PID/maps, and another process unmapping and remapping 4K
pages from a 128MB range of anonymous memory.  At the end of each 10
second run, the latency of each mmap() or munmap() operation is measured,
and for each run the maximum and mean latency is printed. The map/unmap
process is started first, its PID is passed to the scanners, and then the
map/unmap process waits until both scanners are running before starting
its timed test.  The scanners keep scanning until the specified
/proc/PID/maps file disappears. This test registered close to 10x
improvement in update latencies:

Before the change:
./run-proc-vs-map.sh --nsamples 100 --rawdata -- --busyduration 2
    0.011     0.008     0.455
    0.011     0.008     0.472
    0.011     0.008     0.535
    0.011     0.009     0.545
    ...
    0.011     0.014     2.875
    0.011     0.014     2.913
    0.011     0.014     3.007
    0.011     0.015     3.018

After the change:
./run-proc-vs-map.sh --nsamples 100 --rawdata -- --busyduration 2
    0.006     0.005     0.036
    0.006     0.005     0.039
    0.006     0.005     0.039
    0.006     0.005     0.039
    ...
    0.006     0.006     0.403
    0.006     0.006     0.474
    0.006     0.006     0.479
    0.006     0.006     0.498

The patchset also adds a number of tests to check for /proc/pid/maps data
coherency. They are designed to detect any unexpected data tearing while
performing some common address space modifications (vma split, resize and
remap). Even before these changes, reading /proc/pid/maps might have
inconsistent data because the file is read page-by-page with mmap_lock
being dropped between the pages. An example of user-visible inconsistency
can be that the same vma is printed twice: once before it was modified and
then after the modifications. For example if vma was extended, it might be
found and reported twice. What is not expected is to see a gap where there
should have been a vma both before and after modification. This patchset
increases the chances of such tearing, therefore it's event more important
now to test for unexpected inconsistencies.

[1] https://lore.kernel.org/all/20250418174959.1431962-1-surenb@google.com/
[2] https://github.com/paulmckrcu/proc-mmap_sem-test

Suren Baghdasaryan (7):
  selftests/proc: add /proc/pid/maps tearing from vma split test
  selftests/proc: extend /proc/pid/maps tearing test to include vma
    resizing
  selftests/proc: extend /proc/pid/maps tearing test to include vma
    remapping
  selftests/proc: test PROCMAP_QUERY ioctl while vma is concurrently
    modified
  selftests/proc: add verbose more for tests to facilitate debugging
  mm/maps: read proc/pid/maps under per-vma lock
  mm/maps: execute PROCMAP_QUERY ioctl under per-vma locks

 fs/proc/internal.h                         |   6 +
 fs/proc/task_mmu.c                         | 233 +++++-
 tools/testing/selftests/proc/proc-pid-vm.c | 793 ++++++++++++++++++++-
 3 files changed, 1011 insertions(+), 21 deletions(-)


base-commit: 2d0c297637e7d59771c1533847c666cdddc19884
-- 
2.49.0.1266.g31b7d2e469-goog



             reply	other threads:[~2025-06-04 23:11 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-04 23:11 Suren Baghdasaryan [this message]
2025-06-04 23:11 ` [PATCH v4 1/7] selftests/proc: add /proc/pid/maps tearing from vma split test Suren Baghdasaryan
2025-06-04 23:11 ` [PATCH v4 2/7] selftests/proc: extend /proc/pid/maps tearing test to include vma resizing Suren Baghdasaryan
2025-06-04 23:11 ` [PATCH v4 3/7] selftests/proc: extend /proc/pid/maps tearing test to include vma remapping Suren Baghdasaryan
2025-06-04 23:11 ` [PATCH v4 4/7] selftests/proc: test PROCMAP_QUERY ioctl while vma is concurrently modified Suren Baghdasaryan
2025-06-04 23:11 ` [PATCH v4 5/7] selftests/proc: add verbose more for tests to facilitate debugging Suren Baghdasaryan
2025-06-04 23:11 ` [PATCH v4 6/7] mm/maps: read proc/pid/maps under per-vma lock Suren Baghdasaryan
2025-06-07 17:43   ` Lorenzo Stoakes
2025-06-08  1:41     ` Suren Baghdasaryan
2025-06-10 17:43       ` Lorenzo Stoakes
2025-06-11  0:16         ` Suren Baghdasaryan
2025-06-11 10:24           ` Lorenzo Stoakes
2025-06-11 15:12             ` Suren Baghdasaryan
2025-06-10  7:50   ` kernel test robot
2025-06-10 14:02     ` Suren Baghdasaryan
2025-06-04 23:11 ` [PATCH v4 7/7] mm/maps: execute PROCMAP_QUERY ioctl under per-vma locks Suren Baghdasaryan
2025-06-13 20:36   ` Andrii Nakryiko
2025-06-13 15:01 ` [PATCH v4 0/7] use per-vma locks for /proc/pid/maps reads and PROCMAP_QUERY Lorenzo Stoakes
2025-06-13 19:11   ` Suren Baghdasaryan
2025-06-13 19:26     ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250604231151.799834-1-surenb@google.com \
    --to=surenb@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=brauner@kernel.org \
    --cc=christophe.leroy@csgroup.eu \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=jannh@google.com \
    --cc=josef@toxicpanda.com \
    --cc=kaleshsingh@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@weissschuh.net \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@kernel.org \
    --cc=osalvador@suse.de \
    --cc=paulmck@kernel.org \
    --cc=peterx@redhat.com \
    --cc=ryan.roberts@arm.com \
    --cc=shuah@kernel.org \
    --cc=tjmercier@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yebin10@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).