From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: peterz@infradead.org, willy@infradead.org,
liam.howlett@oracle.com, lorenzo.stoakes@oracle.com,
david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz,
hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com,
mgorman@techsingularity.net, david@redhat.com,
peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net,
paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com,
hdanton@sina.com, hughd@google.com, lokeshgidra@google.com,
minchan@google.com, jannh@google.com, shakeel.butt@linux.dev,
souravpanda@google.com, pasha.tatashin@soleen.com,
klarasmodin@gmail.com, richard.weiyang@gmail.com,
corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, kernel-team@android.com,
surenb@google.com, "Liam R. Howlett" <Liam.Howlett@Oracle.com>
Subject: [PATCH v9 17/17] docs/mm: document latest changes to vm_lock
Date: Fri, 10 Jan 2025 20:26:04 -0800 [thread overview]
Message-ID: <20250111042604.3230628-18-surenb@google.com> (raw)
In-Reply-To: <20250111042604.3230628-1-surenb@google.com>
Change the documentation to reflect that vm_lock is integrated into vma
and replaced with vm_refcnt.
Document newly introduced vma_start_read_locked{_nested} functions.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
---
Documentation/mm/process_addrs.rst | 44 ++++++++++++++++++------------
1 file changed, 26 insertions(+), 18 deletions(-)
diff --git a/Documentation/mm/process_addrs.rst b/Documentation/mm/process_addrs.rst
index 81417fa2ed20..f573de936b5d 100644
--- a/Documentation/mm/process_addrs.rst
+++ b/Documentation/mm/process_addrs.rst
@@ -716,9 +716,14 @@ calls :c:func:`!rcu_read_lock` to ensure that the VMA is looked up in an RCU
critical section, then attempts to VMA lock it via :c:func:`!vma_start_read`,
before releasing the RCU lock via :c:func:`!rcu_read_unlock`.
-VMA read locks hold the read lock on the :c:member:`!vma->vm_lock` semaphore for
-their duration and the caller of :c:func:`!lock_vma_under_rcu` must release it
-via :c:func:`!vma_end_read`.
+In cases when the user already holds mmap read lock, :c:func:`!vma_start_read_locked`
+and :c:func:`!vma_start_read_locked_nested` can be used. These functions do not
+fail due to lock contention but the caller should still check their return values
+in case they fail for other reasons.
+
+VMA read locks increment :c:member:`!vma.vm_refcnt` reference counter for their
+duration and the caller of :c:func:`!lock_vma_under_rcu` must drop it via
+:c:func:`!vma_end_read`.
VMA **write** locks are acquired via :c:func:`!vma_start_write` in instances where a
VMA is about to be modified, unlike :c:func:`!vma_start_read` the lock is always
@@ -726,9 +731,9 @@ acquired. An mmap write lock **must** be held for the duration of the VMA write
lock, releasing or downgrading the mmap write lock also releases the VMA write
lock so there is no :c:func:`!vma_end_write` function.
-Note that a semaphore write lock is not held across a VMA lock. Rather, a
-sequence number is used for serialisation, and the write semaphore is only
-acquired at the point of write lock to update this.
+Note that when write-locking a VMA lock, the :c:member:`!vma.vm_refcnt` is temporarily
+modified so that readers can detect the presense of a writer. The reference counter is
+restored once the vma sequence number used for serialisation is updated.
This ensures the semantics we require - VMA write locks provide exclusive write
access to the VMA.
@@ -738,7 +743,7 @@ Implementation details
The VMA lock mechanism is designed to be a lightweight means of avoiding the use
of the heavily contended mmap lock. It is implemented using a combination of a
-read/write semaphore and sequence numbers belonging to the containing
+reference counter and sequence numbers belonging to the containing
:c:struct:`!struct mm_struct` and the VMA.
Read locks are acquired via :c:func:`!vma_start_read`, which is an optimistic
@@ -779,28 +784,31 @@ release of any VMA locks on its release makes sense, as you would never want to
keep VMAs locked across entirely separate write operations. It also maintains
correct lock ordering.
-Each time a VMA read lock is acquired, we acquire a read lock on the
-:c:member:`!vma->vm_lock` read/write semaphore and hold it, while checking that
-the sequence count of the VMA does not match that of the mm.
+Each time a VMA read lock is acquired, we increment :c:member:`!vma.vm_refcnt`
+reference counter and check that the sequence count of the VMA does not match
+that of the mm.
-If it does, the read lock fails. If it does not, we hold the lock, excluding
-writers, but permitting other readers, who will also obtain this lock under RCU.
+If it does, the read lock fails and :c:member:`!vma.vm_refcnt` is dropped.
+If it does not, we keep the reference counter raised, excluding writers, but
+permitting other readers, who can also obtain this lock under RCU.
Importantly, maple tree operations performed in :c:func:`!lock_vma_under_rcu`
are also RCU safe, so the whole read lock operation is guaranteed to function
correctly.
-On the write side, we acquire a write lock on the :c:member:`!vma->vm_lock`
-read/write semaphore, before setting the VMA's sequence number under this lock,
-also simultaneously holding the mmap write lock.
+On the write side, we set a bit in :c:member:`!vma.vm_refcnt` which can't be
+modified by readers and wait for all readers to drop their reference count.
+Once there are no readers, VMA's sequence number is set to match that of the
+mm. During this entire operation mmap write lock is held.
This way, if any read locks are in effect, :c:func:`!vma_start_write` will sleep
until these are finished and mutual exclusion is achieved.
-After setting the VMA's sequence number, the lock is released, avoiding
-complexity with a long-term held write lock.
+After setting the VMA's sequence number, the bit in :c:member:`!vma.vm_refcnt`
+indicating a writer is cleared. From this point on, VMA's sequence number will
+indicate VMA's write-locked state until mmap write lock is dropped or downgraded.
-This clever combination of a read/write semaphore and sequence count allows for
+This clever combination of a reference counter and sequence count allows for
fast RCU-based per-VMA lock acquisition (especially on page fault, though
utilised elsewhere) with minimal complexity around lock ordering.
--
2.47.1.613.gc27f4b7a9f-goog
next prev parent reply other threads:[~2025-01-11 4:26 UTC|newest]
Thread overview: 140+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-11 4:25 [PATCH v9 00/17] reimplement per-vma lock as a refcount Suren Baghdasaryan
2025-01-11 4:25 ` [PATCH v9 01/17] mm: introduce vma_start_read_locked{_nested} helpers Suren Baghdasaryan
2025-01-11 4:25 ` [PATCH v9 02/17] mm: move per-vma lock into vm_area_struct Suren Baghdasaryan
2025-01-11 4:25 ` [PATCH v9 03/17] mm: mark vma as detached until it's added into vma tree Suren Baghdasaryan
2025-01-11 4:25 ` [PATCH v9 04/17] mm: introduce vma_iter_store_attached() to use with attached vmas Suren Baghdasaryan
2025-01-13 11:58 ` Lorenzo Stoakes
2025-01-13 16:31 ` Suren Baghdasaryan
2025-01-13 16:44 ` Lorenzo Stoakes
2025-01-13 16:47 ` Lorenzo Stoakes
2025-01-13 19:09 ` Suren Baghdasaryan
2025-01-14 11:38 ` Lorenzo Stoakes
2025-01-11 4:25 ` [PATCH v9 05/17] mm: mark vmas detached upon exit Suren Baghdasaryan
2025-01-13 12:05 ` Lorenzo Stoakes
2025-01-13 17:02 ` Suren Baghdasaryan
2025-01-13 17:13 ` Lorenzo Stoakes
2025-01-13 19:11 ` Suren Baghdasaryan
2025-01-13 20:32 ` Vlastimil Babka
2025-01-13 20:42 ` Suren Baghdasaryan
2025-01-14 11:36 ` Lorenzo Stoakes
2025-01-11 4:25 ` [PATCH v9 06/17] types: move struct rcuwait into types.h Suren Baghdasaryan
2025-01-13 14:46 ` Lorenzo Stoakes
2025-01-11 4:25 ` [PATCH v9 07/17] mm: allow vma_start_read_locked/vma_start_read_locked_nested to fail Suren Baghdasaryan
2025-01-13 15:25 ` Lorenzo Stoakes
2025-01-13 17:53 ` Suren Baghdasaryan
2025-01-14 11:48 ` Lorenzo Stoakes
2025-01-11 4:25 ` [PATCH v9 08/17] mm: move mmap_init_lock() out of the header file Suren Baghdasaryan
2025-01-13 15:27 ` Lorenzo Stoakes
2025-01-13 17:53 ` Suren Baghdasaryan
2025-01-11 4:25 ` [PATCH v9 09/17] mm: uninline the main body of vma_start_write() Suren Baghdasaryan
2025-01-13 15:52 ` Lorenzo Stoakes
2025-01-11 4:25 ` [PATCH v9 10/17] refcount: introduce __refcount_{add|inc}_not_zero_limited Suren Baghdasaryan
2025-01-11 6:31 ` Hillf Danton
2025-01-11 9:59 ` Suren Baghdasaryan
2025-01-11 10:00 ` Suren Baghdasaryan
2025-01-11 12:13 ` Hillf Danton
2025-01-11 17:11 ` Suren Baghdasaryan
2025-01-11 23:44 ` Hillf Danton
2025-01-12 0:31 ` Suren Baghdasaryan
2025-01-15 9:39 ` Peter Zijlstra
2025-01-16 10:52 ` Hillf Danton
2025-01-11 12:39 ` David Laight
2025-01-11 17:07 ` Matthew Wilcox
2025-01-11 18:30 ` Paul E. McKenney
2025-01-11 22:19 ` David Laight
2025-01-11 22:50 ` [PATCH v9 10/17] refcount: introduce __refcount_{add|inc}_not_zero_limited - clang 17.0.1 bug David Laight
2025-01-12 11:37 ` David Laight
2025-01-12 17:56 ` Paul E. McKenney
2025-01-11 4:25 ` [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count Suren Baghdasaryan
2025-01-11 11:24 ` Mateusz Guzik
2025-01-11 20:14 ` Suren Baghdasaryan
2025-01-11 20:16 ` Suren Baghdasaryan
2025-01-11 20:31 ` Mateusz Guzik
2025-01-11 20:58 ` Suren Baghdasaryan
2025-01-11 20:38 ` Vlastimil Babka
2025-01-13 1:47 ` Wei Yang
2025-01-13 2:25 ` Wei Yang
2025-01-13 21:14 ` Suren Baghdasaryan
2025-01-13 21:08 ` Suren Baghdasaryan
2025-01-15 10:48 ` Peter Zijlstra
2025-01-15 11:13 ` Peter Zijlstra
2025-01-15 15:00 ` Suren Baghdasaryan
2025-01-15 15:35 ` Peter Zijlstra
2025-01-15 15:38 ` Peter Zijlstra
2025-01-15 16:22 ` Suren Baghdasaryan
2025-01-15 16:00 ` [PATCH] refcount: Strengthen inc_not_zero() Peter Zijlstra
2025-01-16 15:12 ` Suren Baghdasaryan
2025-01-17 15:41 ` Will Deacon
2025-01-27 14:09 ` Will Deacon
2025-01-27 19:21 ` Suren Baghdasaryan
2025-01-28 23:51 ` Suren Baghdasaryan
2025-02-06 2:52 ` [PATCH 1/1] refcount: provide ops for cases when object's memory can be reused Suren Baghdasaryan
2025-02-06 10:41 ` Vlastimil Babka
2025-02-06 3:03 ` [PATCH] refcount: Strengthen inc_not_zero() Suren Baghdasaryan
2025-02-13 23:04 ` Suren Baghdasaryan
2025-01-17 16:13 ` Matthew Wilcox
2025-01-12 2:59 ` [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count Wei Yang
2025-01-12 17:35 ` Suren Baghdasaryan
2025-01-13 0:59 ` Wei Yang
2025-01-13 2:37 ` Wei Yang
2025-01-13 21:16 ` Suren Baghdasaryan
2025-01-13 9:36 ` Wei Yang
2025-01-13 21:18 ` Suren Baghdasaryan
2025-01-15 2:58 ` Wei Yang
2025-01-15 3:12 ` Suren Baghdasaryan
2025-01-15 12:05 ` Wei Yang
2025-01-15 15:01 ` Suren Baghdasaryan
2025-01-16 1:37 ` Wei Yang
2025-01-16 1:41 ` Suren Baghdasaryan
2025-01-16 9:10 ` Wei Yang
2025-01-11 4:25 ` [PATCH v9 12/17] mm: move lesser used vma_area_struct members into the last cacheline Suren Baghdasaryan
2025-01-13 16:15 ` Lorenzo Stoakes
2025-01-15 10:50 ` Peter Zijlstra
2025-01-15 16:39 ` Suren Baghdasaryan
2025-02-13 22:59 ` Suren Baghdasaryan
2025-01-11 4:26 ` [PATCH v9 13/17] mm/debug: print vm_refcnt state when dumping the vma Suren Baghdasaryan
2025-01-13 16:21 ` Lorenzo Stoakes
2025-01-13 16:35 ` Liam R. Howlett
2025-01-13 17:57 ` Suren Baghdasaryan
2025-01-14 11:41 ` Lorenzo Stoakes
2025-01-11 4:26 ` [PATCH v9 14/17] mm: remove extra vma_numab_state_init() call Suren Baghdasaryan
2025-01-13 16:28 ` Lorenzo Stoakes
2025-01-13 17:56 ` Suren Baghdasaryan
2025-01-14 11:45 ` Lorenzo Stoakes
2025-01-11 4:26 ` [PATCH v9 15/17] mm: prepare lock_vma_under_rcu() for vma reuse possibility Suren Baghdasaryan
2025-01-11 4:26 ` [PATCH v9 16/17] mm: make vma cache SLAB_TYPESAFE_BY_RCU Suren Baghdasaryan
2025-01-15 2:27 ` Wei Yang
2025-01-15 3:15 ` Suren Baghdasaryan
2025-01-15 3:58 ` Liam R. Howlett
2025-01-15 5:41 ` Suren Baghdasaryan
2025-01-15 3:59 ` Mateusz Guzik
2025-01-15 5:47 ` Suren Baghdasaryan
2025-01-15 5:51 ` Mateusz Guzik
2025-01-15 6:41 ` Suren Baghdasaryan
2025-01-15 7:58 ` Vlastimil Babka
2025-01-15 15:10 ` Suren Baghdasaryan
2025-02-13 22:56 ` Suren Baghdasaryan
2025-01-15 12:17 ` Wei Yang
2025-01-15 21:46 ` Suren Baghdasaryan
2025-01-11 4:26 ` Suren Baghdasaryan [this message]
2025-01-13 16:33 ` [PATCH v9 17/17] docs/mm: document latest changes to vm_lock Lorenzo Stoakes
2025-01-13 17:56 ` Suren Baghdasaryan
2025-01-11 4:52 ` [PATCH v9 00/17] reimplement per-vma lock as a refcount Matthew Wilcox
2025-01-11 9:45 ` Suren Baghdasaryan
2025-01-13 12:14 ` Lorenzo Stoakes
2025-01-13 16:58 ` Suren Baghdasaryan
2025-01-13 17:11 ` Lorenzo Stoakes
2025-01-13 19:00 ` Suren Baghdasaryan
2025-01-14 11:35 ` Lorenzo Stoakes
2025-01-14 1:49 ` Andrew Morton
2025-01-14 2:53 ` Suren Baghdasaryan
2025-01-14 4:09 ` Andrew Morton
2025-01-14 9:09 ` Vlastimil Babka
2025-01-14 10:27 ` Hillf Danton
2025-01-14 9:47 ` Lorenzo Stoakes
2025-01-14 14:59 ` Liam R. Howlett
2025-01-14 15:54 ` Suren Baghdasaryan
2025-01-15 11:34 ` Lorenzo Stoakes
2025-01-15 15:14 ` Suren Baghdasaryan
2025-01-28 5:26 ` Shivank Garg
2025-01-28 5:50 ` Suren Baghdasaryan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250111042604.3230628-18-surenb@google.com \
--to=surenb@google.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=corbet@lwn.net \
--cc=dave@stgolabs.net \
--cc=david.laight.linux@gmail.com \
--cc=david@redhat.com \
--cc=dhowells@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=hdanton@sina.com \
--cc=hughd@google.com \
--cc=jannh@google.com \
--cc=kernel-team@android.com \
--cc=klarasmodin@gmail.com \
--cc=liam.howlett@oracle.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lokeshgidra@google.com \
--cc=lorenzo.stoakes@oracle.com \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=minchan@google.com \
--cc=mjguzik@gmail.com \
--cc=oleg@redhat.com \
--cc=oliver.sang@intel.com \
--cc=pasha.tatashin@soleen.com \
--cc=paulmck@kernel.org \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=richard.weiyang@gmail.com \
--cc=shakeel.butt@linux.dev \
--cc=souravpanda@google.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.