* [merged mm-stable] docs-mm-expand-vma-doc-to-highlight-pte-freeing-non-vma-traversal.patch removed from -mm tree
@ 2025-07-10 5:43 Andrew Morton
0 siblings, 0 replies; only message in thread
From: Andrew Morton @ 2025-07-10 5:43 UTC (permalink / raw)
To: mm-commits, zhengqi.arch, vbabka, surenb, shakeel.butt,
liam.howlett, jannh, corbet, bagasdotme, lorenzo.stoakes, akpm
The quilt patch titled
Subject: docs/mm: expand vma doc to highlight pte freeing, non-vma traversal
has been removed from the -mm tree. Its filename was
docs-mm-expand-vma-doc-to-highlight-pte-freeing-non-vma-traversal.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Subject: docs/mm: expand vma doc to highlight pte freeing, non-vma traversal
Date: Wed, 4 Jun 2025 19:03:08 +0100
The process addresses documentation already contains a great deal of
information about mmap/VMA locking and page table traversal and
manipulation.
However it waves it hands about non-VMA traversal. Add a section for this
and explain the caveats around this kind of traversal.
Additionally, commit 6375e95f381e ("mm: pgtable: reclaim empty PTE page in
madvise(MADV_DONTNEED)") caused zapping to also free empty PTE page
tables. Highlight this.
Link: https://lkml.kernel.org/r/20250604180308.137116-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/mm/process_addrs.rst | 54 ++++++++++++++++++++++++---
1 file changed, 48 insertions(+), 6 deletions(-)
--- a/Documentation/mm/process_addrs.rst~docs-mm-expand-vma-doc-to-highlight-pte-freeing-non-vma-traversal
+++ a/Documentation/mm/process_addrs.rst
@@ -303,7 +303,9 @@ There are four key operations typically
1. **Traversing** page tables - Simply reading page tables in order to traverse
them. This only requires that the VMA is kept stable, so a lock which
establishes this suffices for traversal (there are also lockless variants
- which eliminate even this requirement, such as :c:func:`!gup_fast`).
+ which eliminate even this requirement, such as :c:func:`!gup_fast`). There is
+ also a special case of page table traversal for non-VMA regions which we
+ consider separately below.
2. **Installing** page table mappings - Whether creating a new mapping or
modifying an existing one in such a way as to change its identity. This
requires that the VMA is kept stable via an mmap or VMA lock (explicitly not
@@ -335,15 +337,13 @@ ahead and perform these operations on pa
operations that perform writes also acquire internal page table locks to
serialise - see the page table implementation detail section for more details).
+.. note:: We free empty PTE tables on zap under the RCU lock - this does not
+ change the aforementioned locking requirements around zapping.
+
When **installing** page table entries, the mmap or VMA lock must be held to
keep the VMA stable. We explore why this is in the page table locking details
section below.
-.. warning:: Page tables are normally only traversed in regions covered by VMAs.
- If you want to traverse page tables in areas that might not be
- covered by VMAs, heavier locking is required.
- See :c:func:`!walk_page_range_novma` for details.
-
**Freeing** page tables is an entirely internal memory management operation and
has special requirements (see the page freeing section below for more details).
@@ -355,6 +355,44 @@ has special requirements (see the page f
from the reverse mappings, but no other VMAs can be permitted to be
accessible and span the specified range.
+Traversing non-VMA page tables
+------------------------------
+
+We've focused above on traversal of page tables belonging to VMAs. It is also
+possible to traverse page tables which are not represented by VMAs.
+
+Kernel page table mappings themselves are generally managed but whatever part of
+the kernel established them and the aforementioned locking rules do not apply -
+for instance vmalloc has its own set of locks which are utilised for
+establishing and tearing down page its page tables.
+
+However, for convenience we provide the :c:func:`!walk_kernel_page_table_range`
+function which is synchronised via the mmap lock on the :c:macro:`!init_mm`
+kernel instantiation of the :c:struct:`!struct mm_struct` metadata object.
+
+If an operation requires exclusive access, a write lock is used, but if not, a
+read lock suffices - we assert only that at least a read lock has been acquired.
+
+Since, aside from vmalloc and memory hot plug, kernel page tables are not torn
+down all that often - this usually suffices, however any caller of this
+functionality must ensure that any additionally required locks are acquired in
+advance.
+
+We also permit a truly unusual case is the traversal of non-VMA ranges in
+**userland** ranges, as provided for by :c:func:`!walk_page_range_debug`.
+
+This has only one user - the general page table dumping logic (implemented in
+:c:macro:`!mm/ptdump.c`) - which seeks to expose all mappings for debug purposes
+even if they are highly unusual (possibly architecture-specific) and are not
+backed by a VMA.
+
+We must take great care in this case, as the :c:func:`!munmap` implementation
+detaches VMAs under an mmap write lock before tearing down page tables under a
+downgraded mmap read lock.
+
+This means such an operation could race with this, and thus an mmap **write**
+lock is required.
+
Lock ordering
-------------
@@ -461,6 +499,10 @@ Locking Implementation Details
Page table locking details
--------------------------
+.. note:: This section explores page table locking requirements for page tables
+ encompassed by a VMA. See the above section on non-VMA page table
+ traversal for details on how we handle that case.
+
In addition to the locks described in the terminology section above, we have
additional locks dedicated to page tables:
_
Patches currently in -mm which might be from lorenzo.stoakes@oracle.com are
mm-madvise-remove-the-visitor-pattern-and-thread-anon_vma-state.patch
mm-madvise-thread-mm_struct-through-madvise_behavior.patch
mm-madvise-thread-vma-range-state-through-madvise_behavior.patch
mm-madvise-thread-all-madvise-state-through-madv_behavior.patch
mm-madvise-eliminate-very-confusing-manipulation-of-prev-vma.patch
mm-madvise-eliminate-very-confusing-manipulation-of-prev-vma-fix.patch
tools-testing-selftests-add-mremap-unfaulted-faulted-test-cases.patch
mm-mremap-perform-some-simple-cleanups.patch
mm-mremap-refactor-initial-parameter-sanity-checks.patch
mm-mremap-put-vma-check-and-prep-logic-into-helper-function.patch
mm-mremap-cleanup-post-processing-stage-of-mremap.patch
mm-mremap-use-an-explicit-uffd-failure-path-for-mremap.patch
mm-mremap-use-an-explicit-uffd-failure-path-for-mremap-fix.patch
mm-mremap-check-remap-conditions-earlier.patch
mm-mremap-move-remap_is_valid-into-check_prep_vma.patch
mm-mremap-clean-up-mlock-populate-behaviour.patch
mm-mremap-permit-mremap-move-of-multiple-vmas.patch
tools-testing-selftests-extend-mremap_test-to-test-multi-vma-mremap.patch
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2025-07-10 5:43 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-10 5:43 [merged mm-stable] docs-mm-expand-vma-doc-to-highlight-pte-freeing-non-vma-traversal.patch removed from -mm tree Andrew Morton
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.