From: Lorenzo Stoakes <ljs@kernel.org>
To: Rik van Riel <riel@surriel.com>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org, linux-mm@kvack.org,
Thomas Gleixner <tglx@kernel.org>,
Ingo Molnar <mingo@redhat.com>, Dmitry Ilvokhin <d@ilvokhin.com>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
"Liam R. Howlett" <liam@infradead.org>,
Vlastimil Babka <vbabka@kernel.org>,
Suren Baghdasaryan <surenb@google.com>
Subject: Re: [PATCH 2/3] mm/pagewalk: let folio_walk_start() run under the per-VMA lock
Date: Fri, 19 Jun 2026 13:34:53 +0100 [thread overview]
Message-ID: <ajU1AdbQVkk4AOKs@lucifer> (raw)
In-Reply-To: <20260616190300.1509639-3-riel@surriel.com>
On Tue, Jun 16, 2026 at 03:02:59PM -0400, Rik van Riel wrote:
> folio_walk_start() asserts that the mmap lock is held. For callers that
> only need to read a single, already-present page, the mmap lock is a
> heavy and often badly contended hammer: the VMA can instead be
> stabilized with the per-VMA lock, and the page table pages that are
> walked are kept alive by RCU page-table freeing
> (CONFIG_MMU_GATHER_RCU_TABLE_FREE).
See below, I don't think this is correct?
>
> Add an FW_VMA_LOCKED flag. When passed, folio_walk_start() asserts the
> per-VMA lock instead of the mmap lock, requires RCU-freed page tables,
> and refuses hugetlb VMAs (PMD sharing cannot be walked safely this way).
This is mostly superfluous. You can just say you added the flag to use a VMA
flag. You put in parens the key thing about hugetlb, I think you should break
that out.
> Everything else folio_walk_start() relies on -- the page table locks,
> pmdp_get_lockless() and pte_offset_map_lock() -- is already safe without
is -> are.
> the mmap lock, mirroring the per-VMA lock page fault path.
I'm not sure I understand why you have to have RCU freed page tables but then
say that you didn't need it here? Strange to reference arbitrary functions from
folio_walk_start() too.
>
> No existing caller passes FW_VMA_LOCKED, so behaviour is unchanged.
>
> Assisted-by: Claude:claude-opus-4-8
> Signed-off-by: Rik van Riel <riel@surriel.com>
> ---
> include/linux/pagewalk.h | 5 +++++
> mm/pagewalk.c | 18 ++++++++++++++++--
> 2 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
> index b41d7265c01b..84dd0d68f747 100644
> --- a/include/linux/pagewalk.h
> +++ b/include/linux/pagewalk.h
> @@ -150,6 +150,11 @@ typedef int __bitwise folio_walk_flags_t;
>
> /* Walk shared zeropages (small + huge) as well. */
> #define FW_ZEROPAGE ((__force folio_walk_flags_t)BIT(0))
> +/*
> + * The caller holds the per-VMA lock instead of the mmap lock. Only valid with
> + * RCU-freed page tables (CONFIG_MMU_GATHER_RCU_TABLE_FREE) and not for hugetlb.
> + */
Hang on how could we be freeing higher level page tables of a VMA that's still
locked?
A VMA lock stabilises page tables for traversal, so why do you require
CONFIG_MMU_GATHER_RCU_TABLE_FREE here?
What will free the higher-level page tables?
Ref: https://origin.kernel.org/doc/html/latest/mm/process_addrs.html#page-table
> +#define FW_VMA_LOCKED ((__force folio_walk_flags_t)BIT(1))
>
> enum folio_walk_level {
> FW_LEVEL_PTE,
> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index 3ae2586ff45b..c85364b73e12 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -890,7 +890,9 @@ int walk_page_mapping(struct address_space *mapping, pgoff_t first_index,
> * huge_ptep_set_*, ...). Note that the page table entry stored in @fw might
> * not correspond to the first physical entry of a logical hugetlb entry.
> *
> - * The mmap lock must be held in read mode.
> + * The mmap lock must be held in read mode. Alternatively, if @FW_VMA_LOCKED is
> + * passed, the VMA's per-VMA lock must be held (only supported with RCU-freed
> + * page tables, i.e. CONFIG_MMU_GATHER_RCU_TABLE_FREE, and not for hugetlb).
> *
> * Return: folio pointer on success, otherwise NULL.
> */
> @@ -908,7 +910,19 @@ struct folio *folio_walk_start(struct folio_walk *fw,
> pgd_t *pgdp;
> p4d_t *p4dp;
>
> - mmap_assert_locked(vma->vm_mm);
> + if (flags & FW_VMA_LOCKED) {
> + /*
> + * Lockless walk: the per-VMA lock keeps the VMA stable, and
> + * RCU-freed page tables keep the walked page table pages alive
> + * across the lockless upper-level walk and pte_offset_map_lock().
Err, but we take locks as normal on the walk?
> + * Hugetlb (PMD sharing) is not supported on this path.
I don't get the explanation above and then you just write a line that says what
your assert is doing with zero explanation here.
You should explain why hugetlb isn't supported.
> + */
> + VM_WARN_ON_ONCE(!IS_ENABLED(CONFIG_MMU_GATHER_RCU_TABLE_FREE));
> + VM_WARN_ON_ONCE(is_vm_hugetlb_page(vma));
> + vma_assert_locked(vma);
> + } else {
> + mmap_assert_locked(vma->vm_mm);
> + }
> vma_pgtable_walk_begin(vma);
>
> if (WARN_ON_ONCE(addr < vma->vm_start || addr >= vma->vm_end))
> --
> 2.53.0-Meta
>
Thanks, Lorenzo
next prev parent reply other threads:[~2026-06-19 12:35 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-16 19:02 [PATCH 0/3] mm: __access_remote_vm with per-VMA lock Rik van Riel
2026-06-16 19:02 ` [PATCH 1/3] x86/mm: use READ_ONCE/WRITE_ONCE for mm->context.untag_mask Rik van Riel
2026-06-18 16:40 ` Usama Arif
2026-06-16 19:02 ` [PATCH 2/3] mm/pagewalk: let folio_walk_start() run under the per-VMA lock Rik van Riel
2026-06-19 12:34 ` Lorenzo Stoakes [this message]
2026-06-16 19:03 ` [PATCH 3/3] mm: read remote memory without the mmap lock where possible Rik van Riel
2026-06-17 6:19 ` Suren Baghdasaryan
2026-06-19 12:24 ` Lorenzo Stoakes
2026-06-19 13:46 ` Rik van Riel
2026-06-19 14:03 ` Suren Baghdasaryan
2026-06-19 14:33 ` David Hildenbrand (Arm)
2026-06-18 17:01 ` Usama Arif
2026-06-18 17:07 ` David Hildenbrand (Arm)
2026-06-18 17:22 ` Usama Arif
2026-06-19 12:20 ` Lorenzo Stoakes
2026-06-17 1:10 ` [PATCH 0/3] mm: __access_remote_vm with per-VMA lock Suren Baghdasaryan
2026-06-17 9:42 ` David Hildenbrand (Arm)
2026-06-17 13:33 ` Suren Baghdasaryan
2026-06-18 20:37 ` David Hildenbrand (Arm)
2026-06-19 12:43 ` Lorenzo Stoakes
2026-06-19 13:04 ` David Hildenbrand (Arm)
2026-06-19 14:19 ` Suren Baghdasaryan
2026-06-19 14:27 ` David Hildenbrand (Arm)
2026-06-19 15:07 ` Suren Baghdasaryan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ajU1AdbQVkk4AOKs@lucifer \
--to=ljs@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=d@ilvokhin.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@kernel.org \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=riel@surriel.com \
--cc=surenb@google.com \
--cc=tglx@kernel.org \
--cc=vbabka@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.