Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <ljs@kernel.org>
To: Rik van Riel <riel@surriel.com>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org, linux-mm@kvack.org,
	 Thomas Gleixner <tglx@kernel.org>,
	Ingo Molnar <mingo@redhat.com>, Dmitry Ilvokhin <d@ilvokhin.com>,
	 Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	 "Liam R. Howlett" <liam@infradead.org>,
	Vlastimil Babka <vbabka@kernel.org>,
	 Suren Baghdasaryan <surenb@google.com>
Subject: Re: [PATCH 2/3] mm/pagewalk: let folio_walk_start() run under the per-VMA lock
Date: Fri, 19 Jun 2026 13:34:53 +0100	[thread overview]
Message-ID: <ajU1AdbQVkk4AOKs@lucifer> (raw)
In-Reply-To: <20260616190300.1509639-3-riel@surriel.com>

On Tue, Jun 16, 2026 at 03:02:59PM -0400, Rik van Riel wrote:
> folio_walk_start() asserts that the mmap lock is held.  For callers that
> only need to read a single, already-present page, the mmap lock is a
> heavy and often badly contended hammer: the VMA can instead be
> stabilized with the per-VMA lock, and the page table pages that are
> walked are kept alive by RCU page-table freeing
> (CONFIG_MMU_GATHER_RCU_TABLE_FREE).

See below, I don't think this is correct?

>
> Add an FW_VMA_LOCKED flag.  When passed, folio_walk_start() asserts the
> per-VMA lock instead of the mmap lock, requires RCU-freed page tables,
> and refuses hugetlb VMAs (PMD sharing cannot be walked safely this way).

This is mostly superfluous. You can just say you added the flag to use a VMA
flag. You put in parens the key thing about hugetlb, I think you should break
that out.

> Everything else folio_walk_start() relies on -- the page table locks,
> pmdp_get_lockless() and pte_offset_map_lock() -- is already safe without

is -> are.

> the mmap lock, mirroring the per-VMA lock page fault path.

I'm not sure I understand why you have to have RCU freed page tables but then
say that you didn't need it here? Strange to reference arbitrary functions from
folio_walk_start() too.

>
> No existing caller passes FW_VMA_LOCKED, so behaviour is unchanged.
>
> Assisted-by: Claude:claude-opus-4-8
> Signed-off-by: Rik van Riel <riel@surriel.com>
> ---
>  include/linux/pagewalk.h |  5 +++++
>  mm/pagewalk.c            | 18 ++++++++++++++++--
>  2 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
> index b41d7265c01b..84dd0d68f747 100644
> --- a/include/linux/pagewalk.h
> +++ b/include/linux/pagewalk.h
> @@ -150,6 +150,11 @@ typedef int __bitwise folio_walk_flags_t;
>
>  /* Walk shared zeropages (small + huge) as well. */
>  #define FW_ZEROPAGE			((__force folio_walk_flags_t)BIT(0))
> +/*
> + * The caller holds the per-VMA lock instead of the mmap lock. Only valid with
> + * RCU-freed page tables (CONFIG_MMU_GATHER_RCU_TABLE_FREE) and not for hugetlb.
> + */

Hang on how could we be freeing higher level page tables of a VMA that's still
locked?

A VMA lock stabilises page tables for traversal, so why do you require
CONFIG_MMU_GATHER_RCU_TABLE_FREE here?

What will free the higher-level page tables?

Ref: https://origin.kernel.org/doc/html/latest/mm/process_addrs.html#page-table


> +#define FW_VMA_LOCKED			((__force folio_walk_flags_t)BIT(1))
>
>  enum folio_walk_level {
>  	FW_LEVEL_PTE,
> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index 3ae2586ff45b..c85364b73e12 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -890,7 +890,9 @@ int walk_page_mapping(struct address_space *mapping, pgoff_t first_index,
>   * huge_ptep_set_*, ...). Note that the page table entry stored in @fw might
>   * not correspond to the first physical entry of a logical hugetlb entry.
>   *
> - * The mmap lock must be held in read mode.
> + * The mmap lock must be held in read mode. Alternatively, if @FW_VMA_LOCKED is
> + * passed, the VMA's per-VMA lock must be held (only supported with RCU-freed
> + * page tables, i.e. CONFIG_MMU_GATHER_RCU_TABLE_FREE, and not for hugetlb).
>   *
>   * Return: folio pointer on success, otherwise NULL.
>   */
> @@ -908,7 +910,19 @@ struct folio *folio_walk_start(struct folio_walk *fw,
>  	pgd_t *pgdp;
>  	p4d_t *p4dp;
>
> -	mmap_assert_locked(vma->vm_mm);
> +	if (flags & FW_VMA_LOCKED) {
> +		/*
> +		 * Lockless walk: the per-VMA lock keeps the VMA stable, and
> +		 * RCU-freed page tables keep the walked page table pages alive
> +		 * across the lockless upper-level walk and pte_offset_map_lock().

Err, but we take locks as normal on the walk?

> +		 * Hugetlb (PMD sharing) is not supported on this path.

I don't get the explanation above and then you just write a line that says what
your assert is doing with zero explanation here.

You should explain why hugetlb isn't supported.

> +		 */
> +		VM_WARN_ON_ONCE(!IS_ENABLED(CONFIG_MMU_GATHER_RCU_TABLE_FREE));
> +		VM_WARN_ON_ONCE(is_vm_hugetlb_page(vma));
> +		vma_assert_locked(vma);
> +	} else {
> +		mmap_assert_locked(vma->vm_mm);
> +	}
>  	vma_pgtable_walk_begin(vma);
>
>  	if (WARN_ON_ONCE(addr < vma->vm_start || addr >= vma->vm_end))
> --
> 2.53.0-Meta
>

Thanks, Lorenzo


  reply	other threads:[~2026-06-19 12:35 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-16 19:02 [PATCH 0/3] mm: __access_remote_vm with per-VMA lock Rik van Riel
2026-06-16 19:02 ` [PATCH 1/3] x86/mm: use READ_ONCE/WRITE_ONCE for mm->context.untag_mask Rik van Riel
2026-06-18 16:40   ` Usama Arif
2026-06-16 19:02 ` [PATCH 2/3] mm/pagewalk: let folio_walk_start() run under the per-VMA lock Rik van Riel
2026-06-19 12:34   ` Lorenzo Stoakes [this message]
2026-06-16 19:03 ` [PATCH 3/3] mm: read remote memory without the mmap lock where possible Rik van Riel
2026-06-17  6:19   ` Suren Baghdasaryan
2026-06-19 12:24     ` Lorenzo Stoakes
2026-06-19 13:46     ` Rik van Riel
2026-06-19 14:03       ` Suren Baghdasaryan
2026-06-19 14:33         ` David Hildenbrand (Arm)
2026-06-18 17:01   ` Usama Arif
2026-06-18 17:07     ` David Hildenbrand (Arm)
2026-06-18 17:22       ` Usama Arif
2026-06-19 12:20   ` Lorenzo Stoakes
2026-06-17  1:10 ` [PATCH 0/3] mm: __access_remote_vm with per-VMA lock Suren Baghdasaryan
2026-06-17  9:42   ` David Hildenbrand (Arm)
2026-06-17 13:33     ` Suren Baghdasaryan
2026-06-18 20:37       ` David Hildenbrand (Arm)
2026-06-19 12:43         ` Lorenzo Stoakes
2026-06-19 13:04           ` David Hildenbrand (Arm)
2026-06-19 14:19             ` Suren Baghdasaryan
2026-06-19 14:27               ` David Hildenbrand (Arm)
2026-06-19 15:07                 ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ajU1AdbQVkk4AOKs@lucifer \
    --to=ljs@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=d@ilvokhin.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@kernel.org \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=riel@surriel.com \
    --cc=surenb@google.com \
    --cc=tglx@kernel.org \
    --cc=vbabka@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox