linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5] mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables()
@ 2025-05-20 21:48 Roman Gushchin
  2025-05-21  0:38 ` Jann Horn
  0 siblings, 1 reply; 3+ messages in thread
From: Roman Gushchin @ 2025-05-20 21:48 UTC (permalink / raw)
  To: Andrew Morton, Peter Zijlstra
  Cc: linux-kernel, linux-mm, Roman Gushchin, Jann Horn, Will Deacon,
	Aneesh Kumar K.V, Nick Piggin, Hugh Dickins, linux-arch

Commit b67fbebd4cf9 ("mmu_gather: Force tlb-flush VM_PFNMAP vmas")
added a forced tlbflush to tlb_vma_end(), which is required to avoid a
race between munmap() and unmap_mapping_range(). However it added some
overhead to other paths where tlb_vma_end() is used, but vmas are not
removed, e.g. madvise(MADV_DONTNEED).

Fix this by moving the tlb flush out of tlb_end_vma() into new
tlb_flush_vmas() called from free_pgtables(), somewhat similar to the
stable version of the original commit:
commit 895428ee124a ("mm: Force TLB flush for PFNMAP mappings before
unlink_file_vma()").

Note, that if tlb->fullmm is set, no flush is required, as the whole
mm is about to be destroyed.

Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org

---
v5:
  - tlb_free_vma() -> tlb_free_vmas() to avoid extra checks

v4:
  - naming/comments update (by Peter Z.)
  - check vma->vma->vm_flags in tlb_free_vma() (by Peter Z.)

v3:
  - added initialization of vma_pfn in __tlb_reset_range() (by Hugh D.)

v2:
  - moved vma_pfn flag handling into tlb.h (by Peter Z.)
  - added comments (by Peter Z.)
  - fixed the vma_pfn flag setting (by Hugh D.)
---
 include/asm-generic/tlb.h | 49 +++++++++++++++++++++++++++++++--------
 mm/memory.c               |  2 ++
 2 files changed, 41 insertions(+), 10 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 88a42973fa47..8a8b9535a930 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -58,6 +58,11 @@
  *    Defaults to flushing at tlb_end_vma() to reset the range; helps when
  *    there's large holes between the VMAs.
  *
+ *  - tlb_free_vmas()
+ *
+ *    tlb_free_vmas() marks the start of unlinking of one or more vmas
+ *    and freeing page-tables.
+ *
  *  - tlb_remove_table()
  *
  *    tlb_remove_table() is the basic primitive to free page-table directories
@@ -399,7 +404,10 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
 	 * Do not reset mmu_gather::vma_* fields here, we do not
 	 * call into tlb_start_vma() again to set them if there is an
 	 * intermediate flush.
+	 *
+	 * Except for vma_pfn, that only cares if there's pending TLBI.
 	 */
+	tlb->vma_pfn = 0;
 }
 
 #ifdef CONFIG_MMU_GATHER_NO_RANGE
@@ -464,7 +472,12 @@ tlb_update_vma_flags(struct mmu_gather *tlb, struct vm_area_struct *vma)
 	 */
 	tlb->vma_huge = is_vm_hugetlb_page(vma);
 	tlb->vma_exec = !!(vma->vm_flags & VM_EXEC);
-	tlb->vma_pfn  = !!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP));
+
+	/*
+	 * Track if there's at least one VM_PFNMAP/VM_MIXEDMAP vma
+	 * in the tracked range, see tlb_free_vmas().
+	 */
+	tlb->vma_pfn |= !!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP));
 }
 
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
@@ -547,23 +560,39 @@ static inline void tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *
 }
 
 static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+	if (tlb->fullmm || IS_ENABLED(CONFIG_MMU_GATHER_MERGE_VMAS))
+		return;
+
+	/*
+	 * Do a TLB flush and reset the range at VMA boundaries; this avoids
+	 * the ranges growing with the unused space between consecutive VMAs,
+	 * but also the mmu_gather::vma_* flags from tlb_start_vma() rely on
+	 * this.
+	 */
+	tlb_flush_mmu_tlbonly(tlb);
+}
+
+static inline void tlb_free_vmas(struct mmu_gather *tlb)
 {
 	if (tlb->fullmm)
 		return;
 
 	/*
 	 * VM_PFNMAP is more fragile because the core mm will not track the
-	 * page mapcount -- there might not be page-frames for these PFNs after
-	 * all. Force flush TLBs for such ranges to avoid munmap() vs
-	 * unmap_mapping_range() races.
+	 * page mapcount -- there might not be page-frames for these PFNs
+	 * after all.
+	 *
+	 * Specifically() there is a race between munmap() and
+	 * unmap_mapping_range(), where munmap() will unlink the VMA, such
+	 * that unmap_mapping_range() will no longer observe the VMA and
+	 * no-op, without observing the TLBI, returning prematurely.
+	 *
+	 * So if we're about to unlink such a VMA, and we have pending
+	 * TLBI for such a vma, flush things now.
 	 */
-	if (tlb->vma_pfn || !IS_ENABLED(CONFIG_MMU_GATHER_MERGE_VMAS)) {
-		/*
-		 * Do a TLB flush and reset the range at VMA boundaries; this avoids
-		 * the ranges growing with the unused space between consecutive VMAs.
-		 */
+	if (tlb->vma_pfn)
 		tlb_flush_mmu_tlbonly(tlb);
-	}
 }
 
 /*
diff --git a/mm/memory.c b/mm/memory.c
index 5cb48f262ab0..6b71a66cc4fe 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -358,6 +358,8 @@ void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas,
 {
 	struct unlink_vma_file_batch vb;
 
+	tlb_free_vmas(tlb);
+
 	do {
 		unsigned long addr = vma->vm_start;
 		struct vm_area_struct *next;
-- 
2.49.0.1112.g889b7c5bd8-goog



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v5] mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables()
  2025-05-20 21:48 [PATCH v5] mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables() Roman Gushchin
@ 2025-05-21  0:38 ` Jann Horn
  2025-05-22  0:23   ` Roman Gushchin
  0 siblings, 1 reply; 3+ messages in thread
From: Jann Horn @ 2025-05-21  0:38 UTC (permalink / raw)
  To: Roman Gushchin, Hugh Dickins
  Cc: Andrew Morton, Peter Zijlstra, linux-kernel, linux-mm,
	Will Deacon, Aneesh Kumar K.V, Nick Piggin, linux-arch

On Tue, May 20, 2025 at 11:50 PM Roman Gushchin
<roman.gushchin@linux.dev> wrote:
> Commit b67fbebd4cf9 ("mmu_gather: Force tlb-flush VM_PFNMAP vmas")
> added a forced tlbflush to tlb_vma_end(), which is required to avoid a
> race between munmap() and unmap_mapping_range(). However it added some
> overhead to other paths where tlb_vma_end() is used, but vmas are not
> removed, e.g. madvise(MADV_DONTNEED).
>
> Fix this by moving the tlb flush out of tlb_end_vma() into new
> tlb_flush_vmas() called from free_pgtables(), somewhat similar to the
> stable version of the original commit:
> commit 895428ee124a ("mm: Force TLB flush for PFNMAP mappings before
> unlink_file_vma()").
>
> Note, that if tlb->fullmm is set, no flush is required, as the whole
> mm is about to be destroyed.
>
> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Jann Horn <jannh@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Nick Piggin <npiggin@gmail.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: linux-arch@vger.kernel.org
> Cc: linux-mm@kvack.org
>
> ---
> v5:
>   - tlb_free_vma() -> tlb_free_vmas() to avoid extra checks
>
> v4:
>   - naming/comments update (by Peter Z.)
>   - check vma->vma->vm_flags in tlb_free_vma() (by Peter Z.)
>
> v3:
>   - added initialization of vma_pfn in __tlb_reset_range() (by Hugh D.)
>
> v2:
>   - moved vma_pfn flag handling into tlb.h (by Peter Z.)
>   - added comments (by Peter Z.)
>   - fixed the vma_pfn flag setting (by Hugh D.)
> ---
>  include/asm-generic/tlb.h | 49 +++++++++++++++++++++++++++++++--------
>  mm/memory.c               |  2 ++
>  2 files changed, 41 insertions(+), 10 deletions(-)
>
> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> index 88a42973fa47..8a8b9535a930 100644
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -58,6 +58,11 @@
>   *    Defaults to flushing at tlb_end_vma() to reset the range; helps when
>   *    there's large holes between the VMAs.
>   *
> + *  - tlb_free_vmas()
> + *
> + *    tlb_free_vmas() marks the start of unlinking of one or more vmas
> + *    and freeing page-tables.
> + *
>   *  - tlb_remove_table()
>   *
>   *    tlb_remove_table() is the basic primitive to free page-table directories
> @@ -399,7 +404,10 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
>          * Do not reset mmu_gather::vma_* fields here, we do not
>          * call into tlb_start_vma() again to set them if there is an
>          * intermediate flush.
> +        *
> +        * Except for vma_pfn, that only cares if there's pending TLBI.
>          */
> +       tlb->vma_pfn = 0;

This looks dodgy to me. Can you explain here in more detail why this
is okay? Looking at current mainline, tlb->vma_pfn is only set to 1
when tlb_start_vma() calls into tlb_update_vma_flags(); it is never
set again after tlb_start_vma(), so I don't think it's legal to just
clear it in the middle of a VMA.

If we had something like this callgraph on a VM_MIXEDMAP mapping, with
an intermediate TLB flush in the middle of the VM_MIXEDMAP mapping:

tlb_start_vma()
  [sets tlb->vma_pfn]
zap_pte_range
  tlb_flush_mmu_tlbonly
    __tlb_reset_range
      [clears tlb->vma_pfn]
zap_pte_range
  [zaps more PTEs and queues a pending TLB flush]
tlb_end_vma()
free_pgtables
  tlb_free_vmas
    [checks for tlb->vma_pfn]

then tlb_free_vmas() will erroneously not do a flush when it should've
done one, right?

Why does it even matter to you whether tlb->vma_pfn ever gets reset? I
think more or less at worst you do one extra TLB flush in some case
involving a munmap() across multiple VMAs including a mix of
VM_PFNMAP/VM_MIXEDMAP and non-VM_PFNMAP/VM_MIXEDMAP VMAs (which is
already a fairly weird scenario on its own), with the region being
operated on large enough or complicated enough that you already did at
least one TLB flush, and the unmap sufficiently large or with
sufficient address space around it that page tables are getting freed,
or something like that? That seems like a scenario in which one more
flush isn't going to be a big deal.

If you wanted to do this properly, I think you could do something like:

 - add another flag tlb->current_vma_pfn that tracks whether the
current vma is pfnmap/mixedmap
 - reset tlb->vma_pfn on TLB flush
 - set tlb->vma_pfn again if a TLB flush is enqueued while
tlb->current_vma_pfn is true

But that seems way too complicated, so I would just delete these three
lines from the patch.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v5] mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables()
  2025-05-21  0:38 ` Jann Horn
@ 2025-05-22  0:23   ` Roman Gushchin
  0 siblings, 0 replies; 3+ messages in thread
From: Roman Gushchin @ 2025-05-22  0:23 UTC (permalink / raw)
  To: Jann Horn
  Cc: Hugh Dickins, Andrew Morton, Peter Zijlstra, linux-kernel,
	linux-mm, Will Deacon, Aneesh Kumar K.V, Nick Piggin, linux-arch

Jann Horn <jannh@google.com> writes:

> On Tue, May 20, 2025 at 11:50 PM Roman Gushchin
> <roman.gushchin@linux.dev> wrote:
>> Commit b67fbebd4cf9 ("mmu_gather: Force tlb-flush VM_PFNMAP vmas")
>> added a forced tlbflush to tlb_vma_end(), which is required to avoid a
>> race between munmap() and unmap_mapping_range(). However it added some
>> overhead to other paths where tlb_vma_end() is used, but vmas are not
>> removed, e.g. madvise(MADV_DONTNEED).
>>
>> Fix this by moving the tlb flush out of tlb_end_vma() into new
>> tlb_flush_vmas() called from free_pgtables(), somewhat similar to the
>> stable version of the original commit:
>> commit 895428ee124a ("mm: Force TLB flush for PFNMAP mappings before
>> unlink_file_vma()").
>>
>> Note, that if tlb->fullmm is set, no flush is required, as the whole
>> mm is about to be destroyed.
>>
>> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
>> Cc: Jann Horn <jannh@google.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Nick Piggin <npiggin@gmail.com>
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: linux-arch@vger.kernel.org
>> Cc: linux-mm@kvack.org
>>
>> ---
>> v5:
>>   - tlb_free_vma() -> tlb_free_vmas() to avoid extra checks
>>
>> v4:
>>   - naming/comments update (by Peter Z.)
>>   - check vma->vma->vm_flags in tlb_free_vma() (by Peter Z.)
>>
>> v3:
>>   - added initialization of vma_pfn in __tlb_reset_range() (by Hugh D.)
>>
>> v2:
>>   - moved vma_pfn flag handling into tlb.h (by Peter Z.)
>>   - added comments (by Peter Z.)
>>   - fixed the vma_pfn flag setting (by Hugh D.)
>> ---
>>  include/asm-generic/tlb.h | 49 +++++++++++++++++++++++++++++++--------
>>  mm/memory.c               |  2 ++
>>  2 files changed, 41 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
>> index 88a42973fa47..8a8b9535a930 100644
>> --- a/include/asm-generic/tlb.h
>> +++ b/include/asm-generic/tlb.h
>> @@ -58,6 +58,11 @@
>>   *    Defaults to flushing at tlb_end_vma() to reset the range; helps when
>>   *    there's large holes between the VMAs.
>>   *
>> + *  - tlb_free_vmas()
>> + *
>> + *    tlb_free_vmas() marks the start of unlinking of one or more vmas
>> + *    and freeing page-tables.
>> + *
>>   *  - tlb_remove_table()
>>   *
>>   *    tlb_remove_table() is the basic primitive to free page-table directories
>> @@ -399,7 +404,10 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
>>          * Do not reset mmu_gather::vma_* fields here, we do not
>>          * call into tlb_start_vma() again to set them if there is an
>>          * intermediate flush.
>> +        *
>> +        * Except for vma_pfn, that only cares if there's pending TLBI.
>>          */
>> +       tlb->vma_pfn = 0;
>
> This looks dodgy to me. Can you explain here in more detail why this
> is okay? Looking at current mainline, tlb->vma_pfn is only set to 1
> when tlb_start_vma() calls into tlb_update_vma_flags(); it is never
> set again after tlb_start_vma(), so I don't think it's legal to just
> clear it in the middle of a VMA.
>
> If we had something like this callgraph on a VM_MIXEDMAP mapping, with
> an intermediate TLB flush in the middle of the VM_MIXEDMAP mapping:
>
> tlb_start_vma()
>   [sets tlb->vma_pfn]
> zap_pte_range
>   tlb_flush_mmu_tlbonly
>     __tlb_reset_range
>       [clears tlb->vma_pfn]
> zap_pte_range
>   [zaps more PTEs and queues a pending TLB flush]
> tlb_end_vma()
> free_pgtables
>   tlb_free_vmas
>     [checks for tlb->vma_pfn]
>
> then tlb_free_vmas() will erroneously not do a flush when it should've
> done one, right?

Good catch!

>
> Why does it even matter to you whether tlb->vma_pfn ever gets reset? I
> think more or less at worst you do one extra TLB flush in some case
> involving a munmap() across multiple VMAs including a mix of
> VM_PFNMAP/VM_MIXEDMAP and non-VM_PFNMAP/VM_MIXEDMAP VMAs (which is
> already a fairly weird scenario on its own), with the region being
> operated on large enough or complicated enough that you already did at
> least one TLB flush, and the unmap sufficiently large or with
> sufficient address space around it that page tables are getting freed,
> or something like that? That seems like a scenario in which one more
> flush isn't going to be a big deal.

Agree.

>
> If you wanted to do this properly, I think you could do something like:
>
>  - add another flag tlb->current_vma_pfn that tracks whether the
> current vma is pfnmap/mixedmap
>  - reset tlb->vma_pfn on TLB flush
>  - set tlb->vma_pfn again if a TLB flush is enqueued while
> tlb->current_vma_pfn is true
>
> But that seems way too complicated, so I would just delete these three
> lines from the patch.

Agree, except tlb->vma_pfn needs to be initialized somewhere. This
is primarily why these lines were added based on the feedback to the
original version. However we missed the race you pointed at.

I guess instead tlb->vma_pfn can be initialized in __tlb_gather_mmu().

I'll send out the fixed version shortly.

Thank you!


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-05-22  0:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-20 21:48 [PATCH v5] mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables() Roman Gushchin
2025-05-21  0:38 ` Jann Horn
2025-05-22  0:23   ` Roman Gushchin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).