linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Roman Gushchin <roman.gushchin@linux.dev>
To: Jann Horn <jannh@google.com>
Cc: Hugh Dickins <hughd@google.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	 Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org,
	 Will Deacon <will@kernel.org>,
	 "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>,
	 Nick Piggin <npiggin@gmail.com>,
	 linux-arch@vger.kernel.org
Subject: Re: [PATCH v5] mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables()
Date: Wed, 21 May 2025 17:23:57 -0700	[thread overview]
Message-ID: <878qmpwl0i.fsf@linux.dev> (raw)
In-Reply-To: <CAG48ez39SaJe=cwq3JZ6UM0BLMHEj76Kdmd9=Ho1nr17fAco6Q@mail.gmail.com> (Jann Horn's message of "Wed, 21 May 2025 02:38:20 +0200")

Jann Horn <jannh@google.com> writes:

> On Tue, May 20, 2025 at 11:50 PM Roman Gushchin
> <roman.gushchin@linux.dev> wrote:
>> Commit b67fbebd4cf9 ("mmu_gather: Force tlb-flush VM_PFNMAP vmas")
>> added a forced tlbflush to tlb_vma_end(), which is required to avoid a
>> race between munmap() and unmap_mapping_range(). However it added some
>> overhead to other paths where tlb_vma_end() is used, but vmas are not
>> removed, e.g. madvise(MADV_DONTNEED).
>>
>> Fix this by moving the tlb flush out of tlb_end_vma() into new
>> tlb_flush_vmas() called from free_pgtables(), somewhat similar to the
>> stable version of the original commit:
>> commit 895428ee124a ("mm: Force TLB flush for PFNMAP mappings before
>> unlink_file_vma()").
>>
>> Note, that if tlb->fullmm is set, no flush is required, as the whole
>> mm is about to be destroyed.
>>
>> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
>> Cc: Jann Horn <jannh@google.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Nick Piggin <npiggin@gmail.com>
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: linux-arch@vger.kernel.org
>> Cc: linux-mm@kvack.org
>>
>> ---
>> v5:
>>   - tlb_free_vma() -> tlb_free_vmas() to avoid extra checks
>>
>> v4:
>>   - naming/comments update (by Peter Z.)
>>   - check vma->vma->vm_flags in tlb_free_vma() (by Peter Z.)
>>
>> v3:
>>   - added initialization of vma_pfn in __tlb_reset_range() (by Hugh D.)
>>
>> v2:
>>   - moved vma_pfn flag handling into tlb.h (by Peter Z.)
>>   - added comments (by Peter Z.)
>>   - fixed the vma_pfn flag setting (by Hugh D.)
>> ---
>>  include/asm-generic/tlb.h | 49 +++++++++++++++++++++++++++++++--------
>>  mm/memory.c               |  2 ++
>>  2 files changed, 41 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
>> index 88a42973fa47..8a8b9535a930 100644
>> --- a/include/asm-generic/tlb.h
>> +++ b/include/asm-generic/tlb.h
>> @@ -58,6 +58,11 @@
>>   *    Defaults to flushing at tlb_end_vma() to reset the range; helps when
>>   *    there's large holes between the VMAs.
>>   *
>> + *  - tlb_free_vmas()
>> + *
>> + *    tlb_free_vmas() marks the start of unlinking of one or more vmas
>> + *    and freeing page-tables.
>> + *
>>   *  - tlb_remove_table()
>>   *
>>   *    tlb_remove_table() is the basic primitive to free page-table directories
>> @@ -399,7 +404,10 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
>>          * Do not reset mmu_gather::vma_* fields here, we do not
>>          * call into tlb_start_vma() again to set them if there is an
>>          * intermediate flush.
>> +        *
>> +        * Except for vma_pfn, that only cares if there's pending TLBI.
>>          */
>> +       tlb->vma_pfn = 0;
>
> This looks dodgy to me. Can you explain here in more detail why this
> is okay? Looking at current mainline, tlb->vma_pfn is only set to 1
> when tlb_start_vma() calls into tlb_update_vma_flags(); it is never
> set again after tlb_start_vma(), so I don't think it's legal to just
> clear it in the middle of a VMA.
>
> If we had something like this callgraph on a VM_MIXEDMAP mapping, with
> an intermediate TLB flush in the middle of the VM_MIXEDMAP mapping:
>
> tlb_start_vma()
>   [sets tlb->vma_pfn]
> zap_pte_range
>   tlb_flush_mmu_tlbonly
>     __tlb_reset_range
>       [clears tlb->vma_pfn]
> zap_pte_range
>   [zaps more PTEs and queues a pending TLB flush]
> tlb_end_vma()
> free_pgtables
>   tlb_free_vmas
>     [checks for tlb->vma_pfn]
>
> then tlb_free_vmas() will erroneously not do a flush when it should've
> done one, right?

Good catch!

>
> Why does it even matter to you whether tlb->vma_pfn ever gets reset? I
> think more or less at worst you do one extra TLB flush in some case
> involving a munmap() across multiple VMAs including a mix of
> VM_PFNMAP/VM_MIXEDMAP and non-VM_PFNMAP/VM_MIXEDMAP VMAs (which is
> already a fairly weird scenario on its own), with the region being
> operated on large enough or complicated enough that you already did at
> least one TLB flush, and the unmap sufficiently large or with
> sufficient address space around it that page tables are getting freed,
> or something like that? That seems like a scenario in which one more
> flush isn't going to be a big deal.

Agree.

>
> If you wanted to do this properly, I think you could do something like:
>
>  - add another flag tlb->current_vma_pfn that tracks whether the
> current vma is pfnmap/mixedmap
>  - reset tlb->vma_pfn on TLB flush
>  - set tlb->vma_pfn again if a TLB flush is enqueued while
> tlb->current_vma_pfn is true
>
> But that seems way too complicated, so I would just delete these three
> lines from the patch.

Agree, except tlb->vma_pfn needs to be initialized somewhere. This
is primarily why these lines were added based on the feedback to the
original version. However we missed the race you pointed at.

I guess instead tlb->vma_pfn can be initialized in __tlb_gather_mmu().

I'll send out the fixed version shortly.

Thank you!


      reply	other threads:[~2025-05-22  0:24 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-20 21:48 [PATCH v5] mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables() Roman Gushchin
2025-05-21  0:38 ` Jann Horn
2025-05-22  0:23   ` Roman Gushchin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878qmpwl0i.fsf@linux.dev \
    --to=roman.gushchin@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@kernel.org \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).