From: SeongJae Park <sj@kernel.org>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: SeongJae Park <sj@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
"Liam R. Howlett" <howlett@gmail.com>,
David Hildenbrand <david@redhat.com>,
Shakeel Butt <shakeel.butt@linux.dev>,
Vlastimil Babka <vbabka@suse.cz>,
kernel-team@meta.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH 7/9] mm/madvise: let madvise_{dontneed,free}_single_vma() caller batches tlb flushes
Date: Tue, 11 Mar 2025 14:00:23 -0700 [thread overview]
Message-ID: <20250311210023.85435-1-sj@kernel.org> (raw)
In-Reply-To: <e55c43a1-55de-4720-9177-8af08c797d17@lucifer.local>
On Tue, 11 Mar 2025 13:07:25 +0000 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
> Super super UBER nitty but... pretty sure the subject here should be <= 75
> chars right? :P
I believe that's not a hard limit, but I will try to make it shorter in the
next spin.
>
> On Mon, Mar 10, 2025 at 10:23:16AM -0700, SeongJae Park wrote:
> > Update madvise_dontneed_single_vma() and madvise_free_single_vma()
> > functions so that the caller can pass an mmu_gather object that should
> > be initialized and will be finished outside, for batched tlb flushes.
> > Also modify their internal code to support such usage by skipping the
> > initialization and finishing of self-allocated mmu_gather object if it
> > received a valid mmu_gather object.
> >
> > Signed-off-by: SeongJae Park <sj@kernel.org>
> > ---
> > mm/internal.h | 3 +++
> > mm/madvise.c | 37 +++++++++++++++++++++++++------------
> > mm/memory.c | 16 +++++++++++++---
> > 3 files changed, 41 insertions(+), 15 deletions(-)
> >
> > diff --git a/mm/internal.h b/mm/internal.h
> > index 0caa64dc2cb7..ce7fb2383f65 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -438,6 +438,9 @@ void unmap_page_range(struct mmu_gather *tlb,
> > struct vm_area_struct *vma,
> > unsigned long addr, unsigned long end,
> > struct zap_details *details);
> > +void unmap_vma_single(struct mmu_gather *tlb, struct vm_area_struct *vma,
> > + unsigned long addr, unsigned long size,
> > + struct zap_details *details);
> > int folio_unmap_invalidate(struct address_space *mapping, struct folio *folio,
> > gfp_t gfp);
> >
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index ba2a78795207..d7ea71c6422c 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -794,12 +794,19 @@ static const struct mm_walk_ops madvise_free_walk_ops = {
> > .walk_lock = PGWALK_RDLOCK,
> > };
> >
> > -static int madvise_free_single_vma(struct vm_area_struct *vma,
> > - unsigned long start_addr, unsigned long end_addr)
> > +static int madvise_free_single_vma(
> > + struct mmu_gather *caller_tlb, struct vm_area_struct *vma,
>
> I find this interface horrible, and super confusing. It's not clear at all
> what's going on here.
>
> Why not use your new helper struct to add a field you can thread through
> here?
I will do so in the next spin.
>
> > + unsigned long start_addr, unsigned long end_addr)
> > {
> > struct mm_struct *mm = vma->vm_mm;
> > struct mmu_notifier_range range;
> > - struct mmu_gather tlb;
> > + struct mmu_gather self_tlb;
> > + struct mmu_gather *tlb;
> > +
> > + if (caller_tlb)
> > + tlb = caller_tlb;
> > + else
> > + tlb = &self_tlb;
> >
> > /* MADV_FREE works for only anon vma at the moment */
> > if (!vma_is_anonymous(vma))
> > @@ -815,16 +822,18 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
> > range.start, range.end);
> >
> > lru_add_drain();
> > - tlb_gather_mmu(&tlb, mm);
> > + if (!caller_tlb)
> > + tlb_gather_mmu(tlb, mm);
>
> Yeah really don't like this.
>
> Ideally we'd abstract the mmu_gather struct to the helper struct (which I
> see you do in a subsequent patch anyway) would be ideal if you could find a
> way to make that work.
>
> But if not, then:
>
> if (behavior->batched_tlb)
> tlb_gather_mmu(&tlb, mm);
>
> etc. etc.
>
> Would work better.
Agreed.
>
> > update_hiwater_rss(mm);
> >
> > mmu_notifier_invalidate_range_start(&range);
> > - tlb_start_vma(&tlb, vma);
> > + tlb_start_vma(tlb, vma);
>
> Also not a fan of making tlb refer to a pointer now when before it
> didn't... I mean that's more of a nit and maybe unavoidable, but still!
>
> I mean yeah ok this is probably unavoidable, ignore.
Yeah... I also find no good way to make this very cleaner without the followup
cleanup for now.
>
> > walk_page_range(vma->vm_mm, range.start, range.end,
> > - &madvise_free_walk_ops, &tlb);
> > - tlb_end_vma(&tlb, vma);
> > + &madvise_free_walk_ops, tlb);
> > + tlb_end_vma(tlb, vma);
> > mmu_notifier_invalidate_range_end(&range);
> > - tlb_finish_mmu(&tlb);
> > + if (!caller_tlb)
> > + tlb_finish_mmu(tlb);
> >
> > return 0;
> > }
> > @@ -848,7 +857,8 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
> > * An interface that causes the system to free clean pages and flush
> > * dirty pages is already available as msync(MS_INVALIDATE).
> > */
> > -static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
> > +static long madvise_dontneed_single_vma(struct mmu_gather *tlb,
> > + struct vm_area_struct *vma,
> > unsigned long start, unsigned long end)
> > {
> > struct zap_details details = {
> > @@ -856,7 +866,10 @@ static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
> > .even_cows = true,
> > };
> >
> > - zap_page_range_single(vma, start, end - start, &details);
> > + if (!tlb)
> > + zap_page_range_single(vma, start, end - start, &details);
>
> Please don't put the negation case first, it's confusing. Swap them!
Ok, I will do so.
>
>
> > + else
> > + unmap_vma_single(tlb, vma, start, end - start, &details);
> > return 0;
> > }
> >
> > @@ -951,9 +964,9 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
> > }
> >
> > if (behavior == MADV_DONTNEED || behavior == MADV_DONTNEED_LOCKED)
> > - return madvise_dontneed_single_vma(vma, start, end);
> > + return madvise_dontneed_single_vma(NULL, vma, start, end);
> > else if (behavior == MADV_FREE)
> > - return madvise_free_single_vma(vma, start, end);
> > + return madvise_free_single_vma(NULL, vma, start, end);
>
> Not to labour the point, but this is also horrid, passing a mystery NULL
> parameter first...
Agreed again. I will just pass the madvise_behavior struct in the next spin.
>
> > else
> > return -EINVAL;
> > }
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 88c478e2ed1a..3256b9713cbd 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -1995,9 +1995,19 @@ void unmap_vmas(struct mmu_gather *tlb, struct ma_state *mas,
> > mmu_notifier_invalidate_range_end(&range);
> > }
> >
> > -static void unmap_vma_single(struct mmu_gather *tlb,
> > - struct vm_area_struct *vma, unsigned long address,
> > - unsigned long size, struct zap_details *details)
> > +/**
> > + * unmap_vma_single - remove user pages in a given range
> > + * @tlb: pointer to the caller's struct mmu_gather
> > + * @vma: vm_area_struct holding the applicable pages
> > + * @address: starting address of the pages
> > + * @size: number of bytes to remove
> > + * @details: details of shared cache invalidation
> > + *
> > + * @tlb shouldn't be NULL. The range must fit into one VMA.
>
> Can we add some VM_WARN_ON[_ONCE]()'s for these conditions please?
Nice suggestion, I will do so.
>
> Thanks for documenting!
Kudos to Shakeel, who suggested this kerneldoc comment :)
Thanks,
SJ
[...]
next prev parent reply other threads:[~2025-03-11 21:00 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-10 17:23 [PATCH 0/9] mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE SeongJae Park
2025-03-10 17:23 ` [PATCH 1/9] mm/madvise: use is_memory_failure() from madvise_do_behavior() SeongJae Park
2025-03-11 11:27 ` Lorenzo Stoakes
2025-03-10 17:23 ` [PATCH 2/9] mm/madvise: split out populate behavior check logic SeongJae Park
2025-03-11 11:29 ` Lorenzo Stoakes
2025-03-10 17:23 ` [PATCH 3/9] mm/madvise: deduplicate madvise_do_behavior() skip case handlings SeongJae Park
2025-03-11 12:02 ` Lorenzo Stoakes
2025-03-11 20:54 ` SeongJae Park
2025-03-10 17:23 ` [PATCH 4/9] mm/madvise: remove len parameter of madvise_do_behavior() SeongJae Park
2025-03-11 12:05 ` Lorenzo Stoakes
2025-03-10 17:23 ` [PATCH 5/9] mm/madvise: define and use madvise_behavior struct for madvise_do_behavior() SeongJae Park
2025-03-11 12:17 ` Lorenzo Stoakes
2025-03-11 20:56 ` SeongJae Park
2025-03-12 5:47 ` Lorenzo Stoakes
2025-03-12 17:23 ` SeongJae Park
2025-03-10 17:23 ` [PATCH 6/9] mm/memory: split non-tlb flushing part from zap_page_range_single() SeongJae Park
2025-03-11 12:45 ` Lorenzo Stoakes
2025-03-11 20:58 ` SeongJae Park
2025-03-31 20:24 ` SeongJae Park
2025-04-01 1:45 ` Liam R. Howlett
2025-04-01 2:48 ` SeongJae Park
2025-04-01 14:03 ` Liam R. Howlett
2025-04-01 21:25 ` SeongJae Park
2025-03-10 17:23 ` [PATCH 7/9] mm/madvise: let madvise_{dontneed,free}_single_vma() caller batches tlb flushes SeongJae Park
2025-03-11 13:07 ` Lorenzo Stoakes
2025-03-11 21:00 ` SeongJae Park [this message]
2025-03-10 17:23 ` [PATCH 8/9] mm/madvise: batch tlb flushes for [process_]madvise(MADV_{DONTNEED[_LOCKED],FREE}) SeongJae Park
2025-03-11 13:59 ` Lorenzo Stoakes
2025-03-11 21:01 ` SeongJae Park
2025-04-01 21:17 ` SeongJae Park
2025-03-10 17:23 ` [PATCH 9/9] mm/madvise: remove !tlb support from madvise_{dontneed,free}_single_vma() SeongJae Park
2025-03-11 14:01 ` Lorenzo Stoakes
2025-03-11 21:02 ` SeongJae Park
2025-03-12 13:46 ` Lorenzo Stoakes
2025-04-01 21:22 ` SeongJae Park
2025-03-10 22:39 ` [PATCH 0/9] mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE Andrew Morton
2025-03-10 23:15 ` Shakeel Butt
2025-03-10 23:36 ` Roman Gushchin
2025-03-11 11:17 ` Lorenzo Stoakes
2025-03-10 23:27 ` SeongJae Park
2025-03-11 12:49 ` Lorenzo Stoakes
2025-03-11 21:03 ` SeongJae Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250311210023.85435-1-sj@kernel.org \
--to=sj@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=howlett@gmail.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=shakeel.butt@linux.dev \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.