From: SeongJae Park <sj@kernel.org>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: SeongJae Park <sj@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
"Liam R.Howlett" <howlett@gmail.com>,
David Hildenbrand <david@redhat.com>,
Rik van Riel <riel@surriel.com>,
Shakeel Butt <shakeel.butt@linux.dev>,
Vlastimil Babka <vbabka@suse.cz>,
kernel-team@meta.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH v2 2/4] mm/madvise: batch tlb flushes for MADV_FREE
Date: Tue, 8 Apr 2025 11:49:47 -0700 [thread overview]
Message-ID: <20250408184947.62625-1-sj@kernel.org> (raw)
In-Reply-To: <12a8989c-c4f3-45a5-a66e-06ef7c2ef876@lucifer.local>
On Tue, 8 Apr 2025 13:58:18 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
> On Fri, Apr 04, 2025 at 02:06:58PM -0700, SeongJae Park wrote:
> > MADV_FREE handling for [process_]madvise() flushes tlb for each vma of
> > each address range. Update the logic to do tlb flushes in a batched
> > way. Initialize an mmu_gather object from do_madvise() and
> > vector_madvise(), which are the entry level functions for
> > [process_]madvise(), respectively. And pass those objects to the
> > function for per-vma work, via madvise_behavior struct. Make the
> > per-vma logic not flushes tlb on their own but just saves the tlb
> > entries to the received mmu_gather object. Finally, the entry level
> > functions flush the tlb entries that gathered for the entire user
> > request, at once.
> >
> > Signed-off-by: SeongJae Park <sj@kernel.org>
>
> Other than some nitty stuff, and a desire for some careful testing of the
> horrid edge case that err... I introduced :P this looks fine, so:
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Thank you for your kind review! I will make the next revision following your
suggestions as I answered below.
>
> > ---
> > mm/madvise.c | 59 +++++++++++++++++++++++++++++++++++++++++-----------
> > 1 file changed, 47 insertions(+), 12 deletions(-)
> >
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 8bcfdd995d18..564095e381b2 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -799,12 +799,13 @@ static const struct mm_walk_ops madvise_free_walk_ops = {
> > .walk_lock = PGWALK_RDLOCK,
> > };
> >
> > -static int madvise_free_single_vma(struct vm_area_struct *vma,
> > - unsigned long start_addr, unsigned long end_addr)
> > +static int madvise_free_single_vma(
> > + struct madvise_behavior *behavior, struct vm_area_struct *vma,
>
> This is pedantic, but elsewhere you differentiate between int behavior and
> struct madvise_behavior by referringt to the later as madv_behavior.
>
> The naming kind of sucks in general though.
>
> But for consistency, let's maybe rename this to madv_behavior, and we can
> maybe do a commit later to do a rename across the board?
I completely agree. I will rename so in the next spin.
>
> > + unsigned long start_addr, unsigned long end_addr)
> > {
> > struct mm_struct *mm = vma->vm_mm;
> > struct mmu_notifier_range range;
> > - struct mmu_gather tlb;
> > + struct mmu_gather *tlb = behavior->tlb;
> >
> > /* MADV_FREE works for only anon vma at the moment */
> > if (!vma_is_anonymous(vma))
[...]
> > @@ -953,7 +951,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
> > if (action == MADV_DONTNEED || action == MADV_DONTNEED_LOCKED)
> > return madvise_dontneed_single_vma(vma, start, end);
> > else if (action == MADV_FREE)
> > - return madvise_free_single_vma(vma, start, end);
> > + return madvise_free_single_vma(behavior, vma, start, end);
> > else
> > return -EINVAL;
>
> On error paths, do we correctly finish the batched (botched? :P) TLB
> operation?
Yes, the change calls tlb_finish_mmu() and tlb_gather_mmu() as needed in the
error paths. Of course I might forgot calling those in some edge cases.
Please let me know if you find such mistakes.
>
> > }
[...]
> > @@ -1841,14 +1873,17 @@ static ssize_t vector_madvise(struct mm_struct *mm, struct iov_iter *iter,
> > }
> >
> > /* Drop and reacquire lock to unwind race. */
> > + madvise_finish_tlb(&madv_behavior);
> > madvise_unlock(mm, behavior);
> > madvise_lock(mm, behavior);
> > + madvise_init_tlb(&madv_behavior, mm);
> > continue;
>
> Have you found a way in which to test this? Perhaps force this case and
> find a means of asserting the TLB flushing behaves as expected? I think
> we're ok from the logic, but it's such a tricky one it'd be good to find a
> means of doing so, albeit in a manual way.
No, unfortunately I haven't found a good way to test this case.
>
> > }
> > if (ret < 0)
> > break;
> > iov_iter_advance(iter, iter_iov_len(iter));
> > }
> > + madvise_finish_tlb(&madv_behavior);
> > madvise_unlock(mm, behavior);
> >
> > ret = (total_len - iov_iter_count(iter)) ? : ret;
> > --
> > 2.39.5
Thanks,
SJ
next prev parent reply other threads:[~2025-04-08 18:49 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-04 21:06 [PATCH v2 0/4] mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE SeongJae Park
2025-04-04 21:06 ` [PATCH v2 1/4] mm/madvise: define and use madvise_behavior struct for madvise_do_behavior() SeongJae Park
2025-04-04 21:06 ` [PATCH v2 2/4] mm/madvise: batch tlb flushes for MADV_FREE SeongJae Park
2025-04-08 12:58 ` Lorenzo Stoakes
2025-04-08 18:49 ` SeongJae Park [this message]
2025-04-04 21:06 ` [PATCH v2 3/4] mm/memory: split non-tlb flushing part from zap_page_range_single() SeongJae Park
2025-04-08 13:29 ` Lorenzo Stoakes
2025-04-08 20:12 ` SeongJae Park
2025-04-09 10:37 ` Lorenzo Stoakes
2025-04-04 21:07 ` [PATCH v2 4/4] mm/madvise: batch tlb flushes for MADV_DONTNEED[_LOCKED] SeongJae Park
2025-04-08 13:36 ` Lorenzo Stoakes
2025-04-08 20:16 ` SeongJae Park
2025-04-08 13:44 ` [PATCH v2 0/4] mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE Lorenzo Stoakes
2025-04-08 20:23 ` SeongJae Park
[not found] <20250405013136.3863-1-hdanton@sina.com>
2025-04-05 16:30 ` [PATCH v2 2/4] mm/madvise: batch tlb flushes for MADV_FREE SeongJae Park
2025-04-05 23:46 ` SeongJae Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250408184947.62625-1-sj@kernel.org \
--to=sj@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=howlett@gmail.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=riel@surriel.com \
--cc=shakeel.butt@linux.dev \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.