From: SeongJae Park <sj@kernel.org>
To: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: SeongJae Park <sj@kernel.org>,
Shakeel Butt <shakeel.butt@linux.dev>,
David Hildenbrand <david@redhat.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
Jens Axboe <axboe@kernel.dk>,
Pavel Begunkov <asml.silence@gmail.com>,
damon@lists.linux.dev, io-uring@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [RFC PATCH] mm/madvise: remove redundant mmap_lock operations from process_madvise()
Date: Tue, 14 Jan 2025 20:17:50 -0800 [thread overview]
Message-ID: <20250115041750.58164-1-sj@kernel.org> (raw)
In-Reply-To: <vwyuhra3bjcnwxsmenngsingp5gi2xqrazej5eescnuyz332q5@jc5u34qg4umc>
On Tue, 14 Jan 2025 22:44:53 -0500 "Liam R. Howlett" <Liam.Howlett@oracle.com> wrote:
> * Shakeel Butt <shakeel.butt@linux.dev> [250114 13:14]:
> > Ccing relevant folks.
>
> Thanks Shakeel.
>
> >
> > On Fri, Jan 10, 2025 at 04:46:18PM -0800, SeongJae Park wrote:
> > > process_madvise() calls do_madvise() for each address range. Then, each
> > > do_madvise() invocation holds and releases same mmap_lock. Optimize the
> > > redundant lock operations by doing the locking in process_madvise(), and
> > > inform do_madvise() that the lock is already held and therefore can be
> > > skipped.
> > >
> > > Evaluation
> > > ==========
> > >
> > > I measured the time to apply MADV_DONTNEED advice to 256 MiB memory
> > > using multiple madvise() calls, 4 KiB per each call. I also do the same
> > > with process_madvise(), but with varying iovec size from 1 to 1024.
> > > The source code for the measurement is available at GitHub[1].
> > >
> > > The measurement results are as below. 'sz_batches' column shows the
> > > iovec size of process_madvise() calls. '0' is for madvise() calls case.
> > > 'before' and 'after' columns are the measured time to apply
> > > MADV_DONTNEED to the 256 MiB memory buffer in nanoseconds, on kernels
> > > that built without and with this patch, respectively. So lower value
> > > means better efficiency. 'after/before' column is the ratio of 'after'
> > > to 'before'.
> > >
> > > sz_batches before after after/before
> > > 0 124062365 96670188 0.779206393494111
> > > 1 136341258 113915688 0.835518827323714
> > > 2 105314942 78898211 0.749164453796119
> > > 4 82012858 59778998 0.728897875989153
> > > 8 82562651 51003069 0.617749895167489
> > > 16 71474930 47575960 0.665631431888076
> > > 32 71391211 42902076 0.600943385033768
> > > 64 68225932 41337835 0.605896230190011
> > > 128 71053578 42467240 0.597679120395598
> > > 256 85094126 41630463 0.489228398679364
> > > 512 68531628 44049763 0.6427654542221
> > > 1024 79338892 43370866 0.546653285755491
> > >
> > > The measurement shows this patch reduces the process_madvise() latency,
> > > proportional to the batching size, from about 25% with the batch size 2
> > > to about 55% with the batch size 1,024. The trend is somewhat we can
> > > expect.
> > >
> > > Interestingly, this patch has also optimize madvise() and single batch
> > > size process_madvise(), though. I ran this test multiple times, but the
> > > results are consistent. I'm still investigating if there are something
> > > I'm missing. But I believe the investigation may not necessarily be a
> > > blocker of this RFC, so just posting this. I will add updates of the
> > > madvise() and single batch size process_madvise() investigation later.
> > >
> > > [1] https://github.com/sjp38/eval_proc_madvise
> > >
> > > Signed-off-by: SeongJae Park <sj@kernel.org>
> > > ---
> > > include/linux/mm.h | 3 ++-
> > > io_uring/advise.c | 2 +-
> > > mm/damon/vaddr.c | 2 +-
> > > mm/madvise.c | 54 +++++++++++++++++++++++++++++++++++-----------
> > > 4 files changed, 45 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index 612b513ebfbd..e3ca5967ebd4 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -3459,7 +3459,8 @@ int do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > > unsigned long end, struct list_head *uf, bool unlock);
> > > extern int do_munmap(struct mm_struct *, unsigned long, size_t,
> > > struct list_head *uf);
> > > -extern int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior);
> > > +extern int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in,
> > > + int behavior, bool lock_held);
>
> We are dropping externs when it is not needed as things are changed.
Good point. I will drop it if I result in changing something here in next
versions.
>
> Also, please don't use a flags for this. It will have a single user of
> true, probably ever.
Ok, that sounds fair.
>
> It might be better to break do_madvise up into more parts:
> 1. is_madvise_valid(), which does the checking.
> 2. madivse_do_behavior()
>
> The locking type is already extracted to madivse_need_mmap_write().
>
> What do you think?
Sounds good to me :)
I will make the v2 of this patch following your suggestion after waiting for
any possible more comments a bit.
Thanks,
SJ
[...]
next prev parent reply other threads:[~2025-01-15 4:17 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-11 0:46 [RFC PATCH] mm/madvise: remove redundant mmap_lock operations from process_madvise() SeongJae Park
2025-01-14 18:13 ` Shakeel Butt
2025-01-14 18:47 ` Lorenzo Stoakes
2025-01-14 19:54 ` SeongJae Park
2025-01-14 21:55 ` Shakeel Butt
2025-01-15 3:44 ` Liam R. Howlett
2025-01-15 4:17 ` SeongJae Park [this message]
2025-01-15 4:35 ` Honggyu Kim
2025-01-15 6:19 ` SeongJae Park
2025-01-15 6:41 ` Honggyu Kim
2025-01-15 7:15 ` Honggyu Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250115041750.58164-1-sj@kernel.org \
--to=sj@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=damon@lists.linux.dev \
--cc=david@redhat.com \
--cc=io-uring@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=shakeel.butt@linux.dev \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.