All of lore.kernel.org
 help / color / mirror / Atom feed
From: SeongJae Park <sj@kernel.org>
To: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: SeongJae Park <sj@kernel.org>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	David Hildenbrand <david@redhat.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jens Axboe <axboe@kernel.dk>,
	Pavel Begunkov <asml.silence@gmail.com>,
	damon@lists.linux.dev, io-uring@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [RFC PATCH] mm/madvise: remove redundant mmap_lock operations from process_madvise()
Date: Tue, 14 Jan 2025 20:17:50 -0800	[thread overview]
Message-ID: <20250115041750.58164-1-sj@kernel.org> (raw)
In-Reply-To: <vwyuhra3bjcnwxsmenngsingp5gi2xqrazej5eescnuyz332q5@jc5u34qg4umc>

On Tue, 14 Jan 2025 22:44:53 -0500 "Liam R. Howlett" <Liam.Howlett@oracle.com> wrote:

> * Shakeel Butt <shakeel.butt@linux.dev> [250114 13:14]:
> > Ccing relevant folks.
> 
> Thanks Shakeel.
> 
> > 
> > On Fri, Jan 10, 2025 at 04:46:18PM -0800, SeongJae Park wrote:
> > > process_madvise() calls do_madvise() for each address range.  Then, each
> > > do_madvise() invocation holds and releases same mmap_lock.  Optimize the
> > > redundant lock operations by doing the locking in process_madvise(), and
> > > inform do_madvise() that the lock is already held and therefore can be
> > > skipped.
> > > 
> > > Evaluation
> > > ==========
> > > 
> > > I measured the time to apply MADV_DONTNEED advice to 256 MiB memory
> > > using multiple madvise() calls, 4 KiB per each call.  I also do the same
> > > with process_madvise(), but with varying iovec size from 1 to 1024.
> > > The source code for the measurement is available at GitHub[1].
> > > 
> > > The measurement results are as below.  'sz_batches' column shows the
> > > iovec size of process_madvise() calls.  '0' is for madvise() calls case.
> > > 'before' and 'after' columns are the measured time to apply
> > > MADV_DONTNEED to the 256 MiB memory buffer in nanoseconds, on kernels
> > > that built without and with this patch, respectively.  So lower value
> > > means better efficiency.  'after/before' column is the ratio of 'after'
> > > to 'before'.
> > > 
> > >     sz_batches  before     after      after/before
> > >     0           124062365  96670188   0.779206393494111
> > >     1           136341258  113915688  0.835518827323714
> > >     2           105314942  78898211   0.749164453796119
> > >     4           82012858   59778998   0.728897875989153
> > >     8           82562651   51003069   0.617749895167489
> > >     16          71474930   47575960   0.665631431888076
> > >     32          71391211   42902076   0.600943385033768
> > >     64          68225932   41337835   0.605896230190011
> > >     128         71053578   42467240   0.597679120395598
> > >     256         85094126   41630463   0.489228398679364
> > >     512         68531628   44049763   0.6427654542221
> > >     1024        79338892   43370866   0.546653285755491
> > > 
> > > The measurement shows this patch reduces the process_madvise() latency,
> > > proportional to the batching size, from about 25% with the batch size 2
> > > to about 55% with the batch size 1,024.  The trend is somewhat we can
> > > expect.
> > > 
> > > Interestingly, this patch has also optimize madvise() and single batch
> > > size process_madvise(), though.  I ran this test multiple times, but the
> > > results are consistent.  I'm still investigating if there are something
> > > I'm missing.  But I believe the investigation may not necessarily be a
> > > blocker of this RFC, so just posting this.  I will add updates of the
> > > madvise() and single batch size process_madvise() investigation later.
> > > 
> > > [1] https://github.com/sjp38/eval_proc_madvise
> > > 
> > > Signed-off-by: SeongJae Park <sj@kernel.org>
> > > ---
> > >  include/linux/mm.h |  3 ++-
> > >  io_uring/advise.c  |  2 +-
> > >  mm/damon/vaddr.c   |  2 +-
> > >  mm/madvise.c       | 54 +++++++++++++++++++++++++++++++++++-----------
> > >  4 files changed, 45 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index 612b513ebfbd..e3ca5967ebd4 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -3459,7 +3459,8 @@ int do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > >  		    unsigned long end, struct list_head *uf, bool unlock);
> > >  extern int do_munmap(struct mm_struct *, unsigned long, size_t,
> > >  		     struct list_head *uf);
> > > -extern int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior);
> > > +extern int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in,
> > > +		int behavior, bool lock_held);
> 
> We are dropping externs when it is not needed as things are changed.

Good point.  I will drop it if I result in changing something here in next
versions.

> 
> Also, please don't use a flags for this.  It will have a single user of
> true, probably ever.

Ok, that sounds fair.

> 
> It might be better to break do_madvise up into more parts:
> 1. is_madvise_valid(), which does the checking.
> 2. madivse_do_behavior()
> 
> The locking type is already extracted to madivse_need_mmap_write().
> 
> What do you think?

Sounds good to me :)

I will make the v2 of this patch following your suggestion after waiting for
any possible more comments a bit.


Thanks,
SJ

[...]

  reply	other threads:[~2025-01-15  4:17 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-11  0:46 [RFC PATCH] mm/madvise: remove redundant mmap_lock operations from process_madvise() SeongJae Park
2025-01-14 18:13 ` Shakeel Butt
2025-01-14 18:47   ` Lorenzo Stoakes
2025-01-14 19:54     ` SeongJae Park
2025-01-14 21:55     ` Shakeel Butt
2025-01-15  3:44   ` Liam R. Howlett
2025-01-15  4:17     ` SeongJae Park [this message]
2025-01-15  4:35 ` Honggyu Kim
2025-01-15  6:19   ` SeongJae Park
2025-01-15  6:41     ` Honggyu Kim
2025-01-15  7:15     ` Honggyu Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250115041750.58164-1-sj@kernel.org \
    --to=sj@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=damon@lists.linux.dev \
    --cc=david@redhat.com \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=shakeel.butt@linux.dev \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.