Re: [RFC PATCH v2 4/4] mm/madvise: remove redundant mmap_lock operations from process_madvise()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: SeongJae Park <sj@kernel.org>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: SeongJae Park <sj@kernel.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Vlastimil Babka <vbabka@suse.cz>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [RFC PATCH v2 4/4] mm/madvise: remove redundant mmap_lock operations from process_madvise()
Date: Mon, 19 May 2025 11:25:44 -0700	[thread overview]
Message-ID: <20250519182544.45603-1-sj@kernel.org> (raw)
In-Reply-To: <371ec2c6-01d9-4deb-a234-aacad94680c5@lucifer.local>

On Sat, 17 May 2025 20:28:49 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:

> On Fri, Jan 31, 2025 at 05:51:45PM +0000, Lorenzo Stoakes wrote:
> > On Fri, Jan 31, 2025 at 12:47:24PM -0500, Liam R. Howlett wrote:
> > > * Davidlohr Bueso <dave@stgolabs.net> [250131 12:31]:
> > > > On Fri, 31 Jan 2025, Lorenzo Stoakes wrote:
> > > >
> > > > > On Thu, Jan 16, 2025 at 05:30:58PM -0800, SeongJae Park wrote:
> > > > > > Optimize redundant mmap lock operations from process_madvise() by
> > > > > > directly doing the mmap locking first, and then the remaining works for
> > > > > > all ranges in the loop.
> > > > > >
> > > > > > Signed-off-by: SeongJae Park <sj@kernel.org>
> > > > >
> > > > > I wonder if this might increase lock contention because now all of the
> > > > > vector operations will hold the relevant mm lock without releasing after
> > > > > each operation?
> > > >
> > > > That was exactly my concern. While afaict the numbers presented in v1
> > > > are quite nice, this is ultimately a micro-benchmark, where no other
> > > > unrelated threads are impacted by these new hold times.
> > >
> > > Indeed, I was also concerned about this scenario.
> > >
> > > But this method does have the added advantage of keeping the vma space
> > > in the same state as it was expected during the initial call - although
> > > the race does still exist on looking vs acting on the data.  This would
> > > just remove the intermediate changes.
> > >
> > > >
> > > > > Probably it's ok given limited size of iov, but maybe in future we'd want
> > > > > to set a limit on the ranges before we drop/reacquire lock?
> > > >
> > > > imo, this should best be done in the same patch/series. Maybe extend
> > > > the benchmark to use IOV_MAX and find a sweet spot?
> > >
> > > Are you worried this is over-engineering for a problem that may never be
> > > an issue, or is there a particular usecase you have in mind?
> > >
> > > It is probably worth investigating, and maybe a potential usecase would
> > > help with the targeted sweet spot?
> > >
> >
> > Keep in mind process_madvise() is not limited by IOV_MAX, which can be rather
> > high, but rather UIO_FASTIOV, which is limited to 8 entries.
> >
> > (Some have been surprised by this limitation...!)
> 
> Surprised, perhaps because I was wrong about this :) Apologies for that.
> 
> SJ raised this in [0] and the non-RFC version of this series is over at [1].
> 
> [0]: https://lore.kernel.org/all/20250517162048.36347-1-sj@kernel.org/
> [1]: https://lore.kernel.org/all/20250206061517.2958-1-sj@kernel.org/

I actually mentioned[1] I think the real limit is UIO_MAXIOV but still that
wouldn't be a real problem since users can tune the batching size.  Actually
jemalloc has made a change to use process_madvise() with up to 128 batching
size.

I impatiently sent[3] the next revision without giving you enough time to
reply, though.

> 
> We should revisit this and determine whether the drop/reacquire lock is
> required, perhaps doing some experiments around heavy operations using
> UIO_MAXIOV entries?
> 
> SJ - could you take a look at this please?

We had a chance to test this against a production workload, and found no
visible regression.  The workload is not intesively calling process_madvise()
though.  Our internal testing of kernels having this change also didn't find
any problem so far, though process_madvise() calls from the internal testing is
also not intensive to my best knowledge.

So my thought about UIO_MAXIOV is same.  I anticipate no issue (until someone
yells ;) ) and didn't find an evidence of the problem.  But also same to the
previous discussion[1], I agree more testing would be good, while I have no
good list of benchmarks for this.  It would be nice if someone can give me the
name of the benchmarks.

> 
> >
> > So I think at this point scaling isn't a huge issue, I raise it because in
> > future we may want to increase this limit, at which point we should think about
> > it, which is why I sort of hand-waved it away a bit.
> 
> Again as I said here, I suspect _probably_ this won't be too much of an
> issue - but it is absolutely one we need to address.

Yes, I agree :)

[1] https://lore.kernel.org/20250204195343.16500-1-sj@kernel.org
[2] https://github.com/jemalloc/jemalloc/pull/2794/commits/c3604456d4c1f570348a
[3] https://lore.kernel.org/20250206062801.3060-1-sj@kernel.org


Thanks,
SJ

[...]

next prev parent reply	other threads:[~2025-05-19 18:25 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-17  1:30 [RFC PATCH v2 0/4] mm/madvise: remove redundant mmap_lock operations from process_madvise() SeongJae Park
2025-01-17  1:30 ` [RFC PATCH v2 1/4] mm/madvise: split out mmap locking operations for madvise() SeongJae Park
2025-01-29 19:18   ` Shakeel Butt
2025-01-31 15:58   ` Lorenzo Stoakes
2025-01-31 17:33   ` Davidlohr Bueso
2025-01-17  1:30 ` [RFC PATCH v2 2/4] mm/madvise: split out madvise input validity check SeongJae Park
2025-01-29 19:18   ` Shakeel Butt
2025-01-31 16:01   ` Lorenzo Stoakes
2025-01-31 19:19   ` Davidlohr Bueso
2025-01-17  1:30 ` [RFC PATCH v2 3/4] mm/madvise: split out madvise() behavior execution SeongJae Park
2025-01-29 19:19   ` Shakeel Butt
2025-01-31 16:10   ` Lorenzo Stoakes
2025-01-17  1:30 ` [RFC PATCH v2 4/4] mm/madvise: remove redundant mmap_lock operations from process_madvise() SeongJae Park
2025-01-29 19:20   ` Shakeel Butt
2025-01-31 16:53   ` Lorenzo Stoakes
2025-01-31 17:31     ` Davidlohr Bueso
2025-01-31 17:47       ` Liam R. Howlett
2025-01-31 17:51         ` Lorenzo Stoakes
2025-01-31 17:58           ` Davidlohr Bueso
2025-02-04 19:53           ` SeongJae Park
2025-02-06  6:28             ` SeongJae Park
2025-05-17 19:28           ` Lorenzo Stoakes
2025-05-19 18:25             ` SeongJae Park [this message]
2025-01-31 19:17         ` Shakeel Butt
2025-02-04 18:56     ` SeongJae Park
2025-01-29 19:22 ` [RFC PATCH v2 0/4] " Shakeel Butt
2025-01-29 21:09   ` SeongJae Park
2025-01-31 16:04 ` Liam R. Howlett
2025-01-31 16:30   ` SeongJae Park
2025-01-31 16:55   ` Lorenzo Stoakes
2025-01-31 17:53     ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250519182544.45603-1-sj@kernel.org \
    --to=sj@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=shakeel.butt@linux.dev \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.