Re: [LSF/MM TOPIC] mmap locking topics

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Michel Lespinasse <michel@lespinasse.org>
To: Matthew Wilcox <willy@infradead.org>
Cc: Michel Lespinasse <michel@lespinasse.org>,
	lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org
Subject: Re: [LSF/MM TOPIC] mmap locking topics
Date: Tue, 1 Jun 2021 20:34:08 -0700	[thread overview]
Message-ID: <20210602033408.GA3229@lespinasse.org> (raw)
In-Reply-To: <YLY9nJMdaK+M6VqE@casper.infradead.org>

On Tue, Jun 01, 2021 at 03:01:00PM +0100, Matthew Wilcox wrote:
> On Mon, May 31, 2021 at 09:48:45PM -0700, Michel Lespinasse wrote:
> > I - Speculative page faults
> > 
> > The idea there is to avoid taking the mmap lock during page faults,
> > at least for the easier cases. This requiers the fault handler to be
> > a careful to avoid races with mmap writers (and most particularly
> > munmap), and when the new page is ready to be inserted into the user
> > process, to verify, at the last moment (after taking the page table
> > lock), that there has been no race between the fault handler and any
> > mmap writers.  Such checks can be implemented locally, without hitting
> > any global locks, which results in very nice scalability improvements
> > when processing concurrent faults.
> > 
> > I think the idea is ready for prime time, and a patchset has been proposed,
> > but it is not getting much traction yet. I suspect we will need to discuss
> > the idea in person to figure out the next steps.
> 
> There is a lot of interest in this.  I disagree with Michel's approach
> in that he wants to use seqlocks to detect whether any modification has
> been made to the process's address space, whereas I want to use the VMA
> tree to detect whether any modification has been made to this VMA.

I see the sequence count as being the easy & safe approach, but yes it
does have limitations that can lead to unnecessary fast path aborts.
It would be nice checking the VMAs to avoid *some* of these aborts,
but I do not think that is always applicable either - I wrote about
that in https://lwn.net/ml/linux-kernel/20210430224649.GA29203@lespinasse.org/
("Thoughts about concurrency checks at the end of the page fault")

> > II - Fine grained MM locking
> > 
> > A major limitation of the current mmap lock design is that it covers a
> > process's entire address space. In threaded applications, it is common
> > for threads to issue concurrent requests for non-overlapping parts of
> > the process address space - for example, one thread might be mmaping
> > new memory while another releases a different range, and a third might
> > fault within his own address range too. The current mmap lock design
> > does not take the non-overlapping ranges into consideration, and
> > consequently serialises the 3 above requests rather than letting them
> > proceed in parallel.
> > 
> > There has been a lot of work spent mitigating the problem by reducing
> > the mmap lock hold times (for example, dropping the mmap lock during
> > page faults that hit disk, or lowering to a read lock during longer
> > mmap/munmap/populate operations). But this approach is hitting its
> > limits, and I think it would be better to fix the core of the problem
> > by making the mmap lock capable of allowing concurrent non-overlapping
> > operations.
> > 
> > I would like to propose an approach that:
> > - separates the mmap lock into two separate locks, one that is only
> >   held for short periods of time to protect mm-wide data structures
> >   (including the vma tree), and another that functions as a range lock
> >   and can be held for longer periods of time;
> > - allows for incremental conversion from the current code to being
> >   aware about locking ranges;
> > 
> > I have been maintaining a prototype for this, which has been shared
> > with a small set of people. The main holdup is with page fault
> > performance; in order to allow non-overlapping writers to proceed
> > while some page faults are in progress, the prototype needs to
> > maintain a shared structure holding addresses for each pending page
> > fault. Updating this shared structure gets very expenside in high
> > concurrency page fault benchmarks, though it seems quite unnoticeable
> > in macro benchmarks I hae looked at.
> 
> Here I have larger disagreements with Michel.  I do not believe the
> range lock is a useful tool for this problem.

Regardless of any proposed solution, do you agree that most of the cases
where mmap lock blocking happens are between non-overlapping memory
operations that could conceivably be handled concurrently ?

In other words - do you believe that range locks would be too slow to
be a useful solution, or is it that you do not think they would
actually solve the issue ?

> The two topics above seem large enough, but there are other important
> users of the mmap_sem that also hit contention.  /proc/$pid/maps, smaps
> and similar files hit priority inversion problems which have been reduced,
> but not solved.

Yes - I do think this would be worth discussing too. Not sure if that is
a separate topic, or if this should be brought under a larger theme.

Generally - I think there are many issues people have with mmap
locking, and it's been really hard to make progress addressing these -
even when prototype solutions exist - due to a lack of concensus (many
of the people involved have different ideas as to which of the issues
are important to them). But I think that's what makes this an important
topic to be discussed ?

--
Michel "walken" Lespinasse

next prev parent reply	other threads:[~2021-06-02  3:34 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-01  4:48 [LSF/MM TOPIC] mmap locking topics Michel Lespinasse
2021-06-01 14:01 ` Matthew Wilcox
2021-06-02  3:34   ` Michel Lespinasse [this message]
2021-06-10  1:29     ` Suren Baghdasaryan
2021-06-01 17:39 ` Liam Howlett

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210602033408.GA3229@lespinasse.org \
    --to=michel@lespinasse.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).