From: Eric B Munson <emunson@akamai.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.cz>, Jonathan Corbet <corbet@lwn.net>,
linux-alpha@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mips@linux-mips.org, linux-parisc@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org,
linux-xtensa@linux-xtensa.org, dri-devel@lists.freedesktop.org,
linux-mm@kvack.org, linux-arch@vger.kernel.org,
linux-api@vger.kernel.org
Subject: Re: [PATCH V4 4/6] mm: mlock: Introduce VM_LOCKONFAULT and add mlock flags to enable it
Date: Wed, 22 Jul 2015 14:43:43 -0400 [thread overview]
Message-ID: <20150722184343.GA2351@akamai.com> (raw)
In-Reply-To: <55AF6A73.1080500@suse.cz>
[-- Attachment #1: Type: text/plain, Size: 5437 bytes --]
On Wed, 22 Jul 2015, Vlastimil Babka wrote:
> On 07/21/2015 09:59 PM, Eric B Munson wrote:
> >The cost of faulting in all memory to be locked can be very high when
> >working with large mappings. If only portions of the mapping will be
> >used this can incur a high penalty for locking.
> >
> >For the example of a large file, this is the usage pattern for a large
> >statical language model (probably applies to other statical or graphical
> >models as well). For the security example, any application transacting
> >in data that cannot be swapped out (credit card data, medical records,
> >etc).
> >
> >This patch introduces the ability to request that pages are not
> >pre-faulted, but are placed on the unevictable LRU when they are finally
> >faulted in. This can be done area at a time via the
> >mlock2(MLOCK_ONFAULT) or the mlockall(MCL_ONFAULT) system calls. These
> >calls can be undone via munlock2(MLOCK_ONFAULT) or
> >munlockall2(MCL_ONFAULT).
> >
> >Applying the VM_LOCKONFAULT flag to a mapping with pages that are
> >already present required the addition of a function in gup.c to pin all
> >pages which are present in an address range. It borrows heavily from
> >__mm_populate().
> >
> >To keep accounting checks out of the page fault path, users are billed
> >for the entire mapping lock as if MLOCK_LOCKED was used.
>
> Hi,
>
> I think you should include a complete description of which
> transitions for vma states and mlock2/munlock2 flags applied on them
> are valid and what they do. It will also help with the manpages.
> You explained some to Jon in the last thread, but I think there
> should be a canonical description in changelog (if not also
> Documentation, if mlock is covered there).
>
> For example the scenario Jon asked, what happens after a
> mlock2(MLOCK_ONFAULT) followed by mlock2(MLOCK_LOCKED), and that the
> answer is "nothing". Your promised code comment for
> apply_vma_flags() doesn't suffice IMHO (and I'm not sure it's there,
> anyway?).
I missed adding that comment to the code, will be there in V5 along with
the description in the changelog.
>
> But the more I think about the scenario and your new VM_LOCKONFAULT
> vma flag, it seems awkward to me. Why should munlocking at all care
> if the vma was mlocked with MLOCK_LOCKED or MLOCK_ONFAULT? In either
> case the result is that all pages currently populated are munlocked.
> So the flags for munlock2 should be unnecessary.
Say a user has a large area of interleaved MLOCK_LOCK and MLOCK_ONFAULT
mappings and they want to unlock only the ones with MLOCK_LOCK. With
the current implementation, this is possible in a single system call
that spans the entire region. With your suggestion, the user would have
to know what regions where locked with MLOCK_LOCK and call munlock() on
each of them. IMO, the way munlock2() works better mirrors the way
munlock() currently works when called on a large area of interleaved
locked and unlocked areas.
>
> I also think VM_LOCKONFAULT is unnecessary. VM_LOCKED should be
> enough - see how you had to handle the new flag in all places that
> had to handle the old flag? I think the information whether mlock
> was supposed to fault the whole vma is obsolete at the moment mlock
> returns. VM_LOCKED should be enough for both modes, and the flag to
> mlock2 could just control whether the pre-faulting is done.
>
> So what should be IMHO enough:
> - munlock can stay without flags
> - mlock2 has only one new flag MLOCK_ONFAULT. If specified,
> pre-faulting is not done, just set VM_LOCKED and mlock pages already
> present.
> - same with mmap(MAP_LOCKONFAULT) (need to define what happens when
> both MAP_LOCKED and MAP_LOCKONFAULT are specified).
>
> Now mlockall(MCL_FUTURE) muddles the situation in that it stores the
> information for future VMA's in current->mm->def_flags, and this
> def_flags would need to distinguish VM_LOCKED with population and
> without. But that could be still solvable without introducing a new
> vma flag everywhere.
With you right up until that last paragraph. I have been staring at
this a while and I cannot come up a way to handle the
mlockall(MCL_ONFAULT) without introducing a new vm flag. It doesn't
have to be VM_LOCKONFAULT, we could use the model that Michal Hocko
suggested with something like VM_FAULTPOPULATE. However, we can't
really use this flag anywhere except the mlock code becuase we have to
be able to distinguish a caller that wants to use MLOCK_LOCK with
whatever control VM_FAULTPOPULATE might grant outside of mlock and a
caller that wants MLOCK_ONFAULT. That was a long way of saying we need
an extra vma flag regardless. However, if that flag only controls if
mlock pre-populates it would work and it would do away with most of the
places I had to touch to handle VM_LOCKONFAULT properly.
I picked VM_LOCKONFAULT because it is explicit about what it is for and
there is little risk of someone coming along in 5 years and saying "why
not overload this flag to do this other thing completely unrelated to
mlock?". A flag for controling speculative population is more likely to
be overloaded outside of mlock().
If you have a sane way of handling mlockall(MCL_ONFAULT) without a new
VMA flag, I am happy to give it a try, but I haven't been able to come
up with one that doesn't have its own gremlins.
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
next prev parent reply other threads:[~2015-07-22 18:43 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-21 19:59 [PATCH V4 0/6] Allow user to request memory to be locked on page fault Eric B Munson
2015-07-21 19:59 ` [PATCH V4 1/6] mm: mlock: Refactor mlock, munlock, and munlockall code Eric B Munson
2015-07-22 10:42 ` Kirill A. Shutemov
2015-07-22 14:04 ` Eric B Munson
2015-07-21 19:59 ` [PATCH V4 2/6] mm: mlock: Add new mlock, munlock, and munlockall system calls Eric B Munson
2015-07-21 20:44 ` Andrew Morton
2015-07-22 1:25 ` Michael Ellerman
2015-07-22 14:15 ` Eric B Munson
2015-07-23 6:58 ` Ralf Baechle
2015-07-24 14:39 ` Eric B Munson
2015-07-24 15:46 ` Guenter Roeck
2015-07-24 15:53 ` Eric B Munson
2015-07-22 9:16 ` Vlastimil Babka
2015-07-22 14:05 ` Eric B Munson
2015-07-21 19:59 ` [PATCH V4 3/6] mm: gup: Add mm_lock_present() Eric B Munson
2015-07-22 11:13 ` Kirill A. Shutemov
2015-07-22 14:11 ` Eric B Munson
2015-07-21 19:59 ` [PATCH V4 4/6] mm: mlock: Introduce VM_LOCKONFAULT and add mlock flags to enable it Eric B Munson
2015-07-22 10:03 ` Vlastimil Babka
2015-07-22 18:43 ` Eric B Munson [this message]
2015-07-23 10:03 ` Vlastimil Babka
2015-07-23 15:21 ` Eric B Munson
2015-07-21 19:59 ` [PATCH V4 5/6] mm: mmap: Add mmap flag to request VM_LOCKONFAULT Eric B Munson
2015-07-22 11:25 ` Kirill A. Shutemov
2015-07-22 14:32 ` Eric B Munson
2015-07-22 15:45 ` Kirill A. Shutemov
2015-07-21 19:59 ` [PATCH V4 6/6] selftests: vm: Add tests for lock on fault Eric B Munson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150722184343.GA2351@akamai.com \
--to=emunson@akamai.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=dri-devel@lists.freedesktop.org \
--cc=linux-alpha@vger.kernel.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mips@linux-mips.org \
--cc=linux-mm@kvack.org \
--cc=linux-parisc@vger.kernel.org \
--cc=linux-xtensa@linux-xtensa.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mhocko@suse.cz \
--cc=sparclinux@vger.kernel.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).