* [PATCH v7 0/6] Allow user to request memory to be locked on page fault
@ 2015-08-09 5:22 ` Eric B Munson
0 siblings, 0 replies; 87+ messages in thread
From: Eric B Munson @ 2015-08-09 5:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Eric B Munson, Shuah Khan, Michal Hocko, Michael Kerrisk,
Vlastimil Babka, Jonathan Corbet, Ralf Baechle, Andrea Arcangeli,
linux-alpha, linux-kernel, linux-mips, linux-parisc, linuxppc-dev,
sparclinux, linux-xtensa, linux-mm, linux-arch, linux-api
mlock() allows a user to control page out of program memory, but this
comes at the cost of faulting in the entire mapping when it is
allocated. For large mappings where the entire area is not necessary
this is not ideal. Instead of forcing all locked pages to be present
when they are allocated, this set creates a middle ground. Pages are
marked to be placed on the unevictable LRU (locked) when they are first
used, but they are not faulted in by the mlock call.
This series introduces a new mlock() system call that takes a flags
argument along with the start address and size. This flags argument
gives the caller the ability to request memory be locked in the
traditional way, or to be locked after the page is faulted in. A new
MCL flag is added to mirror the lock on fault behavior from mlock() in
mlockall().
There are two main use cases that this set covers. The first is the
security focussed mlock case. A buffer is needed that cannot be written
to swap. The maximum size is known, but on average the memory used is
significantly less than this maximum. With lock on fault, the buffer
is guaranteed to never be paged out without consuming the maximum size
every time such a buffer is created.
The second use case is focussed on performance. Portions of a large
file are needed and we want to keep the used portions in memory once
accessed. This is the case for large graphical models where the path
through the graph is not known until run time. The entire graph is
unlikely to be used in a given invocation, but once a node has been
used it needs to stay resident for further processing. Given these
constraints we have a number of options. We can potentially waste a
large amount of memory by mlocking the entire region (this can also
cause a significant stall at startup as the entire file is read in).
We can mlock every page as we access them without tracking if the page
is already resident but this introduces large overhead for each access.
The third option is mapping the entire region with PROT_NONE and using
a signal handler for SIGSEGV to mprotect(PROT_READ) and mlock() the
needed page. Doing this page at a time adds a significant performance
penalty. Batching can be used to mitigate this overhead, but in order
to safely avoid trying to mprotect pages outside of the mapping, the
boundaries of each mapping to be used in this way must be tracked and
available to the signal handler. This is precisely what the mm system
in the kernel should already be doing.
For mlock(MLOCK_ONFAULT) the user is charged against RLIMIT_MEMLOCK as
if mlock(MLOCK_LOCKED) or mmap(MAP_LOCKED) was used, so when the VMA is
created not when the pages are faulted in. For mlockall(MCL_ONFAULT)
the user is charged as if MCL_FUTURE was used. This decision was made
to keep the accounting checks out of the page fault path.
To illustrate the benefit of this set I wrote a test program that mmaps
a 5 GB file filled with random data and then makes 15,000,000 accesses
to random addresses in that mapping. The test program was run 20 times
for each setup. Results are reported for two program portions, setup
and execution. The setup phase is calling mmap and optionally mlock on
the entire region. For most experiments this is trivial, but it
highlights the cost of faulting in the entire region. Results are
averages across the 20 runs in milliseconds.
mmap with mlock(MLOCK_LOCKED) on entire range:
Setup avg: 8228.666
Processing avg: 8274.257
mmap with mlock(MLOCK_LOCKED) before each access:
Setup avg: 0.113
Processing avg: 90993.552
mmap with PROT_NONE and signal handler and batch size of 1 page:
With the default value in max_map_count, this gets ENOMEM as I attempt
to change the permissions, after upping the sysctl significantly I get:
Setup avg: 0.058
Processing avg: 69488.073
mmap with PROT_NONE and signal handler and batch size of 8 pages:
Setup avg: 0.068
Processing avg: 38204.116
mmap with PROT_NONE and signal handler and batch size of 16 pages:
Setup avg: 0.044
Processing avg: 29671.180
mmap with mlock(MLOCK_ONFAULT) on entire range:
Setup avg: 0.189
Processing avg: 17904.899
The signal handler in the batch cases faulted in memory in two steps to
avoid having to know the start and end of the faulting mapping. The
first step covers the page that caused the fault as we know that it will
be possible to lock. The second step speculatively tries to mlock and
mprotect the batch size - 1 pages that follow. There may be a clever
way to avoid this without having the program track each mapping to be
covered by this handeler in a globally accessible structure, but I could
not find it. It should be noted that with a large enough batch size
this two step fault handler can still cause the program to crash if it
reaches far beyond the end of the mapping.
These results show that if the developer knows that a majority of the
mapping will be used, it is better to try and fault it in at once,
otherwise mlock(MLOCK_ONFAULT) is significantly faster.
The performance cost of these patches are minimal on the two benchmarks
I have tested (stream and kernbench). The following are the average
values across 20 runs of stream and 10 runs of kernbench after a warmup
run whose results were discarded.
Avg throughput in MB/s from stream using 1000000 element arrays
Test 4.2-rc1 4.2-rc1+lock-on-fault
Copy: 10,566.5 10,421
Scale: 10,685 10,503.5
Add: 12,044.1 11,814.2
Triad: 12,064.8 11,846.3
Kernbench optimal load
4.2-rc1 4.2-rc1+lock-on-fault
Elapsed Time 78.453 78.991
User Time 64.2395 65.2355
System Time 9.7335 9.7085
Context Switches 22211.5 22412.1
Sleeps 14965.3 14956.1
---
Changes from V6:
* Bump the x86 system call number to avoid collision with userfaultfd
* Fix FOLL_POPULATE and FOLL_MLOCK usage when mmap is called with
MAP_POPULATE
* Add documentation for the proc smaps change
* checkpatch fixes
Changes from V5:
Drop MLOCK_LOCKED flag
* MLOCK_ONFAULT and MCL_ONFAULT are treated as a modifier to other locking
operations, mirroring the relationship between VM_LOCKED and
VM_LOCKONFAULT
* Drop mmap flag and related tests
* Fix clearing of MCL_CURRENT when mlockall is called with MCL_FUTURE,
mlockall behavoir now matches the old behavior WRT to ordering
Changes from V4:
Drop all architectures for new sys call entries except x86[_64] and MIPS
Drop munlock2 and munlockall2
Make VM_LOCKONFAULT a modifier to VM_LOCKED only to simplify book keeping
Adjust tests to match
Changes from V3:
Ensure that pages present when mlock2(MLOCK_ONFAULT) is called are locked
Ensure that VM_LOCKONFAULT is handled in cases that used to only check VM_LOCKED
Add tests for new system calls
Add missing syscall entries, fix NR_syscalls on multiple arch's
Add missing MAP_LOCKONFAULT for tile
Changes from V2:
Added new system calls for mlock, munlock, and munlockall with added
flags arguments for controlling how memory is locked or unlocked.
Eric B Munson (6):
mm: mlock: Refactor mlock, munlock, and munlockall code
mm: mlock: Add new mlock system call
mm: Introduce VM_LOCKONFAULT
mm: mlock: Add mlock flags to enable VM_LOCKONFAULT usage
selftests: vm: Add tests for lock on fault
mips: Add entry for new mlock2 syscall
Documentation/filesystems/proc.txt | 1 +
arch/alpha/include/uapi/asm/mman.h | 3 +
arch/mips/include/uapi/asm/mman.h | 6 +
arch/mips/include/uapi/asm/unistd.h | 15 +-
arch/mips/kernel/scall32-o32.S | 1 +
arch/mips/kernel/scall64-64.S | 1 +
arch/mips/kernel/scall64-n32.S | 1 +
arch/mips/kernel/scall64-o32.S | 1 +
arch/parisc/include/uapi/asm/mman.h | 3 +
arch/powerpc/include/uapi/asm/mman.h | 1 +
arch/sparc/include/uapi/asm/mman.h | 1 +
arch/tile/include/uapi/asm/mman.h | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/include/uapi/asm/mman.h | 6 +
drivers/gpu/drm/drm_vm.c | 8 +-
fs/proc/task_mmu.c | 1 +
include/linux/mm.h | 2 +
include/linux/syscalls.h | 2 +
include/uapi/asm-generic/mman-common.h | 5 +
include/uapi/asm-generic/mman.h | 1 +
include/uapi/asm-generic/unistd.h | 4 +-
kernel/fork.c | 2 +-
kernel/sys_ni.c | 1 +
mm/debug.c | 1 +
mm/gup.c | 10 +-
mm/huge_memory.c | 2 +-
mm/hugetlb.c | 4 +-
mm/mlock.c | 87 +++-
mm/mmap.c | 2 +-
mm/rmap.c | 6 +-
tools/testing/selftests/vm/Makefile | 2 +
tools/testing/selftests/vm/mlock2-tests.c | 661 ++++++++++++++++++++++++++++
tools/testing/selftests/vm/on-fault-limit.c | 47 ++
tools/testing/selftests/vm/run_vmtests | 22 +
35 files changed, 872 insertions(+), 41 deletions(-)
create mode 100644 tools/testing/selftests/vm/mlock2-tests.c
create mode 100644 tools/testing/selftests/vm/on-fault-limit.c
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-mm@kvack.org
Cc: linux-arch@vger.kernel.org
Cc: linux-api@vger.kernel.org
--
1.9.1
^ permalink raw reply [flat|nested] 87+ messages in thread* [PATCH v7 0/6] Allow user to request memory to be locked on page fault
@ 2015-08-09 5:22 ` Eric B Munson
0 siblings, 0 replies; 87+ messages in thread
From: Eric B Munson @ 2015-08-09 5:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Eric B Munson, Shuah Khan, Michal Hocko, Michael Kerrisk,
Vlastimil Babka, Jonathan Corbet, Ralf Baechle, Andrea Arcangeli,
linux-alpha, linux-kernel, linux-mips, linux-parisc, linuxppc-dev,
sparclinux, linux-xtensa, linux-mm, linux-arch, linux-api
mlock() allows a user to control page out of program memory, but this
comes at the cost of faulting in the entire mapping when it is
allocated. For large mappings where the entire area is not necessary
this is not ideal. Instead of forcing all locked pages to be present
when they are allocated, this set creates a middle ground. Pages are
marked to be placed on the unevictable LRU (locked) when they are first
used, but they are not faulted in by the mlock call.
This series introduces a new mlock() system call that takes a flags
argument along with the start address and size. This flags argument
gives the caller the ability to request memory be locked in the
traditional way, or to be locked after the page is faulted in. A new
MCL flag is added to mirror the lock on fault behavior from mlock() in
mlockall().
There are two main use cases that this set covers. The first is the
security focussed mlock case. A buffer is needed that cannot be written
to swap. The maximum size is known, but on average the memory used is
significantly less than this maximum. With lock on fault, the buffer
is guaranteed to never be paged out without consuming the maximum size
every time such a buffer is created.
The second use case is focussed on performance. Portions of a large
file are needed and we want to keep the used portions in memory once
accessed. This is the case for large graphical models where the path
through the graph is not known until run time. The entire graph is
unlikely to be used in a given invocation, but once a node has been
used it needs to stay resident for further processing. Given these
constraints we have a number of options. We can potentially waste a
large amount of memory by mlocking the entire region (this can also
cause a significant stall at startup as the entire file is read in).
We can mlock every page as we access them without tracking if the page
is already resident but this introduces large overhead for each access.
The third option is mapping the entire region with PROT_NONE and using
a signal handler for SIGSEGV to mprotect(PROT_READ) and mlock() the
needed page. Doing this page at a time adds a significant performance
penalty. Batching can be used to mitigate this overhead, but in order
to safely avoid trying to mprotect pages outside of the mapping, the
boundaries of each mapping to be used in this way must be tracked and
available to the signal handler. This is precisely what the mm system
in the kernel should already be doing.
For mlock(MLOCK_ONFAULT) the user is charged against RLIMIT_MEMLOCK as
if mlock(MLOCK_LOCKED) or mmap(MAP_LOCKED) was used, so when the VMA is
created not when the pages are faulted in. For mlockall(MCL_ONFAULT)
the user is charged as if MCL_FUTURE was used. This decision was made
to keep the accounting checks out of the page fault path.
To illustrate the benefit of this set I wrote a test program that mmaps
a 5 GB file filled with random data and then makes 15,000,000 accesses
to random addresses in that mapping. The test program was run 20 times
for each setup. Results are reported for two program portions, setup
and execution. The setup phase is calling mmap and optionally mlock on
the entire region. For most experiments this is trivial, but it
highlights the cost of faulting in the entire region. Results are
averages across the 20 runs in milliseconds.
mmap with mlock(MLOCK_LOCKED) on entire range:
Setup avg: 8228.666
Processing avg: 8274.257
mmap with mlock(MLOCK_LOCKED) before each access:
Setup avg: 0.113
Processing avg: 90993.552
mmap with PROT_NONE and signal handler and batch size of 1 page:
With the default value in max_map_count, this gets ENOMEM as I attempt
to change the permissions, after upping the sysctl significantly I get:
Setup avg: 0.058
Processing avg: 69488.073
mmap with PROT_NONE and signal handler and batch size of 8 pages:
Setup avg: 0.068
Processing avg: 38204.116
mmap with PROT_NONE and signal handler and batch size of 16 pages:
Setup avg: 0.044
Processing avg: 29671.180
mmap with mlock(MLOCK_ONFAULT) on entire range:
Setup avg: 0.189
Processing avg: 17904.899
The signal handler in the batch cases faulted in memory in two steps to
avoid having to know the start and end of the faulting mapping. The
first step covers the page that caused the fault as we know that it will
be possible to lock. The second step speculatively tries to mlock and
mprotect the batch size - 1 pages that follow. There may be a clever
way to avoid this without having the program track each mapping to be
covered by this handeler in a globally accessible structure, but I could
not find it. It should be noted that with a large enough batch size
this two step fault handler can still cause the program to crash if it
reaches far beyond the end of the mapping.
These results show that if the developer knows that a majority of the
mapping will be used, it is better to try and fault it in at once,
otherwise mlock(MLOCK_ONFAULT) is significantly faster.
The performance cost of these patches are minimal on the two benchmarks
I have tested (stream and kernbench). The following are the average
values across 20 runs of stream and 10 runs of kernbench after a warmup
run whose results were discarded.
Avg throughput in MB/s from stream using 1000000 element arrays
Test 4.2-rc1 4.2-rc1+lock-on-fault
Copy: 10,566.5 10,421
Scale: 10,685 10,503.5
Add: 12,044.1 11,814.2
Triad: 12,064.8 11,846.3
Kernbench optimal load
4.2-rc1 4.2-rc1+lock-on-fault
Elapsed Time 78.453 78.991
User Time 64.2395 65.2355
System Time 9.7335 9.7085
Context Switches 22211.5 22412.1
Sleeps 14965.3 14956.1
---
Changes from V6:
* Bump the x86 system call number to avoid collision with userfaultfd
* Fix FOLL_POPULATE and FOLL_MLOCK usage when mmap is called with
MAP_POPULATE
* Add documentation for the proc smaps change
* checkpatch fixes
Changes from V5:
Drop MLOCK_LOCKED flag
* MLOCK_ONFAULT and MCL_ONFAULT are treated as a modifier to other locking
operations, mirroring the relationship between VM_LOCKED and
VM_LOCKONFAULT
* Drop mmap flag and related tests
* Fix clearing of MCL_CURRENT when mlockall is called with MCL_FUTURE,
mlockall behavoir now matches the old behavior WRT to ordering
Changes from V4:
Drop all architectures for new sys call entries except x86[_64] and MIPS
Drop munlock2 and munlockall2
Make VM_LOCKONFAULT a modifier to VM_LOCKED only to simplify book keeping
Adjust tests to match
Changes from V3:
Ensure that pages present when mlock2(MLOCK_ONFAULT) is called are locked
Ensure that VM_LOCKONFAULT is handled in cases that used to only check VM_LOCKED
Add tests for new system calls
Add missing syscall entries, fix NR_syscalls on multiple arch's
Add missing MAP_LOCKONFAULT for tile
Changes from V2:
Added new system calls for mlock, munlock, and munlockall with added
flags arguments for controlling how memory is locked or unlocked.
Eric B Munson (6):
mm: mlock: Refactor mlock, munlock, and munlockall code
mm: mlock: Add new mlock system call
mm: Introduce VM_LOCKONFAULT
mm: mlock: Add mlock flags to enable VM_LOCKONFAULT usage
selftests: vm: Add tests for lock on fault
mips: Add entry for new mlock2 syscall
Documentation/filesystems/proc.txt | 1 +
arch/alpha/include/uapi/asm/mman.h | 3 +
arch/mips/include/uapi/asm/mman.h | 6 +
arch/mips/include/uapi/asm/unistd.h | 15 +-
arch/mips/kernel/scall32-o32.S | 1 +
arch/mips/kernel/scall64-64.S | 1 +
arch/mips/kernel/scall64-n32.S | 1 +
arch/mips/kernel/scall64-o32.S | 1 +
arch/parisc/include/uapi/asm/mman.h | 3 +
arch/powerpc/include/uapi/asm/mman.h | 1 +
arch/sparc/include/uapi/asm/mman.h | 1 +
arch/tile/include/uapi/asm/mman.h | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/include/uapi/asm/mman.h | 6 +
drivers/gpu/drm/drm_vm.c | 8 +-
fs/proc/task_mmu.c | 1 +
include/linux/mm.h | 2 +
include/linux/syscalls.h | 2 +
include/uapi/asm-generic/mman-common.h | 5 +
include/uapi/asm-generic/mman.h | 1 +
include/uapi/asm-generic/unistd.h | 4 +-
kernel/fork.c | 2 +-
kernel/sys_ni.c | 1 +
mm/debug.c | 1 +
mm/gup.c | 10 +-
mm/huge_memory.c | 2 +-
mm/hugetlb.c | 4 +-
mm/mlock.c | 87 +++-
mm/mmap.c | 2 +-
mm/rmap.c | 6 +-
tools/testing/selftests/vm/Makefile | 2 +
tools/testing/selftests/vm/mlock2-tests.c | 661 ++++++++++++++++++++++++++++
tools/testing/selftests/vm/on-fault-limit.c | 47 ++
tools/testing/selftests/vm/run_vmtests | 22 +
35 files changed, 872 insertions(+), 41 deletions(-)
create mode 100644 tools/testing/selftests/vm/mlock2-tests.c
create mode 100644 tools/testing/selftests/vm/on-fault-limit.c
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-mm@kvack.org
Cc: linux-arch@vger.kernel.org
Cc: linux-api@vger.kernel.org
--
1.9.1
^ permalink raw reply [flat|nested] 87+ messages in thread* [PATCH v7 1/6] mm: mlock: Refactor mlock, munlock, and munlockall code
2015-08-09 5:22 ` Eric B Munson
@ 2015-08-09 5:22 ` Eric B Munson
-1 siblings, 0 replies; 87+ messages in thread
From: Eric B Munson @ 2015-08-09 5:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Eric B Munson, Michal Hocko, Vlastimil Babka, Kirill A. Shutemov,
linux-mm, linux-kernel
Extending the mlock system call is very difficult because it currently
does not take a flags argument. A later patch in this set will extend
mlock to support a middle ground between pages that are locked and
faulted in immediately and unlocked pages. To pave the way for the new
system call, the code needs some reorganization so that all the actual
entry point handles is checking input and translating to VMA flags.
Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
mm/mlock.c | 30 +++++++++++++++++-------------
1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/mm/mlock.c b/mm/mlock.c
index 6fd2cf1..5692ee5 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -553,7 +553,8 @@ out:
return ret;
}
-static int do_mlock(unsigned long start, size_t len, int on)
+static int apply_vma_lock_flags(unsigned long start, size_t len,
+ vm_flags_t flags)
{
unsigned long nstart, end, tmp;
struct vm_area_struct * vma, * prev;
@@ -575,14 +576,11 @@ static int do_mlock(unsigned long start, size_t len, int on)
prev = vma;
for (nstart = start ; ; ) {
- vm_flags_t newflags;
-
- /* Here we know that vma->vm_start <= nstart < vma->vm_end. */
+ vm_flags_t newflags = vma->vm_flags & ~VM_LOCKED;
- newflags = vma->vm_flags & ~VM_LOCKED;
- if (on)
- newflags |= VM_LOCKED;
+ newflags |= flags;
+ /* Here we know that vma->vm_start <= nstart < vma->vm_end. */
tmp = vma->vm_end;
if (tmp > end)
tmp = end;
@@ -604,7 +602,7 @@ static int do_mlock(unsigned long start, size_t len, int on)
return error;
}
-SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
+static int do_mlock(unsigned long start, size_t len, vm_flags_t flags)
{
unsigned long locked;
unsigned long lock_limit;
@@ -628,7 +626,7 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
/* check against resource limits */
if ((locked <= lock_limit) || capable(CAP_IPC_LOCK))
- error = do_mlock(start, len, 1);
+ error = apply_vma_lock_flags(start, len, flags);
up_write(¤t->mm->mmap_sem);
if (error)
@@ -640,6 +638,11 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
return 0;
}
+SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
+{
+ return do_mlock(start, len, VM_LOCKED);
+}
+
SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
{
int ret;
@@ -648,13 +651,13 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
start &= PAGE_MASK;
down_write(¤t->mm->mmap_sem);
- ret = do_mlock(start, len, 0);
+ ret = apply_vma_lock_flags(start, len, 0);
up_write(¤t->mm->mmap_sem);
return ret;
}
-static int do_mlockall(int flags)
+static int apply_mlockall_flags(int flags)
{
struct vm_area_struct * vma, * prev = NULL;
@@ -662,6 +665,7 @@ static int do_mlockall(int flags)
current->mm->def_flags |= VM_LOCKED;
else
current->mm->def_flags &= ~VM_LOCKED;
+
if (flags == MCL_FUTURE)
goto out;
@@ -703,7 +707,7 @@ SYSCALL_DEFINE1(mlockall, int, flags)
if (!(flags & MCL_CURRENT) || (current->mm->total_vm <= lock_limit) ||
capable(CAP_IPC_LOCK))
- ret = do_mlockall(flags);
+ ret = apply_mlockall_flags(flags);
up_write(¤t->mm->mmap_sem);
if (!ret && (flags & MCL_CURRENT))
mm_populate(0, TASK_SIZE);
@@ -716,7 +720,7 @@ SYSCALL_DEFINE0(munlockall)
int ret;
down_write(¤t->mm->mmap_sem);
- ret = do_mlockall(0);
+ ret = apply_mlockall_flags(0);
up_write(¤t->mm->mmap_sem);
return ret;
}
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 87+ messages in thread* [PATCH v7 1/6] mm: mlock: Refactor mlock, munlock, and munlockall code
@ 2015-08-09 5:22 ` Eric B Munson
0 siblings, 0 replies; 87+ messages in thread
From: Eric B Munson @ 2015-08-09 5:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Eric B Munson, Michal Hocko, Vlastimil Babka, Kirill A. Shutemov,
linux-mm, linux-kernel
Extending the mlock system call is very difficult because it currently
does not take a flags argument. A later patch in this set will extend
mlock to support a middle ground between pages that are locked and
faulted in immediately and unlocked pages. To pave the way for the new
system call, the code needs some reorganization so that all the actual
entry point handles is checking input and translating to VMA flags.
Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
mm/mlock.c | 30 +++++++++++++++++-------------
1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/mm/mlock.c b/mm/mlock.c
index 6fd2cf1..5692ee5 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -553,7 +553,8 @@ out:
return ret;
}
-static int do_mlock(unsigned long start, size_t len, int on)
+static int apply_vma_lock_flags(unsigned long start, size_t len,
+ vm_flags_t flags)
{
unsigned long nstart, end, tmp;
struct vm_area_struct * vma, * prev;
@@ -575,14 +576,11 @@ static int do_mlock(unsigned long start, size_t len, int on)
prev = vma;
for (nstart = start ; ; ) {
- vm_flags_t newflags;
-
- /* Here we know that vma->vm_start <= nstart < vma->vm_end. */
+ vm_flags_t newflags = vma->vm_flags & ~VM_LOCKED;
- newflags = vma->vm_flags & ~VM_LOCKED;
- if (on)
- newflags |= VM_LOCKED;
+ newflags |= flags;
+ /* Here we know that vma->vm_start <= nstart < vma->vm_end. */
tmp = vma->vm_end;
if (tmp > end)
tmp = end;
@@ -604,7 +602,7 @@ static int do_mlock(unsigned long start, size_t len, int on)
return error;
}
-SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
+static int do_mlock(unsigned long start, size_t len, vm_flags_t flags)
{
unsigned long locked;
unsigned long lock_limit;
@@ -628,7 +626,7 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
/* check against resource limits */
if ((locked <= lock_limit) || capable(CAP_IPC_LOCK))
- error = do_mlock(start, len, 1);
+ error = apply_vma_lock_flags(start, len, flags);
up_write(¤t->mm->mmap_sem);
if (error)
@@ -640,6 +638,11 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
return 0;
}
+SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
+{
+ return do_mlock(start, len, VM_LOCKED);
+}
+
SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
{
int ret;
@@ -648,13 +651,13 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
start &= PAGE_MASK;
down_write(¤t->mm->mmap_sem);
- ret = do_mlock(start, len, 0);
+ ret = apply_vma_lock_flags(start, len, 0);
up_write(¤t->mm->mmap_sem);
return ret;
}
-static int do_mlockall(int flags)
+static int apply_mlockall_flags(int flags)
{
struct vm_area_struct * vma, * prev = NULL;
@@ -662,6 +665,7 @@ static int do_mlockall(int flags)
current->mm->def_flags |= VM_LOCKED;
else
current->mm->def_flags &= ~VM_LOCKED;
+
if (flags == MCL_FUTURE)
goto out;
@@ -703,7 +707,7 @@ SYSCALL_DEFINE1(mlockall, int, flags)
if (!(flags & MCL_CURRENT) || (current->mm->total_vm <= lock_limit) ||
capable(CAP_IPC_LOCK))
- ret = do_mlockall(flags);
+ ret = apply_mlockall_flags(flags);
up_write(¤t->mm->mmap_sem);
if (!ret && (flags & MCL_CURRENT))
mm_populate(0, TASK_SIZE);
@@ -716,7 +720,7 @@ SYSCALL_DEFINE0(munlockall)
int ret;
down_write(¤t->mm->mmap_sem);
- ret = do_mlockall(0);
+ ret = apply_mlockall_flags(0);
up_write(¤t->mm->mmap_sem);
return ret;
}
--
1.9.1
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v7 1/6] mm: mlock: Refactor mlock, munlock, and munlockall code
2015-08-09 5:22 ` Eric B Munson
@ 2015-08-12 9:42 ` Michal Hocko
-1 siblings, 0 replies; 87+ messages in thread
From: Michal Hocko @ 2015-08-12 9:42 UTC (permalink / raw)
To: Eric B Munson
Cc: Andrew Morton, Vlastimil Babka, Kirill A. Shutemov, linux-mm,
linux-kernel
On Sun 09-08-15 01:22:51, Eric B Munson wrote:
> Extending the mlock system call is very difficult because it currently
> does not take a flags argument. A later patch in this set will extend
> mlock to support a middle ground between pages that are locked and
> faulted in immediately and unlocked pages. To pave the way for the new
> system call, the code needs some reorganization so that all the actual
> entry point handles is checking input and translating to VMA flags.
OK, I find the reorganized code more readable.
> Signed-off-by: Eric B Munson <emunson@akamai.com>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
Acked-by: Michal Hocko <mhocko@suse.com>
> ---
> mm/mlock.c | 30 +++++++++++++++++-------------
> 1 file changed, 17 insertions(+), 13 deletions(-)
>
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 6fd2cf1..5692ee5 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -553,7 +553,8 @@ out:
> return ret;
> }
>
> -static int do_mlock(unsigned long start, size_t len, int on)
> +static int apply_vma_lock_flags(unsigned long start, size_t len,
> + vm_flags_t flags)
> {
> unsigned long nstart, end, tmp;
> struct vm_area_struct * vma, * prev;
> @@ -575,14 +576,11 @@ static int do_mlock(unsigned long start, size_t len, int on)
> prev = vma;
>
> for (nstart = start ; ; ) {
> - vm_flags_t newflags;
> -
> - /* Here we know that vma->vm_start <= nstart < vma->vm_end. */
> + vm_flags_t newflags = vma->vm_flags & ~VM_LOCKED;
>
> - newflags = vma->vm_flags & ~VM_LOCKED;
> - if (on)
> - newflags |= VM_LOCKED;
> + newflags |= flags;
>
> + /* Here we know that vma->vm_start <= nstart < vma->vm_end. */
> tmp = vma->vm_end;
> if (tmp > end)
> tmp = end;
> @@ -604,7 +602,7 @@ static int do_mlock(unsigned long start, size_t len, int on)
> return error;
> }
>
> -SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
> +static int do_mlock(unsigned long start, size_t len, vm_flags_t flags)
> {
> unsigned long locked;
> unsigned long lock_limit;
> @@ -628,7 +626,7 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
>
> /* check against resource limits */
> if ((locked <= lock_limit) || capable(CAP_IPC_LOCK))
> - error = do_mlock(start, len, 1);
> + error = apply_vma_lock_flags(start, len, flags);
>
> up_write(¤t->mm->mmap_sem);
> if (error)
> @@ -640,6 +638,11 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
> return 0;
> }
>
> +SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
> +{
> + return do_mlock(start, len, VM_LOCKED);
> +}
> +
> SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
> {
> int ret;
> @@ -648,13 +651,13 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
> start &= PAGE_MASK;
>
> down_write(¤t->mm->mmap_sem);
> - ret = do_mlock(start, len, 0);
> + ret = apply_vma_lock_flags(start, len, 0);
> up_write(¤t->mm->mmap_sem);
>
> return ret;
> }
>
> -static int do_mlockall(int flags)
> +static int apply_mlockall_flags(int flags)
> {
> struct vm_area_struct * vma, * prev = NULL;
>
> @@ -662,6 +665,7 @@ static int do_mlockall(int flags)
> current->mm->def_flags |= VM_LOCKED;
> else
> current->mm->def_flags &= ~VM_LOCKED;
> +
> if (flags == MCL_FUTURE)
> goto out;
>
> @@ -703,7 +707,7 @@ SYSCALL_DEFINE1(mlockall, int, flags)
>
> if (!(flags & MCL_CURRENT) || (current->mm->total_vm <= lock_limit) ||
> capable(CAP_IPC_LOCK))
> - ret = do_mlockall(flags);
> + ret = apply_mlockall_flags(flags);
> up_write(¤t->mm->mmap_sem);
> if (!ret && (flags & MCL_CURRENT))
> mm_populate(0, TASK_SIZE);
> @@ -716,7 +720,7 @@ SYSCALL_DEFINE0(munlockall)
> int ret;
>
> down_write(¤t->mm->mmap_sem);
> - ret = do_mlockall(0);
> + ret = apply_mlockall_flags(0);
> up_write(¤t->mm->mmap_sem);
> return ret;
> }
> --
> 1.9.1
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 87+ messages in thread* Re: [PATCH v7 1/6] mm: mlock: Refactor mlock, munlock, and munlockall code
@ 2015-08-12 9:42 ` Michal Hocko
0 siblings, 0 replies; 87+ messages in thread
From: Michal Hocko @ 2015-08-12 9:42 UTC (permalink / raw)
To: Eric B Munson
Cc: Andrew Morton, Vlastimil Babka, Kirill A. Shutemov, linux-mm,
linux-kernel
On Sun 09-08-15 01:22:51, Eric B Munson wrote:
> Extending the mlock system call is very difficult because it currently
> does not take a flags argument. A later patch in this set will extend
> mlock to support a middle ground between pages that are locked and
> faulted in immediately and unlocked pages. To pave the way for the new
> system call, the code needs some reorganization so that all the actual
> entry point handles is checking input and translating to VMA flags.
OK, I find the reorganized code more readable.
> Signed-off-by: Eric B Munson <emunson@akamai.com>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
Acked-by: Michal Hocko <mhocko@suse.com>
> ---
> mm/mlock.c | 30 +++++++++++++++++-------------
> 1 file changed, 17 insertions(+), 13 deletions(-)
>
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 6fd2cf1..5692ee5 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -553,7 +553,8 @@ out:
> return ret;
> }
>
> -static int do_mlock(unsigned long start, size_t len, int on)
> +static int apply_vma_lock_flags(unsigned long start, size_t len,
> + vm_flags_t flags)
> {
> unsigned long nstart, end, tmp;
> struct vm_area_struct * vma, * prev;
> @@ -575,14 +576,11 @@ static int do_mlock(unsigned long start, size_t len, int on)
> prev = vma;
>
> for (nstart = start ; ; ) {
> - vm_flags_t newflags;
> -
> - /* Here we know that vma->vm_start <= nstart < vma->vm_end. */
> + vm_flags_t newflags = vma->vm_flags & ~VM_LOCKED;
>
> - newflags = vma->vm_flags & ~VM_LOCKED;
> - if (on)
> - newflags |= VM_LOCKED;
> + newflags |= flags;
>
> + /* Here we know that vma->vm_start <= nstart < vma->vm_end. */
> tmp = vma->vm_end;
> if (tmp > end)
> tmp = end;
> @@ -604,7 +602,7 @@ static int do_mlock(unsigned long start, size_t len, int on)
> return error;
> }
>
> -SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
> +static int do_mlock(unsigned long start, size_t len, vm_flags_t flags)
> {
> unsigned long locked;
> unsigned long lock_limit;
> @@ -628,7 +626,7 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
>
> /* check against resource limits */
> if ((locked <= lock_limit) || capable(CAP_IPC_LOCK))
> - error = do_mlock(start, len, 1);
> + error = apply_vma_lock_flags(start, len, flags);
>
> up_write(¤t->mm->mmap_sem);
> if (error)
> @@ -640,6 +638,11 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
> return 0;
> }
>
> +SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
> +{
> + return do_mlock(start, len, VM_LOCKED);
> +}
> +
> SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
> {
> int ret;
> @@ -648,13 +651,13 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
> start &= PAGE_MASK;
>
> down_write(¤t->mm->mmap_sem);
> - ret = do_mlock(start, len, 0);
> + ret = apply_vma_lock_flags(start, len, 0);
> up_write(¤t->mm->mmap_sem);
>
> return ret;
> }
>
> -static int do_mlockall(int flags)
> +static int apply_mlockall_flags(int flags)
> {
> struct vm_area_struct * vma, * prev = NULL;
>
> @@ -662,6 +665,7 @@ static int do_mlockall(int flags)
> current->mm->def_flags |= VM_LOCKED;
> else
> current->mm->def_flags &= ~VM_LOCKED;
> +
> if (flags == MCL_FUTURE)
> goto out;
>
> @@ -703,7 +707,7 @@ SYSCALL_DEFINE1(mlockall, int, flags)
>
> if (!(flags & MCL_CURRENT) || (current->mm->total_vm <= lock_limit) ||
> capable(CAP_IPC_LOCK))
> - ret = do_mlockall(flags);
> + ret = apply_mlockall_flags(flags);
> up_write(¤t->mm->mmap_sem);
> if (!ret && (flags & MCL_CURRENT))
> mm_populate(0, TASK_SIZE);
> @@ -716,7 +720,7 @@ SYSCALL_DEFINE0(munlockall)
> int ret;
>
> down_write(¤t->mm->mmap_sem);
> - ret = do_mlockall(0);
> + ret = apply_mlockall_flags(0);
> up_write(¤t->mm->mmap_sem);
> return ret;
> }
> --
> 1.9.1
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v7 2/6] mm: mlock: Add new mlock system call
2015-08-09 5:22 ` Eric B Munson
` (2 preceding siblings ...)
(?)
@ 2015-08-09 5:22 ` Eric B Munson
-1 siblings, 0 replies; 87+ messages in thread
From: Eric B Munson @ 2015-08-09 5:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Eric B Munson, Michal Hocko, Vlastimil Babka, Heiko Carstens,
Geert Uytterhoeven, Catalin Marinas, Stephen Rothwell,
Guenter Roeck, Andrea Arcangeli, linux-alpha, linux-kernel,
linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel,
linux-ia64, linux-m68k, linux-am33-list, linux-parisc,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-xtensa,
linux-api, linux-arch, linux-mm
With the refactored mlock code, introduce a new system call for mlock.
The new call will allow the user to specify what lock states are being
added. mlock2 is trivial at the moment, but a follow on patch will add
a new mlock state making it useful.
Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: adi-buildroot-devel@lists.sourceforge.net
Cc: linux-cris-kernel@axis.com
Cc: linux-ia64@vger.kernel.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-am33-list@redhat.com
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
---
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
include/linux/syscalls.h | 2 ++
include/uapi/asm-generic/unistd.h | 4 +++-
kernel/sys_ni.c | 1 +
mm/mlock.c | 8 ++++++++
6 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index ef8187f..8e06da6 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -365,3 +365,4 @@
356 i386 memfd_create sys_memfd_create
357 i386 bpf sys_bpf
358 i386 execveat sys_execveat stub32_execveat
+360 i386 mlock2 sys_mlock2
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 9ef32d5..67601e7 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -329,6 +329,7 @@
320 common kexec_file_load sys_kexec_file_load
321 common bpf sys_bpf
322 64 execveat stub_execveat
+324 common mlock2 sys_mlock2
#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index b45c45b..56a3d59 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -884,4 +884,6 @@ asmlinkage long sys_execveat(int dfd, const char __user *filename,
const char __user *const __user *argv,
const char __user *const __user *envp, int flags);
+asmlinkage long sys_mlock2(unsigned long start, size_t len, int flags);
+
#endif
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index e016bd9..14a6013 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -709,9 +709,11 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
__SYSCALL(__NR_bpf, sys_bpf)
#define __NR_execveat 281
__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
+#define __NR_mlock2 282
+__SYSCALL(__NR_mlock2, sys_mlock2)
#undef __NR_syscalls
-#define __NR_syscalls 282
+#define __NR_syscalls 283
/*
* All syscalls below here should go away really,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 7995ef5..4818b71 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -193,6 +193,7 @@ cond_syscall(sys_mlock);
cond_syscall(sys_munlock);
cond_syscall(sys_mlockall);
cond_syscall(sys_munlockall);
+cond_syscall(sys_mlock2);
cond_syscall(sys_mincore);
cond_syscall(sys_madvise);
cond_syscall(sys_mremap);
diff --git a/mm/mlock.c b/mm/mlock.c
index 5692ee5..3094f27 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -643,6 +643,14 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
return do_mlock(start, len, VM_LOCKED);
}
+SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
+{
+ if (flags)
+ return -EINVAL;
+
+ return do_mlock(start, len, VM_LOCKED);
+}
+
SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
{
int ret;
--
1.9.1
^ permalink raw reply related [flat|nested] 87+ messages in thread* [PATCH v7 2/6] mm: mlock: Add new mlock system call
@ 2015-08-09 5:22 ` Eric B Munson
0 siblings, 0 replies; 87+ messages in thread
From: Eric B Munson @ 2015-08-09 5:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Eric B Munson, Michal Hocko, Vlastimil Babka, Heiko Carstens,
Geert Uytterhoeven, Catalin Marinas, Stephen Rothwell,
Guenter Roeck, Andrea Arcangeli, linux-alpha, linux-kernel,
linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel,
linux-ia64, linux-m68k, linux-am33-list, linux-parisc,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-xtensa,
linux-api, linux-arch, linux-mm
With the refactored mlock code, introduce a new system call for mlock.
The new call will allow the user to specify what lock states are being
added. mlock2 is trivial at the moment, but a follow on patch will add
a new mlock state making it useful.
Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: adi-buildroot-devel@lists.sourceforge.net
Cc: linux-cris-kernel@axis.com
Cc: linux-ia64@vger.kernel.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-am33-list@redhat.com
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
---
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
include/linux/syscalls.h | 2 ++
include/uapi/asm-generic/unistd.h | 4 +++-
kernel/sys_ni.c | 1 +
mm/mlock.c | 8 ++++++++
6 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index ef8187f..8e06da6 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -365,3 +365,4 @@
356 i386 memfd_create sys_memfd_create
357 i386 bpf sys_bpf
358 i386 execveat sys_execveat stub32_execveat
+360 i386 mlock2 sys_mlock2
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 9ef32d5..67601e7 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -329,6 +329,7 @@
320 common kexec_file_load sys_kexec_file_load
321 common bpf sys_bpf
322 64 execveat stub_execveat
+324 common mlock2 sys_mlock2
#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index b45c45b..56a3d59 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -884,4 +884,6 @@ asmlinkage long sys_execveat(int dfd, const char __user *filename,
const char __user *const __user *argv,
const char __user *const __user *envp, int flags);
+asmlinkage long sys_mlock2(unsigned long start, size_t len, int flags);
+
#endif
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index e016bd9..14a6013 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -709,9 +709,11 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
__SYSCALL(__NR_bpf, sys_bpf)
#define __NR_execveat 281
__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
+#define __NR_mlock2 282
+__SYSCALL(__NR_mlock2, sys_mlock2)
#undef __NR_syscalls
-#define __NR_syscalls 282
+#define __NR_syscalls 283
/*
* All syscalls below here should go away really,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 7995ef5..4818b71 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -193,6 +193,7 @@ cond_syscall(sys_mlock);
cond_syscall(sys_munlock);
cond_syscall(sys_mlockall);
cond_syscall(sys_munlockall);
+cond_syscall(sys_mlock2);
cond_syscall(sys_mincore);
cond_syscall(sys_madvise);
cond_syscall(sys_mremap);
diff --git a/mm/mlock.c b/mm/mlock.c
index 5692ee5..3094f27 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -643,6 +643,14 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
return do_mlock(start, len, VM_LOCKED);
}
+SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
+{
+ if (flags)
+ return -EINVAL;
+
+ return do_mlock(start, len, VM_LOCKED);
+}
+
SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
{
int ret;
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 87+ messages in thread* [PATCH v7 2/6] mm: mlock: Add new mlock system call
@ 2015-08-09 5:22 ` Eric B Munson
0 siblings, 0 replies; 87+ messages in thread
From: Eric B Munson @ 2015-08-09 5:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Eric B Munson, Michal Hocko, Vlastimil Babka, Heiko Carstens,
Geert Uytterhoeven, Catalin Marinas, Stephen Rothwell,
Guenter Roeck, Andrea Arcangeli, linux-alpha, linux-kernel,
linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel,
linux-ia64, linux-m68k, linux-am33-list, linux-parisc,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-xtensa,
linux-api, linux-arch, linux-mm
With the refactored mlock code, introduce a new system call for mlock.
The new call will allow the user to specify what lock states are being
added. mlock2 is trivial at the moment, but a follow on patch will add
a new mlock state making it useful.
Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: adi-buildroot-devel@lists.sourceforge.net
Cc: linux-cris-kernel@axis.com
Cc: linux-ia64@vger.kernel.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-am33-list@redhat.com
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
---
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
include/linux/syscalls.h | 2 ++
include/uapi/asm-generic/unistd.h | 4 +++-
kernel/sys_ni.c | 1 +
mm/mlock.c | 8 ++++++++
6 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index ef8187f..8e06da6 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -365,3 +365,4 @@
356 i386 memfd_create sys_memfd_create
357 i386 bpf sys_bpf
358 i386 execveat sys_execveat stub32_execveat
+360 i386 mlock2 sys_mlock2
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 9ef32d5..67601e7 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -329,6 +329,7 @@
320 common kexec_file_load sys_kexec_file_load
321 common bpf sys_bpf
322 64 execveat stub_execveat
+324 common mlock2 sys_mlock2
#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index b45c45b..56a3d59 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -884,4 +884,6 @@ asmlinkage long sys_execveat(int dfd, const char __user *filename,
const char __user *const __user *argv,
const char __user *const __user *envp, int flags);
+asmlinkage long sys_mlock2(unsigned long start, size_t len, int flags);
+
#endif
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index e016bd9..14a6013 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -709,9 +709,11 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
__SYSCALL(__NR_bpf, sys_bpf)
#define __NR_execveat 281
__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
+#define __NR_mlock2 282
+__SYSCALL(__NR_mlock2, sys_mlock2)
#undef __NR_syscalls
-#define __NR_syscalls 282
+#define __NR_syscalls 283
/*
* All syscalls below here should go away really,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 7995ef5..4818b71 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -193,6 +193,7 @@ cond_syscall(sys_mlock);
cond_syscall(sys_munlock);
cond_syscall(sys_mlockall);
cond_syscall(sys_munlockall);
+cond_syscall(sys_mlock2);
cond_syscall(sys_mincore);
cond_syscall(sys_madvise);
cond_syscall(sys_mremap);
diff --git a/mm/mlock.c b/mm/mlock.c
index 5692ee5..3094f27 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -643,6 +643,14 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
return do_mlock(start, len, VM_LOCKED);
}
+SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
+{
+ if (flags)
+ return -EINVAL;
+
+ return do_mlock(start, len, VM_LOCKED);
+}
+
SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
{
int ret;
--
1.9.1
^ permalink raw reply related [flat|nested] 87+ messages in thread* [PATCH v7 2/6] mm: mlock: Add new mlock system call
@ 2015-08-09 5:22 ` Eric B Munson
0 siblings, 0 replies; 87+ messages in thread
From: Eric B Munson @ 2015-08-09 5:22 UTC (permalink / raw)
To: linux-arm-kernel
With the refactored mlock code, introduce a new system call for mlock.
The new call will allow the user to specify what lock states are being
added. mlock2 is trivial at the moment, but a follow on patch will add
a new mlock state making it useful.
Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: linux-alpha at vger.kernel.org
Cc: linux-kernel at vger.kernel.org
Cc: linux-arm-kernel at lists.infradead.org
Cc: adi-buildroot-devel at lists.sourceforge.net
Cc: linux-cris-kernel at axis.com
Cc: linux-ia64 at vger.kernel.org
Cc: linux-m68k at lists.linux-m68k.org
Cc: linux-am33-list at redhat.com
Cc: linux-parisc at vger.kernel.org
Cc: linuxppc-dev at lists.ozlabs.org
Cc: linux-s390 at vger.kernel.org
Cc: linux-sh at vger.kernel.org
Cc: sparclinux at vger.kernel.org
Cc: linux-xtensa at linux-xtensa.org
Cc: linux-api at vger.kernel.org
Cc: linux-arch at vger.kernel.org
Cc: linux-mm at kvack.org
---
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
include/linux/syscalls.h | 2 ++
include/uapi/asm-generic/unistd.h | 4 +++-
kernel/sys_ni.c | 1 +
mm/mlock.c | 8 ++++++++
6 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index ef8187f..8e06da6 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -365,3 +365,4 @@
356 i386 memfd_create sys_memfd_create
357 i386 bpf sys_bpf
358 i386 execveat sys_execveat stub32_execveat
+360 i386 mlock2 sys_mlock2
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 9ef32d5..67601e7 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -329,6 +329,7 @@
320 common kexec_file_load sys_kexec_file_load
321 common bpf sys_bpf
322 64 execveat stub_execveat
+324 common mlock2 sys_mlock2
#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index b45c45b..56a3d59 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -884,4 +884,6 @@ asmlinkage long sys_execveat(int dfd, const char __user *filename,
const char __user *const __user *argv,
const char __user *const __user *envp, int flags);
+asmlinkage long sys_mlock2(unsigned long start, size_t len, int flags);
+
#endif
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index e016bd9..14a6013 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -709,9 +709,11 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
__SYSCALL(__NR_bpf, sys_bpf)
#define __NR_execveat 281
__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
+#define __NR_mlock2 282
+__SYSCALL(__NR_mlock2, sys_mlock2)
#undef __NR_syscalls
-#define __NR_syscalls 282
+#define __NR_syscalls 283
/*
* All syscalls below here should go away really,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 7995ef5..4818b71 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -193,6 +193,7 @@ cond_syscall(sys_mlock);
cond_syscall(sys_munlock);
cond_syscall(sys_mlockall);
cond_syscall(sys_munlockall);
+cond_syscall(sys_mlock2);
cond_syscall(sys_mincore);
cond_syscall(sys_madvise);
cond_syscall(sys_mremap);
diff --git a/mm/mlock.c b/mm/mlock.c
index 5692ee5..3094f27 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -643,6 +643,14 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
return do_mlock(start, len, VM_LOCKED);
}
+SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
+{
+ if (flags)
+ return -EINVAL;
+
+ return do_mlock(start, len, VM_LOCKED);
+}
+
SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
{
int ret;
--
1.9.1
^ permalink raw reply related [flat|nested] 87+ messages in thread* [PATCH v7 2/6] mm: mlock: Add new mlock system call
@ 2015-08-09 5:22 ` Eric B Munson
0 siblings, 0 replies; 87+ messages in thread
From: Eric B Munson @ 2015-08-09 5:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Eric B Munson, Michal Hocko, Vlastimil Babka, Heiko Carstens,
Geert Uytterhoeven, Catalin Marinas, Stephen Rothwell,
Guenter Roeck, Andrea Arcangeli, linux-alpha, linux-kernel,
linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel,
linux-ia64, linux-m68k, linux-am33-list, linux-parisc,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-xtensa,
linux-api, linux-arch, linux-mm
With the refactored mlock code, introduce a new system call for mlock.
The new call will allow the user to specify what lock states are being
added. mlock2 is trivial at the moment, but a follow on patch will add
a new mlock state making it useful.
Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: adi-buildroot-devel@lists.sourceforge.net
Cc: linux-cris-kernel@axis.com
Cc: linux-ia64@vger.kernel.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-am33-list@redhat.com
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
---
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
include/linux/syscalls.h | 2 ++
include/uapi/asm-generic/unistd.h | 4 +++-
kernel/sys_ni.c | 1 +
mm/mlock.c | 8 ++++++++
6 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index ef8187f..8e06da6 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -365,3 +365,4 @@
356 i386 memfd_create sys_memfd_create
357 i386 bpf sys_bpf
358 i386 execveat sys_execveat stub32_execveat
+360 i386 mlock2 sys_mlock2
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 9ef32d5..67601e7 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -329,6 +329,7 @@
320 common kexec_file_load sys_kexec_file_load
321 common bpf sys_bpf
322 64 execveat stub_execveat
+324 common mlock2 sys_mlock2
#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index b45c45b..56a3d59 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -884,4 +884,6 @@ asmlinkage long sys_execveat(int dfd, const char __user *filename,
const char __user *const __user *argv,
const char __user *const __user *envp, int flags);
+asmlinkage long sys_mlock2(unsigned long start, size_t len, int flags);
+
#endif
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index e016bd9..14a6013 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -709,9 +709,11 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
__SYSCALL(__NR_bpf, sys_bpf)
#define __NR_execveat 281
__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
+#define __NR_mlock2 282
+__SYSCALL(__NR_mlock2, sys_mlock2)
#undef __NR_syscalls
-#define __NR_syscalls 282
+#define __NR_syscalls 283
/*
* All syscalls below here should go away really,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 7995ef5..4818b71 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -193,6 +193,7 @@ cond_syscall(sys_mlock);
cond_syscall(sys_munlock);
cond_syscall(sys_mlockall);
cond_syscall(sys_munlockall);
+cond_syscall(sys_mlock2);
cond_syscall(sys_mincore);
cond_syscall(sys_madvise);
cond_syscall(sys_mremap);
diff --git a/mm/mlock.c b/mm/mlock.c
index 5692ee5..3094f27 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -643,6 +643,14 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
return do_mlock(start, len, VM_LOCKED);
}
+SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
+{
+ if (flags)
+ return -EINVAL;
+
+ return do_mlock(start, len, VM_LOCKED);
+}
+
SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
{
int ret;
--
1.9.1
^ permalink raw reply related [flat|nested] 87+ messages in thread* [PATCH v7 2/6] mm: mlock: Add new mlock system call
@ 2015-08-09 5:22 ` Eric B Munson
0 siblings, 0 replies; 87+ messages in thread
From: Eric B Munson @ 2015-08-09 5:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Eric B Munson, Michal Hocko, Vlastimil Babka, Heiko Carstens,
Geert Uytterhoeven, Catalin Marinas, Stephen Rothwell,
Guenter Roeck, Andrea Arcangeli, linux-alpha, linux-kernel,
linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel,
linux-ia64, linux-m68k, linux-am33-list, linux-parisc,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-xtensa,
linux-api, linux-arch, linux-mm
With the refactored mlock code, introduce a new system call for mlock.
The new call will allow the user to specify what lock states are being
added. mlock2 is trivial at the moment, but a follow on patch will add
a new mlock state making it useful.
Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: adi-buildroot-devel@lists.sourceforge.net
Cc: linux-cris-kernel@axis.com
Cc: linux-ia64@vger.kernel.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-am33-list@redhat.com
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
---
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
include/linux/syscalls.h | 2 ++
include/uapi/asm-generic/unistd.h | 4 +++-
kernel/sys_ni.c | 1 +
mm/mlock.c | 8 ++++++++
6 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index ef8187f..8e06da6 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -365,3 +365,4 @@
356 i386 memfd_create sys_memfd_create
357 i386 bpf sys_bpf
358 i386 execveat sys_execveat stub32_execveat
+360 i386 mlock2 sys_mlock2
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 9ef32d5..67601e7 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -329,6 +329,7 @@
320 common kexec_file_load sys_kexec_file_load
321 common bpf sys_bpf
322 64 execveat stub_execveat
+324 common mlock2 sys_mlock2
#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index b45c45b..56a3d59 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -884,4 +884,6 @@ asmlinkage long sys_execveat(int dfd, const char __user *filename,
const char __user *const __user *argv,
const char __user *const __user *envp, int flags);
+asmlinkage long sys_mlock2(unsigned long start, size_t len, int flags);
+
#endif
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index e016bd9..14a6013 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -709,9 +709,11 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
__SYSCALL(__NR_bpf, sys_bpf)
#define __NR_execveat 281
__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
+#define __NR_mlock2 282
+__SYSCALL(__NR_mlock2, sys_mlock2)
#undef __NR_syscalls
-#define __NR_syscalls 282
+#define __NR_syscalls 283
/*
* All syscalls below here should go away really,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 7995ef5..4818b71 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -193,6 +193,7 @@ cond_syscall(sys_mlock);
cond_syscall(sys_munlock);
cond_syscall(sys_mlockall);
cond_syscall(sys_munlockall);
+cond_syscall(sys_mlock2);
cond_syscall(sys_mincore);
cond_syscall(sys_madvise);
cond_syscall(sys_mremap);
diff --git a/mm/mlock.c b/mm/mlock.c
index 5692ee5..3094f27 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -643,6 +643,14 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
return do_mlock(start, len, VM_LOCKED);
}
+SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
+{
+ if (flags)
+ return -EINVAL;
+
+ return do_mlock(start, len, VM_LOCKED);
+}
+
SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
{
int ret;
--
1.9.1
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v7 2/6] mm: mlock: Add new mlock system call
2015-08-09 5:22 ` Eric B Munson
` (3 preceding siblings ...)
(?)
@ 2015-08-12 9:45 ` Michal Hocko
-1 siblings, 0 replies; 87+ messages in thread
From: Michal Hocko @ 2015-08-12 9:45 UTC (permalink / raw)
To: Eric B Munson
Cc: Andrew Morton, Vlastimil Babka, Heiko Carstens,
Geert Uytterhoeven, Catalin Marinas, Stephen Rothwell,
Guenter Roeck, Andrea Arcangeli, linux-alpha, linux-kernel,
linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel,
linux-ia64, linux-m68k, linux-am33-list, linux-parisc,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-xtensa,
linux-api, linux-arch, linux-mm
On Sun 09-08-15 01:22:52, Eric B Munson wrote:
> With the refactored mlock code, introduce a new system call for mlock.
> The new call will allow the user to specify what lock states are being
> added. mlock2 is trivial at the moment, but a follow on patch will add
> a new mlock state making it useful.
Looks good to me
Acked-by: Michal Hocko <mhocko@suse.com>
> Signed-off-by: Eric B Munson <emunson@akamai.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Cc: Guenter Roeck <linux@roeck-us.net>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: linux-alpha@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: adi-buildroot-devel@lists.sourceforge.net
> Cc: linux-cris-kernel@axis.com
> Cc: linux-ia64@vger.kernel.org
> Cc: linux-m68k@lists.linux-m68k.org
> Cc: linux-am33-list@redhat.com
> Cc: linux-parisc@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-s390@vger.kernel.org
> Cc: linux-sh@vger.kernel.org
> Cc: sparclinux@vger.kernel.org
> Cc: linux-xtensa@linux-xtensa.org
> Cc: linux-api@vger.kernel.org
> Cc: linux-arch@vger.kernel.org
> Cc: linux-mm@kvack.org
> ---
> arch/x86/entry/syscalls/syscall_32.tbl | 1 +
> arch/x86/entry/syscalls/syscall_64.tbl | 1 +
> include/linux/syscalls.h | 2 ++
> include/uapi/asm-generic/unistd.h | 4 +++-
> kernel/sys_ni.c | 1 +
> mm/mlock.c | 8 ++++++++
> 6 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index ef8187f..8e06da6 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -365,3 +365,4 @@
> 356 i386 memfd_create sys_memfd_create
> 357 i386 bpf sys_bpf
> 358 i386 execveat sys_execveat stub32_execveat
> +360 i386 mlock2 sys_mlock2
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 9ef32d5..67601e7 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -329,6 +329,7 @@
> 320 common kexec_file_load sys_kexec_file_load
> 321 common bpf sys_bpf
> 322 64 execveat stub_execveat
> +324 common mlock2 sys_mlock2
>
> #
> # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index b45c45b..56a3d59 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -884,4 +884,6 @@ asmlinkage long sys_execveat(int dfd, const char __user *filename,
> const char __user *const __user *argv,
> const char __user *const __user *envp, int flags);
>
> +asmlinkage long sys_mlock2(unsigned long start, size_t len, int flags);
> +
> #endif
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index e016bd9..14a6013 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -709,9 +709,11 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
> __SYSCALL(__NR_bpf, sys_bpf)
> #define __NR_execveat 281
> __SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
> +#define __NR_mlock2 282
> +__SYSCALL(__NR_mlock2, sys_mlock2)
>
> #undef __NR_syscalls
> -#define __NR_syscalls 282
> +#define __NR_syscalls 283
>
> /*
> * All syscalls below here should go away really,
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index 7995ef5..4818b71 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -193,6 +193,7 @@ cond_syscall(sys_mlock);
> cond_syscall(sys_munlock);
> cond_syscall(sys_mlockall);
> cond_syscall(sys_munlockall);
> +cond_syscall(sys_mlock2);
> cond_syscall(sys_mincore);
> cond_syscall(sys_madvise);
> cond_syscall(sys_mremap);
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 5692ee5..3094f27 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -643,6 +643,14 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
> return do_mlock(start, len, VM_LOCKED);
> }
>
> +SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
> +{
> + if (flags)
> + return -EINVAL;
> +
> + return do_mlock(start, len, VM_LOCKED);
> +}
> +
> SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
> {
> int ret;
> --
> 1.9.1
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 87+ messages in thread* Re: [PATCH v7 2/6] mm: mlock: Add new mlock system call
@ 2015-08-12 9:45 ` Michal Hocko
0 siblings, 0 replies; 87+ messages in thread
From: Michal Hocko @ 2015-08-12 9:45 UTC (permalink / raw)
To: Eric B Munson
Cc: Andrew Morton, Vlastimil Babka, Heiko Carstens,
Geert Uytterhoeven, Catalin Marinas, Stephen Rothwell,
Guenter Roeck, Andrea Arcangeli, linux-alpha, linux-kernel,
linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel,
linux-ia64, linux-m68k, linux-am33-list, linux-parisc,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-xtensa,
linux-api, linux-arch, linux-mm
On Sun 09-08-15 01:22:52, Eric B Munson wrote:
> With the refactored mlock code, introduce a new system call for mlock.
> The new call will allow the user to specify what lock states are being
> added. mlock2 is trivial at the moment, but a follow on patch will add
> a new mlock state making it useful.
Looks good to me
Acked-by: Michal Hocko <mhocko@suse.com>
> Signed-off-by: Eric B Munson <emunson@akamai.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Cc: Guenter Roeck <linux@roeck-us.net>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: linux-alpha@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: adi-buildroot-devel@lists.sourceforge.net
> Cc: linux-cris-kernel@axis.com
> Cc: linux-ia64@vger.kernel.org
> Cc: linux-m68k@lists.linux-m68k.org
> Cc: linux-am33-list@redhat.com
> Cc: linux-parisc@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-s390@vger.kernel.org
> Cc: linux-sh@vger.kernel.org
> Cc: sparclinux@vger.kernel.org
> Cc: linux-xtensa@linux-xtensa.org
> Cc: linux-api@vger.kernel.org
> Cc: linux-arch@vger.kernel.org
> Cc: linux-mm@kvack.org
> ---
> arch/x86/entry/syscalls/syscall_32.tbl | 1 +
> arch/x86/entry/syscalls/syscall_64.tbl | 1 +
> include/linux/syscalls.h | 2 ++
> include/uapi/asm-generic/unistd.h | 4 +++-
> kernel/sys_ni.c | 1 +
> mm/mlock.c | 8 ++++++++
> 6 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index ef8187f..8e06da6 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -365,3 +365,4 @@
> 356 i386 memfd_create sys_memfd_create
> 357 i386 bpf sys_bpf
> 358 i386 execveat sys_execveat stub32_execveat
> +360 i386 mlock2 sys_mlock2
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 9ef32d5..67601e7 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -329,6 +329,7 @@
> 320 common kexec_file_load sys_kexec_file_load
> 321 common bpf sys_bpf
> 322 64 execveat stub_execveat
> +324 common mlock2 sys_mlock2
>
> #
> # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index b45c45b..56a3d59 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -884,4 +884,6 @@ asmlinkage long sys_execveat(int dfd, const char __user *filename,
> const char __user *const __user *argv,
> const char __user *const __user *envp, int flags);
>
> +asmlinkage long sys_mlock2(unsigned long start, size_t len, int flags);
> +
> #endif
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index e016bd9..14a6013 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -709,9 +709,11 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
> __SYSCALL(__NR_bpf, sys_bpf)
> #define __NR_execveat 281
> __SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
> +#define __NR_mlock2 282
> +__SYSCALL(__NR_mlock2, sys_mlock2)
>
> #undef __NR_syscalls
> -#define __NR_syscalls 282
> +#define __NR_syscalls 283
>
> /*
> * All syscalls below here should go away really,
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index 7995ef5..4818b71 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -193,6 +193,7 @@ cond_syscall(sys_mlock);
> cond_syscall(sys_munlock);
> cond_syscall(sys_mlockall);
> cond_syscall(sys_munlockall);
> +cond_syscall(sys_mlock2);
> cond_syscall(sys_mincore);
> cond_syscall(sys_madvise);
> cond_syscall(sys_mremap);
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 5692ee5..3094f27 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -643,6 +643,14 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
> return do_mlock(start, len, VM_LOCKED);
> }
>
> +SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
> +{
> + if (flags)
> + return -EINVAL;
> +
> + return do_mlock(start, len, VM_LOCKED);
> +}
> +
> SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
> {
> int ret;
> --
> 1.9.1
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 87+ messages in thread* Re: [PATCH v7 2/6] mm: mlock: Add new mlock system call
@ 2015-08-12 9:45 ` Michal Hocko
0 siblings, 0 replies; 87+ messages in thread
From: Michal Hocko @ 2015-08-12 9:45 UTC (permalink / raw)
To: Eric B Munson
Cc: Andrew Morton, Vlastimil Babka, Heiko Carstens,
Geert Uytterhoeven, Catalin Marinas, Stephen Rothwell,
Guenter Roeck, Andrea Arcangeli, linux-alpha, linux-kernel,
linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel,
linux-ia64, linux-m68k, linux-am33-list, linux-parisc,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-xtensa,
linux-api, linux-arch, linux-mm
On Sun 09-08-15 01:22:52, Eric B Munson wrote:
> With the refactored mlock code, introduce a new system call for mlock.
> The new call will allow the user to specify what lock states are being
> added. mlock2 is trivial at the moment, but a follow on patch will add
> a new mlock state making it useful.
Looks good to me
Acked-by: Michal Hocko <mhocko@suse.com>
> Signed-off-by: Eric B Munson <emunson@akamai.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Cc: Guenter Roeck <linux@roeck-us.net>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: linux-alpha@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: adi-buildroot-devel@lists.sourceforge.net
> Cc: linux-cris-kernel@axis.com
> Cc: linux-ia64@vger.kernel.org
> Cc: linux-m68k@lists.linux-m68k.org
> Cc: linux-am33-list@redhat.com
> Cc: linux-parisc@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-s390@vger.kernel.org
> Cc: linux-sh@vger.kernel.org
> Cc: sparclinux@vger.kernel.org
> Cc: linux-xtensa@linux-xtensa.org
> Cc: linux-api@vger.kernel.org
> Cc: linux-arch@vger.kernel.org
> Cc: linux-mm@kvack.org
> ---
> arch/x86/entry/syscalls/syscall_32.tbl | 1 +
> arch/x86/entry/syscalls/syscall_64.tbl | 1 +
> include/linux/syscalls.h | 2 ++
> include/uapi/asm-generic/unistd.h | 4 +++-
> kernel/sys_ni.c | 1 +
> mm/mlock.c | 8 ++++++++
> 6 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index ef8187f..8e06da6 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -365,3 +365,4 @@
> 356 i386 memfd_create sys_memfd_create
> 357 i386 bpf sys_bpf
> 358 i386 execveat sys_execveat stub32_execveat
> +360 i386 mlock2 sys_mlock2
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 9ef32d5..67601e7 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -329,6 +329,7 @@
> 320 common kexec_file_load sys_kexec_file_load
> 321 common bpf sys_bpf
> 322 64 execveat stub_execveat
> +324 common mlock2 sys_mlock2
>
> #
> # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index b45c45b..56a3d59 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -884,4 +884,6 @@ asmlinkage long sys_execveat(int dfd, const char __user *filename,
> const char __user *const __user *argv,
> const char __user *const __user *envp, int flags);
>
> +asmlinkage long sys_mlock2(unsigned long start, size_t len, int flags);
> +
> #endif
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index e016bd9..14a6013 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -709,9 +709,11 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
> __SYSCALL(__NR_bpf, sys_bpf)
> #define __NR_execveat 281
> __SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
> +#define __NR_mlock2 282
> +__SYSCALL(__NR_mlock2, sys_mlock2)
>
> #undef __NR_syscalls
> -#define __NR_syscalls 282
> +#define __NR_syscalls 283
>
> /*
> * All syscalls below here should go away really,
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index 7995ef5..4818b71 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -193,6 +193,7 @@ cond_syscall(sys_mlock);
> cond_syscall(sys_munlock);
> cond_syscall(sys_mlockall);
> cond_syscall(sys_munlockall);
> +cond_syscall(sys_mlock2);
> cond_syscall(sys_mincore);
> cond_syscall(sys_madvise);
> cond_syscall(sys_mremap);
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 5692ee5..3094f27 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -643,6 +643,14 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
> return do_mlock(start, len, VM_LOCKED);
> }
>
> +SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
> +{
> + if (flags)
> + return -EINVAL;
> +
> + return do_mlock(start, len, VM_LOCKED);
> +}
> +
> SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
> {
> int ret;
> --
> 1.9.1
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 87+ messages in thread* [PATCH v7 2/6] mm: mlock: Add new mlock system call
@ 2015-08-12 9:45 ` Michal Hocko
0 siblings, 0 replies; 87+ messages in thread
From: Michal Hocko @ 2015-08-12 9:45 UTC (permalink / raw)
To: linux-arm-kernel
On Sun 09-08-15 01:22:52, Eric B Munson wrote:
> With the refactored mlock code, introduce a new system call for mlock.
> The new call will allow the user to specify what lock states are being
> added. mlock2 is trivial at the moment, but a follow on patch will add
> a new mlock state making it useful.
Looks good to me
Acked-by: Michal Hocko <mhocko@suse.com>
> Signed-off-by: Eric B Munson <emunson@akamai.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Cc: Guenter Roeck <linux@roeck-us.net>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: linux-alpha at vger.kernel.org
> Cc: linux-kernel at vger.kernel.org
> Cc: linux-arm-kernel at lists.infradead.org
> Cc: adi-buildroot-devel at lists.sourceforge.net
> Cc: linux-cris-kernel at axis.com
> Cc: linux-ia64 at vger.kernel.org
> Cc: linux-m68k at lists.linux-m68k.org
> Cc: linux-am33-list at redhat.com
> Cc: linux-parisc at vger.kernel.org
> Cc: linuxppc-dev at lists.ozlabs.org
> Cc: linux-s390 at vger.kernel.org
> Cc: linux-sh at vger.kernel.org
> Cc: sparclinux at vger.kernel.org
> Cc: linux-xtensa at linux-xtensa.org
> Cc: linux-api at vger.kernel.org
> Cc: linux-arch at vger.kernel.org
> Cc: linux-mm at kvack.org
> ---
> arch/x86/entry/syscalls/syscall_32.tbl | 1 +
> arch/x86/entry/syscalls/syscall_64.tbl | 1 +
> include/linux/syscalls.h | 2 ++
> include/uapi/asm-generic/unistd.h | 4 +++-
> kernel/sys_ni.c | 1 +
> mm/mlock.c | 8 ++++++++
> 6 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index ef8187f..8e06da6 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -365,3 +365,4 @@
> 356 i386 memfd_create sys_memfd_create
> 357 i386 bpf sys_bpf
> 358 i386 execveat sys_execveat stub32_execveat
> +360 i386 mlock2 sys_mlock2
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 9ef32d5..67601e7 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -329,6 +329,7 @@
> 320 common kexec_file_load sys_kexec_file_load
> 321 common bpf sys_bpf
> 322 64 execveat stub_execveat
> +324 common mlock2 sys_mlock2
>
> #
> # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index b45c45b..56a3d59 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -884,4 +884,6 @@ asmlinkage long sys_execveat(int dfd, const char __user *filename,
> const char __user *const __user *argv,
> const char __user *const __user *envp, int flags);
>
> +asmlinkage long sys_mlock2(unsigned long start, size_t len, int flags);
> +
> #endif
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index e016bd9..14a6013 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -709,9 +709,11 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
> __SYSCALL(__NR_bpf, sys_bpf)
> #define __NR_execveat 281
> __SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
> +#define __NR_mlock2 282
> +__SYSCALL(__NR_mlock2, sys_mlock2)
>
> #undef __NR_syscalls
> -#define __NR_syscalls 282
> +#define __NR_syscalls 283
>
> /*
> * All syscalls below here should go away really,
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index 7995ef5..4818b71 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -193,6 +193,7 @@ cond_syscall(sys_mlock);
> cond_syscall(sys_munlock);
> cond_syscall(sys_mlockall);
> cond_syscall(sys_munlockall);
> +cond_syscall(sys_mlock2);
> cond_syscall(sys_mincore);
> cond_syscall(sys_madvise);
> cond_syscall(sys_mremap);
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 5692ee5..3094f27 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -643,6 +643,14 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
> return do_mlock(start, len, VM_LOCKED);
> }
>
> +SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
> +{
> + if (flags)
> + return -EINVAL;
> +
> + return do_mlock(start, len, VM_LOCKED);
> +}
> +
> SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
> {
> int ret;
> --
> 1.9.1
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 87+ messages in thread* Re: [PATCH v7 2/6] mm: mlock: Add new mlock system call
@ 2015-08-12 9:45 ` Michal Hocko
0 siblings, 0 replies; 87+ messages in thread
From: Michal Hocko @ 2015-08-12 9:45 UTC (permalink / raw)
To: Eric B Munson
Cc: Andrew Morton, Vlastimil Babka, Heiko Carstens,
Geert Uytterhoeven, Catalin Marinas, Stephen Rothwell,
Guenter Roeck, Andrea Arcangeli, linux-alpha, linux-kernel,
linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel,
linux-ia64, linux-m68k, linux-am33-list, linux-parisc,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-xtensa,
linux-api, linux-arch, linux-mm
On Sun 09-08-15 01:22:52, Eric B Munson wrote:
> With the refactored mlock code, introduce a new system call for mlock.
> The new call will allow the user to specify what lock states are being
> added. mlock2 is trivial at the moment, but a follow on patch will add
> a new mlock state making it useful.
Looks good to me
Acked-by: Michal Hocko <mhocko@suse.com>
> Signed-off-by: Eric B Munson <emunson@akamai.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Cc: Guenter Roeck <linux@roeck-us.net>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: linux-alpha@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: adi-buildroot-devel@lists.sourceforge.net
> Cc: linux-cris-kernel@axis.com
> Cc: linux-ia64@vger.kernel.org
> Cc: linux-m68k@lists.linux-m68k.org
> Cc: linux-am33-list@redhat.com
> Cc: linux-parisc@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-s390@vger.kernel.org
> Cc: linux-sh@vger.kernel.org
> Cc: sparclinux@vger.kernel.org
> Cc: linux-xtensa@linux-xtensa.org
> Cc: linux-api@vger.kernel.org
> Cc: linux-arch@vger.kernel.org
> Cc: linux-mm@kvack.org
> ---
> arch/x86/entry/syscalls/syscall_32.tbl | 1 +
> arch/x86/entry/syscalls/syscall_64.tbl | 1 +
> include/linux/syscalls.h | 2 ++
> include/uapi/asm-generic/unistd.h | 4 +++-
> kernel/sys_ni.c | 1 +
> mm/mlock.c | 8 ++++++++
> 6 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index ef8187f..8e06da6 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -365,3 +365,4 @@
> 356 i386 memfd_create sys_memfd_create
> 357 i386 bpf sys_bpf
> 358 i386 execveat sys_execveat stub32_execveat
> +360 i386 mlock2 sys_mlock2
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 9ef32d5..67601e7 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -329,6 +329,7 @@
> 320 common kexec_file_load sys_kexec_file_load
> 321 common bpf sys_bpf
> 322 64 execveat stub_execveat
> +324 common mlock2 sys_mlock2
>
> #
> # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index b45c45b..56a3d59 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -884,4 +884,6 @@ asmlinkage long sys_execveat(int dfd, const char __user *filename,
> const char __user *const __user *argv,
> const char __user *const __user *envp, int flags);
>
> +asmlinkage long sys_mlock2(unsigned long start, size_t len, int flags);
> +
> #endif
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index e016bd9..14a6013 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -709,9 +709,11 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
> __SYSCALL(__NR_bpf, sys_bpf)
> #define __NR_execveat 281
> __SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
> +#define __NR_mlock2 282
> +__SYSCALL(__NR_mlock2, sys_mlock2)
>
> #undef __NR_syscalls
> -#define __NR_syscalls 282
> +#define __NR_syscalls 283
>
> /*
> * All syscalls below here should go away really,
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index 7995ef5..4818b71 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -193,6 +193,7 @@ cond_syscall(sys_mlock);
> cond_syscall(sys_munlock);
> cond_syscall(sys_mlockall);
> cond_syscall(sys_munlockall);
> +cond_syscall(sys_mlock2);
> cond_syscall(sys_mincore);
> cond_syscall(sys_madvise);
> cond_syscall(sys_mremap);
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 5692ee5..3094f27 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -643,6 +643,14 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
> return do_mlock(start, len, VM_LOCKED);
> }
>
> +SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
> +{
> + if (flags)
> + return -EINVAL;
> +
> + return do_mlock(start, len, VM_LOCKED);
> +}
> +
> SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
> {
> int ret;
> --
> 1.9.1
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 87+ messages in thread* Re: [PATCH v7 2/6] mm: mlock: Add new mlock system call
@ 2015-08-12 9:45 ` Michal Hocko
0 siblings, 0 replies; 87+ messages in thread
From: Michal Hocko @ 2015-08-12 9:45 UTC (permalink / raw)
To: Eric B Munson
Cc: Andrew Morton, Vlastimil Babka, Heiko Carstens,
Geert Uytterhoeven, Catalin Marinas, Stephen Rothwell,
Guenter Roeck, Andrea Arcangeli, linux-alpha, linux-kernel,
linux-arm-kernel, adi-buildroot-devel, linux-cris-kernel,
linux-ia64, linux-m68k, linux-am33-list, linux-parisc,
linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-xtensa,
linux-api, linux-arch, linux-mm
On Sun 09-08-15 01:22:52, Eric B Munson wrote:
> With the refactored mlock code, introduce a new system call for mlock.
> The new call will allow the user to specify what lock states are being
> added. mlock2 is trivial at the moment, but a follow on patch will add
> a new mlock state making it useful.
Looks good to me
Acked-by: Michal Hocko <mhocko@suse.com>
> Signed-off-by: Eric B Munson <emunson@akamai.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Cc: Guenter Roeck <linux@roeck-us.net>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: linux-alpha@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: adi-buildroot-devel@lists.sourceforge.net
> Cc: linux-cris-kernel@axis.com
> Cc: linux-ia64@vger.kernel.org
> Cc: linux-m68k@lists.linux-m68k.org
> Cc: linux-am33-list@redhat.com
> Cc: linux-parisc@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-s390@vger.kernel.org
> Cc: linux-sh@vger.kernel.org
> Cc: sparclinux@vger.kernel.org
> Cc: linux-xtensa@linux-xtensa.org
> Cc: linux-api@vger.kernel.org
> Cc: linux-arch@vger.kernel.org
> Cc: linux-mm@kvack.org
> ---
> arch/x86/entry/syscalls/syscall_32.tbl | 1 +
> arch/x86/entry/syscalls/syscall_64.tbl | 1 +
> include/linux/syscalls.h | 2 ++
> include/uapi/asm-generic/unistd.h | 4 +++-
> kernel/sys_ni.c | 1 +
> mm/mlock.c | 8 ++++++++
> 6 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index ef8187f..8e06da6 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -365,3 +365,4 @@
> 356 i386 memfd_create sys_memfd_create
> 357 i386 bpf sys_bpf
> 358 i386 execveat sys_execveat stub32_execveat
> +360 i386 mlock2 sys_mlock2
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 9ef32d5..67601e7 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -329,6 +329,7 @@
> 320 common kexec_file_load sys_kexec_file_load
> 321 common bpf sys_bpf
> 322 64 execveat stub_execveat
> +324 common mlock2 sys_mlock2
>
> #
> # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index b45c45b..56a3d59 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -884,4 +884,6 @@ asmlinkage long sys_execveat(int dfd, const char __user *filename,
> const char __user *const __user *argv,
> const char __user *const __user *envp, int flags);
>
> +asmlinkage long sys_mlock2(unsigned long start, size_t len, int flags);
> +
> #endif
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index e016bd9..14a6013 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -709,9 +709,11 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
> __SYSCALL(__NR_bpf, sys_bpf)
> #define __NR_execveat 281
> __SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
> +#define __NR_mlock2 282
> +__SYSCALL(__NR_mlock2, sys_mlock2)
>
> #undef __NR_syscalls
> -#define __NR_syscalls 282
> +#define __NR_syscalls 283
>
> /*
> * All syscalls below here should go away really,
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index 7995ef5..4818b71 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -193,6 +193,7 @@ cond_syscall(sys_mlock);
> cond_syscall(sys_munlock);
> cond_syscall(sys_mlockall);
> cond_syscall(sys_munlockall);
> +cond_syscall(sys_mlock2);
> cond_syscall(sys_mincore);
> cond_syscall(sys_madvise);
> cond_syscall(sys_mremap);
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 5692ee5..3094f27 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -643,6 +643,14 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
> return do_mlock(start, len, VM_LOCKED);
> }
>
> +SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
> +{
> + if (flags)
> + return -EINVAL;
> +
> + return do_mlock(start, len, VM_LOCKED);
> +}
> +
> SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
> {
> int ret;
> --
> 1.9.1
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT
2015-08-09 5:22 ` Eric B Munson
@ 2015-08-09 5:22 ` Eric B Munson
-1 siblings, 0 replies; 87+ messages in thread
From: Eric B Munson @ 2015-08-09 5:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Eric B Munson, Michal Hocko, Vlastimil Babka, Jonathan Corbet,
Kirill A. Shutemov, linux-kernel, dri-devel, linux-mm, linux-api
The cost of faulting in all memory to be locked can be very high when
working with large mappings. If only portions of the mapping will be
used this can incur a high penalty for locking.
For the example of a large file, this is the usage pattern for a large
statical language model (probably applies to other statical or graphical
models as well). For the security example, any application transacting
in data that cannot be swapped out (credit card data, medical records,
etc).
This patch introduces the ability to request that pages are not
pre-faulted, but are placed on the unevictable LRU when they are finally
faulted in. The VM_LOCKONFAULT flag will be used together with
VM_LOCKED and has no effect when set without VM_LOCKED. Setting the
VM_LOCKONFAULT flag for a VMA will cause pages faulted into that VMA to
be added to the unevictable LRU when they are faulted or if they are
already present, but will not cause any missing pages to be faulted in.
Exposing this new lock state means that we cannot overload the meaning
of the FOLL_POPULATE flag any longer. Prior to this patch it was used
to mean that the VMA for a fault was locked. This means we need the
new FOLL_MLOCK flag to communicate the locked state of a VMA.
FOLL_POPULATE will now only control if the VMA should be populated and
in the case of VM_LOCKONFAULT, it will not be set.
Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: linux-kernel@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-mm@kvack.org
Cc: linux-api@vger.kernel.org
---
Documentation/filesystems/proc.txt | 1 +
drivers/gpu/drm/drm_vm.c | 8 +++++++-
fs/proc/task_mmu.c | 1 +
include/linux/mm.h | 2 ++
kernel/fork.c | 2 +-
mm/debug.c | 1 +
mm/gup.c | 10 ++++++++--
mm/huge_memory.c | 2 +-
mm/hugetlb.c | 4 ++--
mm/mlock.c | 2 +-
mm/mmap.c | 2 +-
mm/rmap.c | 6 ++++--
12 files changed, 30 insertions(+), 11 deletions(-)
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 6f7fafd..ed21989 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -463,6 +463,7 @@ manner. The codes are the following:
rr - random read advise provided
dc - do not copy area on fork
de - do not expand area on remapping
+ lf - mark area to lock pages when faulted in, do not pre-populate
ac - area is accountable
nr - swap space is not reserved for the area
ht - area uses huge tlb pages
diff --git a/drivers/gpu/drm/drm_vm.c b/drivers/gpu/drm/drm_vm.c
index aab49ee..103a5f6 100644
--- a/drivers/gpu/drm/drm_vm.c
+++ b/drivers/gpu/drm/drm_vm.c
@@ -699,9 +699,15 @@ int drm_vma_info(struct seq_file *m, void *data)
(void *)(unsigned long)virt_to_phys(high_memory));
list_for_each_entry(pt, &dev->vmalist, head) {
+ char lock_flag = '-';
+
vma = pt->vma;
if (!vma)
continue;
+ if (vma->vm_flags & VM_LOCKONFAULT)
+ lock_flag = 'f';
+ else if (vma->vm_flags & VM_LOCKED)
+ lock_flag = 'l';
seq_printf(m,
"\n%5d 0x%pK-0x%pK %c%c%c%c%c%c 0x%08lx000",
pt->pid,
@@ -710,7 +716,7 @@ int drm_vma_info(struct seq_file *m, void *data)
vma->vm_flags & VM_WRITE ? 'w' : '-',
vma->vm_flags & VM_EXEC ? 'x' : '-',
vma->vm_flags & VM_MAYSHARE ? 's' : 'p',
- vma->vm_flags & VM_LOCKED ? 'l' : '-',
+ lock_flag,
vma->vm_flags & VM_IO ? 'i' : '-',
vma->vm_pgoff);
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index ca1e091..8dcc297 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -585,6 +585,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
[ilog2(VM_RAND_READ)] = "rr",
[ilog2(VM_DONTCOPY)] = "dc",
[ilog2(VM_DONTEXPAND)] = "de",
+ [ilog2(VM_LOCKONFAULT)] = "lf",
[ilog2(VM_ACCOUNT)] = "ac",
[ilog2(VM_NORESERVE)] = "nr",
[ilog2(VM_HUGETLB)] = "ht",
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2e872f9..d6e1637 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -136,6 +136,7 @@ extern unsigned int kobjsize(const void *objp);
#define VM_DONTCOPY 0x00020000 /* Do not copy this vma on fork */
#define VM_DONTEXPAND 0x00040000 /* Cannot expand with mremap() */
+#define VM_LOCKONFAULT 0x00080000 /* Lock the pages covered when they are faulted in */
#define VM_ACCOUNT 0x00100000 /* Is a VM accounted object */
#define VM_NORESERVE 0x00200000 /* should the VM suppress accounting */
#define VM_HUGETLB 0x00400000 /* Huge TLB Page VM */
@@ -2043,6 +2044,7 @@ static inline struct page *follow_page(struct vm_area_struct *vma,
#define FOLL_NUMA 0x200 /* force NUMA hinting page fault */
#define FOLL_MIGRATION 0x400 /* wait for page to replace migration entry */
#define FOLL_TRIED 0x800 /* a retry, previous pass started an IO */
+#define FOLL_MLOCK 0x1000 /* lock present pages */
typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
void *data);
diff --git a/kernel/fork.c b/kernel/fork.c
index dbd9b8d..a949228 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -454,7 +454,7 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
tmp->vm_mm = mm;
if (anon_vma_fork(tmp, mpnt))
goto fail_nomem_anon_vma_fork;
- tmp->vm_flags &= ~VM_LOCKED;
+ tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
tmp->vm_next = tmp->vm_prev = NULL;
file = tmp->vm_file;
if (file) {
diff --git a/mm/debug.c b/mm/debug.c
index 76089dd..25176bb 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -121,6 +121,7 @@ static const struct trace_print_flags vmaflags_names[] = {
{VM_GROWSDOWN, "growsdown" },
{VM_PFNMAP, "pfnmap" },
{VM_DENYWRITE, "denywrite" },
+ {VM_LOCKONFAULT, "lockonfault" },
{VM_LOCKED, "locked" },
{VM_IO, "io" },
{VM_SEQ_READ, "seqread" },
diff --git a/mm/gup.c b/mm/gup.c
index 6297f6b..dce6ccd 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -92,7 +92,7 @@ retry:
*/
mark_page_accessed(page);
}
- if ((flags & FOLL_POPULATE) && (vma->vm_flags & VM_LOCKED)) {
+ if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) {
/*
* The preliminary mapping check is mainly to avoid the
* pointless overhead of lock_page on the ZERO_PAGE
@@ -265,6 +265,9 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma,
unsigned int fault_flags = 0;
int ret;
+ /* mlock all present pages, but do not fault in new pages */
+ if ((*flags & (FOLL_POPULATE | FOLL_MLOCK)) == FOLL_MLOCK)
+ return -ENOENT;
/* For mm_populate(), just skip the stack guard page. */
if ((*flags & FOLL_POPULATE) &&
(stack_guard_page_start(vma, address) ||
@@ -850,7 +853,10 @@ long populate_vma_page_range(struct vm_area_struct *vma,
VM_BUG_ON_VMA(end > vma->vm_end, vma);
VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_sem), mm);
- gup_flags = FOLL_TOUCH | FOLL_POPULATE;
+ gup_flags = FOLL_TOUCH | FOLL_POPULATE | FOLL_MLOCK;
+ if (vma->vm_flags & VM_LOCKONFAULT)
+ gup_flags &= ~FOLL_POPULATE;
+
/*
* We want to touch writable mappings with a write fault in order
* to break COW, except for shared mappings because these don't COW
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 097c7a4..cba783e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1238,7 +1238,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
pmd, _pmd, 1))
update_mmu_cache_pmd(vma, addr, pmd);
}
- if ((flags & FOLL_POPULATE) && (vma->vm_flags & VM_LOCKED)) {
+ if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) {
if (page->mapping && trylock_page(page)) {
lru_add_drain();
if (page->mapping)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a8c3087..4ed9e93 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3764,8 +3764,8 @@ static unsigned long page_table_shareable(struct vm_area_struct *svma,
unsigned long s_end = sbase + PUD_SIZE;
/* Allow segments to share if only one is marked locked */
- unsigned long vm_flags = vma->vm_flags & ~VM_LOCKED;
- unsigned long svm_flags = svma->vm_flags & ~VM_LOCKED;
+ unsigned long vm_flags = vma->vm_flags & ~(VM_LOCKED|VM_LOCKONFAULT);
+ unsigned long svm_flags = svma->vm_flags & ~(VM_LOCKED|VM_LOCKONFAULT);
/*
* match the virtual addresses, permission and the alignment of the
diff --git a/mm/mlock.c b/mm/mlock.c
index 3094f27..029a75b 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -422,7 +422,7 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec,
void munlock_vma_pages_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
- vma->vm_flags &= ~VM_LOCKED;
+ vma->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
while (start < end) {
struct page *page = NULL;
diff --git a/mm/mmap.c b/mm/mmap.c
index aa632ad..bdbefc3 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1651,7 +1651,7 @@ out:
vma == get_gate_vma(current->mm)))
mm->locked_vm += (len >> PAGE_SHIFT);
else
- vma->vm_flags &= ~VM_LOCKED;
+ vma->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
}
if (file)
diff --git a/mm/rmap.c b/mm/rmap.c
index 171b687..14ce002 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -744,7 +744,8 @@ static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
if (vma->vm_flags & VM_LOCKED) {
spin_unlock(ptl);
- pra->vm_flags |= VM_LOCKED;
+ pra->vm_flags |=
+ (vma->vm_flags & (VM_LOCKED | VM_LOCKONFAULT));
return SWAP_FAIL; /* To break the loop */
}
@@ -765,7 +766,8 @@ static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
if (vma->vm_flags & VM_LOCKED) {
pte_unmap_unlock(pte, ptl);
- pra->vm_flags |= VM_LOCKED;
+ pra->vm_flags |=
+ (vma->vm_flags & (VM_LOCKED | VM_LOCKONFAULT));
return SWAP_FAIL; /* To break the loop */
}
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 87+ messages in thread* [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT
@ 2015-08-09 5:22 ` Eric B Munson
0 siblings, 0 replies; 87+ messages in thread
From: Eric B Munson @ 2015-08-09 5:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Eric B Munson, Michal Hocko, Vlastimil Babka, Jonathan Corbet,
Kirill A. Shutemov, linux-kernel, dri-devel, linux-mm, linux-api
The cost of faulting in all memory to be locked can be very high when
working with large mappings. If only portions of the mapping will be
used this can incur a high penalty for locking.
For the example of a large file, this is the usage pattern for a large
statical language model (probably applies to other statical or graphical
models as well). For the security example, any application transacting
in data that cannot be swapped out (credit card data, medical records,
etc).
This patch introduces the ability to request that pages are not
pre-faulted, but are placed on the unevictable LRU when they are finally
faulted in. The VM_LOCKONFAULT flag will be used together with
VM_LOCKED and has no effect when set without VM_LOCKED. Setting the
VM_LOCKONFAULT flag for a VMA will cause pages faulted into that VMA to
be added to the unevictable LRU when they are faulted or if they are
already present, but will not cause any missing pages to be faulted in.
Exposing this new lock state means that we cannot overload the meaning
of the FOLL_POPULATE flag any longer. Prior to this patch it was used
to mean that the VMA for a fault was locked. This means we need the
new FOLL_MLOCK flag to communicate the locked state of a VMA.
FOLL_POPULATE will now only control if the VMA should be populated and
in the case of VM_LOCKONFAULT, it will not be set.
Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: linux-kernel@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-mm@kvack.org
Cc: linux-api@vger.kernel.org
---
Documentation/filesystems/proc.txt | 1 +
drivers/gpu/drm/drm_vm.c | 8 +++++++-
fs/proc/task_mmu.c | 1 +
include/linux/mm.h | 2 ++
kernel/fork.c | 2 +-
mm/debug.c | 1 +
mm/gup.c | 10 ++++++++--
mm/huge_memory.c | 2 +-
mm/hugetlb.c | 4 ++--
mm/mlock.c | 2 +-
mm/mmap.c | 2 +-
mm/rmap.c | 6 ++++--
12 files changed, 30 insertions(+), 11 deletions(-)
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 6f7fafd..ed21989 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -463,6 +463,7 @@ manner. The codes are the following:
rr - random read advise provided
dc - do not copy area on fork
de - do not expand area on remapping
+ lf - mark area to lock pages when faulted in, do not pre-populate
ac - area is accountable
nr - swap space is not reserved for the area
ht - area uses huge tlb pages
diff --git a/drivers/gpu/drm/drm_vm.c b/drivers/gpu/drm/drm_vm.c
index aab49ee..103a5f6 100644
--- a/drivers/gpu/drm/drm_vm.c
+++ b/drivers/gpu/drm/drm_vm.c
@@ -699,9 +699,15 @@ int drm_vma_info(struct seq_file *m, void *data)
(void *)(unsigned long)virt_to_phys(high_memory));
list_for_each_entry(pt, &dev->vmalist, head) {
+ char lock_flag = '-';
+
vma = pt->vma;
if (!vma)
continue;
+ if (vma->vm_flags & VM_LOCKONFAULT)
+ lock_flag = 'f';
+ else if (vma->vm_flags & VM_LOCKED)
+ lock_flag = 'l';
seq_printf(m,
"\n%5d 0x%pK-0x%pK %c%c%c%c%c%c 0x%08lx000",
pt->pid,
@@ -710,7 +716,7 @@ int drm_vma_info(struct seq_file *m, void *data)
vma->vm_flags & VM_WRITE ? 'w' : '-',
vma->vm_flags & VM_EXEC ? 'x' : '-',
vma->vm_flags & VM_MAYSHARE ? 's' : 'p',
- vma->vm_flags & VM_LOCKED ? 'l' : '-',
+ lock_flag,
vma->vm_flags & VM_IO ? 'i' : '-',
vma->vm_pgoff);
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index ca1e091..8dcc297 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -585,6 +585,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
[ilog2(VM_RAND_READ)] = "rr",
[ilog2(VM_DONTCOPY)] = "dc",
[ilog2(VM_DONTEXPAND)] = "de",
+ [ilog2(VM_LOCKONFAULT)] = "lf",
[ilog2(VM_ACCOUNT)] = "ac",
[ilog2(VM_NORESERVE)] = "nr",
[ilog2(VM_HUGETLB)] = "ht",
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2e872f9..d6e1637 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -136,6 +136,7 @@ extern unsigned int kobjsize(const void *objp);
#define VM_DONTCOPY 0x00020000 /* Do not copy this vma on fork */
#define VM_DONTEXPAND 0x00040000 /* Cannot expand with mremap() */
+#define VM_LOCKONFAULT 0x00080000 /* Lock the pages covered when they are faulted in */
#define VM_ACCOUNT 0x00100000 /* Is a VM accounted object */
#define VM_NORESERVE 0x00200000 /* should the VM suppress accounting */
#define VM_HUGETLB 0x00400000 /* Huge TLB Page VM */
@@ -2043,6 +2044,7 @@ static inline struct page *follow_page(struct vm_area_struct *vma,
#define FOLL_NUMA 0x200 /* force NUMA hinting page fault */
#define FOLL_MIGRATION 0x400 /* wait for page to replace migration entry */
#define FOLL_TRIED 0x800 /* a retry, previous pass started an IO */
+#define FOLL_MLOCK 0x1000 /* lock present pages */
typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
void *data);
diff --git a/kernel/fork.c b/kernel/fork.c
index dbd9b8d..a949228 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -454,7 +454,7 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
tmp->vm_mm = mm;
if (anon_vma_fork(tmp, mpnt))
goto fail_nomem_anon_vma_fork;
- tmp->vm_flags &= ~VM_LOCKED;
+ tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
tmp->vm_next = tmp->vm_prev = NULL;
file = tmp->vm_file;
if (file) {
diff --git a/mm/debug.c b/mm/debug.c
index 76089dd..25176bb 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -121,6 +121,7 @@ static const struct trace_print_flags vmaflags_names[] = {
{VM_GROWSDOWN, "growsdown" },
{VM_PFNMAP, "pfnmap" },
{VM_DENYWRITE, "denywrite" },
+ {VM_LOCKONFAULT, "lockonfault" },
{VM_LOCKED, "locked" },
{VM_IO, "io" },
{VM_SEQ_READ, "seqread" },
diff --git a/mm/gup.c b/mm/gup.c
index 6297f6b..dce6ccd 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -92,7 +92,7 @@ retry:
*/
mark_page_accessed(page);
}
- if ((flags & FOLL_POPULATE) && (vma->vm_flags & VM_LOCKED)) {
+ if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) {
/*
* The preliminary mapping check is mainly to avoid the
* pointless overhead of lock_page on the ZERO_PAGE
@@ -265,6 +265,9 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma,
unsigned int fault_flags = 0;
int ret;
+ /* mlock all present pages, but do not fault in new pages */
+ if ((*flags & (FOLL_POPULATE | FOLL_MLOCK)) == FOLL_MLOCK)
+ return -ENOENT;
/* For mm_populate(), just skip the stack guard page. */
if ((*flags & FOLL_POPULATE) &&
(stack_guard_page_start(vma, address) ||
@@ -850,7 +853,10 @@ long populate_vma_page_range(struct vm_area_struct *vma,
VM_BUG_ON_VMA(end > vma->vm_end, vma);
VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_sem), mm);
- gup_flags = FOLL_TOUCH | FOLL_POPULATE;
+ gup_flags = FOLL_TOUCH | FOLL_POPULATE | FOLL_MLOCK;
+ if (vma->vm_flags & VM_LOCKONFAULT)
+ gup_flags &= ~FOLL_POPULATE;
+
/*
* We want to touch writable mappings with a write fault in order
* to break COW, except for shared mappings because these don't COW
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 097c7a4..cba783e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1238,7 +1238,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
pmd, _pmd, 1))
update_mmu_cache_pmd(vma, addr, pmd);
}
- if ((flags & FOLL_POPULATE) && (vma->vm_flags & VM_LOCKED)) {
+ if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) {
if (page->mapping && trylock_page(page)) {
lru_add_drain();
if (page->mapping)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a8c3087..4ed9e93 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3764,8 +3764,8 @@ static unsigned long page_table_shareable(struct vm_area_struct *svma,
unsigned long s_end = sbase + PUD_SIZE;
/* Allow segments to share if only one is marked locked */
- unsigned long vm_flags = vma->vm_flags & ~VM_LOCKED;
- unsigned long svm_flags = svma->vm_flags & ~VM_LOCKED;
+ unsigned long vm_flags = vma->vm_flags & ~(VM_LOCKED|VM_LOCKONFAULT);
+ unsigned long svm_flags = svma->vm_flags & ~(VM_LOCKED|VM_LOCKONFAULT);
/*
* match the virtual addresses, permission and the alignment of the
diff --git a/mm/mlock.c b/mm/mlock.c
index 3094f27..029a75b 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -422,7 +422,7 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec,
void munlock_vma_pages_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
- vma->vm_flags &= ~VM_LOCKED;
+ vma->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
while (start < end) {
struct page *page = NULL;
diff --git a/mm/mmap.c b/mm/mmap.c
index aa632ad..bdbefc3 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1651,7 +1651,7 @@ out:
vma == get_gate_vma(current->mm)))
mm->locked_vm += (len >> PAGE_SHIFT);
else
- vma->vm_flags &= ~VM_LOCKED;
+ vma->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
}
if (file)
diff --git a/mm/rmap.c b/mm/rmap.c
index 171b687..14ce002 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -744,7 +744,8 @@ static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
if (vma->vm_flags & VM_LOCKED) {
spin_unlock(ptl);
- pra->vm_flags |= VM_LOCKED;
+ pra->vm_flags |=
+ (vma->vm_flags & (VM_LOCKED | VM_LOCKONFAULT));
return SWAP_FAIL; /* To break the loop */
}
@@ -765,7 +766,8 @@ static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
if (vma->vm_flags & VM_LOCKED) {
pte_unmap_unlock(pte, ptl);
- pra->vm_flags |= VM_LOCKED;
+ pra->vm_flags |=
+ (vma->vm_flags & (VM_LOCKED | VM_LOCKONFAULT));
return SWAP_FAIL; /* To break the loop */
}
--
1.9.1
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT
2015-08-09 5:22 ` Eric B Munson
@ 2015-08-12 11:59 ` Michal Hocko
-1 siblings, 0 replies; 87+ messages in thread
From: Michal Hocko @ 2015-08-12 11:59 UTC (permalink / raw)
To: Eric B Munson
Cc: Andrew Morton, Vlastimil Babka, Jonathan Corbet,
Kirill A. Shutemov, linux-kernel, dri-devel, linux-mm, linux-api
On Sun 09-08-15 01:22:53, Eric B Munson wrote:
> The cost of faulting in all memory to be locked can be very high when
> working with large mappings. If only portions of the mapping will be
> used this can incur a high penalty for locking.
>
> For the example of a large file, this is the usage pattern for a large
> statical language model (probably applies to other statical or graphical
> models as well). For the security example, any application transacting
> in data that cannot be swapped out (credit card data, medical records,
> etc).
>
> This patch introduces the ability to request that pages are not
> pre-faulted, but are placed on the unevictable LRU when they are finally
> faulted in. The VM_LOCKONFAULT flag will be used together with
> VM_LOCKED and has no effect when set without VM_LOCKED.
I do not like this very much to be honest. We have only few bits
left there and it seems this is not really necessary. I thought that
LOCKONFAULT acts as a modifier to the mlock call to tell whether to
poppulate or not. The only place we have to persist it is
mlockall(MCL_FUTURE) AFAICS. And this can be handled by an additional
field in the mm_struct. This could be handled at __mm_populate level.
So unless I am missing something this would be much more easier
in the end we no new bit in VM flags would be necessary.
This would obviously mean that the LOCKONFAULT couldn't be exported to
the userspace but is this really necessary?
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT
@ 2015-08-12 11:59 ` Michal Hocko
0 siblings, 0 replies; 87+ messages in thread
From: Michal Hocko @ 2015-08-12 11:59 UTC (permalink / raw)
To: Eric B Munson
Cc: Andrew Morton, Vlastimil Babka, Jonathan Corbet,
Kirill A. Shutemov, linux-kernel, dri-devel, linux-mm, linux-api
On Sun 09-08-15 01:22:53, Eric B Munson wrote:
> The cost of faulting in all memory to be locked can be very high when
> working with large mappings. If only portions of the mapping will be
> used this can incur a high penalty for locking.
>
> For the example of a large file, this is the usage pattern for a large
> statical language model (probably applies to other statical or graphical
> models as well). For the security example, any application transacting
> in data that cannot be swapped out (credit card data, medical records,
> etc).
>
> This patch introduces the ability to request that pages are not
> pre-faulted, but are placed on the unevictable LRU when they are finally
> faulted in. The VM_LOCKONFAULT flag will be used together with
> VM_LOCKED and has no effect when set without VM_LOCKED.
I do not like this very much to be honest. We have only few bits
left there and it seems this is not really necessary. I thought that
LOCKONFAULT acts as a modifier to the mlock call to tell whether to
poppulate or not. The only place we have to persist it is
mlockall(MCL_FUTURE) AFAICS. And this can be handled by an additional
field in the mm_struct. This could be handled at __mm_populate level.
So unless I am missing something this would be much more easier
in the end we no new bit in VM flags would be necessary.
This would obviously mean that the LOCKONFAULT couldn't be exported to
the userspace but is this really necessary?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT
2015-08-12 11:59 ` Michal Hocko
(?)
@ 2015-08-19 21:33 ` Eric B Munson
[not found] ` <20150819213345.GB4536-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org>
-1 siblings, 1 reply; 87+ messages in thread
From: Eric B Munson @ 2015-08-19 21:33 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, Vlastimil Babka, Jonathan Corbet,
Kirill A. Shutemov, linux-kernel, dri-devel, linux-mm, linux-api
[-- Attachment #1: Type: text/plain, Size: 2235 bytes --]
On Wed, 12 Aug 2015, Michal Hocko wrote:
> On Sun 09-08-15 01:22:53, Eric B Munson wrote:
> > The cost of faulting in all memory to be locked can be very high when
> > working with large mappings. If only portions of the mapping will be
> > used this can incur a high penalty for locking.
> >
> > For the example of a large file, this is the usage pattern for a large
> > statical language model (probably applies to other statical or graphical
> > models as well). For the security example, any application transacting
> > in data that cannot be swapped out (credit card data, medical records,
> > etc).
> >
> > This patch introduces the ability to request that pages are not
> > pre-faulted, but are placed on the unevictable LRU when they are finally
> > faulted in. The VM_LOCKONFAULT flag will be used together with
> > VM_LOCKED and has no effect when set without VM_LOCKED.
>
> I do not like this very much to be honest. We have only few bits
> left there and it seems this is not really necessary. I thought that
> LOCKONFAULT acts as a modifier to the mlock call to tell whether to
> poppulate or not. The only place we have to persist it is
> mlockall(MCL_FUTURE) AFAICS. And this can be handled by an additional
> field in the mm_struct. This could be handled at __mm_populate level.
> So unless I am missing something this would be much more easier
> in the end we no new bit in VM flags would be necessary.
>
> This would obviously mean that the LOCKONFAULT couldn't be exported to
> the userspace but is this really necessary?
Sorry for the latency here, I was on vacation and am now at plumbers.
I am not sure that growing the mm_struct by another flags field instead
of using available bits in the vm_flags is the right choice. After this
patch, we still have 3 free bits on 32 bit architectures (2 after the
userfaultfd set IIRC). The group which asked for this feature here
wants the ability to distinguish between LOCKED and LOCKONFAULT regions
and without the VMA flag there isn't a way to do that.
Do we know that these last two open flags are needed right now or is
this speculation that they will be and that none of the other VMA flags
can be reclaimed?
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v7 4/6] mm: mlock: Add mlock flags to enable VM_LOCKONFAULT usage
@ 2015-08-09 5:22 ` Eric B Munson
0 siblings, 0 replies; 87+ messages in thread
From: Eric B Munson @ 2015-08-09 5:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Eric B Munson, Michal Hocko, Vlastimil Babka, Jonathan Corbet,
Kirill A. Shutemov, linux-alpha, linux-kernel, linux-mips,
linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-arch,
linux-api, linux-mm
The previous patch introduced a flag that specified pages in a VMA
should be placed on the unevictable LRU, but they should not be made
present when the area is created. This patch adds the ability to set
this state via the new mlock system calls.
We add MLOCK_ONFAULT for mlock2 and MCL_ONFAULT for mlockall.
MLOCK_ONFAULT will set the VM_LOCKONFAULT modifier for VM_LOCKED.
MCL_ONFAULT should be used as a modifier to the two other mlockall
flags. When used with MCL_CURRENT, all current mappings will be marked
with VM_LOCKED | VM_LOCKONFAULT. When used with MCL_FUTURE, the
mm->def_flags will be marked with VM_LOCKED | VM_LOCKONFAULT. When used
with both MCL_CURRENT and MCL_FUTURE, all current mappings and
mm->def_flags will be marked with VM_LOCKED | VM_LOCKONFAULT.
Prior to this patch, mlockall() will unconditionally clear the
mm->def_flags any time it is called without MCL_FUTURE. This behavior
is maintained after adding MCL_ONFAULT. If a call to
mlockall(MCL_FUTURE) is followed by mlockall(MCL_CURRENT), the
mm->def_flags will be cleared and new VMAs will be unlocked. This
remains true with or without MCL_ONFAULT in either mlockall()
invocation.
munlock() will unconditionally clear both vma flags. munlockall()
unconditionally clears for VMA flags on all VMAs and in the
mm->def_flags field.
Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: linux-alpha@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-arch@vger.kernel.org
Cc: linux-api@vger.kernel.org
Cc: linux-mm@kvack.org
---
arch/alpha/include/uapi/asm/mman.h | 3 ++
arch/mips/include/uapi/asm/mman.h | 6 ++++
arch/parisc/include/uapi/asm/mman.h | 3 ++
arch/powerpc/include/uapi/asm/mman.h | 1 +
arch/sparc/include/uapi/asm/mman.h | 1 +
arch/tile/include/uapi/asm/mman.h | 1 +
arch/xtensa/include/uapi/asm/mman.h | 6 ++++
include/uapi/asm-generic/mman-common.h | 5 ++++
include/uapi/asm-generic/mman.h | 1 +
mm/mlock.c | 53 +++++++++++++++++++++++++---------
10 files changed, 67 insertions(+), 13 deletions(-)
diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
index 0086b47..f2f9496 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -37,6 +37,9 @@
#define MCL_CURRENT 8192 /* lock all currently mapped pages */
#define MCL_FUTURE 16384 /* lock all additions to address space */
+#define MCL_ONFAULT 32768 /* lock all pages that are faulted in */
+
+#define MLOCK_ONFAULT 0x01 /* Lock pages in range after they are faulted in, do not prefault */
#define MADV_NORMAL 0 /* no further special treatment */
#define MADV_RANDOM 1 /* expect random page references */
diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
index cfcb876..97c03f4 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -61,6 +61,12 @@
*/
#define MCL_CURRENT 1 /* lock all current mappings */
#define MCL_FUTURE 2 /* lock all future mappings */
+#define MCL_ONFAULT 4 /* lock all pages that are faulted in */
+
+/*
+ * Flags for mlock
+ */
+#define MLOCK_ONFAULT 0x01 /* Lock pages in range after they are faulted in, do not prefault */
#define MADV_NORMAL 0 /* no further special treatment */
#define MADV_RANDOM 1 /* expect random page references */
diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
index 294d251..ecc3ae1 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -31,6 +31,9 @@
#define MCL_CURRENT 1 /* lock all current mappings */
#define MCL_FUTURE 2 /* lock all future mappings */
+#define MCL_ONFAULT 4 /* lock all pages that are faulted in */
+
+#define MLOCK_ONFAULT 0x01 /* Lock pages in range after they are faulted in, do not prefault */
#define MADV_NORMAL 0 /* no further special treatment */
#define MADV_RANDOM 1 /* expect random page references */
diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h
index 6ea26df..03c06ba 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -22,6 +22,7 @@
#define MCL_CURRENT 0x2000 /* lock all currently mapped pages */
#define MCL_FUTURE 0x4000 /* lock all additions to address space */
+#define MCL_ONFAULT 0x8000 /* lock all pages that are faulted in */
#define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */
#define MAP_NONBLOCK 0x10000 /* do not block on IO */
diff --git a/arch/sparc/include/uapi/asm/mman.h b/arch/sparc/include/uapi/asm/mman.h
index 0b14df3..9765896 100644
--- a/arch/sparc/include/uapi/asm/mman.h
+++ b/arch/sparc/include/uapi/asm/mman.h
@@ -17,6 +17,7 @@
#define MCL_CURRENT 0x2000 /* lock all currently mapped pages */
#define MCL_FUTURE 0x4000 /* lock all additions to address space */
+#define MCL_ONFAULT 0x8000 /* lock all pages that are faulted in */
#define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */
#define MAP_NONBLOCK 0x10000 /* do not block on IO */
diff --git a/arch/tile/include/uapi/asm/mman.h b/arch/tile/include/uapi/asm/mman.h
index 81b8fc3..63ee13f 100644
--- a/arch/tile/include/uapi/asm/mman.h
+++ b/arch/tile/include/uapi/asm/mman.h
@@ -36,6 +36,7 @@
*/
#define MCL_CURRENT 1 /* lock all current mappings */
#define MCL_FUTURE 2 /* lock all future mappings */
+#define MCL_ONFAULT 4 /* lock all pages that are faulted in */
#endif /* _ASM_TILE_MMAN_H */
diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
index 201aec0..360944e 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -74,6 +74,12 @@
*/
#define MCL_CURRENT 1 /* lock all current mappings */
#define MCL_FUTURE 2 /* lock all future mappings */
+#define MCL_ONFAULT 4 /* lock all pages that are faulted in */
+
+/*
+ * Flags for mlock
+ */
+#define MLOCK_ONFAULT 0x01 /* Lock pages in range after they are faulted in, do not prefault */
#define MADV_NORMAL 0 /* no further special treatment */
#define MADV_RANDOM 1 /* expect random page references */
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index ddc3b36..a74dd84 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -25,6 +25,11 @@
# define MAP_UNINITIALIZED 0x0 /* Don't support this flag */
#endif
+/*
+ * Flags for mlock
+ */
+#define MLOCK_ONFAULT 0x01 /* Lock pages in range after they are faulted in, do not prefault */
+
#define MS_ASYNC 1 /* sync memory asynchronously */
#define MS_INVALIDATE 2 /* invalidate the caches */
#define MS_SYNC 4 /* synchronous memory sync */
diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
index e9fe6fd..7162cd4 100644
--- a/include/uapi/asm-generic/mman.h
+++ b/include/uapi/asm-generic/mman.h
@@ -17,5 +17,6 @@
#define MCL_CURRENT 1 /* lock all current mappings */
#define MCL_FUTURE 2 /* lock all future mappings */
+#define MCL_ONFAULT 4 /* lock all pages that are faulted in */
#endif /* __ASM_GENERIC_MMAN_H */
diff --git a/mm/mlock.c b/mm/mlock.c
index 029a75b..7e1b8c5 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -506,7 +506,8 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
if (newflags == vma->vm_flags || (vma->vm_flags & VM_SPECIAL) ||
is_vm_hugetlb_page(vma) || vma == get_gate_vma(current->mm))
- goto out; /* don't set VM_LOCKED, don't count */
+ /* don't set VM_LOCKED or VM_LOCKONFAULT and don't count */
+ goto out;
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
*prev = vma_merge(mm, *prev, start, end, newflags, vma->anon_vma,
@@ -576,7 +577,8 @@ static int apply_vma_lock_flags(unsigned long start, size_t len,
prev = vma;
for (nstart = start ; ; ) {
- vm_flags_t newflags = vma->vm_flags & ~VM_LOCKED;
+ vm_flags_t newflags =
+ vma->vm_flags & ~(VM_LOCKED | VM_LOCKONFAULT);
newflags |= flags;
@@ -645,10 +647,15 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
{
- if (flags)
+ vm_flags_t vm_flags = VM_LOCKED;
+
+ if (flags & ~MLOCK_ONFAULT)
return -EINVAL;
- return do_mlock(start, len, VM_LOCKED);
+ if (flags & MLOCK_ONFAULT)
+ vm_flags |= VM_LOCKONFAULT;
+
+ return do_mlock(start, len, vm_flags);
}
SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
@@ -665,24 +672,43 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
return ret;
}
+/*
+ * Take the MCL_* flags passed into mlockall (or 0 if called from munlockall)
+ * and translate into the appropriate modifications to mm->def_flags and/or the
+ * flags for all current VMAs.
+ *
+ * There are a couple of subtleties with this. If mlockall() is called multiple
+ * times with different flags, the values do not necessarily stack. If mlockall
+ * is called once including the MCL_FUTURE flag and then a second time without
+ * it, VM_LOCKED and VM_LOCKONFAULT will be cleared from mm->def_flags.
+ */
static int apply_mlockall_flags(int flags)
{
struct vm_area_struct * vma, * prev = NULL;
+ vm_flags_t to_add = 0;
- if (flags & MCL_FUTURE)
+ current->mm->def_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
+ if (flags & MCL_FUTURE) {
current->mm->def_flags |= VM_LOCKED;
- else
- current->mm->def_flags &= ~VM_LOCKED;
- if (flags == MCL_FUTURE)
- goto out;
+ if (flags & MCL_ONFAULT)
+ current->mm->def_flags |= VM_LOCKONFAULT;
+
+ if (!(flags & MCL_CURRENT))
+ goto out;
+ }
+
+ if (flags & MCL_CURRENT) {
+ to_add |= VM_LOCKED;
+ if (flags & MCL_ONFAULT)
+ to_add |= VM_LOCKONFAULT;
+ }
for (vma = current->mm->mmap; vma ; vma = prev->vm_next) {
vm_flags_t newflags;
- newflags = vma->vm_flags & ~VM_LOCKED;
- if (flags & MCL_CURRENT)
- newflags |= VM_LOCKED;
+ newflags = vma->vm_flags & ~(VM_LOCKED | VM_LOCKONFAULT);
+ newflags |= to_add;
/* Ignore errors */
mlock_fixup(vma, &prev, vma->vm_start, vma->vm_end, newflags);
@@ -697,7 +723,8 @@ SYSCALL_DEFINE1(mlockall, int, flags)
unsigned long lock_limit;
int ret = -EINVAL;
- if (!flags || (flags & ~(MCL_CURRENT | MCL_FUTURE)))
+ if (!flags || (flags & ~(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT)) ||
+ flags == MCL_ONFAULT)
goto out;
ret = -EPERM;
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 87+ messages in thread* [PATCH v7 4/6] mm: mlock: Add mlock flags to enable VM_LOCKONFAULT usage
@ 2015-08-09 5:22 ` Eric B Munson
0 siblings, 0 replies; 87+ messages in thread
From: Eric B Munson @ 2015-08-09 5:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Eric B Munson, Michal Hocko, Vlastimil Babka, Jonathan Corbet,
Kirill A. Shutemov, linux-alpha, linux-kernel, linux-mips,
linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-arch,
linux-api, linux-mm
The previous patch introduced a flag that specified pages in a VMA
should be placed on the unevictable LRU, but they should not be made
present when the area is created. This patch adds the ability to set
this state via the new mlock system calls.
We add MLOCK_ONFAULT for mlock2 and MCL_ONFAULT for mlockall.
MLOCK_ONFAULT will set the VM_LOCKONFAULT modifier for VM_LOCKED.
MCL_ONFAULT should be used as a modifier to the two other mlockall
flags. When used with MCL_CURRENT, all current mappings will be marked
with VM_LOCKED | VM_LOCKONFAULT. When used with MCL_FUTURE, the
mm->def_flags will be marked with VM_LOCKED | VM_LOCKONFAULT. When used
with both MCL_CURRENT and MCL_FUTURE, all current mappings and
mm->def_flags will be marked with VM_LOCKED | VM_LOCKONFAULT.
Prior to this patch, mlockall() will unconditionally clear the
mm->def_flags any time it is called without MCL_FUTURE. This behavior
is maintained after adding MCL_ONFAULT. If a call to
mlockall(MCL_FUTURE) is followed by mlockall(MCL_CURRENT), the
mm->def_flags will be cleared and new VMAs will be unlocked. This
remains true with or without MCL_ONFAULT in either mlockall()
invocation.
munlock() will unconditionally clear both vma flags. munlockall()
unconditionally clears for VMA flags on all VMAs and in the
mm->def_flags field.
Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: linux-alpha@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-arch@vger.kernel.org
Cc: linux-api@vger.kernel.org
Cc: linux-mm@kvack.org
---
arch/alpha/include/uapi/asm/mman.h | 3 ++
arch/mips/include/uapi/asm/mman.h | 6 ++++
arch/parisc/include/uapi/asm/mman.h | 3 ++
arch/powerpc/include/uapi/asm/mman.h | 1 +
arch/sparc/include/uapi/asm/mman.h | 1 +
arch/tile/include/uapi/asm/mman.h | 1 +
arch/xtensa/include/uapi/asm/mman.h | 6 ++++
include/uapi/asm-generic/mman-common.h | 5 ++++
include/uapi/asm-generic/mman.h | 1 +
mm/mlock.c | 53 +++++++++++++++++++++++++---------
10 files changed, 67 insertions(+), 13 deletions(-)
diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
index 0086b47..f2f9496 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -37,6 +37,9 @@
#define MCL_CURRENT 8192 /* lock all currently mapped pages */
#define MCL_FUTURE 16384 /* lock all additions to address space */
+#define MCL_ONFAULT 32768 /* lock all pages that are faulted in */
+
+#define MLOCK_ONFAULT 0x01 /* Lock pages in range after they are faulted in, do not prefault */
#define MADV_NORMAL 0 /* no further special treatment */
#define MADV_RANDOM 1 /* expect random page references */
diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
index cfcb876..97c03f4 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -61,6 +61,12 @@
*/
#define MCL_CURRENT 1 /* lock all current mappings */
#define MCL_FUTURE 2 /* lock all future mappings */
+#define MCL_ONFAULT 4 /* lock all pages that are faulted in */
+
+/*
+ * Flags for mlock
+ */
+#define MLOCK_ONFAULT 0x01 /* Lock pages in range after they are faulted in, do not prefault */
#define MADV_NORMAL 0 /* no further special treatment */
#define MADV_RANDOM 1 /* expect random page references */
diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
index 294d251..ecc3ae1 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -31,6 +31,9 @@
#define MCL_CURRENT 1 /* lock all current mappings */
#define MCL_FUTURE 2 /* lock all future mappings */
+#define MCL_ONFAULT 4 /* lock all pages that are faulted in */
+
+#define MLOCK_ONFAULT 0x01 /* Lock pages in range after they are faulted in, do not prefault */
#define MADV_NORMAL 0 /* no further special treatment */
#define MADV_RANDOM 1 /* expect random page references */
diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h
index 6ea26df..03c06ba 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -22,6 +22,7 @@
#define MCL_CURRENT 0x2000 /* lock all currently mapped pages */
#define MCL_FUTURE 0x4000 /* lock all additions to address space */
+#define MCL_ONFAULT 0x8000 /* lock all pages that are faulted in */
#define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */
#define MAP_NONBLOCK 0x10000 /* do not block on IO */
diff --git a/arch/sparc/include/uapi/asm/mman.h b/arch/sparc/include/uapi/asm/mman.h
index 0b14df3..9765896 100644
--- a/arch/sparc/include/uapi/asm/mman.h
+++ b/arch/sparc/include/uapi/asm/mman.h
@@ -17,6 +17,7 @@
#define MCL_CURRENT 0x2000 /* lock all currently mapped pages */
#define MCL_FUTURE 0x4000 /* lock all additions to address space */
+#define MCL_ONFAULT 0x8000 /* lock all pages that are faulted in */
#define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */
#define MAP_NONBLOCK 0x10000 /* do not block on IO */
diff --git a/arch/tile/include/uapi/asm/mman.h b/arch/tile/include/uapi/asm/mman.h
index 81b8fc3..63ee13f 100644
--- a/arch/tile/include/uapi/asm/mman.h
+++ b/arch/tile/include/uapi/asm/mman.h
@@ -36,6 +36,7 @@
*/
#define MCL_CURRENT 1 /* lock all current mappings */
#define MCL_FUTURE 2 /* lock all future mappings */
+#define MCL_ONFAULT 4 /* lock all pages that are faulted in */
#endif /* _ASM_TILE_MMAN_H */
diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
index 201aec0..360944e 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -74,6 +74,12 @@
*/
#define MCL_CURRENT 1 /* lock all current mappings */
#define MCL_FUTURE 2 /* lock all future mappings */
+#define MCL_ONFAULT 4 /* lock all pages that are faulted in */
+
+/*
+ * Flags for mlock
+ */
+#define MLOCK_ONFAULT 0x01 /* Lock pages in range after they are faulted in, do not prefault */
#define MADV_NORMAL 0 /* no further special treatment */
#define MADV_RANDOM 1 /* expect random page references */
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index ddc3b36..a74dd84 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -25,6 +25,11 @@
# define MAP_UNINITIALIZED 0x0 /* Don't support this flag */
#endif
+/*
+ * Flags for mlock
+ */
+#define MLOCK_ONFAULT 0x01 /* Lock pages in range after they are faulted in, do not prefault */
+
#define MS_ASYNC 1 /* sync memory asynchronously */
#define MS_INVALIDATE 2 /* invalidate the caches */
#define MS_SYNC 4 /* synchronous memory sync */
diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
index e9fe6fd..7162cd4 100644
--- a/include/uapi/asm-generic/mman.h
+++ b/include/uapi/asm-generic/mman.h
@@ -17,5 +17,6 @@
#define MCL_CURRENT 1 /* lock all current mappings */
#define MCL_FUTURE 2 /* lock all future mappings */
+#define MCL_ONFAULT 4 /* lock all pages that are faulted in */
#endif /* __ASM_GENERIC_MMAN_H */
diff --git a/mm/mlock.c b/mm/mlock.c
index 029a75b..7e1b8c5 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -506,7 +506,8 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
if (newflags = vma->vm_flags || (vma->vm_flags & VM_SPECIAL) ||
is_vm_hugetlb_page(vma) || vma = get_gate_vma(current->mm))
- goto out; /* don't set VM_LOCKED, don't count */
+ /* don't set VM_LOCKED or VM_LOCKONFAULT and don't count */
+ goto out;
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
*prev = vma_merge(mm, *prev, start, end, newflags, vma->anon_vma,
@@ -576,7 +577,8 @@ static int apply_vma_lock_flags(unsigned long start, size_t len,
prev = vma;
for (nstart = start ; ; ) {
- vm_flags_t newflags = vma->vm_flags & ~VM_LOCKED;
+ vm_flags_t newflags + vma->vm_flags & ~(VM_LOCKED | VM_LOCKONFAULT);
newflags |= flags;
@@ -645,10 +647,15 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
{
- if (flags)
+ vm_flags_t vm_flags = VM_LOCKED;
+
+ if (flags & ~MLOCK_ONFAULT)
return -EINVAL;
- return do_mlock(start, len, VM_LOCKED);
+ if (flags & MLOCK_ONFAULT)
+ vm_flags |= VM_LOCKONFAULT;
+
+ return do_mlock(start, len, vm_flags);
}
SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
@@ -665,24 +672,43 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
return ret;
}
+/*
+ * Take the MCL_* flags passed into mlockall (or 0 if called from munlockall)
+ * and translate into the appropriate modifications to mm->def_flags and/or the
+ * flags for all current VMAs.
+ *
+ * There are a couple of subtleties with this. If mlockall() is called multiple
+ * times with different flags, the values do not necessarily stack. If mlockall
+ * is called once including the MCL_FUTURE flag and then a second time without
+ * it, VM_LOCKED and VM_LOCKONFAULT will be cleared from mm->def_flags.
+ */
static int apply_mlockall_flags(int flags)
{
struct vm_area_struct * vma, * prev = NULL;
+ vm_flags_t to_add = 0;
- if (flags & MCL_FUTURE)
+ current->mm->def_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
+ if (flags & MCL_FUTURE) {
current->mm->def_flags |= VM_LOCKED;
- else
- current->mm->def_flags &= ~VM_LOCKED;
- if (flags = MCL_FUTURE)
- goto out;
+ if (flags & MCL_ONFAULT)
+ current->mm->def_flags |= VM_LOCKONFAULT;
+
+ if (!(flags & MCL_CURRENT))
+ goto out;
+ }
+
+ if (flags & MCL_CURRENT) {
+ to_add |= VM_LOCKED;
+ if (flags & MCL_ONFAULT)
+ to_add |= VM_LOCKONFAULT;
+ }
for (vma = current->mm->mmap; vma ; vma = prev->vm_next) {
vm_flags_t newflags;
- newflags = vma->vm_flags & ~VM_LOCKED;
- if (flags & MCL_CURRENT)
- newflags |= VM_LOCKED;
+ newflags = vma->vm_flags & ~(VM_LOCKED | VM_LOCKONFAULT);
+ newflags |= to_add;
/* Ignore errors */
mlock_fixup(vma, &prev, vma->vm_start, vma->vm_end, newflags);
@@ -697,7 +723,8 @@ SYSCALL_DEFINE1(mlockall, int, flags)
unsigned long lock_limit;
int ret = -EINVAL;
- if (!flags || (flags & ~(MCL_CURRENT | MCL_FUTURE)))
+ if (!flags || (flags & ~(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT)) ||
+ flags = MCL_ONFAULT)
goto out;
ret = -EPERM;
--
1.9.1
^ permalink raw reply related [flat|nested] 87+ messages in thread* [PATCH v7 4/6] mm: mlock: Add mlock flags to enable VM_LOCKONFAULT usage
@ 2015-08-09 5:22 ` Eric B Munson
0 siblings, 0 replies; 87+ messages in thread
From: Eric B Munson @ 2015-08-09 5:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Eric B Munson, Michal Hocko, Vlastimil Babka, Jonathan Corbet,
Kirill A. Shutemov, linux-alpha, linux-kernel, linux-mips,
linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-arch,
linux-api, linux-mm
The previous patch introduced a flag that specified pages in a VMA
should be placed on the unevictable LRU, but they should not be made
present when the area is created. This patch adds the ability to set
this state via the new mlock system calls.
We add MLOCK_ONFAULT for mlock2 and MCL_ONFAULT for mlockall.
MLOCK_ONFAULT will set the VM_LOCKONFAULT modifier for VM_LOCKED.
MCL_ONFAULT should be used as a modifier to the two other mlockall
flags. When used with MCL_CURRENT, all current mappings will be marked
with VM_LOCKED | VM_LOCKONFAULT. When used with MCL_FUTURE, the
mm->def_flags will be marked with VM_LOCKED | VM_LOCKONFAULT. When used
with both MCL_CURRENT and MCL_FUTURE, all current mappings and
mm->def_flags will be marked with VM_LOCKED | VM_LOCKONFAULT.
Prior to this patch, mlockall() will unconditionally clear the
mm->def_flags any time it is called without MCL_FUTURE. This behavior
is maintained after adding MCL_ONFAULT. If a call to
mlockall(MCL_FUTURE) is followed by mlockall(MCL_CURRENT), the
mm->def_flags will be cleared and new VMAs will be unlocked. This
remains true with or without MCL_ONFAULT in either mlockall()
invocation.
munlock() will unconditionally clear both vma flags. munlockall()
unconditionally clears for VMA flags on all VMAs and in the
mm->def_flags field.
Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: linux-alpha@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-arch@vger.kernel.org
Cc: linux-api@vger.kernel.org
Cc: linux-mm@kvack.org
---
arch/alpha/include/uapi/asm/mman.h | 3 ++
arch/mips/include/uapi/asm/mman.h | 6 ++++
arch/parisc/include/uapi/asm/mman.h | 3 ++
arch/powerpc/include/uapi/asm/mman.h | 1 +
arch/sparc/include/uapi/asm/mman.h | 1 +
arch/tile/include/uapi/asm/mman.h | 1 +
arch/xtensa/include/uapi/asm/mman.h | 6 ++++
include/uapi/asm-generic/mman-common.h | 5 ++++
include/uapi/asm-generic/mman.h | 1 +
mm/mlock.c | 53 +++++++++++++++++++++++++---------
10 files changed, 67 insertions(+), 13 deletions(-)
diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
index 0086b47..f2f9496 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -37,6 +37,9 @@
#define MCL_CURRENT 8192 /* lock all currently mapped pages */
#define MCL_FUTURE 16384 /* lock all additions to address space */
+#define MCL_ONFAULT 32768 /* lock all pages that are faulted in */
+
+#define MLOCK_ONFAULT 0x01 /* Lock pages in range after they are faulted in, do not prefault */
#define MADV_NORMAL 0 /* no further special treatment */
#define MADV_RANDOM 1 /* expect random page references */
diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
index cfcb876..97c03f4 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -61,6 +61,12 @@
*/
#define MCL_CURRENT 1 /* lock all current mappings */
#define MCL_FUTURE 2 /* lock all future mappings */
+#define MCL_ONFAULT 4 /* lock all pages that are faulted in */
+
+/*
+ * Flags for mlock
+ */
+#define MLOCK_ONFAULT 0x01 /* Lock pages in range after they are faulted in, do not prefault */
#define MADV_NORMAL 0 /* no further special treatment */
#define MADV_RANDOM 1 /* expect random page references */
diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
index 294d251..ecc3ae1 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -31,6 +31,9 @@
#define MCL_CURRENT 1 /* lock all current mappings */
#define MCL_FUTURE 2 /* lock all future mappings */
+#define MCL_ONFAULT 4 /* lock all pages that are faulted in */
+
+#define MLOCK_ONFAULT 0x01 /* Lock pages in range after they are faulted in, do not prefault */
#define MADV_NORMAL 0 /* no further special treatment */
#define MADV_RANDOM 1 /* expect random page references */
diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h
index 6ea26df..03c06ba 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -22,6 +22,7 @@
#define MCL_CURRENT 0x2000 /* lock all currently mapped pages */
#define MCL_FUTURE 0x4000 /* lock all additions to address space */
+#define MCL_ONFAULT 0x8000 /* lock all pages that are faulted in */
#define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */
#define MAP_NONBLOCK 0x10000 /* do not block on IO */
diff --git a/arch/sparc/include/uapi/asm/mman.h b/arch/sparc/include/uapi/asm/mman.h
index 0b14df3..9765896 100644
--- a/arch/sparc/include/uapi/asm/mman.h
+++ b/arch/sparc/include/uapi/asm/mman.h
@@ -17,6 +17,7 @@
#define MCL_CURRENT 0x2000 /* lock all currently mapped pages */
#define MCL_FUTURE 0x4000 /* lock all additions to address space */
+#define MCL_ONFAULT 0x8000 /* lock all pages that are faulted in */
#define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */
#define MAP_NONBLOCK 0x10000 /* do not block on IO */
diff --git a/arch/tile/include/uapi/asm/mman.h b/arch/tile/include/uapi/asm/mman.h
index 81b8fc3..63ee13f 100644
--- a/arch/tile/include/uapi/asm/mman.h
+++ b/arch/tile/include/uapi/asm/mman.h
@@ -36,6 +36,7 @@
*/
#define MCL_CURRENT 1 /* lock all current mappings */
#define MCL_FUTURE 2 /* lock all future mappings */
+#define MCL_ONFAULT 4 /* lock all pages that are faulted in */
#endif /* _ASM_TILE_MMAN_H */
diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
index 201aec0..360944e 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -74,6 +74,12 @@
*/
#define MCL_CURRENT 1 /* lock all current mappings */
#define MCL_FUTURE 2 /* lock all future mappings */
+#define MCL_ONFAULT 4 /* lock all pages that are faulted in */
+
+/*
+ * Flags for mlock
+ */
+#define MLOCK_ONFAULT 0x01 /* Lock pages in range after they are faulted in, do not prefault */
#define MADV_NORMAL 0 /* no further special treatment */
#define MADV_RANDOM 1 /* expect random page references */
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index ddc3b36..a74dd84 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -25,6 +25,11 @@
# define MAP_UNINITIALIZED 0x0 /* Don't support this flag */
#endif
+/*
+ * Flags for mlock
+ */
+#define MLOCK_ONFAULT 0x01 /* Lock pages in range after they are faulted in, do not prefault */
+
#define MS_ASYNC 1 /* sync memory asynchronously */
#define MS_INVALIDATE 2 /* invalidate the caches */
#define MS_SYNC 4 /* synchronous memory sync */
diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
index e9fe6fd..7162cd4 100644
--- a/include/uapi/asm-generic/mman.h
+++ b/include/uapi/asm-generic/mman.h
@@ -17,5 +17,6 @@
#define MCL_CURRENT 1 /* lock all current mappings */
#define MCL_FUTURE 2 /* lock all future mappings */
+#define MCL_ONFAULT 4 /* lock all pages that are faulted in */
#endif /* __ASM_GENERIC_MMAN_H */
diff --git a/mm/mlock.c b/mm/mlock.c
index 029a75b..7e1b8c5 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -506,7 +506,8 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
if (newflags == vma->vm_flags || (vma->vm_flags & VM_SPECIAL) ||
is_vm_hugetlb_page(vma) || vma == get_gate_vma(current->mm))
- goto out; /* don't set VM_LOCKED, don't count */
+ /* don't set VM_LOCKED or VM_LOCKONFAULT and don't count */
+ goto out;
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
*prev = vma_merge(mm, *prev, start, end, newflags, vma->anon_vma,
@@ -576,7 +577,8 @@ static int apply_vma_lock_flags(unsigned long start, size_t len,
prev = vma;
for (nstart = start ; ; ) {
- vm_flags_t newflags = vma->vm_flags & ~VM_LOCKED;
+ vm_flags_t newflags =
+ vma->vm_flags & ~(VM_LOCKED | VM_LOCKONFAULT);
newflags |= flags;
@@ -645,10 +647,15 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
{
- if (flags)
+ vm_flags_t vm_flags = VM_LOCKED;
+
+ if (flags & ~MLOCK_ONFAULT)
return -EINVAL;
- return do_mlock(start, len, VM_LOCKED);
+ if (flags & MLOCK_ONFAULT)
+ vm_flags |= VM_LOCKONFAULT;
+
+ return do_mlock(start, len, vm_flags);
}
SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
@@ -665,24 +672,43 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
return ret;
}
+/*
+ * Take the MCL_* flags passed into mlockall (or 0 if called from munlockall)
+ * and translate into the appropriate modifications to mm->def_flags and/or the
+ * flags for all current VMAs.
+ *
+ * There are a couple of subtleties with this. If mlockall() is called multiple
+ * times with different flags, the values do not necessarily stack. If mlockall
+ * is called once including the MCL_FUTURE flag and then a second time without
+ * it, VM_LOCKED and VM_LOCKONFAULT will be cleared from mm->def_flags.
+ */
static int apply_mlockall_flags(int flags)
{
struct vm_area_struct * vma, * prev = NULL;
+ vm_flags_t to_add = 0;
- if (flags & MCL_FUTURE)
+ current->mm->def_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
+ if (flags & MCL_FUTURE) {
current->mm->def_flags |= VM_LOCKED;
- else
- current->mm->def_flags &= ~VM_LOCKED;
- if (flags == MCL_FUTURE)
- goto out;
+ if (flags & MCL_ONFAULT)
+ current->mm->def_flags |= VM_LOCKONFAULT;
+
+ if (!(flags & MCL_CURRENT))
+ goto out;
+ }
+
+ if (flags & MCL_CURRENT) {
+ to_add |= VM_LOCKED;
+ if (flags & MCL_ONFAULT)
+ to_add |= VM_LOCKONFAULT;
+ }
for (vma = current->mm->mmap; vma ; vma = prev->vm_next) {
vm_flags_t newflags;
- newflags = vma->vm_flags & ~VM_LOCKED;
- if (flags & MCL_CURRENT)
- newflags |= VM_LOCKED;
+ newflags = vma->vm_flags & ~(VM_LOCKED | VM_LOCKONFAULT);
+ newflags |= to_add;
/* Ignore errors */
mlock_fixup(vma, &prev, vma->vm_start, vma->vm_end, newflags);
@@ -697,7 +723,8 @@ SYSCALL_DEFINE1(mlockall, int, flags)
unsigned long lock_limit;
int ret = -EINVAL;
- if (!flags || (flags & ~(MCL_CURRENT | MCL_FUTURE)))
+ if (!flags || (flags & ~(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT)) ||
+ flags == MCL_ONFAULT)
goto out;
ret = -EPERM;
--
1.9.1
^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v7 5/6] selftests: vm: Add tests for lock on fault
2015-08-09 5:22 ` Eric B Munson
@ 2015-08-09 5:22 ` Eric B Munson
-1 siblings, 0 replies; 87+ messages in thread
From: Eric B Munson @ 2015-08-09 5:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Eric B Munson, Shuah Khan, Michal Hocko, Vlastimil Babka,
Jonathan Corbet, linux-mm, linux-kernel, linux-api
Test the mmap() flag, and the mlockall() flag. These tests ensure that
pages are not faulted in until they are accessed, that the pages are
unevictable once faulted in, and that VMA splitting and merging works
with the new VM flag. The second test ensures that mlock limits are
respected. Note that the limit test needs to be run a normal user.
Also add tests to use the new mlock2 family of system calls.
Signed-off-by: Eric B Munson <emunson@akamai.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-api@vger.kernel.org
---
tools/testing/selftests/vm/Makefile | 2 +
tools/testing/selftests/vm/mlock2-tests.c | 661 ++++++++++++++++++++++++++++
tools/testing/selftests/vm/on-fault-limit.c | 47 ++
tools/testing/selftests/vm/run_vmtests | 22 +
4 files changed, 732 insertions(+)
create mode 100644 tools/testing/selftests/vm/mlock2-tests.c
create mode 100644 tools/testing/selftests/vm/on-fault-limit.c
diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
index 231b9a0..71a4e9f 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -6,6 +6,8 @@ BINARIES += hugepage-mmap
BINARIES += hugepage-shm
BINARIES += hugetlbfstest
BINARIES += map_hugetlb
+BINARIES += mlock2-tests
+BINARIES += on-fault-limit
BINARIES += thuge-gen
BINARIES += transhuge-stress
diff --git a/tools/testing/selftests/vm/mlock2-tests.c b/tools/testing/selftests/vm/mlock2-tests.c
new file mode 100644
index 0000000..c49122b
--- /dev/null
+++ b/tools/testing/selftests/vm/mlock2-tests.c
@@ -0,0 +1,661 @@
+#include <sys/mman.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/time.h>
+#include <sys/resource.h>
+#include <errno.h>
+#include <stdbool.h>
+
+#ifndef MLOCK_ONFAULT
+#define MLOCK_ONFAULT 1
+#endif
+
+#ifndef MCL_ONFAULT
+#define MCL_ONFAULT (MCL_FUTURE << 1)
+#endif
+
+static int mlock2_(void *start, size_t len, int flags)
+{
+#ifdef __NR_mlock2
+ return syscall(__NR_mlock2, start, len, flags);
+#else
+ errno = ENOSYS;
+ return -1;
+#endif
+}
+
+struct vm_boundaries {
+ unsigned long start;
+ unsigned long end;
+};
+
+static int get_vm_area(unsigned long addr, struct vm_boundaries *area)
+{
+ FILE *file;
+ int ret = 1;
+ char line[1024] = {0};
+ char *end_addr;
+ char *stop;
+ unsigned long start;
+ unsigned long end;
+
+ if (!area)
+ return ret;
+
+ file = fopen("/proc/self/maps", "r");
+ if (!file) {
+ perror("fopen");
+ return ret;
+ }
+
+ memset(area, 0, sizeof(struct vm_boundaries));
+
+ while(fgets(line, 1024, file)) {
+ end_addr = strchr(line, '-');
+ if (!end_addr) {
+ printf("cannot parse /proc/self/maps\n");
+ goto out;
+ }
+ *end_addr = '\0';
+ end_addr++;
+ stop = strchr(end_addr, ' ');
+ if (!stop) {
+ printf("cannot parse /proc/self/maps\n");
+ goto out;
+ }
+ stop = '\0';
+
+ sscanf(line, "%lx", &start);
+ sscanf(end_addr, "%lx", &end);
+
+ if (start <= addr && end > addr) {
+ area->start = start;
+ area->end = end;
+ ret = 0;
+ goto out;
+ }
+ }
+out:
+ fclose(file);
+ return ret;
+}
+
+static unsigned long get_pageflags(unsigned long addr)
+{
+ FILE *file;
+ unsigned long pfn;
+ unsigned long offset;
+
+ file = fopen("/proc/self/pagemap", "r");
+ if (!file) {
+ perror("fopen pagemap");
+ _exit(1);
+ }
+
+ offset = addr / getpagesize() * sizeof(unsigned long);
+ if (fseek(file, offset, SEEK_SET)) {
+ perror("fseek pagemap");
+ _exit(1);
+ }
+
+ if (fread(&pfn, sizeof(unsigned long), 1, file) != 1) {
+ perror("fread pagemap");
+ _exit(1);
+ }
+
+ fclose(file);
+ return pfn;
+}
+
+static unsigned long get_kpageflags(unsigned long pfn)
+{
+ unsigned long flags;
+ FILE *file;
+
+ file = fopen("/proc/kpageflags", "r");
+ if (!file) {
+ perror("fopen kpageflags");
+ _exit(1);
+ }
+
+ if (fseek(file, pfn * sizeof(unsigned long), SEEK_SET)) {
+ perror("fseek kpageflags");
+ _exit(1);
+ }
+
+ if (fread(&flags, sizeof(unsigned long), 1, file) != 1) {
+ perror("fread kpageflags");
+ _exit(1);
+ }
+
+ fclose(file);
+ return flags;
+}
+
+#define VMFLAGS "VmFlags:"
+
+static bool find_flag(FILE *file, const char *vmflag)
+{
+ char *line = NULL;
+ char *flags;
+ size_t size = 0;
+ bool ret = false;
+
+ while (getline(&line, &size, file) > 0) {
+ if (!strstr(line, VMFLAGS)) {
+ free(line);
+ line = NULL;
+ size = 0;
+ continue;
+ }
+
+ flags = line + strlen(VMFLAGS);
+ ret = (strstr(flags, vmflag) != NULL);
+ goto out;
+ }
+
+out:
+ free(line);
+ return ret;
+}
+
+static bool is_vmflag_set(unsigned long addr, const char *vmflag)
+{
+ FILE *file;
+ char *line = NULL;
+ size_t size = 0;
+ bool ret = false;
+ unsigned long start, end;
+ char perms[5];
+ unsigned long offset;
+ char dev[32];
+ unsigned long inode;
+ char path[BUFSIZ];
+
+ file = fopen("/proc/self/smaps", "r");
+ if (!file) {
+ perror("fopen smaps");
+ _exit(1);
+ }
+
+ while (getline(&line, &size, file) > 0) {
+ if (sscanf(line, "%lx-%lx %s %lx %s %lu %s\n",
+ &start, &end, perms, &offset, dev, &inode, path) < 6)
+ goto next;
+
+ if (start <= addr && addr < end) {
+ ret = find_flag(file, vmflag);
+ goto out;
+ }
+
+next:
+ free(line);
+ line = NULL;
+ size = 0;
+ }
+
+out:
+ free(line);
+ fclose(file);
+ return ret;
+}
+
+#define PRESENT_BIT 0x8000000000000000
+#define PFN_MASK 0x007FFFFFFFFFFFFF
+#define UNEVICTABLE_BIT (1UL << 18)
+
+#define LOCKED "lo"
+#define LOCKEDONFAULT "lf"
+
+static int lock_check(char *map)
+{
+ unsigned long page1_flags;
+ unsigned long page2_flags;
+ unsigned long page_size = getpagesize();
+
+ page1_flags = get_pageflags((unsigned long)map);
+ page2_flags = get_pageflags((unsigned long)map + page_size);
+
+ /* Both pages should be present */
+ if (((page1_flags & PRESENT_BIT) == 0) ||
+ ((page2_flags & PRESENT_BIT) == 0)) {
+ printf("Failed to make both pages present\n");
+ return 1;
+ }
+
+ page1_flags = get_kpageflags(page1_flags & PFN_MASK);
+ page2_flags = get_kpageflags(page2_flags & PFN_MASK);
+
+ /* Both pages should be unevictable */
+ if (((page1_flags & UNEVICTABLE_BIT) == 0) ||
+ ((page2_flags & UNEVICTABLE_BIT) == 0)) {
+ printf("Failed to make both pages unevictable\n");
+ return 1;
+ }
+
+ if (!is_vmflag_set((unsigned long)map, LOCKED) ||
+ !is_vmflag_set((unsigned long)map + page_size, LOCKED)) {
+ printf("VMA flag %s is missing\n", LOCKED);
+ return 1;
+ }
+
+ return 0;
+}
+
+static int unlock_lock_check(char *map)
+{
+ unsigned long page1_flags;
+ unsigned long page2_flags;
+ unsigned long page_size = getpagesize();
+
+ page1_flags = get_pageflags((unsigned long)map);
+ page2_flags = get_pageflags((unsigned long)map + page_size);
+ page1_flags = get_kpageflags(page1_flags & PFN_MASK);
+ page2_flags = get_kpageflags(page2_flags & PFN_MASK);
+
+ if ((page1_flags & UNEVICTABLE_BIT) || (page2_flags & UNEVICTABLE_BIT)) {
+ printf("A page is still marked unevictable after unlock\n");
+ return 1;
+ }
+
+ if (is_vmflag_set((unsigned long)map, LOCKED) ||
+ is_vmflag_set((unsigned long)map + page_size, LOCKED)) {
+ printf("VMA flag %s is still set after unlock\n", LOCKED);
+ return 1;
+ }
+
+ return 0;
+}
+
+static int test_mlock_lock()
+{
+ char *map;
+ int ret = 1;
+ unsigned long page_size = getpagesize();
+
+ map = mmap(NULL, 2 * page_size, PROT_READ | PROT_WRITE,
+ MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
+ if (map == MAP_FAILED) {
+ perror("test_mlock_locked mmap");
+ goto out;
+ }
+
+ if (mlock2_(map, 2 * page_size, 0)) {
+ if (errno == ENOSYS) {
+ printf("Cannot call new mlock family, skipping test\n");
+ _exit(0);
+ }
+ perror("mlock2(0)");
+ goto unmap;
+ }
+
+ if (lock_check(map))
+ goto unmap;
+
+ /* Now unlock and recheck attributes */
+ if (munlock(map, 2 * page_size)) {
+ perror("munlock()");
+ goto unmap;
+ }
+
+ ret = unlock_lock_check(map);
+
+unmap:
+ munmap(map, 2 * page_size);
+out:
+ return ret;
+}
+
+static int onfault_check(char *map)
+{
+ unsigned long page1_flags;
+ unsigned long page2_flags;
+ unsigned long page_size = getpagesize();
+
+ page1_flags = get_pageflags((unsigned long)map);
+ page2_flags = get_pageflags((unsigned long)map + page_size);
+
+ /* Neither page should be present */
+ if ((page1_flags & PRESENT_BIT) || (page2_flags & PRESENT_BIT)) {
+ printf("Pages were made present by MLOCK_ONFAULT\n");
+ return 1;
+ }
+
+ *map = 'a';
+ page1_flags = get_pageflags((unsigned long)map);
+ page2_flags = get_pageflags((unsigned long)map + page_size);
+
+ /* Only page 1 should be present */
+ if ((page1_flags & PRESENT_BIT) == 0) {
+ printf("Page 1 is not present after fault\n");
+ return 1;
+ } else if (page2_flags & PRESENT_BIT) {
+ printf("Page 2 was made present\n");
+ return 1;
+ }
+
+ page1_flags = get_kpageflags(page1_flags & PFN_MASK);
+
+ /* Page 1 should be unevictable */
+ if ((page1_flags & UNEVICTABLE_BIT) == 0) {
+ printf("Failed to make faulted page unevictable\n");
+ return 1;
+ }
+
+ if (!is_vmflag_set((unsigned long)map, LOCKEDONFAULT) ||
+ !is_vmflag_set((unsigned long)map + page_size, LOCKEDONFAULT)) {
+ printf("VMA flag %s is missing\n", LOCKEDONFAULT);
+ return 1;
+ }
+
+ return 0;
+}
+
+static int unlock_onfault_check(char *map)
+{
+ unsigned long page1_flags;
+ unsigned long page2_flags;
+ unsigned long page_size = getpagesize();
+
+ page1_flags = get_pageflags((unsigned long)map);
+ page1_flags = get_kpageflags(page1_flags & PFN_MASK);
+
+ if (page1_flags & UNEVICTABLE_BIT) {
+ printf("Page 1 is still marked unevictable after unlock\n");
+ return 1;
+ }
+
+ if (is_vmflag_set((unsigned long)map, LOCKEDONFAULT) ||
+ is_vmflag_set((unsigned long)map + page_size, LOCKEDONFAULT)) {
+ printf("VMA flag %s is still set after unlock\n", LOCKEDONFAULT);
+ return 1;
+ }
+
+ return 0;
+}
+
+static int test_mlock_onfault()
+{
+ char *map;
+ int ret = 1;
+ unsigned long page_size = getpagesize();
+
+ map = mmap(NULL, 2 * page_size, PROT_READ | PROT_WRITE,
+ MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
+ if (map == MAP_FAILED) {
+ perror("test_mlock_locked mmap");
+ goto out;
+ }
+
+ if (mlock2_(map, 2 * page_size, MLOCK_ONFAULT)) {
+ if (errno == ENOSYS) {
+ printf("Cannot call new mlock family, skipping test\n");
+ _exit(0);
+ }
+ perror("mlock2(MLOCK_ONFAULT)");
+ goto unmap;
+ }
+
+ if (onfault_check(map))
+ goto unmap;
+
+ /* Now unlock and recheck attributes */
+ if (munlock(map, 2 * page_size)) {
+ if (errno == ENOSYS) {
+ printf("Cannot call new mlock family, skipping test\n");
+ _exit(0);
+ }
+ perror("munlock()");
+ goto unmap;
+ }
+
+ ret = unlock_onfault_check(map);
+unmap:
+ munmap(map, 2 * page_size);
+out:
+ return ret;
+}
+
+static int test_lock_onfault_of_present()
+{
+ char *map;
+ int ret = 1;
+ unsigned long page1_flags;
+ unsigned long page2_flags;
+ unsigned long page_size = getpagesize();
+
+ map = mmap(NULL, 2 * page_size, PROT_READ | PROT_WRITE,
+ MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
+ if (map == MAP_FAILED) {
+ perror("test_mlock_locked mmap");
+ goto out;
+ }
+
+ *map = 'a';
+
+ if (mlock2_(map, 2 * page_size, MLOCK_ONFAULT)) {
+ if (errno == ENOSYS) {
+ printf("Cannot call new mlock family, skipping test\n");
+ _exit(0);
+ }
+ perror("mlock2(MLOCK_ONFAULT)");
+ goto unmap;
+ }
+
+ page1_flags = get_pageflags((unsigned long)map);
+ page2_flags = get_pageflags((unsigned long)map + page_size);
+ page1_flags = get_kpageflags(page1_flags & PFN_MASK);
+ page2_flags = get_kpageflags(page2_flags & PFN_MASK);
+
+ /* Page 1 should be unevictable */
+ if ((page1_flags & UNEVICTABLE_BIT) == 0) {
+ printf("Failed to make present page unevictable\n");
+ goto unmap;
+ }
+
+ if (!is_vmflag_set((unsigned long)map, LOCKEDONFAULT) ||
+ !is_vmflag_set((unsigned long)map + page_size, LOCKEDONFAULT)) {
+ printf("VMA flag %s is missing for one of the pages\n", LOCKEDONFAULT);
+ goto unmap;
+ }
+ ret = 0;
+unmap:
+ munmap(map, 2 * page_size);
+out:
+ return ret;
+}
+
+static int test_munlockall()
+{
+ char *map;
+ int ret = 1;
+ unsigned long page1_flags;
+ unsigned long page2_flags;
+ unsigned long page_size = getpagesize();
+
+ map = mmap(NULL, 2 * page_size, PROT_READ | PROT_WRITE,
+ MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
+
+ if (map == MAP_FAILED) {
+ perror("test_munlockall mmap");
+ goto out;
+ }
+
+ if (mlockall(MCL_CURRENT)) {
+ perror("mlockall(MCL_CURRENT)");
+ goto out;
+ }
+
+ if (lock_check(map))
+ goto unmap;
+
+ if (munlockall()) {
+ perror("munlockall()");
+ goto unmap;
+ }
+
+ if (unlock_lock_check(map))
+ goto unmap;
+
+ munmap(map, 2 * page_size);
+
+ map = mmap(NULL, 2 * page_size, PROT_READ | PROT_WRITE,
+ MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
+
+ if (map == MAP_FAILED) {
+ perror("test_munlockall second mmap");
+ goto out;
+ }
+
+ if (mlockall(MCL_CURRENT | MCL_ONFAULT)) {
+ perror("mlockall(MCL_CURRENT | MCL_ONFAULT)");
+ goto unmap;
+ }
+
+ if (onfault_check(map))
+ goto unmap;
+
+ if (munlockall()) {
+ perror("munlockall()");
+ goto unmap;
+ }
+
+ if (unlock_onfault_check(map))
+ goto unmap;
+
+ if (mlockall(MCL_CURRENT | MCL_FUTURE)) {
+ perror("mlockall(MCL_CURRENT | MCL_FUTURE)");
+ goto out;
+ }
+
+ if (lock_check(map))
+ goto unmap;
+
+ if (munlockall()) {
+ perror("munlockall()");
+ goto unmap;
+ }
+
+ ret = unlock_lock_check(map);
+
+unmap:
+ munmap(map, 2 * page_size);
+out:
+ munlockall();
+ return ret;
+}
+
+static int test_vma_management(bool call_mlock)
+{
+ int ret = 1;
+ void *map;
+ unsigned long page_size = getpagesize();
+ struct vm_boundaries page1;
+ struct vm_boundaries page2;
+ struct vm_boundaries page3;
+
+ map = mmap(NULL, 3 * page_size, PROT_READ | PROT_WRITE,
+ MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
+ if (map == MAP_FAILED) {
+ perror("mmap()");
+ return ret;
+ }
+
+ if (call_mlock && mlock2_(map, 3 * page_size, MLOCK_ONFAULT)) {
+ if (errno == ENOSYS) {
+ printf("Cannot call new mlock family, skipping test\n");
+ _exit(0);
+ }
+ perror("mlock(ONFAULT)\n");
+ goto out;
+ }
+
+ if (get_vm_area((unsigned long)map, &page1) ||
+ get_vm_area((unsigned long)map + page_size, &page2) ||
+ get_vm_area((unsigned long)map + page_size * 2, &page3)) {
+ printf("couldn't find mapping in /proc/self/maps\n");
+ goto out;
+ }
+
+ /*
+ * Before we unlock a portion, we need to that all three pages are in
+ * the same VMA. If they are not we abort this test (Note that this is
+ * not a failure)
+ */
+ if (page1.start != page2.start || page2.start != page3.start) {
+ printf("VMAs are not merged to start, aborting test\n");
+ ret = 0;
+ goto out;
+ }
+
+ if (munlock(map + page_size, page_size)) {
+ perror("munlock()");
+ goto out;
+ }
+
+ if (get_vm_area((unsigned long)map, &page1) ||
+ get_vm_area((unsigned long)map + page_size, &page2) ||
+ get_vm_area((unsigned long)map + page_size * 2, &page3)) {
+ printf("couldn't find mapping in /proc/self/maps\n");
+ goto out;
+ }
+
+ /* All three VMAs should be different */
+ if (page1.start == page2.start || page2.start == page3.start) {
+ printf("failed to split VMA for munlock\n");
+ goto out;
+ }
+
+ /* Now unlock the first and third page and check the VMAs again */
+ if (munlock(map, page_size * 3)) {
+ perror("munlock()");
+ goto out;
+ }
+
+ if (get_vm_area((unsigned long)map, &page1) ||
+ get_vm_area((unsigned long)map + page_size, &page2) ||
+ get_vm_area((unsigned long)map + page_size * 2, &page3)) {
+ printf("couldn't find mapping in /proc/self/maps\n");
+ goto out;
+ }
+
+ /* Now all three VMAs should be the same */
+ if (page1.start != page2.start || page2.start != page3.start) {
+ printf("failed to merge VMAs after munlock\n");
+ goto out;
+ }
+
+ ret = 0;
+out:
+ munmap(map, 3 * page_size);
+ return ret;
+}
+
+static int test_mlockall(int (test_function)(bool call_mlock))
+{
+ int ret = 1;
+
+ if (mlockall(MCL_CURRENT | MCL_ONFAULT | MCL_FUTURE)) {
+ perror("mlockall");
+ return ret;
+ }
+
+ ret = test_function(false);
+ munlockall();
+ return ret;
+}
+
+int main(char **argv, int argc)
+{
+ int ret = 0;
+ ret += test_mlock_lock();
+ ret += test_mlock_onfault();
+ ret += test_munlockall();
+ ret += test_lock_onfault_of_present();
+ ret += test_vma_management(true);
+ ret += test_mlockall(test_vma_management);
+ return ret;
+}
+
diff --git a/tools/testing/selftests/vm/on-fault-limit.c b/tools/testing/selftests/vm/on-fault-limit.c
new file mode 100644
index 0000000..245accc
--- /dev/null
+++ b/tools/testing/selftests/vm/on-fault-limit.c
@@ -0,0 +1,47 @@
+#include <sys/mman.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/time.h>
+#include <sys/resource.h>
+
+#ifndef MCL_ONFAULT
+#define MCL_ONFAULT (MCL_FUTURE << 1)
+#endif
+
+static int test_limit(void)
+{
+ int ret = 1;
+ struct rlimit lims;
+ void *map;
+
+ if (getrlimit(RLIMIT_MEMLOCK, &lims)) {
+ perror("getrlimit");
+ return ret;
+ }
+
+ if (mlockall(MCL_CURRENT | MCL_ONFAULT | MCL_FUTURE)) {
+ perror("mlockall");
+ return ret;
+ }
+
+ map = mmap(NULL, 2 * lims.rlim_max, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, 0, 0);
+ if (map != MAP_FAILED)
+ printf("mmap should have failed, but didn't\n");
+ else {
+ ret = 0;
+ munmap(map, 2 * lims.rlim_max);
+ }
+
+ munlockall();
+ return ret;
+}
+
+int main(int argc, char **argv)
+{
+ int ret = 0;
+
+ ret += test_limit();
+ return ret;
+}
diff --git a/tools/testing/selftests/vm/run_vmtests b/tools/testing/selftests/vm/run_vmtests
index 49ece11..877ca04a 100755
--- a/tools/testing/selftests/vm/run_vmtests
+++ b/tools/testing/selftests/vm/run_vmtests
@@ -102,4 +102,26 @@ else
echo "[PASS]"
fi
+echo "--------------------"
+echo "running on-fault-limit"
+echo "--------------------"
+sudo -u nobody ./on-fault-limit
+if [ $? -ne 0 ]; then
+ echo "[FAIL]"
+ exitcode=1
+else
+ echo "[PASS]"
+fi
+
+echo "--------------------"
+echo "running mlock2-tests"
+echo "--------------------"
+./mlock2-tests
+if [ $? -ne 0 ]; then
+ echo "[FAIL]"
+ exitcode=1
+else
+ echo "[PASS]"
+fi
+
exit $exitcode
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 87+ messages in thread* [PATCH v7 5/6] selftests: vm: Add tests for lock on fault
@ 2015-08-09 5:22 ` Eric B Munson
0 siblings, 0 replies; 87+ messages in thread
From: Eric B Munson @ 2015-08-09 5:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Eric B Munson, Shuah Khan, Michal Hocko, Vlastimil Babka,
Jonathan Corbet, linux-mm, linux-kernel, linux-api
Test the mmap() flag, and the mlockall() flag. These tests ensure that
pages are not faulted in until they are accessed, that the pages are
unevictable once faulted in, and that VMA splitting and merging works
with the new VM flag. The second test ensures that mlock limits are
respected. Note that the limit test needs to be run a normal user.
Also add tests to use the new mlock2 family of system calls.
Signed-off-by: Eric B Munson <emunson@akamai.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-api@vger.kernel.org
---
tools/testing/selftests/vm/Makefile | 2 +
tools/testing/selftests/vm/mlock2-tests.c | 661 ++++++++++++++++++++++++++++
tools/testing/selftests/vm/on-fault-limit.c | 47 ++
tools/testing/selftests/vm/run_vmtests | 22 +
4 files changed, 732 insertions(+)
create mode 100644 tools/testing/selftests/vm/mlock2-tests.c
create mode 100644 tools/testing/selftests/vm/on-fault-limit.c
diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
index 231b9a0..71a4e9f 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -6,6 +6,8 @@ BINARIES += hugepage-mmap
BINARIES += hugepage-shm
BINARIES += hugetlbfstest
BINARIES += map_hugetlb
+BINARIES += mlock2-tests
+BINARIES += on-fault-limit
BINARIES += thuge-gen
BINARIES += transhuge-stress
diff --git a/tools/testing/selftests/vm/mlock2-tests.c b/tools/testing/selftests/vm/mlock2-tests.c
new file mode 100644
index 0000000..c49122b
--- /dev/null
+++ b/tools/testing/selftests/vm/mlock2-tests.c
@@ -0,0 +1,661 @@
+#include <sys/mman.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/time.h>
+#include <sys/resource.h>
+#include <errno.h>
+#include <stdbool.h>
+
+#ifndef MLOCK_ONFAULT
+#define MLOCK_ONFAULT 1
+#endif
+
+#ifndef MCL_ONFAULT
+#define MCL_ONFAULT (MCL_FUTURE << 1)
+#endif
+
+static int mlock2_(void *start, size_t len, int flags)
+{
+#ifdef __NR_mlock2
+ return syscall(__NR_mlock2, start, len, flags);
+#else
+ errno = ENOSYS;
+ return -1;
+#endif
+}
+
+struct vm_boundaries {
+ unsigned long start;
+ unsigned long end;
+};
+
+static int get_vm_area(unsigned long addr, struct vm_boundaries *area)
+{
+ FILE *file;
+ int ret = 1;
+ char line[1024] = {0};
+ char *end_addr;
+ char *stop;
+ unsigned long start;
+ unsigned long end;
+
+ if (!area)
+ return ret;
+
+ file = fopen("/proc/self/maps", "r");
+ if (!file) {
+ perror("fopen");
+ return ret;
+ }
+
+ memset(area, 0, sizeof(struct vm_boundaries));
+
+ while(fgets(line, 1024, file)) {
+ end_addr = strchr(line, '-');
+ if (!end_addr) {
+ printf("cannot parse /proc/self/maps\n");
+ goto out;
+ }
+ *end_addr = '\0';
+ end_addr++;
+ stop = strchr(end_addr, ' ');
+ if (!stop) {
+ printf("cannot parse /proc/self/maps\n");
+ goto out;
+ }
+ stop = '\0';
+
+ sscanf(line, "%lx", &start);
+ sscanf(end_addr, "%lx", &end);
+
+ if (start <= addr && end > addr) {
+ area->start = start;
+ area->end = end;
+ ret = 0;
+ goto out;
+ }
+ }
+out:
+ fclose(file);
+ return ret;
+}
+
+static unsigned long get_pageflags(unsigned long addr)
+{
+ FILE *file;
+ unsigned long pfn;
+ unsigned long offset;
+
+ file = fopen("/proc/self/pagemap", "r");
+ if (!file) {
+ perror("fopen pagemap");
+ _exit(1);
+ }
+
+ offset = addr / getpagesize() * sizeof(unsigned long);
+ if (fseek(file, offset, SEEK_SET)) {
+ perror("fseek pagemap");
+ _exit(1);
+ }
+
+ if (fread(&pfn, sizeof(unsigned long), 1, file) != 1) {
+ perror("fread pagemap");
+ _exit(1);
+ }
+
+ fclose(file);
+ return pfn;
+}
+
+static unsigned long get_kpageflags(unsigned long pfn)
+{
+ unsigned long flags;
+ FILE *file;
+
+ file = fopen("/proc/kpageflags", "r");
+ if (!file) {
+ perror("fopen kpageflags");
+ _exit(1);
+ }
+
+ if (fseek(file, pfn * sizeof(unsigned long), SEEK_SET)) {
+ perror("fseek kpageflags");
+ _exit(1);
+ }
+
+ if (fread(&flags, sizeof(unsigned long), 1, file) != 1) {
+ perror("fread kpageflags");
+ _exit(1);
+ }
+
+ fclose(file);
+ return flags;
+}
+
+#define VMFLAGS "VmFlags:"
+
+static bool find_flag(FILE *file, const char *vmflag)
+{
+ char *line = NULL;
+ char *flags;
+ size_t size = 0;
+ bool ret = false;
+
+ while (getline(&line, &size, file) > 0) {
+ if (!strstr(line, VMFLAGS)) {
+ free(line);
+ line = NULL;
+ size = 0;
+ continue;
+ }
+
+ flags = line + strlen(VMFLAGS);
+ ret = (strstr(flags, vmflag) != NULL);
+ goto out;
+ }
+
+out:
+ free(line);
+ return ret;
+}
+
+static bool is_vmflag_set(unsigned long addr, const char *vmflag)
+{
+ FILE *file;
+ char *line = NULL;
+ size_t size = 0;
+ bool ret = false;
+ unsigned long start, end;
+ char perms[5];
+ unsigned long offset;
+ char dev[32];
+ unsigned long inode;
+ char path[BUFSIZ];
+
+ file = fopen("/proc/self/smaps", "r");
+ if (!file) {
+ perror("fopen smaps");
+ _exit(1);
+ }
+
+ while (getline(&line, &size, file) > 0) {
+ if (sscanf(line, "%lx-%lx %s %lx %s %lu %s\n",
+ &start, &end, perms, &offset, dev, &inode, path) < 6)
+ goto next;
+
+ if (start <= addr && addr < end) {
+ ret = find_flag(file, vmflag);
+ goto out;
+ }
+
+next:
+ free(line);
+ line = NULL;
+ size = 0;
+ }
+
+out:
+ free(line);
+ fclose(file);
+ return ret;
+}
+
+#define PRESENT_BIT 0x8000000000000000
+#define PFN_MASK 0x007FFFFFFFFFFFFF
+#define UNEVICTABLE_BIT (1UL << 18)
+
+#define LOCKED "lo"
+#define LOCKEDONFAULT "lf"
+
+static int lock_check(char *map)
+{
+ unsigned long page1_flags;
+ unsigned long page2_flags;
+ unsigned long page_size = getpagesize();
+
+ page1_flags = get_pageflags((unsigned long)map);
+ page2_flags = get_pageflags((unsigned long)map + page_size);
+
+ /* Both pages should be present */
+ if (((page1_flags & PRESENT_BIT) == 0) ||
+ ((page2_flags & PRESENT_BIT) == 0)) {
+ printf("Failed to make both pages present\n");
+ return 1;
+ }
+
+ page1_flags = get_kpageflags(page1_flags & PFN_MASK);
+ page2_flags = get_kpageflags(page2_flags & PFN_MASK);
+
+ /* Both pages should be unevictable */
+ if (((page1_flags & UNEVICTABLE_BIT) == 0) ||
+ ((page2_flags & UNEVICTABLE_BIT) == 0)) {
+ printf("Failed to make both pages unevictable\n");
+ return 1;
+ }
+
+ if (!is_vmflag_set((unsigned long)map, LOCKED) ||
+ !is_vmflag_set((unsigned long)map + page_size, LOCKED)) {
+ printf("VMA flag %s is missing\n", LOCKED);
+ return 1;
+ }
+
+ return 0;
+}
+
+static int unlock_lock_check(char *map)
+{
+ unsigned long page1_flags;
+ unsigned long page2_flags;
+ unsigned long page_size = getpagesize();
+
+ page1_flags = get_pageflags((unsigned long)map);
+ page2_flags = get_pageflags((unsigned long)map + page_size);
+ page1_flags = get_kpageflags(page1_flags & PFN_MASK);
+ page2_flags = get_kpageflags(page2_flags & PFN_MASK);
+
+ if ((page1_flags & UNEVICTABLE_BIT) || (page2_flags & UNEVICTABLE_BIT)) {
+ printf("A page is still marked unevictable after unlock\n");
+ return 1;
+ }
+
+ if (is_vmflag_set((unsigned long)map, LOCKED) ||
+ is_vmflag_set((unsigned long)map + page_size, LOCKED)) {
+ printf("VMA flag %s is still set after unlock\n", LOCKED);
+ return 1;
+ }
+
+ return 0;
+}
+
+static int test_mlock_lock()
+{
+ char *map;
+ int ret = 1;
+ unsigned long page_size = getpagesize();
+
+ map = mmap(NULL, 2 * page_size, PROT_READ | PROT_WRITE,
+ MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
+ if (map == MAP_FAILED) {
+ perror("test_mlock_locked mmap");
+ goto out;
+ }
+
+ if (mlock2_(map, 2 * page_size, 0)) {
+ if (errno == ENOSYS) {
+ printf("Cannot call new mlock family, skipping test\n");
+ _exit(0);
+ }
+ perror("mlock2(0)");
+ goto unmap;
+ }
+
+ if (lock_check(map))
+ goto unmap;
+
+ /* Now unlock and recheck attributes */
+ if (munlock(map, 2 * page_size)) {
+ perror("munlock()");
+ goto unmap;
+ }
+
+ ret = unlock_lock_check(map);
+
+unmap:
+ munmap(map, 2 * page_size);
+out:
+ return ret;
+}
+
+static int onfault_check(char *map)
+{
+ unsigned long page1_flags;
+ unsigned long page2_flags;
+ unsigned long page_size = getpagesize();
+
+ page1_flags = get_pageflags((unsigned long)map);
+ page2_flags = get_pageflags((unsigned long)map + page_size);
+
+ /* Neither page should be present */
+ if ((page1_flags & PRESENT_BIT) || (page2_flags & PRESENT_BIT)) {
+ printf("Pages were made present by MLOCK_ONFAULT\n");
+ return 1;
+ }
+
+ *map = 'a';
+ page1_flags = get_pageflags((unsigned long)map);
+ page2_flags = get_pageflags((unsigned long)map + page_size);
+
+ /* Only page 1 should be present */
+ if ((page1_flags & PRESENT_BIT) == 0) {
+ printf("Page 1 is not present after fault\n");
+ return 1;
+ } else if (page2_flags & PRESENT_BIT) {
+ printf("Page 2 was made present\n");
+ return 1;
+ }
+
+ page1_flags = get_kpageflags(page1_flags & PFN_MASK);
+
+ /* Page 1 should be unevictable */
+ if ((page1_flags & UNEVICTABLE_BIT) == 0) {
+ printf("Failed to make faulted page unevictable\n");
+ return 1;
+ }
+
+ if (!is_vmflag_set((unsigned long)map, LOCKEDONFAULT) ||
+ !is_vmflag_set((unsigned long)map + page_size, LOCKEDONFAULT)) {
+ printf("VMA flag %s is missing\n", LOCKEDONFAULT);
+ return 1;
+ }
+
+ return 0;
+}
+
+static int unlock_onfault_check(char *map)
+{
+ unsigned long page1_flags;
+ unsigned long page2_flags;
+ unsigned long page_size = getpagesize();
+
+ page1_flags = get_pageflags((unsigned long)map);
+ page1_flags = get_kpageflags(page1_flags & PFN_MASK);
+
+ if (page1_flags & UNEVICTABLE_BIT) {
+ printf("Page 1 is still marked unevictable after unlock\n");
+ return 1;
+ }
+
+ if (is_vmflag_set((unsigned long)map, LOCKEDONFAULT) ||
+ is_vmflag_set((unsigned long)map + page_size, LOCKEDONFAULT)) {
+ printf("VMA flag %s is still set after unlock\n", LOCKEDONFAULT);
+ return 1;
+ }
+
+ return 0;
+}
+
+static int test_mlock_onfault()
+{
+ char *map;
+ int ret = 1;
+ unsigned long page_size = getpagesize();
+
+ map = mmap(NULL, 2 * page_size, PROT_READ | PROT_WRITE,
+ MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
+ if (map == MAP_FAILED) {
+ perror("test_mlock_locked mmap");
+ goto out;
+ }
+
+ if (mlock2_(map, 2 * page_size, MLOCK_ONFAULT)) {
+ if (errno == ENOSYS) {
+ printf("Cannot call new mlock family, skipping test\n");
+ _exit(0);
+ }
+ perror("mlock2(MLOCK_ONFAULT)");
+ goto unmap;
+ }
+
+ if (onfault_check(map))
+ goto unmap;
+
+ /* Now unlock and recheck attributes */
+ if (munlock(map, 2 * page_size)) {
+ if (errno == ENOSYS) {
+ printf("Cannot call new mlock family, skipping test\n");
+ _exit(0);
+ }
+ perror("munlock()");
+ goto unmap;
+ }
+
+ ret = unlock_onfault_check(map);
+unmap:
+ munmap(map, 2 * page_size);
+out:
+ return ret;
+}
+
+static int test_lock_onfault_of_present()
+{
+ char *map;
+ int ret = 1;
+ unsigned long page1_flags;
+ unsigned long page2_flags;
+ unsigned long page_size = getpagesize();
+
+ map = mmap(NULL, 2 * page_size, PROT_READ | PROT_WRITE,
+ MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
+ if (map == MAP_FAILED) {
+ perror("test_mlock_locked mmap");
+ goto out;
+ }
+
+ *map = 'a';
+
+ if (mlock2_(map, 2 * page_size, MLOCK_ONFAULT)) {
+ if (errno == ENOSYS) {
+ printf("Cannot call new mlock family, skipping test\n");
+ _exit(0);
+ }
+ perror("mlock2(MLOCK_ONFAULT)");
+ goto unmap;
+ }
+
+ page1_flags = get_pageflags((unsigned long)map);
+ page2_flags = get_pageflags((unsigned long)map + page_size);
+ page1_flags = get_kpageflags(page1_flags & PFN_MASK);
+ page2_flags = get_kpageflags(page2_flags & PFN_MASK);
+
+ /* Page 1 should be unevictable */
+ if ((page1_flags & UNEVICTABLE_BIT) == 0) {
+ printf("Failed to make present page unevictable\n");
+ goto unmap;
+ }
+
+ if (!is_vmflag_set((unsigned long)map, LOCKEDONFAULT) ||
+ !is_vmflag_set((unsigned long)map + page_size, LOCKEDONFAULT)) {
+ printf("VMA flag %s is missing for one of the pages\n", LOCKEDONFAULT);
+ goto unmap;
+ }
+ ret = 0;
+unmap:
+ munmap(map, 2 * page_size);
+out:
+ return ret;
+}
+
+static int test_munlockall()
+{
+ char *map;
+ int ret = 1;
+ unsigned long page1_flags;
+ unsigned long page2_flags;
+ unsigned long page_size = getpagesize();
+
+ map = mmap(NULL, 2 * page_size, PROT_READ | PROT_WRITE,
+ MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
+
+ if (map == MAP_FAILED) {
+ perror("test_munlockall mmap");
+ goto out;
+ }
+
+ if (mlockall(MCL_CURRENT)) {
+ perror("mlockall(MCL_CURRENT)");
+ goto out;
+ }
+
+ if (lock_check(map))
+ goto unmap;
+
+ if (munlockall()) {
+ perror("munlockall()");
+ goto unmap;
+ }
+
+ if (unlock_lock_check(map))
+ goto unmap;
+
+ munmap(map, 2 * page_size);
+
+ map = mmap(NULL, 2 * page_size, PROT_READ | PROT_WRITE,
+ MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
+
+ if (map == MAP_FAILED) {
+ perror("test_munlockall second mmap");
+ goto out;
+ }
+
+ if (mlockall(MCL_CURRENT | MCL_ONFAULT)) {
+ perror("mlockall(MCL_CURRENT | MCL_ONFAULT)");
+ goto unmap;
+ }
+
+ if (onfault_check(map))
+ goto unmap;
+
+ if (munlockall()) {
+ perror("munlockall()");
+ goto unmap;
+ }
+
+ if (unlock_onfault_check(map))
+ goto unmap;
+
+ if (mlockall(MCL_CURRENT | MCL_FUTURE)) {
+ perror("mlockall(MCL_CURRENT | MCL_FUTURE)");
+ goto out;
+ }
+
+ if (lock_check(map))
+ goto unmap;
+
+ if (munlockall()) {
+ perror("munlockall()");
+ goto unmap;
+ }
+
+ ret = unlock_lock_check(map);
+
+unmap:
+ munmap(map, 2 * page_size);
+out:
+ munlockall();
+ return ret;
+}
+
+static int test_vma_management(bool call_mlock)
+{
+ int ret = 1;
+ void *map;
+ unsigned long page_size = getpagesize();
+ struct vm_boundaries page1;
+ struct vm_boundaries page2;
+ struct vm_boundaries page3;
+
+ map = mmap(NULL, 3 * page_size, PROT_READ | PROT_WRITE,
+ MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
+ if (map == MAP_FAILED) {
+ perror("mmap()");
+ return ret;
+ }
+
+ if (call_mlock && mlock2_(map, 3 * page_size, MLOCK_ONFAULT)) {
+ if (errno == ENOSYS) {
+ printf("Cannot call new mlock family, skipping test\n");
+ _exit(0);
+ }
+ perror("mlock(ONFAULT)\n");
+ goto out;
+ }
+
+ if (get_vm_area((unsigned long)map, &page1) ||
+ get_vm_area((unsigned long)map + page_size, &page2) ||
+ get_vm_area((unsigned long)map + page_size * 2, &page3)) {
+ printf("couldn't find mapping in /proc/self/maps\n");
+ goto out;
+ }
+
+ /*
+ * Before we unlock a portion, we need to that all three pages are in
+ * the same VMA. If they are not we abort this test (Note that this is
+ * not a failure)
+ */
+ if (page1.start != page2.start || page2.start != page3.start) {
+ printf("VMAs are not merged to start, aborting test\n");
+ ret = 0;
+ goto out;
+ }
+
+ if (munlock(map + page_size, page_size)) {
+ perror("munlock()");
+ goto out;
+ }
+
+ if (get_vm_area((unsigned long)map, &page1) ||
+ get_vm_area((unsigned long)map + page_size, &page2) ||
+ get_vm_area((unsigned long)map + page_size * 2, &page3)) {
+ printf("couldn't find mapping in /proc/self/maps\n");
+ goto out;
+ }
+
+ /* All three VMAs should be different */
+ if (page1.start == page2.start || page2.start == page3.start) {
+ printf("failed to split VMA for munlock\n");
+ goto out;
+ }
+
+ /* Now unlock the first and third page and check the VMAs again */
+ if (munlock(map, page_size * 3)) {
+ perror("munlock()");
+ goto out;
+ }
+
+ if (get_vm_area((unsigned long)map, &page1) ||
+ get_vm_area((unsigned long)map + page_size, &page2) ||
+ get_vm_area((unsigned long)map + page_size * 2, &page3)) {
+ printf("couldn't find mapping in /proc/self/maps\n");
+ goto out;
+ }
+
+ /* Now all three VMAs should be the same */
+ if (page1.start != page2.start || page2.start != page3.start) {
+ printf("failed to merge VMAs after munlock\n");
+ goto out;
+ }
+
+ ret = 0;
+out:
+ munmap(map, 3 * page_size);
+ return ret;
+}
+
+static int test_mlockall(int (test_function)(bool call_mlock))
+{
+ int ret = 1;
+
+ if (mlockall(MCL_CURRENT | MCL_ONFAULT | MCL_FUTURE)) {
+ perror("mlockall");
+ return ret;
+ }
+
+ ret = test_function(false);
+ munlockall();
+ return ret;
+}
+
+int main(char **argv, int argc)
+{
+ int ret = 0;
+ ret += test_mlock_lock();
+ ret += test_mlock_onfault();
+ ret += test_munlockall();
+ ret += test_lock_onfault_of_present();
+ ret += test_vma_management(true);
+ ret += test_mlockall(test_vma_management);
+ return ret;
+}
+
diff --git a/tools/testing/selftests/vm/on-fault-limit.c b/tools/testing/selftests/vm/on-fault-limit.c
new file mode 100644
index 0000000..245accc
--- /dev/null
+++ b/tools/testing/selftests/vm/on-fault-limit.c
@@ -0,0 +1,47 @@
+#include <sys/mman.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/time.h>
+#include <sys/resource.h>
+
+#ifndef MCL_ONFAULT
+#define MCL_ONFAULT (MCL_FUTURE << 1)
+#endif
+
+static int test_limit(void)
+{
+ int ret = 1;
+ struct rlimit lims;
+ void *map;
+
+ if (getrlimit(RLIMIT_MEMLOCK, &lims)) {
+ perror("getrlimit");
+ return ret;
+ }
+
+ if (mlockall(MCL_CURRENT | MCL_ONFAULT | MCL_FUTURE)) {
+ perror("mlockall");
+ return ret;
+ }
+
+ map = mmap(NULL, 2 * lims.rlim_max, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, 0, 0);
+ if (map != MAP_FAILED)
+ printf("mmap should have failed, but didn't\n");
+ else {
+ ret = 0;
+ munmap(map, 2 * lims.rlim_max);
+ }
+
+ munlockall();
+ return ret;
+}
+
+int main(int argc, char **argv)
+{
+ int ret = 0;
+
+ ret += test_limit();
+ return ret;
+}
diff --git a/tools/testing/selftests/vm/run_vmtests b/tools/testing/selftests/vm/run_vmtests
index 49ece11..877ca04a 100755
--- a/tools/testing/selftests/vm/run_vmtests
+++ b/tools/testing/selftests/vm/run_vmtests
@@ -102,4 +102,26 @@ else
echo "[PASS]"
fi
+echo "--------------------"
+echo "running on-fault-limit"
+echo "--------------------"
+sudo -u nobody ./on-fault-limit
+if [ $? -ne 0 ]; then
+ echo "[FAIL]"
+ exitcode=1
+else
+ echo "[PASS]"
+fi
+
+echo "--------------------"
+echo "running mlock2-tests"
+echo "--------------------"
+./mlock2-tests
+if [ $? -ne 0 ]; then
+ echo "[FAIL]"
+ exitcode=1
+else
+ echo "[PASS]"
+fi
+
exit $exitcode
--
1.9.1
^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v7 6/6] mips: Add entry for new mlock2 syscall
2015-08-09 5:22 ` Eric B Munson
@ 2015-08-09 5:22 ` Eric B Munson
-1 siblings, 0 replies; 87+ messages in thread
From: Eric B Munson @ 2015-08-09 5:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Eric B Munson, Ralf Baechle, linux-mips, linux-api, linux-arch,
linux-mm, linux-kernel
A previous commit introduced the new mlock2 syscall, add entries for the
MIPS architecture.
Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
arch/mips/include/uapi/asm/unistd.h | 15 +++++++++------
arch/mips/kernel/scall32-o32.S | 1 +
arch/mips/kernel/scall64-64.S | 1 +
arch/mips/kernel/scall64-n32.S | 1 +
arch/mips/kernel/scall64-o32.S | 1 +
5 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/arch/mips/include/uapi/asm/unistd.h b/arch/mips/include/uapi/asm/unistd.h
index c03088f..d0bdfaa 100644
--- a/arch/mips/include/uapi/asm/unistd.h
+++ b/arch/mips/include/uapi/asm/unistd.h
@@ -377,16 +377,17 @@
#define __NR_memfd_create (__NR_Linux + 354)
#define __NR_bpf (__NR_Linux + 355)
#define __NR_execveat (__NR_Linux + 356)
+#define __NR_mlock2 (__NR_Linux + 357)
/*
* Offset of the last Linux o32 flavoured syscall
*/
-#define __NR_Linux_syscalls 356
+#define __NR_Linux_syscalls 357
#endif /* _MIPS_SIM == _MIPS_SIM_ABI32 */
#define __NR_O32_Linux 4000
-#define __NR_O32_Linux_syscalls 356
+#define __NR_O32_Linux_syscalls 357
#if _MIPS_SIM == _MIPS_SIM_ABI64
@@ -711,16 +712,17 @@
#define __NR_memfd_create (__NR_Linux + 314)
#define __NR_bpf (__NR_Linux + 315)
#define __NR_execveat (__NR_Linux + 316)
+#define __NR_mlock2 (__NR_Linux + 317)
/*
* Offset of the last Linux 64-bit flavoured syscall
*/
-#define __NR_Linux_syscalls 316
+#define __NR_Linux_syscalls 317
#endif /* _MIPS_SIM == _MIPS_SIM_ABI64 */
#define __NR_64_Linux 5000
-#define __NR_64_Linux_syscalls 316
+#define __NR_64_Linux_syscalls 317
#if _MIPS_SIM == _MIPS_SIM_NABI32
@@ -1049,15 +1051,16 @@
#define __NR_memfd_create (__NR_Linux + 318)
#define __NR_bpf (__NR_Linux + 319)
#define __NR_execveat (__NR_Linux + 320)
+#define __NR_mlock2 (__NR_Linux + 321)
/*
* Offset of the last N32 flavoured syscall
*/
-#define __NR_Linux_syscalls 320
+#define __NR_Linux_syscalls 321
#endif /* _MIPS_SIM == _MIPS_SIM_NABI32 */
#define __NR_N32_Linux 6000
-#define __NR_N32_Linux_syscalls 320
+#define __NR_N32_Linux_syscalls 321
#endif /* _UAPI_ASM_UNISTD_H */
diff --git a/arch/mips/kernel/scall32-o32.S b/arch/mips/kernel/scall32-o32.S
index 4cc1350..b0b377a 100644
--- a/arch/mips/kernel/scall32-o32.S
+++ b/arch/mips/kernel/scall32-o32.S
@@ -599,3 +599,4 @@ EXPORT(sys_call_table)
PTR sys_memfd_create
PTR sys_bpf /* 4355 */
PTR sys_execveat
+ PTR sys_mlock2
diff --git a/arch/mips/kernel/scall64-64.S b/arch/mips/kernel/scall64-64.S
index ad4d4463..97aaf51 100644
--- a/arch/mips/kernel/scall64-64.S
+++ b/arch/mips/kernel/scall64-64.S
@@ -436,4 +436,5 @@ EXPORT(sys_call_table)
PTR sys_memfd_create
PTR sys_bpf /* 5315 */
PTR sys_execveat
+ PTR sys_mlock2
.size sys_call_table,.-sys_call_table
diff --git a/arch/mips/kernel/scall64-n32.S b/arch/mips/kernel/scall64-n32.S
index 446cc65..e36f21e 100644
--- a/arch/mips/kernel/scall64-n32.S
+++ b/arch/mips/kernel/scall64-n32.S
@@ -429,4 +429,5 @@ EXPORT(sysn32_call_table)
PTR sys_memfd_create
PTR sys_bpf
PTR compat_sys_execveat /* 6320 */
+ PTR sys_mlock2
.size sysn32_call_table,.-sysn32_call_table
diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S
index f543ff4..7a8b2df 100644
--- a/arch/mips/kernel/scall64-o32.S
+++ b/arch/mips/kernel/scall64-o32.S
@@ -584,4 +584,5 @@ EXPORT(sys32_call_table)
PTR sys_memfd_create
PTR sys_bpf /* 4355 */
PTR compat_sys_execveat
+ PTR sys_mlock2
.size sys32_call_table,.-sys32_call_table
--
1.9.1
^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v7 6/6] mips: Add entry for new mlock2 syscall
@ 2015-08-09 5:22 ` Eric B Munson
0 siblings, 0 replies; 87+ messages in thread
From: Eric B Munson @ 2015-08-09 5:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Eric B Munson, Ralf Baechle, linux-mips, linux-api, linux-arch,
linux-mm, linux-kernel
A previous commit introduced the new mlock2 syscall, add entries for the
MIPS architecture.
Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
arch/mips/include/uapi/asm/unistd.h | 15 +++++++++------
arch/mips/kernel/scall32-o32.S | 1 +
arch/mips/kernel/scall64-64.S | 1 +
arch/mips/kernel/scall64-n32.S | 1 +
arch/mips/kernel/scall64-o32.S | 1 +
5 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/arch/mips/include/uapi/asm/unistd.h b/arch/mips/include/uapi/asm/unistd.h
index c03088f..d0bdfaa 100644
--- a/arch/mips/include/uapi/asm/unistd.h
+++ b/arch/mips/include/uapi/asm/unistd.h
@@ -377,16 +377,17 @@
#define __NR_memfd_create (__NR_Linux + 354)
#define __NR_bpf (__NR_Linux + 355)
#define __NR_execveat (__NR_Linux + 356)
+#define __NR_mlock2 (__NR_Linux + 357)
/*
* Offset of the last Linux o32 flavoured syscall
*/
-#define __NR_Linux_syscalls 356
+#define __NR_Linux_syscalls 357
#endif /* _MIPS_SIM == _MIPS_SIM_ABI32 */
#define __NR_O32_Linux 4000
-#define __NR_O32_Linux_syscalls 356
+#define __NR_O32_Linux_syscalls 357
#if _MIPS_SIM == _MIPS_SIM_ABI64
@@ -711,16 +712,17 @@
#define __NR_memfd_create (__NR_Linux + 314)
#define __NR_bpf (__NR_Linux + 315)
#define __NR_execveat (__NR_Linux + 316)
+#define __NR_mlock2 (__NR_Linux + 317)
/*
* Offset of the last Linux 64-bit flavoured syscall
*/
-#define __NR_Linux_syscalls 316
+#define __NR_Linux_syscalls 317
#endif /* _MIPS_SIM == _MIPS_SIM_ABI64 */
#define __NR_64_Linux 5000
-#define __NR_64_Linux_syscalls 316
+#define __NR_64_Linux_syscalls 317
#if _MIPS_SIM == _MIPS_SIM_NABI32
@@ -1049,15 +1051,16 @@
#define __NR_memfd_create (__NR_Linux + 318)
#define __NR_bpf (__NR_Linux + 319)
#define __NR_execveat (__NR_Linux + 320)
+#define __NR_mlock2 (__NR_Linux + 321)
/*
* Offset of the last N32 flavoured syscall
*/
-#define __NR_Linux_syscalls 320
+#define __NR_Linux_syscalls 321
#endif /* _MIPS_SIM == _MIPS_SIM_NABI32 */
#define __NR_N32_Linux 6000
-#define __NR_N32_Linux_syscalls 320
+#define __NR_N32_Linux_syscalls 321
#endif /* _UAPI_ASM_UNISTD_H */
diff --git a/arch/mips/kernel/scall32-o32.S b/arch/mips/kernel/scall32-o32.S
index 4cc1350..b0b377a 100644
--- a/arch/mips/kernel/scall32-o32.S
+++ b/arch/mips/kernel/scall32-o32.S
@@ -599,3 +599,4 @@ EXPORT(sys_call_table)
PTR sys_memfd_create
PTR sys_bpf /* 4355 */
PTR sys_execveat
+ PTR sys_mlock2
diff --git a/arch/mips/kernel/scall64-64.S b/arch/mips/kernel/scall64-64.S
index ad4d4463..97aaf51 100644
--- a/arch/mips/kernel/scall64-64.S
+++ b/arch/mips/kernel/scall64-64.S
@@ -436,4 +436,5 @@ EXPORT(sys_call_table)
PTR sys_memfd_create
PTR sys_bpf /* 5315 */
PTR sys_execveat
+ PTR sys_mlock2
.size sys_call_table,.-sys_call_table
diff --git a/arch/mips/kernel/scall64-n32.S b/arch/mips/kernel/scall64-n32.S
index 446cc65..e36f21e 100644
--- a/arch/mips/kernel/scall64-n32.S
+++ b/arch/mips/kernel/scall64-n32.S
@@ -429,4 +429,5 @@ EXPORT(sysn32_call_table)
PTR sys_memfd_create
PTR sys_bpf
PTR compat_sys_execveat /* 6320 */
+ PTR sys_mlock2
.size sysn32_call_table,.-sysn32_call_table
diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S
index f543ff4..7a8b2df 100644
--- a/arch/mips/kernel/scall64-o32.S
+++ b/arch/mips/kernel/scall64-o32.S
@@ -584,4 +584,5 @@ EXPORT(sys32_call_table)
PTR sys_memfd_create
PTR sys_bpf /* 4355 */
PTR compat_sys_execveat
+ PTR sys_mlock2
.size sys32_call_table,.-sys32_call_table
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 87+ messages in thread