* [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault @ 2015-06-10 13:26 Eric B Munson 2015-06-10 13:26 ` [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after " Eric B Munson ` (2 more replies) 0 siblings, 3 replies; 24+ messages in thread From: Eric B Munson @ 2015-06-10 13:26 UTC (permalink / raw) To: Andrew Morton Cc: Eric B Munson, Shuah Khan, Michal Hocko, Michael Kerrisk, linux-alpha, linux-kernel, linux-mips, linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm, linux-arch, linux-api mlock() allows a user to control page out of program memory, but this comes at the cost of faulting in the entire mapping when it is allocated. For large mappings where the entire area is not necessary this is not ideal. This series introduces new flags for mmap() and mlockall() that allow a user to specify that the covered are should not be paged out, but only after the memory has been used the first time. There are two main use cases that this set covers. The first is the security focussed mlock case. A buffer is needed that cannot be written to swap. The maximum size is known, but on average the memory used is significantly less than this maximum. With lock on fault, the buffer is guaranteed to never be paged out without consuming the maximum size every time such a buffer is created. The second use case is focussed on performance. Portions of a large file are needed and we want to keep the used portions in memory once accessed. This is the case for large graphical models where the path through the graph is not known until run time. The entire graph is unlikely to be used in a given invocation, but once a node has been used it needs to stay resident for further processing. Given these constraints we have a number of options. We can potentially waste a large amount of memory by mlocking the entire region (this can also cause a significant stall at startup as the entire file is read in). We can mlock every page as we access them without tracking if the page is already resident but this introduces large overhead for each access. The third option is mapping the entire region with PROT_NONE and using a signal handler for SIGSEGV to mprotect(PROT_READ) and mlock() the needed page. Doing this page at a time adds a significant performance penalty. Batching can be used to mitigate this overhead, but in order to safely avoid trying to mprotect pages outside of the mapping, the boundaries of each mapping to be used in this way must be tracked and available to the signal handler. This is precisely what the mm system in the kernel should already be doing. For mmap(MAP_LOCKONFAULT) the user is charged against RLIMIT_MEMLOCK as if MAP_LOCKED was used, so when the VMA is created not when the pages are faulted in. For mlockall(MCL_ON_FAULT) the user is charged as if MCL_FUTURE was used. This decision was made to keep the accounting checks out of the page fault path. To illustrate the benefit of this patch I wrote a test program that mmaps a 5 GB file filled with random data and then makes 15,000,000 accesses to random addresses in that mapping. The test program was run 20 times for each setup. Results are reported for two program portions, setup and execution. The setup phase is calling mmap and optionally mlock on the entire region. For most experiments this is trivial, but it highlights the cost of faulting in the entire region. Results are averages across the 20 runs in milliseconds. mmap with MAP_LOCKED: Setup avg: 11821.193 Processing avg: 3404.286 mmap with mlock() before each access: Setup avg: 0.054 Processing avg: 34263.201 mmap with PROT_NONE and signal handler and batch size of 1 page: With the default value in max_map_count, this gets ENOMEM as I attempt to change the permissions, after upping the sysctl significantly I get: Setup avg: 0.050 Processing avg: 67690.625 mmap with PROT_NONE and signal handler and batch size of 8 pages: Setup avg: 0.098 Processing avg: 37344.197 mmap with PROT_NONE and signal handler and batch size of 16 pages: Setup avg: 0.0548 Processing avg: 29295.669 mmap with MAP_LOCKONFAULT: Setup avg: 0.073 Processing avg: 18392.136 The signal handler in the batch cases faulted in memory in two steps to avoid having to know the start and end of the faulting mapping. The first step covers the page that caused the fault as we know that it will be possible to lock. The second step speculatively tries to mlock and mprotect the batch size - 1 pages that follow. There may be a clever way to avoid this without having the program track each mapping to be covered by this handeler in a globally accessible structure, but I could not find it. It should be noted that with a large enough batch size this two step fault handler can still cause the program to crash if it reaches far beyond the end of the mapping. These results show that if the developer knows that a majority of the mapping will be used, it is better to try and fault it in at once, otherwise MAP_LOCKONFAULT is significantly faster. The performance cost of these patches are minimal on the two benchmarks I have tested (stream and kernbench). The following are the average values across 20 runs of each benchmark after a warmup run whose results were discarded. Avg throughput in MB/s from stream using 1000000 element arrays Test 4.1-rc2 4.1-rc2+lock-on-fault Copy: 10,979.08 10,917.34 Scale: 11,094.45 11,023.01 Add: 12,487.29 12,388.65 Triad: 12,505.77 12,418.78 Kernbench optimal load 4.1-rc2 4.1-rc2+lock-on-fault Elapsed Time 71.046 71.324 User Time 62.117 62.352 System Time 8.926 8.969 Context Switches 14531.9 14542.5 Sleeps 14935.9 14939 Eric B Munson (3): Add mmap flag to request pages are locked after page fault Add mlockall flag for locking pages on fault Add tests for lock on fault arch/alpha/include/uapi/asm/mman.h | 2 + arch/mips/include/uapi/asm/mman.h | 2 + arch/parisc/include/uapi/asm/mman.h | 2 + arch/powerpc/include/uapi/asm/mman.h | 2 + arch/sparc/include/uapi/asm/mman.h | 2 + arch/tile/include/uapi/asm/mman.h | 2 + arch/xtensa/include/uapi/asm/mman.h | 2 + include/linux/mm.h | 1 + include/linux/mman.h | 3 +- include/uapi/asm-generic/mman.h | 2 + mm/mlock.c | 13 ++- mm/mmap.c | 4 +- mm/swap.c | 3 +- tools/testing/selftests/vm/Makefile | 8 +- tools/testing/selftests/vm/lock-on-fault.c | 145 ++++++++++++++++++++++++++++ tools/testing/selftests/vm/on-fault-limit.c | 47 +++++++++ tools/testing/selftests/vm/run_vmtests | 23 +++++ 17 files changed, 254 insertions(+), 9 deletions(-) create mode 100644 tools/testing/selftests/vm/lock-on-fault.c create mode 100644 tools/testing/selftests/vm/on-fault-limit.c Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: linux-alpha@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: sparclinux@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org Cc: linux-mm@kvack.org Cc: linux-arch@vger.kernel.org Cc: linux-api@vger.kernel.org -- 1.9.1 ^ permalink raw reply [flat|nested] 24+ messages in thread
* [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault 2015-06-10 13:26 [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault Eric B Munson @ 2015-06-10 13:26 ` Eric B Munson 2015-06-18 15:29 ` Michal Hocko 2015-06-10 13:26 ` [RESEND PATCH V2 2/3] Add mlockall flag for locking pages on fault Eric B Munson [not found] ` <1433942810-7852-1-git-send-email-emunson-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org> 2 siblings, 1 reply; 24+ messages in thread From: Eric B Munson @ 2015-06-10 13:26 UTC (permalink / raw) To: Andrew Morton Cc: Eric B Munson, Michal Hocko, linux-alpha, linux-kernel, linux-mips, linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm, linux-arch, linux-api The cost of faulting in all memory to be locked can be very high when working with large mappings. If only portions of the mapping will be used this can incur a high penalty for locking. For the example of a large file, this is the usage pattern for a large statical language model (probably applies to other statical or graphical models as well). For the security example, any application transacting in data that cannot be swapped out (credit card data, medical records, etc). This patch introduces the ability to request that pages are not pre-faulted, but are placed on the unevictable LRU when they are finally faulted in. To keep accounting checks out of the page fault path, users are billed for the entire mapping lock as if MAP_LOCKED was used. Signed-off-by: Eric B Munson <emunson@akamai.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: linux-alpha@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: sparclinux@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org Cc: linux-mm@kvack.org Cc: linux-arch@vger.kernel.org Cc: linux-api@vger.kernel.org --- arch/alpha/include/uapi/asm/mman.h | 1 + arch/mips/include/uapi/asm/mman.h | 1 + arch/parisc/include/uapi/asm/mman.h | 1 + arch/powerpc/include/uapi/asm/mman.h | 1 + arch/sparc/include/uapi/asm/mman.h | 1 + arch/tile/include/uapi/asm/mman.h | 1 + arch/xtensa/include/uapi/asm/mman.h | 1 + include/linux/mm.h | 1 + include/linux/mman.h | 3 ++- include/uapi/asm-generic/mman.h | 1 + mm/mmap.c | 4 ++-- mm/swap.c | 3 ++- 12 files changed, 15 insertions(+), 4 deletions(-) diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h index 0086b47..15e96e1 100644 --- a/arch/alpha/include/uapi/asm/mman.h +++ b/arch/alpha/include/uapi/asm/mman.h @@ -30,6 +30,7 @@ #define MAP_NONBLOCK 0x40000 /* do not block on IO */ #define MAP_STACK 0x80000 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB 0x100000 /* create a huge page mapping */ +#define MAP_LOCKONFAULT 0x200000 /* Lock pages after they are faulted in, do not prefault */ #define MS_ASYNC 1 /* sync memory asynchronously */ #define MS_SYNC 2 /* synchronous memory sync */ diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h index cfcb876..47846a5 100644 --- a/arch/mips/include/uapi/asm/mman.h +++ b/arch/mips/include/uapi/asm/mman.h @@ -48,6 +48,7 @@ #define MAP_NONBLOCK 0x20000 /* do not block on IO */ #define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB 0x80000 /* create a huge page mapping */ +#define MAP_LOCKONFAULT 0x100000 /* Lock pages after they are faulted in, do not prefault */ /* * Flags for msync diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h index 294d251..1514cd7 100644 --- a/arch/parisc/include/uapi/asm/mman.h +++ b/arch/parisc/include/uapi/asm/mman.h @@ -24,6 +24,7 @@ #define MAP_NONBLOCK 0x20000 /* do not block on IO */ #define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB 0x80000 /* create a huge page mapping */ +#define MAP_LOCKONFAULT 0x100000 /* Lock pages after they are faulted in, do not prefault */ #define MS_SYNC 1 /* synchronous memory sync */ #define MS_ASYNC 2 /* sync memory asynchronously */ diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h index 6ea26df..fce74fe 100644 --- a/arch/powerpc/include/uapi/asm/mman.h +++ b/arch/powerpc/include/uapi/asm/mman.h @@ -27,5 +27,6 @@ #define MAP_NONBLOCK 0x10000 /* do not block on IO */ #define MAP_STACK 0x20000 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB 0x40000 /* create a huge page mapping */ +#define MAP_LOCKONFAULT 0x80000 /* Lock pages after they are faulted in, do not prefault */ #endif /* _UAPI_ASM_POWERPC_MMAN_H */ diff --git a/arch/sparc/include/uapi/asm/mman.h b/arch/sparc/include/uapi/asm/mman.h index 0b14df3..12425d8 100644 --- a/arch/sparc/include/uapi/asm/mman.h +++ b/arch/sparc/include/uapi/asm/mman.h @@ -22,6 +22,7 @@ #define MAP_NONBLOCK 0x10000 /* do not block on IO */ #define MAP_STACK 0x20000 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB 0x40000 /* create a huge page mapping */ +#define MAP_LOCKONFAULT 0x80000 /* Lock pages after they are faulted in, do not prefault */ #endif /* _UAPI__SPARC_MMAN_H__ */ diff --git a/arch/tile/include/uapi/asm/mman.h b/arch/tile/include/uapi/asm/mman.h index 81b8fc3..ec04eaf 100644 --- a/arch/tile/include/uapi/asm/mman.h +++ b/arch/tile/include/uapi/asm/mman.h @@ -29,6 +29,7 @@ #define MAP_DENYWRITE 0x0800 /* ETXTBSY */ #define MAP_EXECUTABLE 0x1000 /* mark it as an executable */ #define MAP_HUGETLB 0x4000 /* create a huge page mapping */ +#define MAP_LOCKONFAULT 0x8000 /* Lock pages after they are faulted in, do not prefault */ /* diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h index 201aec0..42d43cc 100644 --- a/arch/xtensa/include/uapi/asm/mman.h +++ b/arch/xtensa/include/uapi/asm/mman.h @@ -55,6 +55,7 @@ #define MAP_NONBLOCK 0x20000 /* do not block on IO */ #define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB 0x80000 /* create a huge page mapping */ +#define MAP_LOCKONFAULT 0x100000 /* Lock pages after they are faulted in, do not prefault */ #ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED # define MAP_UNINITIALIZED 0x4000000 /* For anonymous mmap, memory could be * uninitialized */ diff --git a/include/linux/mm.h b/include/linux/mm.h index 0755b9f..3e31457 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -126,6 +126,7 @@ extern unsigned int kobjsize(const void *objp); #define VM_PFNMAP 0x00000400 /* Page-ranges managed without "struct page", just pure PFN */ #define VM_DENYWRITE 0x00000800 /* ETXTBSY on write attempts.. */ +#define VM_LOCKONFAULT 0x00001000 /* Lock the pages covered when they are faulted in */ #define VM_LOCKED 0x00002000 #define VM_IO 0x00004000 /* Memory mapped I/O or similar */ diff --git a/include/linux/mman.h b/include/linux/mman.h index 16373c8..437264b 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -86,7 +86,8 @@ calc_vm_flag_bits(unsigned long flags) { return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) | _calc_vm_trans(flags, MAP_DENYWRITE, VM_DENYWRITE ) | - _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ); + _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | + _calc_vm_trans(flags, MAP_LOCKONFAULT,VM_LOCKONFAULT); } unsigned long vm_commit_limit(void); diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h index e9fe6fd..fc4e586 100644 --- a/include/uapi/asm-generic/mman.h +++ b/include/uapi/asm-generic/mman.h @@ -12,6 +12,7 @@ #define MAP_NONBLOCK 0x10000 /* do not block on IO */ #define MAP_STACK 0x20000 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB 0x40000 /* create a huge page mapping */ +#define MAP_LOCKONFAULT 0x80000 /* Lock pages after they are faulted in, do not prefault */ /* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */ diff --git a/mm/mmap.c b/mm/mmap.c index bb50cac..ba1a6bf 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1233,7 +1233,7 @@ static inline int mlock_future_check(struct mm_struct *mm, unsigned long locked, lock_limit; /* mlock MCL_FUTURE? */ - if (flags & VM_LOCKED) { + if (flags & (VM_LOCKED | VM_LOCKONFAULT)) { locked = len >> PAGE_SHIFT; locked += mm->locked_vm; lock_limit = rlimit(RLIMIT_MEMLOCK); @@ -1301,7 +1301,7 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr, vm_flags = calc_vm_prot_bits(prot) | calc_vm_flag_bits(flags) | mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; - if (flags & MAP_LOCKED) + if (flags & (MAP_LOCKED | MAP_LOCKONFAULT)) if (!can_do_mlock()) return -EPERM; diff --git a/mm/swap.c b/mm/swap.c index a7251a8..07c905e 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -711,7 +711,8 @@ void lru_cache_add_active_or_unevictable(struct page *page, { VM_BUG_ON_PAGE(PageLRU(page), page); - if (likely((vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) != VM_LOCKED)) { + if (likely((vma->vm_flags & (VM_LOCKED | VM_LOCKONFAULT)) == 0) || + (vma->vm_flags & VM_SPECIAL)) { SetPageActive(page); lru_cache_add(page); return; -- 1.9.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault 2015-06-10 13:26 ` [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after " Eric B Munson @ 2015-06-18 15:29 ` Michal Hocko 2015-06-18 20:30 ` Eric B Munson 0 siblings, 1 reply; 24+ messages in thread From: Michal Hocko @ 2015-06-18 15:29 UTC (permalink / raw) To: Eric B Munson Cc: Andrew Morton, linux-alpha, linux-kernel, linux-mips, linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm, linux-arch, linux-api [Sorry for the late reply - I meant to answer in the previous threads but something always preempted me from that] On Wed 10-06-15 09:26:48, Eric B Munson wrote: > The cost of faulting in all memory to be locked can be very high when > working with large mappings. If only portions of the mapping will be > used this can incur a high penalty for locking. > > For the example of a large file, this is the usage pattern for a large > statical language model (probably applies to other statical or graphical > models as well). For the security example, any application transacting > in data that cannot be swapped out (credit card data, medical records, > etc). Such a use case makes some sense to me but I am not sure the way you implement it is the right one. This is another mlock related flag for mmap with a different semantic. You do not want to prefault but e.g. is the readahead or fault around acceptable? I do not see anything in your patch to handle those... Wouldn't it be much more reasonable and straightforward to have MAP_FAULTPOPULATE as a counterpart for MAP_POPULATE which would explicitly disallow any form of pre-faulting? It would be usable for other usecases than with MAP_LOCKED combination. > This patch introduces the ability to request that pages are not > pre-faulted, but are placed on the unevictable LRU when they are finally > faulted in. > > To keep accounting checks out of the page fault path, users are billed > for the entire mapping lock as if MAP_LOCKED was used. > > Signed-off-by: Eric B Munson <emunson@akamai.com> > Cc: Michal Hocko <mhocko@suse.cz> > Cc: linux-alpha@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Cc: linux-mips@linux-mips.org > Cc: linux-parisc@vger.kernel.org > Cc: linuxppc-dev@lists.ozlabs.org > Cc: sparclinux@vger.kernel.org > Cc: linux-xtensa@linux-xtensa.org > Cc: linux-mm@kvack.org > Cc: linux-arch@vger.kernel.org > Cc: linux-api@vger.kernel.org > --- > arch/alpha/include/uapi/asm/mman.h | 1 + > arch/mips/include/uapi/asm/mman.h | 1 + > arch/parisc/include/uapi/asm/mman.h | 1 + > arch/powerpc/include/uapi/asm/mman.h | 1 + > arch/sparc/include/uapi/asm/mman.h | 1 + > arch/tile/include/uapi/asm/mman.h | 1 + > arch/xtensa/include/uapi/asm/mman.h | 1 + > include/linux/mm.h | 1 + > include/linux/mman.h | 3 ++- > include/uapi/asm-generic/mman.h | 1 + > mm/mmap.c | 4 ++-- > mm/swap.c | 3 ++- > 12 files changed, 15 insertions(+), 4 deletions(-) > > diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h > index 0086b47..15e96e1 100644 > --- a/arch/alpha/include/uapi/asm/mman.h > +++ b/arch/alpha/include/uapi/asm/mman.h > @@ -30,6 +30,7 @@ > #define MAP_NONBLOCK 0x40000 /* do not block on IO */ > #define MAP_STACK 0x80000 /* give out an address that is best suited for process/thread stacks */ > #define MAP_HUGETLB 0x100000 /* create a huge page mapping */ > +#define MAP_LOCKONFAULT 0x200000 /* Lock pages after they are faulted in, do not prefault */ > > #define MS_ASYNC 1 /* sync memory asynchronously */ > #define MS_SYNC 2 /* synchronous memory sync */ > diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h > index cfcb876..47846a5 100644 > --- a/arch/mips/include/uapi/asm/mman.h > +++ b/arch/mips/include/uapi/asm/mman.h > @@ -48,6 +48,7 @@ > #define MAP_NONBLOCK 0x20000 /* do not block on IO */ > #define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */ > #define MAP_HUGETLB 0x80000 /* create a huge page mapping */ > +#define MAP_LOCKONFAULT 0x100000 /* Lock pages after they are faulted in, do not prefault */ > > /* > * Flags for msync > diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h > index 294d251..1514cd7 100644 > --- a/arch/parisc/include/uapi/asm/mman.h > +++ b/arch/parisc/include/uapi/asm/mman.h > @@ -24,6 +24,7 @@ > #define MAP_NONBLOCK 0x20000 /* do not block on IO */ > #define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */ > #define MAP_HUGETLB 0x80000 /* create a huge page mapping */ > +#define MAP_LOCKONFAULT 0x100000 /* Lock pages after they are faulted in, do not prefault */ > > #define MS_SYNC 1 /* synchronous memory sync */ > #define MS_ASYNC 2 /* sync memory asynchronously */ > diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h > index 6ea26df..fce74fe 100644 > --- a/arch/powerpc/include/uapi/asm/mman.h > +++ b/arch/powerpc/include/uapi/asm/mman.h > @@ -27,5 +27,6 @@ > #define MAP_NONBLOCK 0x10000 /* do not block on IO */ > #define MAP_STACK 0x20000 /* give out an address that is best suited for process/thread stacks */ > #define MAP_HUGETLB 0x40000 /* create a huge page mapping */ > +#define MAP_LOCKONFAULT 0x80000 /* Lock pages after they are faulted in, do not prefault */ > > #endif /* _UAPI_ASM_POWERPC_MMAN_H */ > diff --git a/arch/sparc/include/uapi/asm/mman.h b/arch/sparc/include/uapi/asm/mman.h > index 0b14df3..12425d8 100644 > --- a/arch/sparc/include/uapi/asm/mman.h > +++ b/arch/sparc/include/uapi/asm/mman.h > @@ -22,6 +22,7 @@ > #define MAP_NONBLOCK 0x10000 /* do not block on IO */ > #define MAP_STACK 0x20000 /* give out an address that is best suited for process/thread stacks */ > #define MAP_HUGETLB 0x40000 /* create a huge page mapping */ > +#define MAP_LOCKONFAULT 0x80000 /* Lock pages after they are faulted in, do not prefault */ > > > #endif /* _UAPI__SPARC_MMAN_H__ */ > diff --git a/arch/tile/include/uapi/asm/mman.h b/arch/tile/include/uapi/asm/mman.h > index 81b8fc3..ec04eaf 100644 > --- a/arch/tile/include/uapi/asm/mman.h > +++ b/arch/tile/include/uapi/asm/mman.h > @@ -29,6 +29,7 @@ > #define MAP_DENYWRITE 0x0800 /* ETXTBSY */ > #define MAP_EXECUTABLE 0x1000 /* mark it as an executable */ > #define MAP_HUGETLB 0x4000 /* create a huge page mapping */ > +#define MAP_LOCKONFAULT 0x8000 /* Lock pages after they are faulted in, do not prefault */ > > > /* > diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h > index 201aec0..42d43cc 100644 > --- a/arch/xtensa/include/uapi/asm/mman.h > +++ b/arch/xtensa/include/uapi/asm/mman.h > @@ -55,6 +55,7 @@ > #define MAP_NONBLOCK 0x20000 /* do not block on IO */ > #define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */ > #define MAP_HUGETLB 0x80000 /* create a huge page mapping */ > +#define MAP_LOCKONFAULT 0x100000 /* Lock pages after they are faulted in, do not prefault */ > #ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED > # define MAP_UNINITIALIZED 0x4000000 /* For anonymous mmap, memory could be > * uninitialized */ > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 0755b9f..3e31457 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -126,6 +126,7 @@ extern unsigned int kobjsize(const void *objp); > #define VM_PFNMAP 0x00000400 /* Page-ranges managed without "struct page", just pure PFN */ > #define VM_DENYWRITE 0x00000800 /* ETXTBSY on write attempts.. */ > > +#define VM_LOCKONFAULT 0x00001000 /* Lock the pages covered when they are faulted in */ > #define VM_LOCKED 0x00002000 > #define VM_IO 0x00004000 /* Memory mapped I/O or similar */ > > diff --git a/include/linux/mman.h b/include/linux/mman.h > index 16373c8..437264b 100644 > --- a/include/linux/mman.h > +++ b/include/linux/mman.h > @@ -86,7 +86,8 @@ calc_vm_flag_bits(unsigned long flags) > { > return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) | > _calc_vm_trans(flags, MAP_DENYWRITE, VM_DENYWRITE ) | > - _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ); > + _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | > + _calc_vm_trans(flags, MAP_LOCKONFAULT,VM_LOCKONFAULT); > } > > unsigned long vm_commit_limit(void); > diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h > index e9fe6fd..fc4e586 100644 > --- a/include/uapi/asm-generic/mman.h > +++ b/include/uapi/asm-generic/mman.h > @@ -12,6 +12,7 @@ > #define MAP_NONBLOCK 0x10000 /* do not block on IO */ > #define MAP_STACK 0x20000 /* give out an address that is best suited for process/thread stacks */ > #define MAP_HUGETLB 0x40000 /* create a huge page mapping */ > +#define MAP_LOCKONFAULT 0x80000 /* Lock pages after they are faulted in, do not prefault */ > > /* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */ > > diff --git a/mm/mmap.c b/mm/mmap.c > index bb50cac..ba1a6bf 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -1233,7 +1233,7 @@ static inline int mlock_future_check(struct mm_struct *mm, > unsigned long locked, lock_limit; > > /* mlock MCL_FUTURE? */ > - if (flags & VM_LOCKED) { > + if (flags & (VM_LOCKED | VM_LOCKONFAULT)) { > locked = len >> PAGE_SHIFT; > locked += mm->locked_vm; > lock_limit = rlimit(RLIMIT_MEMLOCK); > @@ -1301,7 +1301,7 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr, > vm_flags = calc_vm_prot_bits(prot) | calc_vm_flag_bits(flags) | > mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; > > - if (flags & MAP_LOCKED) > + if (flags & (MAP_LOCKED | MAP_LOCKONFAULT)) > if (!can_do_mlock()) > return -EPERM; > > diff --git a/mm/swap.c b/mm/swap.c > index a7251a8..07c905e 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -711,7 +711,8 @@ void lru_cache_add_active_or_unevictable(struct page *page, > { > VM_BUG_ON_PAGE(PageLRU(page), page); > > - if (likely((vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) != VM_LOCKED)) { > + if (likely((vma->vm_flags & (VM_LOCKED | VM_LOCKONFAULT)) == 0) || > + (vma->vm_flags & VM_SPECIAL)) { > SetPageActive(page); > lru_cache_add(page); > return; > -- > 1.9.1 > -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault 2015-06-18 15:29 ` Michal Hocko @ 2015-06-18 20:30 ` Eric B Munson 2015-06-19 14:57 ` Michal Hocko 0 siblings, 1 reply; 24+ messages in thread From: Eric B Munson @ 2015-06-18 20:30 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, linux-alpha, linux-kernel, linux-mips, linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm, linux-arch, linux-api [-- Attachment #1: Type: text/plain, Size: 11761 bytes --] On Thu, 18 Jun 2015, Michal Hocko wrote: > [Sorry for the late reply - I meant to answer in the previous threads > but something always preempted me from that] > > On Wed 10-06-15 09:26:48, Eric B Munson wrote: > > The cost of faulting in all memory to be locked can be very high when > > working with large mappings. If only portions of the mapping will be > > used this can incur a high penalty for locking. > > > > For the example of a large file, this is the usage pattern for a large > > statical language model (probably applies to other statical or graphical > > models as well). For the security example, any application transacting > > in data that cannot be swapped out (credit card data, medical records, > > etc). > > Such a use case makes some sense to me but I am not sure the way you > implement it is the right one. This is another mlock related flag for > mmap with a different semantic. You do not want to prefault but e.g. is > the readahead or fault around acceptable? I do not see anything in your > patch to handle those... We haven't bumped into readahead or fault around causing performance problems for us. If they cause problems for users when LOCKONFAULT is in use then we can address them. > > Wouldn't it be much more reasonable and straightforward to have > MAP_FAULTPOPULATE as a counterpart for MAP_POPULATE which would > explicitly disallow any form of pre-faulting? It would be usable for > other usecases than with MAP_LOCKED combination. I don't see a clear case for it being more reasonable, it is one possible way to solve the problem. But I think it leaves us in an even more akward state WRT VMA flags. As you noted in your fix for the mmap() man page, one can get into a state where a VMA is VM_LOCKED, but not present. Having VM_LOCKONFAULT states that this was intentional, if we go to using MAP_FAULTPOPULATE instead of MAP_LOCKONFAULT, we no longer set VM_LOCKONFAULT (unless we want to start mapping it to the presence of two MAP_ flags). This can make detecting the MAP_LOCKED + populate failure state harder. If this is the preferred path for mmap(), I am fine with that. However, I would like to see the new system calls that Andrew mentioned (and that I am testing patches for) go in as well. That way we give users the ability to request VM_LOCKONFAULT for memory allocated using something other than mmap. > > > This patch introduces the ability to request that pages are not > > pre-faulted, but are placed on the unevictable LRU when they are finally > > faulted in. > > > > To keep accounting checks out of the page fault path, users are billed > > for the entire mapping lock as if MAP_LOCKED was used. > > > > Signed-off-by: Eric B Munson <emunson@akamai.com> > > Cc: Michal Hocko <mhocko@suse.cz> > > Cc: linux-alpha@vger.kernel.org > > Cc: linux-kernel@vger.kernel.org > > Cc: linux-mips@linux-mips.org > > Cc: linux-parisc@vger.kernel.org > > Cc: linuxppc-dev@lists.ozlabs.org > > Cc: sparclinux@vger.kernel.org > > Cc: linux-xtensa@linux-xtensa.org > > Cc: linux-mm@kvack.org > > Cc: linux-arch@vger.kernel.org > > Cc: linux-api@vger.kernel.org > > --- > > arch/alpha/include/uapi/asm/mman.h | 1 + > > arch/mips/include/uapi/asm/mman.h | 1 + > > arch/parisc/include/uapi/asm/mman.h | 1 + > > arch/powerpc/include/uapi/asm/mman.h | 1 + > > arch/sparc/include/uapi/asm/mman.h | 1 + > > arch/tile/include/uapi/asm/mman.h | 1 + > > arch/xtensa/include/uapi/asm/mman.h | 1 + > > include/linux/mm.h | 1 + > > include/linux/mman.h | 3 ++- > > include/uapi/asm-generic/mman.h | 1 + > > mm/mmap.c | 4 ++-- > > mm/swap.c | 3 ++- > > 12 files changed, 15 insertions(+), 4 deletions(-) > > > > diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h > > index 0086b47..15e96e1 100644 > > --- a/arch/alpha/include/uapi/asm/mman.h > > +++ b/arch/alpha/include/uapi/asm/mman.h > > @@ -30,6 +30,7 @@ > > #define MAP_NONBLOCK 0x40000 /* do not block on IO */ > > #define MAP_STACK 0x80000 /* give out an address that is best suited for process/thread stacks */ > > #define MAP_HUGETLB 0x100000 /* create a huge page mapping */ > > +#define MAP_LOCKONFAULT 0x200000 /* Lock pages after they are faulted in, do not prefault */ > > > > #define MS_ASYNC 1 /* sync memory asynchronously */ > > #define MS_SYNC 2 /* synchronous memory sync */ > > diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h > > index cfcb876..47846a5 100644 > > --- a/arch/mips/include/uapi/asm/mman.h > > +++ b/arch/mips/include/uapi/asm/mman.h > > @@ -48,6 +48,7 @@ > > #define MAP_NONBLOCK 0x20000 /* do not block on IO */ > > #define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */ > > #define MAP_HUGETLB 0x80000 /* create a huge page mapping */ > > +#define MAP_LOCKONFAULT 0x100000 /* Lock pages after they are faulted in, do not prefault */ > > > > /* > > * Flags for msync > > diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h > > index 294d251..1514cd7 100644 > > --- a/arch/parisc/include/uapi/asm/mman.h > > +++ b/arch/parisc/include/uapi/asm/mman.h > > @@ -24,6 +24,7 @@ > > #define MAP_NONBLOCK 0x20000 /* do not block on IO */ > > #define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */ > > #define MAP_HUGETLB 0x80000 /* create a huge page mapping */ > > +#define MAP_LOCKONFAULT 0x100000 /* Lock pages after they are faulted in, do not prefault */ > > > > #define MS_SYNC 1 /* synchronous memory sync */ > > #define MS_ASYNC 2 /* sync memory asynchronously */ > > diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h > > index 6ea26df..fce74fe 100644 > > --- a/arch/powerpc/include/uapi/asm/mman.h > > +++ b/arch/powerpc/include/uapi/asm/mman.h > > @@ -27,5 +27,6 @@ > > #define MAP_NONBLOCK 0x10000 /* do not block on IO */ > > #define MAP_STACK 0x20000 /* give out an address that is best suited for process/thread stacks */ > > #define MAP_HUGETLB 0x40000 /* create a huge page mapping */ > > +#define MAP_LOCKONFAULT 0x80000 /* Lock pages after they are faulted in, do not prefault */ > > > > #endif /* _UAPI_ASM_POWERPC_MMAN_H */ > > diff --git a/arch/sparc/include/uapi/asm/mman.h b/arch/sparc/include/uapi/asm/mman.h > > index 0b14df3..12425d8 100644 > > --- a/arch/sparc/include/uapi/asm/mman.h > > +++ b/arch/sparc/include/uapi/asm/mman.h > > @@ -22,6 +22,7 @@ > > #define MAP_NONBLOCK 0x10000 /* do not block on IO */ > > #define MAP_STACK 0x20000 /* give out an address that is best suited for process/thread stacks */ > > #define MAP_HUGETLB 0x40000 /* create a huge page mapping */ > > +#define MAP_LOCKONFAULT 0x80000 /* Lock pages after they are faulted in, do not prefault */ > > > > > > #endif /* _UAPI__SPARC_MMAN_H__ */ > > diff --git a/arch/tile/include/uapi/asm/mman.h b/arch/tile/include/uapi/asm/mman.h > > index 81b8fc3..ec04eaf 100644 > > --- a/arch/tile/include/uapi/asm/mman.h > > +++ b/arch/tile/include/uapi/asm/mman.h > > @@ -29,6 +29,7 @@ > > #define MAP_DENYWRITE 0x0800 /* ETXTBSY */ > > #define MAP_EXECUTABLE 0x1000 /* mark it as an executable */ > > #define MAP_HUGETLB 0x4000 /* create a huge page mapping */ > > +#define MAP_LOCKONFAULT 0x8000 /* Lock pages after they are faulted in, do not prefault */ > > > > > > /* > > diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h > > index 201aec0..42d43cc 100644 > > --- a/arch/xtensa/include/uapi/asm/mman.h > > +++ b/arch/xtensa/include/uapi/asm/mman.h > > @@ -55,6 +55,7 @@ > > #define MAP_NONBLOCK 0x20000 /* do not block on IO */ > > #define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */ > > #define MAP_HUGETLB 0x80000 /* create a huge page mapping */ > > +#define MAP_LOCKONFAULT 0x100000 /* Lock pages after they are faulted in, do not prefault */ > > #ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED > > # define MAP_UNINITIALIZED 0x4000000 /* For anonymous mmap, memory could be > > * uninitialized */ > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index 0755b9f..3e31457 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -126,6 +126,7 @@ extern unsigned int kobjsize(const void *objp); > > #define VM_PFNMAP 0x00000400 /* Page-ranges managed without "struct page", just pure PFN */ > > #define VM_DENYWRITE 0x00000800 /* ETXTBSY on write attempts.. */ > > > > +#define VM_LOCKONFAULT 0x00001000 /* Lock the pages covered when they are faulted in */ > > #define VM_LOCKED 0x00002000 > > #define VM_IO 0x00004000 /* Memory mapped I/O or similar */ > > > > diff --git a/include/linux/mman.h b/include/linux/mman.h > > index 16373c8..437264b 100644 > > --- a/include/linux/mman.h > > +++ b/include/linux/mman.h > > @@ -86,7 +86,8 @@ calc_vm_flag_bits(unsigned long flags) > > { > > return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) | > > _calc_vm_trans(flags, MAP_DENYWRITE, VM_DENYWRITE ) | > > - _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ); > > + _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | > > + _calc_vm_trans(flags, MAP_LOCKONFAULT,VM_LOCKONFAULT); > > } > > > > unsigned long vm_commit_limit(void); > > diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h > > index e9fe6fd..fc4e586 100644 > > --- a/include/uapi/asm-generic/mman.h > > +++ b/include/uapi/asm-generic/mman.h > > @@ -12,6 +12,7 @@ > > #define MAP_NONBLOCK 0x10000 /* do not block on IO */ > > #define MAP_STACK 0x20000 /* give out an address that is best suited for process/thread stacks */ > > #define MAP_HUGETLB 0x40000 /* create a huge page mapping */ > > +#define MAP_LOCKONFAULT 0x80000 /* Lock pages after they are faulted in, do not prefault */ > > > > /* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */ > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > index bb50cac..ba1a6bf 100644 > > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -1233,7 +1233,7 @@ static inline int mlock_future_check(struct mm_struct *mm, > > unsigned long locked, lock_limit; > > > > /* mlock MCL_FUTURE? */ > > - if (flags & VM_LOCKED) { > > + if (flags & (VM_LOCKED | VM_LOCKONFAULT)) { > > locked = len >> PAGE_SHIFT; > > locked += mm->locked_vm; > > lock_limit = rlimit(RLIMIT_MEMLOCK); > > @@ -1301,7 +1301,7 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr, > > vm_flags = calc_vm_prot_bits(prot) | calc_vm_flag_bits(flags) | > > mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; > > > > - if (flags & MAP_LOCKED) > > + if (flags & (MAP_LOCKED | MAP_LOCKONFAULT)) > > if (!can_do_mlock()) > > return -EPERM; > > > > diff --git a/mm/swap.c b/mm/swap.c > > index a7251a8..07c905e 100644 > > --- a/mm/swap.c > > +++ b/mm/swap.c > > @@ -711,7 +711,8 @@ void lru_cache_add_active_or_unevictable(struct page *page, > > { > > VM_BUG_ON_PAGE(PageLRU(page), page); > > > > - if (likely((vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) != VM_LOCKED)) { > > + if (likely((vma->vm_flags & (VM_LOCKED | VM_LOCKONFAULT)) == 0) || > > + (vma->vm_flags & VM_SPECIAL)) { > > SetPageActive(page); > > lru_cache_add(page); > > return; > > -- > > 1.9.1 > > > > -- > Michal Hocko > SUSE Labs [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault 2015-06-18 20:30 ` Eric B Munson @ 2015-06-19 14:57 ` Michal Hocko 2015-06-19 16:43 ` Eric B Munson 0 siblings, 1 reply; 24+ messages in thread From: Michal Hocko @ 2015-06-19 14:57 UTC (permalink / raw) To: Eric B Munson Cc: Andrew Morton, linux-alpha, linux-kernel, linux-mips, linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm, linux-arch, linux-api On Thu 18-06-15 16:30:48, Eric B Munson wrote: > On Thu, 18 Jun 2015, Michal Hocko wrote: [...] > > Wouldn't it be much more reasonable and straightforward to have > > MAP_FAULTPOPULATE as a counterpart for MAP_POPULATE which would > > explicitly disallow any form of pre-faulting? It would be usable for > > other usecases than with MAP_LOCKED combination. > > I don't see a clear case for it being more reasonable, it is one > possible way to solve the problem. MAP_FAULTPOPULATE would be usable for other cases as well. E.g. fault around is all or nothing feature. Either all mappings (which support this) fault around or none. There is no way to tell the kernel that this particular mapping shouldn't fault around. I haven't seen such a request yet but we have seen requests to have a way to opt out from a global policy in the past (e.g. per-process opt out from THP). So I can imagine somebody will come with a request to opt out from any speculative operations on the mapped area in the future. > But I think it leaves us in an even > more akward state WRT VMA flags. As you noted in your fix for the > mmap() man page, one can get into a state where a VMA is VM_LOCKED, but > not present. Having VM_LOCKONFAULT states that this was intentional, if > we go to using MAP_FAULTPOPULATE instead of MAP_LOCKONFAULT, we no > longer set VM_LOCKONFAULT (unless we want to start mapping it to the > presence of two MAP_ flags). This can make detecting the MAP_LOCKED + > populate failure state harder. I am not sure I understand your point here. Could you be more specific how would you check for that and what for? >From my understanding MAP_LOCKONFAULT is essentially MAP_FAULTPOPULATE|MAP_LOCKED with a quite obvious semantic (unlike single MAP_LOCKED unfortunately). I would love to also have MAP_LOCKED|MAP_POPULATE (aka full mlock semantic) but I am really skeptical considering how my previous attempt to make MAP_POPULATE reasonable went. > If this is the preferred path for mmap(), I am fine with that. > However, > I would like to see the new system calls that Andrew mentioned (and that > I am testing patches for) go in as well. mlock with flags sounds like a good step but I am not sure it will make sense in the future. POSIX has screwed that and I am not sure how many applications would use it. This ship has sailed long time ago. > That way we give users the > ability to request VM_LOCKONFAULT for memory allocated using something > other than mmap. mmap(MAP_FAULTPOPULATE); mlock() would have the same semantic even without changing mlock syscall. > > > This patch introduces the ability to request that pages are not > > > pre-faulted, but are placed on the unevictable LRU when they are finally > > > faulted in. > > > > > > To keep accounting checks out of the page fault path, users are billed > > > for the entire mapping lock as if MAP_LOCKED was used. > > > > > > Signed-off-by: Eric B Munson <emunson@akamai.com> > > > Cc: Michal Hocko <mhocko@suse.cz> > > > Cc: linux-alpha@vger.kernel.org > > > Cc: linux-kernel@vger.kernel.org > > > Cc: linux-mips@linux-mips.org > > > Cc: linux-parisc@vger.kernel.org > > > Cc: linuxppc-dev@lists.ozlabs.org > > > Cc: sparclinux@vger.kernel.org > > > Cc: linux-xtensa@linux-xtensa.org > > > Cc: linux-mm@kvack.org > > > Cc: linux-arch@vger.kernel.org > > > Cc: linux-api@vger.kernel.org > > > --- [...] -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault 2015-06-19 14:57 ` Michal Hocko @ 2015-06-19 16:43 ` Eric B Munson [not found] ` <20150619164333.GD2329-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 24+ messages in thread From: Eric B Munson @ 2015-06-19 16:43 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, linux-alpha, linux-kernel, linux-mips, linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm, linux-arch, linux-api [-- Attachment #1: Type: text/plain, Size: 4808 bytes --] On Fri, 19 Jun 2015, Michal Hocko wrote: > On Thu 18-06-15 16:30:48, Eric B Munson wrote: > > On Thu, 18 Jun 2015, Michal Hocko wrote: > [...] > > > Wouldn't it be much more reasonable and straightforward to have > > > MAP_FAULTPOPULATE as a counterpart for MAP_POPULATE which would > > > explicitly disallow any form of pre-faulting? It would be usable for > > > other usecases than with MAP_LOCKED combination. > > > > I don't see a clear case for it being more reasonable, it is one > > possible way to solve the problem. > > MAP_FAULTPOPULATE would be usable for other cases as well. E.g. fault > around is all or nothing feature. Either all mappings (which support > this) fault around or none. There is no way to tell the kernel that > this particular mapping shouldn't fault around. I haven't seen such a > request yet but we have seen requests to have a way to opt out from > a global policy in the past (e.g. per-process opt out from THP). So > I can imagine somebody will come with a request to opt out from any > speculative operations on the mapped area in the future. > > > But I think it leaves us in an even > > more akward state WRT VMA flags. As you noted in your fix for the > > mmap() man page, one can get into a state where a VMA is VM_LOCKED, but > > not present. Having VM_LOCKONFAULT states that this was intentional, if > > we go to using MAP_FAULTPOPULATE instead of MAP_LOCKONFAULT, we no > > longer set VM_LOCKONFAULT (unless we want to start mapping it to the > > presence of two MAP_ flags). This can make detecting the MAP_LOCKED + > > populate failure state harder. > > I am not sure I understand your point here. Could you be more specific > how would you check for that and what for? My thought on detecting was that someone might want to know if they had a VMA that was VM_LOCKED but had not been made present becuase of a failure in mmap. We don't have a way today, but adding VM_LOCKONFAULT is at least explicit about what is happening which would make detecting the VM_LOCKED but not present state easier. This assumes that MAP_FAULTPOPULATE does not translate to a VMA flag, but it sounds like it would have to. > > From my understanding MAP_LOCKONFAULT is essentially > MAP_FAULTPOPULATE|MAP_LOCKED with a quite obvious semantic (unlike > single MAP_LOCKED unfortunately). I would love to also have > MAP_LOCKED|MAP_POPULATE (aka full mlock semantic) but I am really > skeptical considering how my previous attempt to make MAP_POPULATE > reasonable went. Are you objecting to the addition of the VMA flag VM_LOCKONFAULT, or the new MAP_LOCKONFAULT flag (or both)? If you prefer that MAP_LOCKED | MAP_FAULTPOPULATE means that VM_LOCKONFAULT is set, I am fine with that instead of introducing MAP_LOCKONFAULT. I went with the new flag because to date, we have a one to one mapping of MAP_* to VM_* flags. > > > If this is the preferred path for mmap(), I am fine with that. > > > However, > > I would like to see the new system calls that Andrew mentioned (and that > > I am testing patches for) go in as well. > > mlock with flags sounds like a good step but I am not sure it will make > sense in the future. POSIX has screwed that and I am not sure how many > applications would use it. This ship has sailed long time ago. I don't know either, but the code is the question, right? I know that we have at least one team that wants it here. > > > That way we give users the > > ability to request VM_LOCKONFAULT for memory allocated using something > > other than mmap. > > mmap(MAP_FAULTPOPULATE); mlock() would have the same semantic even > without changing mlock syscall. That is true as long as MAP_FAULTPOPULATE set a flag in the VMA(s). It doesn't cover the actual case I was asking about, which is how do I get lock on fault on malloc'd memory? > > > > > This patch introduces the ability to request that pages are not > > > > pre-faulted, but are placed on the unevictable LRU when they are finally > > > > faulted in. > > > > > > > > To keep accounting checks out of the page fault path, users are billed > > > > for the entire mapping lock as if MAP_LOCKED was used. > > > > > > > > Signed-off-by: Eric B Munson <emunson@akamai.com> > > > > Cc: Michal Hocko <mhocko@suse.cz> > > > > Cc: linux-alpha@vger.kernel.org > > > > Cc: linux-kernel@vger.kernel.org > > > > Cc: linux-mips@linux-mips.org > > > > Cc: linux-parisc@vger.kernel.org > > > > Cc: linuxppc-dev@lists.ozlabs.org > > > > Cc: sparclinux@vger.kernel.org > > > > Cc: linux-xtensa@linux-xtensa.org > > > > Cc: linux-mm@kvack.org > > > > Cc: linux-arch@vger.kernel.org > > > > Cc: linux-api@vger.kernel.org > > > > --- > [...] > -- > Michal Hocko > SUSE Labs [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <20150619164333.GD2329-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org>]
* Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault [not found] ` <20150619164333.GD2329-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org> @ 2015-06-22 12:38 ` Michal Hocko [not found] ` <20150622123826.GF4430-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org> 0 siblings, 1 reply; 24+ messages in thread From: Michal Hocko @ 2015-06-22 12:38 UTC (permalink / raw) To: Eric B Munson Cc: Andrew Morton, linux-alpha-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mips-6z/3iImG2C8G8FEW9MqTrA, linux-parisc-u79uwXL29TY76Z2rM5mHXA, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ, sparclinux-u79uwXL29TY76Z2rM5mHXA, linux-xtensa-PjhNF2WwrV/0Sa2dR60CXw, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-arch-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA On Fri 19-06-15 12:43:33, Eric B Munson wrote: > On Fri, 19 Jun 2015, Michal Hocko wrote: > > > On Thu 18-06-15 16:30:48, Eric B Munson wrote: > > > On Thu, 18 Jun 2015, Michal Hocko wrote: > > [...] > > > > Wouldn't it be much more reasonable and straightforward to have > > > > MAP_FAULTPOPULATE as a counterpart for MAP_POPULATE which would > > > > explicitly disallow any form of pre-faulting? It would be usable for > > > > other usecases than with MAP_LOCKED combination. > > > > > > I don't see a clear case for it being more reasonable, it is one > > > possible way to solve the problem. > > > > MAP_FAULTPOPULATE would be usable for other cases as well. E.g. fault > > around is all or nothing feature. Either all mappings (which support > > this) fault around or none. There is no way to tell the kernel that > > this particular mapping shouldn't fault around. I haven't seen such a > > request yet but we have seen requests to have a way to opt out from > > a global policy in the past (e.g. per-process opt out from THP). So > > I can imagine somebody will come with a request to opt out from any > > speculative operations on the mapped area in the future. > > > > > But I think it leaves us in an even > > > more akward state WRT VMA flags. As you noted in your fix for the > > > mmap() man page, one can get into a state where a VMA is VM_LOCKED, but > > > not present. Having VM_LOCKONFAULT states that this was intentional, if > > > we go to using MAP_FAULTPOPULATE instead of MAP_LOCKONFAULT, we no > > > longer set VM_LOCKONFAULT (unless we want to start mapping it to the > > > presence of two MAP_ flags). This can make detecting the MAP_LOCKED + > > > populate failure state harder. > > > > I am not sure I understand your point here. Could you be more specific > > how would you check for that and what for? > > My thought on detecting was that someone might want to know if they had > a VMA that was VM_LOCKED but had not been made present becuase of a > failure in mmap. We don't have a way today, but adding VM_LOCKONFAULT > is at least explicit about what is happening which would make detecting > the VM_LOCKED but not present state easier. One could use /proc/<pid>/pagemap to query the residency. > This assumes that > MAP_FAULTPOPULATE does not translate to a VMA flag, but it sounds like > it would have to. Yes, it would have to have a VM flag for the vma. > > From my understanding MAP_LOCKONFAULT is essentially > > MAP_FAULTPOPULATE|MAP_LOCKED with a quite obvious semantic (unlike > > single MAP_LOCKED unfortunately). I would love to also have > > MAP_LOCKED|MAP_POPULATE (aka full mlock semantic) but I am really > > skeptical considering how my previous attempt to make MAP_POPULATE > > reasonable went. > > Are you objecting to the addition of the VMA flag VM_LOCKONFAULT, or the > new MAP_LOCKONFAULT flag (or both)? I thought the MAP_FAULTPOPULATE (or any other better name) would directly translate into VM_FAULTPOPULATE and wouldn't be tight to the locked semantic. We already have VM_LOCKED for that. The direct effect of the flag would be to prevent from population other than the direct page fault - including any speculative actions like fault around or read-ahead. > If you prefer that MAP_LOCKED | > MAP_FAULTPOPULATE means that VM_LOCKONFAULT is set, I am fine with that > instead of introducing MAP_LOCKONFAULT. I went with the new flag > because to date, we have a one to one mapping of MAP_* to VM_* flags. > > > > > > If this is the preferred path for mmap(), I am fine with that. > > > > > However, > > > I would like to see the new system calls that Andrew mentioned (and that > > > I am testing patches for) go in as well. > > > > mlock with flags sounds like a good step but I am not sure it will make > > sense in the future. POSIX has screwed that and I am not sure how many > > applications would use it. This ship has sailed long time ago. > > I don't know either, but the code is the question, right? I know that > we have at least one team that wants it here. > > > > > > That way we give users the > > > ability to request VM_LOCKONFAULT for memory allocated using something > > > other than mmap. > > > > mmap(MAP_FAULTPOPULATE); mlock() would have the same semantic even > > without changing mlock syscall. > > That is true as long as MAP_FAULTPOPULATE set a flag in the VMA(s). It > doesn't cover the actual case I was asking about, which is how do I get > lock on fault on malloc'd memory? OK I see your point now. We would indeed need a flag argument for mlock. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <20150622123826.GF4430-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>]
* Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault [not found] ` <20150622123826.GF4430-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org> @ 2015-06-22 14:18 ` Eric B Munson 2015-06-23 12:45 ` Vlastimil Babka 2015-06-24 8:50 ` Michal Hocko 0 siblings, 2 replies; 24+ messages in thread From: Eric B Munson @ 2015-06-22 14:18 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, linux-alpha-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mips-6z/3iImG2C8G8FEW9MqTrA, linux-parisc-u79uwXL29TY76Z2rM5mHXA, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ, sparclinux-u79uwXL29TY76Z2rM5mHXA, linux-xtensa-PjhNF2WwrV/0Sa2dR60CXw, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-arch-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA [-- Attachment #1: Type: text/plain, Size: 5698 bytes --] On Mon, 22 Jun 2015, Michal Hocko wrote: > On Fri 19-06-15 12:43:33, Eric B Munson wrote: > > On Fri, 19 Jun 2015, Michal Hocko wrote: > > > > > On Thu 18-06-15 16:30:48, Eric B Munson wrote: > > > > On Thu, 18 Jun 2015, Michal Hocko wrote: > > > [...] > > > > > Wouldn't it be much more reasonable and straightforward to have > > > > > MAP_FAULTPOPULATE as a counterpart for MAP_POPULATE which would > > > > > explicitly disallow any form of pre-faulting? It would be usable for > > > > > other usecases than with MAP_LOCKED combination. > > > > > > > > I don't see a clear case for it being more reasonable, it is one > > > > possible way to solve the problem. > > > > > > MAP_FAULTPOPULATE would be usable for other cases as well. E.g. fault > > > around is all or nothing feature. Either all mappings (which support > > > this) fault around or none. There is no way to tell the kernel that > > > this particular mapping shouldn't fault around. I haven't seen such a > > > request yet but we have seen requests to have a way to opt out from > > > a global policy in the past (e.g. per-process opt out from THP). So > > > I can imagine somebody will come with a request to opt out from any > > > speculative operations on the mapped area in the future. > > > > > > > But I think it leaves us in an even > > > > more akward state WRT VMA flags. As you noted in your fix for the > > > > mmap() man page, one can get into a state where a VMA is VM_LOCKED, but > > > > not present. Having VM_LOCKONFAULT states that this was intentional, if > > > > we go to using MAP_FAULTPOPULATE instead of MAP_LOCKONFAULT, we no > > > > longer set VM_LOCKONFAULT (unless we want to start mapping it to the > > > > presence of two MAP_ flags). This can make detecting the MAP_LOCKED + > > > > populate failure state harder. > > > > > > I am not sure I understand your point here. Could you be more specific > > > how would you check for that and what for? > > > > My thought on detecting was that someone might want to know if they had > > a VMA that was VM_LOCKED but had not been made present becuase of a > > failure in mmap. We don't have a way today, but adding VM_LOCKONFAULT > > is at least explicit about what is happening which would make detecting > > the VM_LOCKED but not present state easier. > > One could use /proc/<pid>/pagemap to query the residency. > > > This assumes that > > MAP_FAULTPOPULATE does not translate to a VMA flag, but it sounds like > > it would have to. > > Yes, it would have to have a VM flag for the vma. > > > > From my understanding MAP_LOCKONFAULT is essentially > > > MAP_FAULTPOPULATE|MAP_LOCKED with a quite obvious semantic (unlike > > > single MAP_LOCKED unfortunately). I would love to also have > > > MAP_LOCKED|MAP_POPULATE (aka full mlock semantic) but I am really > > > skeptical considering how my previous attempt to make MAP_POPULATE > > > reasonable went. > > > > Are you objecting to the addition of the VMA flag VM_LOCKONFAULT, or the > > new MAP_LOCKONFAULT flag (or both)? > > I thought the MAP_FAULTPOPULATE (or any other better name) would > directly translate into VM_FAULTPOPULATE and wouldn't be tight to the > locked semantic. We already have VM_LOCKED for that. The direct effect > of the flag would be to prevent from population other than the direct > page fault - including any speculative actions like fault around or > read-ahead. I like the ability to control other speculative population, but I am not sure about overloading it with the VM_LOCKONFAULT case. Here is my concern. If we are using VM_FAULTPOPULATE | VM_LOCKED to denote LOCKONFAULT, how can we tell the difference between someone that wants to avoid read-ahead and wants to use mlock()? This might lead to some interesting states with mlock() and munlock() that take flags. For instance, using VM_LOCKONFAULT mlock(MLOCK_ONFAULT) followed by munlock(MLOCK_LOCKED) leaves the VMAs in the same state with VM_LOCKONFAULT set. If we use VM_FAULTPOPULATE, the same pair of calls would clear VM_LOCKED, but leave VM_FAULTPOPULATE. It may not matter in the end, but I am concerned about the subtleties here. > > > If you prefer that MAP_LOCKED | > > MAP_FAULTPOPULATE means that VM_LOCKONFAULT is set, I am fine with that > > instead of introducing MAP_LOCKONFAULT. I went with the new flag > > because to date, we have a one to one mapping of MAP_* to VM_* flags. > > > > > > > > > If this is the preferred path for mmap(), I am fine with that. > > > > > > > However, > > > > I would like to see the new system calls that Andrew mentioned (and that > > > > I am testing patches for) go in as well. > > > > > > mlock with flags sounds like a good step but I am not sure it will make > > > sense in the future. POSIX has screwed that and I am not sure how many > > > applications would use it. This ship has sailed long time ago. > > > > I don't know either, but the code is the question, right? I know that > > we have at least one team that wants it here. > > > > > > > > > That way we give users the > > > > ability to request VM_LOCKONFAULT for memory allocated using something > > > > other than mmap. > > > > > > mmap(MAP_FAULTPOPULATE); mlock() would have the same semantic even > > > without changing mlock syscall. > > > > That is true as long as MAP_FAULTPOPULATE set a flag in the VMA(s). It > > doesn't cover the actual case I was asking about, which is how do I get > > lock on fault on malloc'd memory? > > OK I see your point now. We would indeed need a flag argument for mlock. > -- > Michal Hocko > SUSE Labs [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault 2015-06-22 14:18 ` Eric B Munson @ 2015-06-23 12:45 ` Vlastimil Babka [not found] ` <558954DD.4060405-AlSwsSmVLrQ@public.gmane.org> 2015-06-24 8:50 ` Michal Hocko 1 sibling, 1 reply; 24+ messages in thread From: Vlastimil Babka @ 2015-06-23 12:45 UTC (permalink / raw) To: Eric B Munson, Michal Hocko Cc: Andrew Morton, linux-alpha, linux-kernel, linux-mips, linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm, linux-arch, linux-api On 06/22/2015 04:18 PM, Eric B Munson wrote: > On Mon, 22 Jun 2015, Michal Hocko wrote: > >> On Fri 19-06-15 12:43:33, Eric B Munson wrote: >>> On Fri, 19 Jun 2015, Michal Hocko wrote: >>> >>>> On Thu 18-06-15 16:30:48, Eric B Munson wrote: >>>>> On Thu, 18 Jun 2015, Michal Hocko wrote: >>>> [...] >>>>>> Wouldn't it be much more reasonable and straightforward to have >>>>>> MAP_FAULTPOPULATE as a counterpart for MAP_POPULATE which would >>>>>> explicitly disallow any form of pre-faulting? It would be usable for >>>>>> other usecases than with MAP_LOCKED combination. >>>>> >>>>> I don't see a clear case for it being more reasonable, it is one >>>>> possible way to solve the problem. >>>> >>>> MAP_FAULTPOPULATE would be usable for other cases as well. E.g. fault >>>> around is all or nothing feature. Either all mappings (which support >>>> this) fault around or none. There is no way to tell the kernel that >>>> this particular mapping shouldn't fault around. I haven't seen such a >>>> request yet but we have seen requests to have a way to opt out from >>>> a global policy in the past (e.g. per-process opt out from THP). So >>>> I can imagine somebody will come with a request to opt out from any >>>> speculative operations on the mapped area in the future. That sounds like something where new madvise() flag would make more sense than a new mmap flag, and conflating it with locking behavior would lead to all kinds of weird corner cases as Eric mentioned. >>>> >>>>> But I think it leaves us in an even >>>>> more akward state WRT VMA flags. As you noted in your fix for the >>>>> mmap() man page, one can get into a state where a VMA is VM_LOCKED, but >>>>> not present. Having VM_LOCKONFAULT states that this was intentional, if >>>>> we go to using MAP_FAULTPOPULATE instead of MAP_LOCKONFAULT, we no >>>>> longer set VM_LOCKONFAULT (unless we want to start mapping it to the >>>>> presence of two MAP_ flags). This can make detecting the MAP_LOCKED + >>>>> populate failure state harder. >>>> >>>> I am not sure I understand your point here. Could you be more specific >>>> how would you check for that and what for? >>> >>> My thought on detecting was that someone might want to know if they had >>> a VMA that was VM_LOCKED but had not been made present becuase of a >>> failure in mmap. We don't have a way today, but adding VM_LOCKONFAULT >>> is at least explicit about what is happening which would make detecting >>> the VM_LOCKED but not present state easier. >> >> One could use /proc/<pid>/pagemap to query the residency. I think that's all too much complex scenario for a little gain. If someone knows that mmap(MAP_LOCKED|MAP_POPULATE) is not perfect, he should either mlock() separately from mmap(), or fault the range manually with a for loop. Why try to detect if the corner case was hit? >> >>> This assumes that >>> MAP_FAULTPOPULATE does not translate to a VMA flag, but it sounds like >>> it would have to. >> >> Yes, it would have to have a VM flag for the vma. So with your approach, VM_LOCKED flag is enough, right? The new MAP_ / MLOCK_ flags just cause setting VM_LOCKED to not fault the whole vma, but otherwise nothing changes. If that's true, I think it's better than a new vma flag. >> >>>> From my understanding MAP_LOCKONFAULT is essentially >>>> MAP_FAULTPOPULATE|MAP_LOCKED with a quite obvious semantic (unlike >>>> single MAP_LOCKED unfortunately). I would love to also have >>>> MAP_LOCKED|MAP_POPULATE (aka full mlock semantic) but I am really >>>> skeptical considering how my previous attempt to make MAP_POPULATE >>>> reasonable went. >>> >>> Are you objecting to the addition of the VMA flag VM_LOCKONFAULT, or the >>> new MAP_LOCKONFAULT flag (or both)? >> >> I thought the MAP_FAULTPOPULATE (or any other better name) would >> directly translate into VM_FAULTPOPULATE and wouldn't be tight to the >> locked semantic. We already have VM_LOCKED for that. The direct effect >> of the flag would be to prevent from population other than the direct >> page fault - including any speculative actions like fault around or >> read-ahead. > > I like the ability to control other speculative population, but I am not > sure about overloading it with the VM_LOCKONFAULT case. Here is my > concern. If we are using VM_FAULTPOPULATE | VM_LOCKED to denote > LOCKONFAULT, how can we tell the difference between someone that wants > to avoid read-ahead and wants to use mlock()? This might lead to some > interesting states with mlock() and munlock() that take flags. For > instance, using VM_LOCKONFAULT mlock(MLOCK_ONFAULT) followed by > munlock(MLOCK_LOCKED) leaves the VMAs in the same state with > VM_LOCKONFAULT set. If we use VM_FAULTPOPULATE, the same pair of calls > would clear VM_LOCKED, but leave VM_FAULTPOPULATE. It may not matter in > the end, but I am concerned about the subtleties here. Right. >> >>> If you prefer that MAP_LOCKED | >>> MAP_FAULTPOPULATE means that VM_LOCKONFAULT is set, I am fine with that >>> instead of introducing MAP_LOCKONFAULT. I went with the new flag >>> because to date, we have a one to one mapping of MAP_* to VM_* flags. >>> >>>> >>>>> If this is the preferred path for mmap(), I am fine with that. >>>> >>>>> However, >>>>> I would like to see the new system calls that Andrew mentioned (and that >>>>> I am testing patches for) go in as well. >>>> >>>> mlock with flags sounds like a good step but I am not sure it will make >>>> sense in the future. POSIX has screwed that and I am not sure how many >>>> applications would use it. This ship has sailed long time ago. >>> >>> I don't know either, but the code is the question, right? I know that >>> we have at least one team that wants it here. >>> >>>> >>>>> That way we give users the >>>>> ability to request VM_LOCKONFAULT for memory allocated using something >>>>> other than mmap. >>>> >>>> mmap(MAP_FAULTPOPULATE); mlock() would have the same semantic even >>>> without changing mlock syscall. >>> >>> That is true as long as MAP_FAULTPOPULATE set a flag in the VMA(s). It >>> doesn't cover the actual case I was asking about, which is how do I get >>> lock on fault on malloc'd memory? >> >> OK I see your point now. We would indeed need a flag argument for mlock. >> -- >> Michal Hocko >> SUSE Labs ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <558954DD.4060405-AlSwsSmVLrQ@public.gmane.org>]
* Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault [not found] ` <558954DD.4060405-AlSwsSmVLrQ@public.gmane.org> @ 2015-06-24 9:47 ` Michal Hocko 0 siblings, 0 replies; 24+ messages in thread From: Michal Hocko @ 2015-06-24 9:47 UTC (permalink / raw) To: Vlastimil Babka Cc: Eric B Munson, Andrew Morton, linux-alpha-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mips-6z/3iImG2C8G8FEW9MqTrA, linux-parisc-u79uwXL29TY76Z2rM5mHXA, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ, sparclinux-u79uwXL29TY76Z2rM5mHXA, linux-xtensa-PjhNF2WwrV/0Sa2dR60CXw, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-arch-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA On Tue 23-06-15 14:45:17, Vlastimil Babka wrote: > On 06/22/2015 04:18 PM, Eric B Munson wrote: > >On Mon, 22 Jun 2015, Michal Hocko wrote: > > > >>On Fri 19-06-15 12:43:33, Eric B Munson wrote: [...] > >>>My thought on detecting was that someone might want to know if they had > >>>a VMA that was VM_LOCKED but had not been made present becuase of a > >>>failure in mmap. We don't have a way today, but adding VM_LOCKONFAULT > >>>is at least explicit about what is happening which would make detecting > >>>the VM_LOCKED but not present state easier. > >> > >>One could use /proc/<pid>/pagemap to query the residency. > > I think that's all too much complex scenario for a little gain. If someone > knows that mmap(MAP_LOCKED|MAP_POPULATE) is not perfect, he should either > mlock() separately from mmap(), or fault the range manually with a for loop. > Why try to detect if the corner case was hit? No idea. I have just offered a way to do that. I do not think it is anyhow useful but who knows... I do agree that the mlock should be used for the full mlock semantic. > >>>This assumes that > >>>MAP_FAULTPOPULATE does not translate to a VMA flag, but it sounds like > >>>it would have to. > >> > >>Yes, it would have to have a VM flag for the vma. > > So with your approach, VM_LOCKED flag is enough, right? The new MAP_ / > MLOCK_ flags just cause setting VM_LOCKED to not fault the whole vma, but > otherwise nothing changes. VM_FAULTPOPULATE would have to be sticky to prevent from other speculative poppulation of the mapping. I mean, is it OK to have a new mlock semantic (on fault) which might still populate&lock memory which hasn't been faulted directly? Who knows what kind of speculative things we will do in the future and then find out that the semantic of lock-on-fault is not usable anymore. [...] -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault 2015-06-22 14:18 ` Eric B Munson 2015-06-23 12:45 ` Vlastimil Babka @ 2015-06-24 8:50 ` Michal Hocko 2015-06-25 14:46 ` Eric B Munson 1 sibling, 1 reply; 24+ messages in thread From: Michal Hocko @ 2015-06-24 8:50 UTC (permalink / raw) To: Eric B Munson Cc: Andrew Morton, linux-alpha, linux-kernel, linux-mips, linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm, linux-arch, linux-api On Mon 22-06-15 10:18:06, Eric B Munson wrote: > On Mon, 22 Jun 2015, Michal Hocko wrote: > > > On Fri 19-06-15 12:43:33, Eric B Munson wrote: [...] > > > Are you objecting to the addition of the VMA flag VM_LOCKONFAULT, or the > > > new MAP_LOCKONFAULT flag (or both)? > > > > I thought the MAP_FAULTPOPULATE (or any other better name) would > > directly translate into VM_FAULTPOPULATE and wouldn't be tight to the > > locked semantic. We already have VM_LOCKED for that. The direct effect > > of the flag would be to prevent from population other than the direct > > page fault - including any speculative actions like fault around or > > read-ahead. > > I like the ability to control other speculative population, but I am not > sure about overloading it with the VM_LOCKONFAULT case. Here is my > concern. If we are using VM_FAULTPOPULATE | VM_LOCKED to denote > LOCKONFAULT, how can we tell the difference between someone that wants > to avoid read-ahead and wants to use mlock()? Not sure I understand. Something like? addr = mmap(VM_FAULTPOPULATE) # To prevent speculative mappings into the vma [...] mlock(addr, len) # Now I want the full mlock semantic and the later to have the full mlock semantic and populate the given area regardless of VM_FAULTPOPULATE being set on the vma? This would be an interesting question because mlock man page clearly states the semantic and that is to _always_ populate or fail. So I originally thought that it would obey VM_FAULTPOPULATE but this needs a more thinking. > This might lead to some > interesting states with mlock() and munlock() that take flags. For > instance, using VM_LOCKONFAULT mlock(MLOCK_ONFAULT) followed by > munlock(MLOCK_LOCKED) leaves the VMAs in the same state with > VM_LOCKONFAULT set. This is really confusing. Let me try to rephrase that. So you have mlock(addr, len, MLOCK_ONFAULT) munlock(addr, len, MLOCK_LOCKED) IIUC you would expect the vma still being MLOCK_ONFAULT, right? Isn't that behavior strange and unexpected? First of all, munlock has traditionally dropped the lock on the address range (e.g. what should happen if you did plain old munlock(addr, len)). But even without that. You are trying to unlock something that hasn't been locked the same way. So I would expect -EINVAL at least, if the two modes should be really represented by different flags. Or did you mean the both types of lock like: mlock(addr, len, MLOCK_ONFAULT) | mmap(MAP_LOCKONFAULT) mlock(addr, len, MLOCK_LOCKED) munlock(addr, len, MLOCK_LOCKED) and that should keep MLOCK_ONFAULT? This sounds even more weird to me because that means that the vma in question would be locked by two different mechanisms. MLOCK_LOCKED with the "always populate" semantic would rule out MLOCK_ONFAULT so what would be the meaning of the other flag then? Also what should regular munlock(addr, len) without flags unlock? Both? > If we use VM_FAULTPOPULATE, the same pair of calls > would clear VM_LOCKED, but leave VM_FAULTPOPULATE. It may not matter in > the end, but I am concerned about the subtleties here. This sounds like the proper behavior to me. munlock should simply always drop VM_LOCKED and the VM_FAULTPOPULATE can live its separate life. Btw. could you be more specific about semantic of m{un}lock(addr, len, flags) you want to propose? The more I think about that the more I am unclear about it, especially munlock behavior and possible flags. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault 2015-06-24 8:50 ` Michal Hocko @ 2015-06-25 14:46 ` Eric B Munson 0 siblings, 0 replies; 24+ messages in thread From: Eric B Munson @ 2015-06-25 14:46 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, linux-alpha, linux-kernel, linux-mips, linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm, linux-arch, linux-api [-- Attachment #1: Type: text/plain, Size: 5707 bytes --] On Wed, 24 Jun 2015, Michal Hocko wrote: > On Mon 22-06-15 10:18:06, Eric B Munson wrote: > > On Mon, 22 Jun 2015, Michal Hocko wrote: > > > > > On Fri 19-06-15 12:43:33, Eric B Munson wrote: > [...] > > > > Are you objecting to the addition of the VMA flag VM_LOCKONFAULT, or the > > > > new MAP_LOCKONFAULT flag (or both)? > > > > > > I thought the MAP_FAULTPOPULATE (or any other better name) would > > > directly translate into VM_FAULTPOPULATE and wouldn't be tight to the > > > locked semantic. We already have VM_LOCKED for that. The direct effect > > > of the flag would be to prevent from population other than the direct > > > page fault - including any speculative actions like fault around or > > > read-ahead. > > > > I like the ability to control other speculative population, but I am not > > sure about overloading it with the VM_LOCKONFAULT case. Here is my > > concern. If we are using VM_FAULTPOPULATE | VM_LOCKED to denote > > LOCKONFAULT, how can we tell the difference between someone that wants > > to avoid read-ahead and wants to use mlock()? > > Not sure I understand. Something like? > addr = mmap(VM_FAULTPOPULATE) # To prevent speculative mappings into the vma > [...] > mlock(addr, len) # Now I want the full mlock semantic So this leaves us without the LOCKONFAULT semantics? That is not at all what I am looking for. What I want is a way to express 3 possible states of a VMA WRT locking, locked (populated and all pages on the unevictable LRU), lock on fault (populated by page fault, pages that are present are on the unevictable LRU, newly faulted pages are added to same), and not locked. > > and the later to have the full mlock semantic and populate the given > area regardless of VM_FAULTPOPULATE being set on the vma? This would > be an interesting question because mlock man page clearly states the > semantic and that is to _always_ populate or fail. So I originally > thought that it would obey VM_FAULTPOPULATE but this needs a more > thinking. > > > This might lead to some > > interesting states with mlock() and munlock() that take flags. For > > instance, using VM_LOCKONFAULT mlock(MLOCK_ONFAULT) followed by > > munlock(MLOCK_LOCKED) leaves the VMAs in the same state with > > VM_LOCKONFAULT set. > > This is really confusing. Let me try to rephrase that. So you have > mlock(addr, len, MLOCK_ONFAULT) > munlock(addr, len, MLOCK_LOCKED) > > IIUC you would expect the vma still being MLOCK_ONFAULT, right? Isn't > that behavior strange and unexpected? First of all, munlock has > traditionally dropped the lock on the address range (e.g. what should > happen if you did plain old munlock(addr, len)). But even without > that. You are trying to unlock something that hasn't been locked the > same way. So I would expect -EINVAL at least, if the two modes should be > really represented by different flags. I would expect it to remain MLOCK_LOCKONFAULT because the user requested munlock(addr, len, MLOCK_LOCKED). It is not currently an error to unlock memory that is not locked. We do this because we do not require the user track what areas are locked. It is acceptable to have a mostly locked area with holes unlocked with a single call to munlock that spans the entire area. The same semantics should hold for munlock with flags. If I have an area with MLOCK_LOCKED and MLOCK_ONFAULT interleaved, it should be acceptable to clear the MLOCK_ONFAULT flag from those areas with a single munlock call that spans the area. On top of continuing with munlock semantics, the implementation would need the ability to rollback an munlock call if it failed after altering VMAs. If we have the same interleaved area as before and we go to return -EINVAL the first time we hit an area that was MLOCK_LOCKED, how do we restore the state of the VMAs we have already processed, and possibly merged/split? > > Or did you mean the both types of lock like: > mlock(addr, len, MLOCK_ONFAULT) | mmap(MAP_LOCKONFAULT) > mlock(addr, len, MLOCK_LOCKED) > munlock(addr, len, MLOCK_LOCKED) > > and that should keep MLOCK_ONFAULT? > This sounds even more weird to me because that means that the vma in > question would be locked by two different mechanisms. MLOCK_LOCKED with > the "always populate" semantic would rule out MLOCK_ONFAULT so what > would be the meaning of the other flag then? Also what should regular > munlock(addr, len) without flags unlock? Both? This is indeed confusing and not what I was trying to illustrate, but since you bring it up. mlockall() currently clears all flags and then sets the new flags with each subsequent call. mlock2 would use that same behavior, if LOCKED was specified for a ONFAULT region, that region would become LOCKED and vice versa. I have the new system call set ready, I am waiting to post for rc1 so I can run the benchmarks again on a base more stable than the middle of a merge window. We should wait to hash out implementations until the code is up rather than talk past eachother here. > > > If we use VM_FAULTPOPULATE, the same pair of calls > > would clear VM_LOCKED, but leave VM_FAULTPOPULATE. It may not matter in > > the end, but I am concerned about the subtleties here. > > This sounds like the proper behavior to me. munlock should simply always > drop VM_LOCKED and the VM_FAULTPOPULATE can live its separate life. > > Btw. could you be more specific about semantic of m{un}lock(addr, len, flags) > you want to propose? The more I think about that the more I am unclear > about it, especially munlock behavior and possible flags. > -- > Michal Hocko > SUSE Labs [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* [RESEND PATCH V2 2/3] Add mlockall flag for locking pages on fault 2015-06-10 13:26 [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault Eric B Munson 2015-06-10 13:26 ` [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after " Eric B Munson @ 2015-06-10 13:26 ` Eric B Munson [not found] ` <1433942810-7852-1-git-send-email-emunson-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org> 2 siblings, 0 replies; 24+ messages in thread From: Eric B Munson @ 2015-06-10 13:26 UTC (permalink / raw) To: Andrew Morton Cc: Eric B Munson, Michal Hocko, linux-alpha, linux-kernel, linux-mips, linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-arch, linux-api, linux-mm Building on the previous patch, extend mlockall() to give a process a way to specify that pages should be locked when they are faulted in, but that pre-faulting is not needed. MCL_ONFAULT is preferrable to MCL_FUTURE for the use cases enumerated in the previous patch becuase MCL_FUTURE will behave as if each mapping was made with MAP_LOCKED, causing the entire mapping to be faulted in when new space is allocated or mapped. MCL_ONFAULT allows the user to delay the fault in cost of any given page until it is actually needed, but then guarantees that that page will always be resident. As with the mmap(MAP_LOCKONFAULT) case, the user is charged for the mapping against the RLIMIT_MEMLOCK when the address space is allocated, not when the page is faulted in. This decision was made to keep the accounting checks out of the page fault path. Signed-off-by: Eric B Munson <emunson@akamai.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: linux-alpha@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: sparclinux@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org Cc: linux-arch@vger.kernel.org Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org --- arch/alpha/include/uapi/asm/mman.h | 1 + arch/mips/include/uapi/asm/mman.h | 1 + arch/parisc/include/uapi/asm/mman.h | 1 + arch/powerpc/include/uapi/asm/mman.h | 1 + arch/sparc/include/uapi/asm/mman.h | 1 + arch/tile/include/uapi/asm/mman.h | 1 + arch/xtensa/include/uapi/asm/mman.h | 1 + include/uapi/asm-generic/mman.h | 1 + mm/mlock.c | 13 +++++++++---- 9 files changed, 17 insertions(+), 4 deletions(-) diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h index 15e96e1..dfdaecf 100644 --- a/arch/alpha/include/uapi/asm/mman.h +++ b/arch/alpha/include/uapi/asm/mman.h @@ -38,6 +38,7 @@ #define MCL_CURRENT 8192 /* lock all currently mapped pages */ #define MCL_FUTURE 16384 /* lock all additions to address space */ +#define MCL_ONFAULT 32768 /* lock all pages that are faulted in */ #define MADV_NORMAL 0 /* no further special treatment */ #define MADV_RANDOM 1 /* expect random page references */ diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h index 47846a5..f0705ff 100644 --- a/arch/mips/include/uapi/asm/mman.h +++ b/arch/mips/include/uapi/asm/mman.h @@ -62,6 +62,7 @@ */ #define MCL_CURRENT 1 /* lock all current mappings */ #define MCL_FUTURE 2 /* lock all future mappings */ +#define MCL_ONFAULT 4 /* lock all pages that are faulted in */ #define MADV_NORMAL 0 /* no further special treatment */ #define MADV_RANDOM 1 /* expect random page references */ diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h index 1514cd7..7c2eb85 100644 --- a/arch/parisc/include/uapi/asm/mman.h +++ b/arch/parisc/include/uapi/asm/mman.h @@ -32,6 +32,7 @@ #define MCL_CURRENT 1 /* lock all current mappings */ #define MCL_FUTURE 2 /* lock all future mappings */ +#define MCL_ONFAULT 4 /* lock all pages that are faulted in */ #define MADV_NORMAL 0 /* no further special treatment */ #define MADV_RANDOM 1 /* expect random page references */ diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h index fce74fe..761137a 100644 --- a/arch/powerpc/include/uapi/asm/mman.h +++ b/arch/powerpc/include/uapi/asm/mman.h @@ -22,6 +22,7 @@ #define MCL_CURRENT 0x2000 /* lock all currently mapped pages */ #define MCL_FUTURE 0x4000 /* lock all additions to address space */ +#define MCL_ONFAULT 0x8000 /* lock all pages that are faulted in */ #define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */ #define MAP_NONBLOCK 0x10000 /* do not block on IO */ diff --git a/arch/sparc/include/uapi/asm/mman.h b/arch/sparc/include/uapi/asm/mman.h index 12425d8..dd027b8 100644 --- a/arch/sparc/include/uapi/asm/mman.h +++ b/arch/sparc/include/uapi/asm/mman.h @@ -17,6 +17,7 @@ #define MCL_CURRENT 0x2000 /* lock all currently mapped pages */ #define MCL_FUTURE 0x4000 /* lock all additions to address space */ +#define MCL_ONFAULT 0x8000 /* lock all pages that are faulted in */ #define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */ #define MAP_NONBLOCK 0x10000 /* do not block on IO */ diff --git a/arch/tile/include/uapi/asm/mman.h b/arch/tile/include/uapi/asm/mman.h index ec04eaf..0f7ae45 100644 --- a/arch/tile/include/uapi/asm/mman.h +++ b/arch/tile/include/uapi/asm/mman.h @@ -37,6 +37,7 @@ */ #define MCL_CURRENT 1 /* lock all current mappings */ #define MCL_FUTURE 2 /* lock all future mappings */ +#define MCL_ONFAULT 4 /* lock all pages that are faulted in */ #endif /* _ASM_TILE_MMAN_H */ diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h index 42d43cc..10fbbb7 100644 --- a/arch/xtensa/include/uapi/asm/mman.h +++ b/arch/xtensa/include/uapi/asm/mman.h @@ -75,6 +75,7 @@ */ #define MCL_CURRENT 1 /* lock all current mappings */ #define MCL_FUTURE 2 /* lock all future mappings */ +#define MCL_ONFAULT 4 /* lock all pages that are faulted in */ #define MADV_NORMAL 0 /* no further special treatment */ #define MADV_RANDOM 1 /* expect random page references */ diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h index fc4e586..7fb729b 100644 --- a/include/uapi/asm-generic/mman.h +++ b/include/uapi/asm-generic/mman.h @@ -18,5 +18,6 @@ #define MCL_CURRENT 1 /* lock all current mappings */ #define MCL_FUTURE 2 /* lock all future mappings */ +#define MCL_ONFAULT 4 /* lock all pages that are faulted in */ #endif /* __ASM_GENERIC_MMAN_H */ diff --git a/mm/mlock.c b/mm/mlock.c index 6fd2cf1..f15547f 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -579,7 +579,7 @@ static int do_mlock(unsigned long start, size_t len, int on) /* Here we know that vma->vm_start <= nstart < vma->vm_end. */ - newflags = vma->vm_flags & ~VM_LOCKED; + newflags = vma->vm_flags & ~(VM_LOCKED | VM_LOCKONFAULT); if (on) newflags |= VM_LOCKED; @@ -662,13 +662,17 @@ static int do_mlockall(int flags) current->mm->def_flags |= VM_LOCKED; else current->mm->def_flags &= ~VM_LOCKED; - if (flags == MCL_FUTURE) + if (flags & MCL_ONFAULT) + current->mm->def_flags |= VM_LOCKONFAULT; + else + current->mm->def_flags &= ~VM_LOCKONFAULT; + if (flags == MCL_FUTURE || flags == MCL_ONFAULT) goto out; for (vma = current->mm->mmap; vma ; vma = prev->vm_next) { vm_flags_t newflags; - newflags = vma->vm_flags & ~VM_LOCKED; + newflags = vma->vm_flags & ~(VM_LOCKED | VM_LOCKONFAULT); if (flags & MCL_CURRENT) newflags |= VM_LOCKED; @@ -685,7 +689,8 @@ SYSCALL_DEFINE1(mlockall, int, flags) unsigned long lock_limit; int ret = -EINVAL; - if (!flags || (flags & ~(MCL_CURRENT | MCL_FUTURE))) + if (!flags || (flags & ~(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT)) || + ((flags & MCL_FUTURE) && (flags & MCL_ONFAULT))) goto out; ret = -EPERM; -- 1.9.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 24+ messages in thread
[parent not found: <1433942810-7852-1-git-send-email-emunson-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org>]
* [RESEND PATCH V2 3/3] Add tests for lock on fault [not found] ` <1433942810-7852-1-git-send-email-emunson-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org> @ 2015-06-10 13:26 ` Eric B Munson 2015-06-10 21:59 ` [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault Andrew Morton 1 sibling, 0 replies; 24+ messages in thread From: Eric B Munson @ 2015-06-10 13:26 UTC (permalink / raw) To: Andrew Morton Cc: Eric B Munson, Shuah Khan, Michal Hocko, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA Test the mmap() flag, the mlockall() flag, and ensure that mlock limits are respected. Note that the limit test needs to be run a normal user. Signed-off-by: Eric B Munson <emunson-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org> Cc: Shuah Khan <shuahkh-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org> Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Cc: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org --- tools/testing/selftests/vm/Makefile | 8 +- tools/testing/selftests/vm/lock-on-fault.c | 145 ++++++++++++++++++++++++++++ tools/testing/selftests/vm/on-fault-limit.c | 47 +++++++++ tools/testing/selftests/vm/run_vmtests | 23 +++++ 4 files changed, 222 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/vm/lock-on-fault.c create mode 100644 tools/testing/selftests/vm/on-fault-limit.c diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index a5ce953..32f3d20 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -1,7 +1,13 @@ # Makefile for vm selftests CFLAGS = -Wall -BINARIES = hugepage-mmap hugepage-shm map_hugetlb thuge-gen hugetlbfstest +BINARIES = hugepage-mmap +BINARIES += hugepage-shm +BINARIES += hugetlbfstest +BINARIES += lock-on-fault +BINARIES += map_hugetlb +BINARIES += on-fault-limit +BINARIES += thuge-gen BINARIES += transhuge-stress all: $(BINARIES) diff --git a/tools/testing/selftests/vm/lock-on-fault.c b/tools/testing/selftests/vm/lock-on-fault.c new file mode 100644 index 0000000..4659303 --- /dev/null +++ b/tools/testing/selftests/vm/lock-on-fault.c @@ -0,0 +1,145 @@ +#include <sys/mman.h> +#include <stdio.h> +#include <unistd.h> +#include <string.h> +#include <sys/time.h> +#include <sys/resource.h> + +#ifndef MCL_ONFAULT +#define MCL_ONFAULT (MCL_FUTURE << 1) +#endif + +#define PRESENT_BIT 0x8000000000000000 +#define PFN_MASK 0x007FFFFFFFFFFFFF +#define UNEVICTABLE_BIT (1UL << 18) + +static int check_pageflags(void *map) +{ + FILE *file; + unsigned long pfn1; + unsigned long pfn2; + unsigned long offset1; + unsigned long offset2; + int ret = 1; + + file = fopen("/proc/self/pagemap", "r"); + if (!file) { + perror("fopen"); + return ret; + } + offset1 = (unsigned long)map / getpagesize() * sizeof(unsigned long); + offset2 = ((unsigned long)map + getpagesize()) / getpagesize() * sizeof(unsigned long); + if (fseek(file, offset1, SEEK_SET)) { + perror("fseek"); + goto out; + } + + if (fread(&pfn1, sizeof(unsigned long), 1, file) != 1) { + perror("fread"); + goto out; + } + + if (fseek(file, offset2, SEEK_SET)) { + perror("fseek"); + goto out; + } + + if (fread(&pfn2, sizeof(unsigned long), 1, file) != 1) { + perror("fread"); + goto out; + } + + /* pfn2 should not be present */ + if (pfn2 & PRESENT_BIT) { + printf("page map says 0x%lx\n", pfn2); + printf("present is 0x%lx\n", PRESENT_BIT); + goto out; + } + + /* pfn1 should be present */ + if ((pfn1 & PRESENT_BIT) == 0) { + printf("page map says 0x%lx\n", pfn1); + printf("present is 0x%lx\n", PRESENT_BIT); + goto out; + } + + pfn1 &= PFN_MASK; + fclose(file); + file = fopen("/proc/kpageflags", "r"); + if (!file) { + perror("fopen"); + munmap(map, 2 * getpagesize()); + return ret; + } + + if (fseek(file, pfn1 * sizeof(unsigned long), SEEK_SET)) { + perror("fseek"); + goto out; + } + + if (fread(&pfn2, sizeof(unsigned long), 1, file) != 1) { + perror("fread"); + goto out; + } + + /* pfn2 now contains the entry from kpageflags for the first page, the + * unevictable bit should be set */ + if ((pfn2 & UNEVICTABLE_BIT) == 0) { + printf("kpageflags says 0x%lx\n", pfn2); + printf("unevictable is 0x%lx\n", UNEVICTABLE_BIT); + goto out; + } + + ret = 0; + +out: + fclose(file); + return ret; +} + +static int test_mmap(int flags) +{ + int ret = 1; + void *map; + + map = mmap(NULL, 2 * getpagesize(), PROT_READ | PROT_WRITE, flags, 0, 0); + if (map == MAP_FAILED) { + perror("mmap()"); + return ret; + } + + /* Write something into the first page to ensure it is present */ + *(char *)map = 1; + + ret = check_pageflags(map); + + munmap(map, 2 * getpagesize()); + return ret; +} + +static int test_mlockall(void) +{ + int ret = 1; + + if (mlockall(MCL_ONFAULT)) { + perror("mlockall"); + return ret; + } + + ret = test_mmap(MAP_PRIVATE | MAP_ANONYMOUS); + munlockall(); + return ret; +} + +#ifndef MAP_LOCKONFAULT +#define MAP_LOCKONFAULT (MAP_HUGETLB << 1) +#endif + +int main(int argc, char **argv) +{ + int ret = 0; + + ret += test_mmap(MAP_PRIVATE | MAP_ANONYMOUS | MAP_LOCKONFAULT); + ret += test_mlockall(); + return ret; +} diff --git a/tools/testing/selftests/vm/on-fault-limit.c b/tools/testing/selftests/vm/on-fault-limit.c new file mode 100644 index 0000000..ed2a109 --- /dev/null +++ b/tools/testing/selftests/vm/on-fault-limit.c @@ -0,0 +1,47 @@ +#include <sys/mman.h> +#include <stdio.h> +#include <unistd.h> +#include <string.h> +#include <sys/time.h> +#include <sys/resource.h> + +#ifndef MCL_ONFAULT +#define MCL_ONFAULT (MCL_FUTURE << 1) +#endif + +static int test_limit(void) +{ + int ret = 1; + struct rlimit lims; + void *map; + + if (getrlimit(RLIMIT_MEMLOCK, &lims)) { + perror("getrlimit"); + return ret; + } + + if (mlockall(MCL_ONFAULT)) { + perror("mlockall"); + return ret; + } + + map = mmap(NULL, 2 * lims.rlim_max, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, 0, 0); + if (map != MAP_FAILED) + printf("mmap should have failed, but didn't\n"); + else { + ret = 0; + munmap(map, 2 * lims.rlim_max); + } + + munlockall(); + return ret; +} + +int main(int argc, char **argv) +{ + int ret = 0; + + ret += test_limit(); + return ret; +} diff --git a/tools/testing/selftests/vm/run_vmtests b/tools/testing/selftests/vm/run_vmtests index c87b681..c1aecce 100755 --- a/tools/testing/selftests/vm/run_vmtests +++ b/tools/testing/selftests/vm/run_vmtests @@ -90,4 +90,27 @@ fi umount $mnt rm -rf $mnt echo $nr_hugepgs > /proc/sys/vm/nr_hugepages + +echo "--------------------" +echo "running lock-on-fault" +echo "--------------------" +./lock-on-fault +if [ $? -ne 0 ]; then + echo "[FAIL]" + exitcode=1 +else + echo "[PASS]" +fi + +echo "--------------------" +echo "running on-fault-limit" +echo "--------------------" +sudo -u nobody ./on-fault-limit +if [ $? -ne 0 ]; then + echo "[FAIL]" + exitcode=1 +else + echo "[PASS]" +fi + exit $exitcode -- 1.9.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault [not found] ` <1433942810-7852-1-git-send-email-emunson-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org> 2015-06-10 13:26 ` [RESEND PATCH V2 3/3] Add tests for lock " Eric B Munson @ 2015-06-10 21:59 ` Andrew Morton 2015-06-11 19:21 ` Eric B Munson 1 sibling, 1 reply; 24+ messages in thread From: Andrew Morton @ 2015-06-10 21:59 UTC (permalink / raw) To: Eric B Munson Cc: Shuah Khan, Michal Hocko, Michael Kerrisk, linux-alpha-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mips-6z/3iImG2C8G8FEW9MqTrA, linux-parisc-u79uwXL29TY76Z2rM5mHXA, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ, sparclinux-u79uwXL29TY76Z2rM5mHXA, linux-xtensa-PjhNF2WwrV/0Sa2dR60CXw, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-arch-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA On Wed, 10 Jun 2015 09:26:47 -0400 Eric B Munson <emunson-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org> wrote: > mlock() allows a user to control page out of program memory, but this > comes at the cost of faulting in the entire mapping when it is s/mapping/locked area/ > allocated. For large mappings where the entire area is not necessary > this is not ideal. > > This series introduces new flags for mmap() and mlockall() that allow a > user to specify that the covered are should not be paged out, but only > after the memory has been used the first time. The comparison with MCL_FUTURE is hiding over in the 2/3 changelog. It's important so let's copy it here. : MCL_ONFAULT is preferrable to MCL_FUTURE for the use cases enumerated : in the previous patch becuase MCL_FUTURE will behave as if each mapping : was made with MAP_LOCKED, causing the entire mapping to be faulted in : when new space is allocated or mapped. MCL_ONFAULT allows the user to : delay the fault in cost of any given page until it is actually needed, : but then guarantees that that page will always be resident. I *think* it all looks OK. I'd like someone else to go over it also if poss. I guess the 2/3 changelog should have something like : munlockall() will clear MCL_ONFAULT on all vma's in the process's VM. It's pretty obvious, but the manpage delta should make this clear also. Also the changelog(s) and manpage delta should explain that munlock() clears MCL_ONFAULT. And now I'm wondering what happens if userspace does mmap(MAP_LOCKONFAULT) and later does munlock() on just part of that region. Does the vma get split? Is this tested? Should also be in the changelogs and manpage. Ditto mlockall(MCL_ONFAULT) followed by munlock(). I'm not sure that even makes sense but the behaviour should be understood and tested. What's missing here is a syscall to set VM_LOCKONFAULT on an arbitrary range of memory - mlock() for lock-on-fault. It's a shame that mlock() didn't take a `mode' argument. Perhaps we should add such a syscall - that would make the mmap flag unneeded but I suppose it should be kept for symmetry. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault 2015-06-10 21:59 ` [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault Andrew Morton @ 2015-06-11 19:21 ` Eric B Munson 2015-06-11 19:34 ` Andrew Morton 0 siblings, 1 reply; 24+ messages in thread From: Eric B Munson @ 2015-06-11 19:21 UTC (permalink / raw) To: Andrew Morton Cc: Shuah Khan, Michal Hocko, Michael Kerrisk, linux-alpha, linux-kernel, linux-mips, linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm, linux-arch, linux-api -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/10/2015 05:59 PM, Andrew Morton wrote: > On Wed, 10 Jun 2015 09:26:47 -0400 Eric B Munson > <emunson@akamai.com> wrote: > >> mlock() allows a user to control page out of program memory, but >> this comes at the cost of faulting in the entire mapping when it >> is > > s/mapping/locked area/ Done. > >> allocated. For large mappings where the entire area is not >> necessary this is not ideal. >> >> This series introduces new flags for mmap() and mlockall() that >> allow a user to specify that the covered are should not be paged >> out, but only after the memory has been used the first time. > > The comparison with MCL_FUTURE is hiding over in the 2/3 changelog. > It's important so let's copy it here. > > : MCL_ONFAULT is preferrable to MCL_FUTURE for the use cases > enumerated : in the previous patch becuase MCL_FUTURE will behave > as if each mapping : was made with MAP_LOCKED, causing the entire > mapping to be faulted in : when new space is allocated or mapped. > MCL_ONFAULT allows the user to : delay the fault in cost of any > given page until it is actually needed, : but then guarantees that > that page will always be resident. Done > > I *think* it all looks OK. I'd like someone else to go over it > also if poss. > > > I guess the 2/3 changelog should have something like > > : munlockall() will clear MCL_ONFAULT on all vma's in the process's > VM. Done > > It's pretty obvious, but the manpage delta should make this clear > also. Done > > > Also the changelog(s) and manpage delta should explain that > munlock() clears MCL_ONFAULT. Done > > And now I'm wondering what happens if userspace does > mmap(MAP_LOCKONFAULT) and later does munlock() on just part of > that region. Does the vma get split? Is this tested? Should also > be in the changelogs and manpage. > > Ditto mlockall(MCL_ONFAULT) followed by munlock(). I'm not sure > that even makes sense but the behaviour should be understood and > tested. I have extended the kselftest for lock-on-fault to try both of these scenarios and they work as expected. The VMA is split and the VM flags are set appropriately for the resulting VMAs. > > > What's missing here is a syscall to set VM_LOCKONFAULT on an > arbitrary range of memory - mlock() for lock-on-fault. It's a > shame that mlock() didn't take a `mode' argument. Perhaps we > should add such a syscall - that would make the mmap flag unneeded > but I suppose it should be kept for symmetry. Do you want such a system call as part of this set? I would need some time to make sure I had thought through all the possible corners one could get into with such a call, so it would delay a V3 quite a bit. Otherwise I can send a V3 out immediately. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJVed+3AAoJELbVsDOpoOa9eHwP+gO8QmNdUKN55wiTLxXdFTRo TTm62MJ3Yk45+JJ+8xI1POMSUVEBAX7pxnL8TpNPmwp+UF6IQT/hAnnEFNud8/aQ 5bAxU9a5fRO6Q5533woaVpYfXZXwXAla+37MGQziL7O0VEi2aQ9abX7AKnkjmXwq e1Fc3vutAycNCzSxg42GwZxqHw83TYztyv3C4Cc7lShbCezABYvaDvXcUZkGwhjG KJxSPYS2E0nv0MEy995P0L0H1A/KHq6mCOFFKQw6aVbPDs8J/0RhvQIlp/BBCPMV TqDVxMBpTpdWs6reJnUZpouKBTA11KTvUA2HBVn5B14u2V7Np+NBpLKH2DUqAP2v Gyg4Nj0MknqB1rutaBjHjI0ZefrWK5o+zWAVKZs+wtq9WkmCvTYWp505XnlJO+qo 1CEnab2kX8P74UYcsJUrJxAtxc94t6oLh305KnJheQUdcx/ZNKboB2vl1+np10jj oZLmP2RfajZoPojPZ/bI6mj9Ffqf/Ptau+kLQ56G1IuVmQRi4ZgQ9D1+BILXyKHi uycKovcHVffiQ+z1Ama2b4wP1t5yjNdxBH0oV1KMeScCxfyYHPFuDBe36Krjo8FO dDMyibNIRJMX6SeYNIRni40Eafon5h21I95/yWxUaq0FGBZ1NuuSTofxAA53wJJz f0FUI7f53Oxk9EKk8nfg =gfVJ -----END PGP SIGNATURE----- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault 2015-06-11 19:21 ` Eric B Munson @ 2015-06-11 19:34 ` Andrew Morton 2015-06-11 19:55 ` Eric B Munson ` (2 more replies) 0 siblings, 3 replies; 24+ messages in thread From: Andrew Morton @ 2015-06-11 19:34 UTC (permalink / raw) To: Eric B Munson Cc: Shuah Khan, Michal Hocko, Michael Kerrisk, linux-alpha, linux-kernel, linux-mips, linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm, linux-arch, linux-api On Thu, 11 Jun 2015 15:21:30 -0400 Eric B Munson <emunson@akamai.com> wrote: > > Ditto mlockall(MCL_ONFAULT) followed by munlock(). I'm not sure > > that even makes sense but the behaviour should be understood and > > tested. > > I have extended the kselftest for lock-on-fault to try both of these > scenarios and they work as expected. The VMA is split and the VM > flags are set appropriately for the resulting VMAs. munlock() should do vma merging as well. I *think* we implemented that. More tests for you to add ;) How are you testing the vma merging and splitting, btw? Parsing the profcs files? > > What's missing here is a syscall to set VM_LOCKONFAULT on an > > arbitrary range of memory - mlock() for lock-on-fault. It's a > > shame that mlock() didn't take a `mode' argument. Perhaps we > > should add such a syscall - that would make the mmap flag unneeded > > but I suppose it should be kept for symmetry. > > Do you want such a system call as part of this set? I would need some > time to make sure I had thought through all the possible corners one > could get into with such a call, so it would delay a V3 quite a bit. > Otherwise I can send a V3 out immediately. I think the way to look at this is to pretend that mm/mlock.c doesn't exist and ask "how should we design these features". And that would be: - mmap() takes a `flags' argument: MAP_LOCKED|MAP_LOCKONFAULT. - mlock() takes a `flags' argument. Presently that's MLOCK_LOCKED|MLOCK_LOCKONFAULT. - munlock() takes a `flags' arument. MLOCK_LOCKED|MLOCK_LOCKONFAULT to specify which flags are being cleared. - mlockall() and munlockall() ditto. IOW, LOCKED and LOCKEDONFAULT are treated identically and independently. Now, that's how we would have designed all this on day one. And I think we can do this now, by adding new mlock2() and munlock2() syscalls. And we may as well deprecate the old mlock() and munlock(), not that this matters much. *should* we do this? I'm thinking "yes" - it's all pretty simple boilerplate and wrappers and such, and it gets the interface correct, and extensible. What do others think? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault 2015-06-11 19:34 ` Andrew Morton @ 2015-06-11 19:55 ` Eric B Munson 2015-06-12 12:05 ` Vlastimil Babka 2015-06-15 14:39 ` Eric B Munson 2 siblings, 0 replies; 24+ messages in thread From: Eric B Munson @ 2015-06-11 19:55 UTC (permalink / raw) To: Andrew Morton Cc: Shuah Khan, Michal Hocko, Michael Kerrisk, linux-alpha, linux-kernel, linux-mips, linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm, linux-arch, linux-api -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/11/2015 03:34 PM, Andrew Morton wrote: > On Thu, 11 Jun 2015 15:21:30 -0400 Eric B Munson > <emunson@akamai.com> wrote: > >>> Ditto mlockall(MCL_ONFAULT) followed by munlock(). I'm not >>> sure that even makes sense but the behaviour should be >>> understood and tested. >> >> I have extended the kselftest for lock-on-fault to try both of >> these scenarios and they work as expected. The VMA is split and >> the VM flags are set appropriately for the resulting VMAs. > > munlock() should do vma merging as well. I *think* we implemented > that. More tests for you to add ;) I will add a test for this as well. But the code is in place to merge VMAs IIRC. > > How are you testing the vma merging and splitting, btw? Parsing > the profcs files? To show the VMA split happened, I dropped a printk in mlock_fixup() and the user space test simply checks that unlocked pages are not marked as unevictable. The test does not parse maps or smaps for actual VMA layout. Given that we want to check the merging of VMAs as well I will add this. > >>> What's missing here is a syscall to set VM_LOCKONFAULT on an >>> arbitrary range of memory - mlock() for lock-on-fault. It's a >>> shame that mlock() didn't take a `mode' argument. Perhaps we >>> should add such a syscall - that would make the mmap flag >>> unneeded but I suppose it should be kept for symmetry. >> >> Do you want such a system call as part of this set? I would need >> some time to make sure I had thought through all the possible >> corners one could get into with such a call, so it would delay a >> V3 quite a bit. Otherwise I can send a V3 out immediately. > > I think the way to look at this is to pretend that mm/mlock.c > doesn't exist and ask "how should we design these features". > > And that would be: > > - mmap() takes a `flags' argument: MAP_LOCKED|MAP_LOCKONFAULT. > > - mlock() takes a `flags' argument. Presently that's > MLOCK_LOCKED|MLOCK_LOCKONFAULT. > > - munlock() takes a `flags' arument. > MLOCK_LOCKED|MLOCK_LOCKONFAULT to specify which flags are being > cleared. > > - mlockall() and munlockall() ditto. > > > IOW, LOCKED and LOCKEDONFAULT are treated identically and > independently. > > Now, that's how we would have designed all this on day one. And I > think we can do this now, by adding new mlock2() and munlock2() > syscalls. And we may as well deprecate the old mlock() and > munlock(), not that this matters much. > > *should* we do this? I'm thinking "yes" - it's all pretty simple > boilerplate and wrappers and such, and it gets the interface > correct, and extensible. > > What do others think? > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJVeefAAAoJELbVsDOpoOa9930P/j32OhsgPdxt8pmlYddpHBJg PJ4EOYZLoNJ0bWAoePRAQvb9Rd0UumXukkQKVdFCFW72QfMPkjqyMWWOA5BZ6dYl q3h3FTzcnAtVHG7bqFheV+Ie9ZX0dplTmuGlqTZzEIVePry9VXzqp9BADbWn3bVR ucq1CFikyEB2yu8pMtykJmEaz4CO7fzCHz6oB7RNX5oHElWmi9AieuUr5eAw6enQ 6ofuNy/N3rTCwcjeRfdL7Xhs6vn62u4nw1Jey6l9hBQUx/ujMktKcn4VwkDXIYCi +h7lfXWruqOuC+lspBRJO7OL2e6nRdedpDWJypeUGcKXokxB2FEB25Yu31K9sk/8 jDfaKNqmcfgOseLHb+DjJqG6nq9lsUhozg8C17SJpT8qFwQ8q7iJe+1GhUF1EBsL +DpqLU56geBY6fyIfurOfp/4Hsx2u1KzezkEnMYT/8LkbGwqbq7Zj4rquLMSHCUt uG5j0MuhmP8/Fuf8OMsIHHUMjBHRjH4rTyaCKxNj3T8uSuLfcnIqEZiJu2qaSA8l PxpQ6yy2szw9lDxPvxLnh8Rkx+SGEc1ciamyppDTI4LQRiCjMQ7bHAKo0RwAaPJL ZSHrdlDnUHrYTnd0EZwg0peh8AgkROgxna/pLpfQTeW1g3erqPfbI0Ab8N0cu5j0 8+qA5C+DeSjaMAoMskTG =82B8 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault 2015-06-11 19:34 ` Andrew Morton 2015-06-11 19:55 ` Eric B Munson @ 2015-06-12 12:05 ` Vlastimil Babka 2015-06-15 14:43 ` Eric B Munson 2015-06-15 14:39 ` Eric B Munson 2 siblings, 1 reply; 24+ messages in thread From: Vlastimil Babka @ 2015-06-12 12:05 UTC (permalink / raw) To: Andrew Morton, Eric B Munson Cc: Shuah Khan, Michal Hocko, Michael Kerrisk, linux-alpha, linux-kernel, linux-mips, linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm, linux-arch, linux-api On 06/11/2015 09:34 PM, Andrew Morton wrote: > On Thu, 11 Jun 2015 15:21:30 -0400 Eric B Munson <emunson@akamai.com> wrote: > >>> Ditto mlockall(MCL_ONFAULT) followed by munlock(). I'm not sure >>> that even makes sense but the behaviour should be understood and >>> tested. >> >> I have extended the kselftest for lock-on-fault to try both of these >> scenarios and they work as expected. The VMA is split and the VM >> flags are set appropriately for the resulting VMAs. > > munlock() should do vma merging as well. I *think* we implemented > that. More tests for you to add ;) > > How are you testing the vma merging and splitting, btw? Parsing > the profcs files? > >>> What's missing here is a syscall to set VM_LOCKONFAULT on an >>> arbitrary range of memory - mlock() for lock-on-fault. It's a >>> shame that mlock() didn't take a `mode' argument. Perhaps we >>> should add such a syscall - that would make the mmap flag unneeded >>> but I suppose it should be kept for symmetry. >> >> Do you want such a system call as part of this set? I would need some >> time to make sure I had thought through all the possible corners one >> could get into with such a call, so it would delay a V3 quite a bit. >> Otherwise I can send a V3 out immediately. > > I think the way to look at this is to pretend that mm/mlock.c doesn't > exist and ask "how should we design these features". > > And that would be: > > - mmap() takes a `flags' argument: MAP_LOCKED|MAP_LOCKONFAULT. Note that the semantic of MAP_LOCKED can be subtly surprising: "mlock(2) fails if the memory range cannot get populated to guarantee that no future major faults will happen on the range. mmap(MAP_LOCKED) on the other hand silently succeeds even if the range was populated only partially." ( from http://marc.info/?l=linux-mm&m=143152790412727&w=2 ) So MAP_LOCKED can silently behave like MAP_LOCKONFAULT. While MAP_LOCKONFAULT doesn't suffer from such problem, I wonder if that's sufficient reason not to extend mmap by new mlock() flags that can be instead applied to the VMA after mmapping, using the proposed mlock2() with flags. So I think instead we could deprecate MAP_LOCKED more prominently. I doubt the overhead of calling the extra syscall matters here? > - mlock() takes a `flags' argument. Presently that's > MLOCK_LOCKED|MLOCK_LOCKONFAULT. > > - munlock() takes a `flags' arument. MLOCK_LOCKED|MLOCK_LOCKONFAULT > to specify which flags are being cleared. > > - mlockall() and munlockall() ditto. > > > IOW, LOCKED and LOCKEDONFAULT are treated identically and independently. > > Now, that's how we would have designed all this on day one. And I > think we can do this now, by adding new mlock2() and munlock2() > syscalls. And we may as well deprecate the old mlock() and munlock(), > not that this matters much. > > *should* we do this? I'm thinking "yes" - it's all pretty simple > boilerplate and wrappers and such, and it gets the interface correct, > and extensible. If the new LOCKONFAULT functionality is indeed desired (I haven't still decided myself) then I agree that would be the cleanest way. > What do others think? > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault 2015-06-12 12:05 ` Vlastimil Babka @ 2015-06-15 14:43 ` Eric B Munson [not found] ` <20150615144356.GB12300-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 24+ messages in thread From: Eric B Munson @ 2015-06-15 14:43 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Shuah Khan, Michal Hocko, Michael Kerrisk, linux-alpha, linux-kernel, linux-mips, linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm, linux-arch, linux-api [-- Attachment #1: Type: text/plain, Size: 3898 bytes --] On Fri, 12 Jun 2015, Vlastimil Babka wrote: > On 06/11/2015 09:34 PM, Andrew Morton wrote: > >On Thu, 11 Jun 2015 15:21:30 -0400 Eric B Munson <emunson@akamai.com> wrote: > > > >>>Ditto mlockall(MCL_ONFAULT) followed by munlock(). I'm not sure > >>>that even makes sense but the behaviour should be understood and > >>>tested. > >> > >>I have extended the kselftest for lock-on-fault to try both of these > >>scenarios and they work as expected. The VMA is split and the VM > >>flags are set appropriately for the resulting VMAs. > > > >munlock() should do vma merging as well. I *think* we implemented > >that. More tests for you to add ;) > > > >How are you testing the vma merging and splitting, btw? Parsing > >the profcs files? > > > >>>What's missing here is a syscall to set VM_LOCKONFAULT on an > >>>arbitrary range of memory - mlock() for lock-on-fault. It's a > >>>shame that mlock() didn't take a `mode' argument. Perhaps we > >>>should add such a syscall - that would make the mmap flag unneeded > >>>but I suppose it should be kept for symmetry. > >> > >>Do you want such a system call as part of this set? I would need some > >>time to make sure I had thought through all the possible corners one > >>could get into with such a call, so it would delay a V3 quite a bit. > >>Otherwise I can send a V3 out immediately. > > > >I think the way to look at this is to pretend that mm/mlock.c doesn't > >exist and ask "how should we design these features". > > > >And that would be: > > > >- mmap() takes a `flags' argument: MAP_LOCKED|MAP_LOCKONFAULT. > > Note that the semantic of MAP_LOCKED can be subtly surprising: > > "mlock(2) fails if the memory range cannot get populated to guarantee > that no future major faults will happen on the range. > mmap(MAP_LOCKED) on the other hand silently succeeds even if the > range was populated only > partially." > > ( from http://marc.info/?l=linux-mm&m=143152790412727&w=2 ) > > So MAP_LOCKED can silently behave like MAP_LOCKONFAULT. While > MAP_LOCKONFAULT doesn't suffer from such problem, I wonder if that's > sufficient reason not to extend mmap by new mlock() flags that can > be instead applied to the VMA after mmapping, using the proposed > mlock2() with flags. So I think instead we could deprecate > MAP_LOCKED more prominently. I doubt the overhead of calling the > extra syscall matters here? We could talk about retiring the MAP_LOCKED flag but I suspect that would get significantly more pushback than adding a new mmap flag. Likely that the overhead does not matter in most cases, but presumably there are cases where it does (as we have a MAP_LOCKED flag today). Even with the proposed new system calls I think we should have the MAP_LOCKONFAULT for parity with MAP_LOCKED. > > >- mlock() takes a `flags' argument. Presently that's > > MLOCK_LOCKED|MLOCK_LOCKONFAULT. > > > >- munlock() takes a `flags' arument. MLOCK_LOCKED|MLOCK_LOCKONFAULT > > to specify which flags are being cleared. > > > >- mlockall() and munlockall() ditto. > > > > > >IOW, LOCKED and LOCKEDONFAULT are treated identically and independently. > > > >Now, that's how we would have designed all this on day one. And I > >think we can do this now, by adding new mlock2() and munlock2() > >syscalls. And we may as well deprecate the old mlock() and munlock(), > >not that this matters much. > > > >*should* we do this? I'm thinking "yes" - it's all pretty simple > >boilerplate and wrappers and such, and it gets the interface correct, > >and extensible. > > If the new LOCKONFAULT functionality is indeed desired (I haven't > still decided myself) then I agree that would be the cleanest way. Do you disagree with the use cases I have listed or do you think there is a better way of addressing those cases? > > >What do others think? [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <20150615144356.GB12300-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org>]
* Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault [not found] ` <20150615144356.GB12300-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org> @ 2015-06-23 13:04 ` Vlastimil Babka 2015-06-25 14:16 ` Eric B Munson 0 siblings, 1 reply; 24+ messages in thread From: Vlastimil Babka @ 2015-06-23 13:04 UTC (permalink / raw) To: Eric B Munson Cc: Andrew Morton, Shuah Khan, Michal Hocko, Michael Kerrisk, linux-alpha-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mips-6z/3iImG2C8G8FEW9MqTrA, linux-parisc-u79uwXL29TY76Z2rM5mHXA, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ, sparclinux-u79uwXL29TY76Z2rM5mHXA, linux-xtensa-PjhNF2WwrV/0Sa2dR60CXw, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-arch-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA On 06/15/2015 04:43 PM, Eric B Munson wrote: >> Note that the semantic of MAP_LOCKED can be subtly surprising: >> >> "mlock(2) fails if the memory range cannot get populated to guarantee >> that no future major faults will happen on the range. >> mmap(MAP_LOCKED) on the other hand silently succeeds even if the >> range was populated only >> partially." >> >> ( from http://marc.info/?l=linux-mm&m=143152790412727&w=2 ) >> >> So MAP_LOCKED can silently behave like MAP_LOCKONFAULT. While >> MAP_LOCKONFAULT doesn't suffer from such problem, I wonder if that's >> sufficient reason not to extend mmap by new mlock() flags that can >> be instead applied to the VMA after mmapping, using the proposed >> mlock2() with flags. So I think instead we could deprecate >> MAP_LOCKED more prominently. I doubt the overhead of calling the >> extra syscall matters here? > > We could talk about retiring the MAP_LOCKED flag but I suspect that > would get significantly more pushback than adding a new mmap flag. Oh no we can't "retire" as in remove the flag, ever. Just not continue the way of mmap() flags related to mlock(). > Likely that the overhead does not matter in most cases, but presumably > there are cases where it does (as we have a MAP_LOCKED flag today). > Even with the proposed new system calls I think we should have the > MAP_LOCKONFAULT for parity with MAP_LOCKED. I'm not convinced, but it's not a major issue. >> >>> - mlock() takes a `flags' argument. Presently that's >>> MLOCK_LOCKED|MLOCK_LOCKONFAULT. >>> >>> - munlock() takes a `flags' arument. MLOCK_LOCKED|MLOCK_LOCKONFAULT >>> to specify which flags are being cleared. >>> >>> - mlockall() and munlockall() ditto. >>> >>> >>> IOW, LOCKED and LOCKEDONFAULT are treated identically and independently. >>> >>> Now, that's how we would have designed all this on day one. And I >>> think we can do this now, by adding new mlock2() and munlock2() >>> syscalls. And we may as well deprecate the old mlock() and munlock(), >>> not that this matters much. >>> >>> *should* we do this? I'm thinking "yes" - it's all pretty simple >>> boilerplate and wrappers and such, and it gets the interface correct, >>> and extensible. >> >> If the new LOCKONFAULT functionality is indeed desired (I haven't >> still decided myself) then I agree that would be the cleanest way. > > Do you disagree with the use cases I have listed or do you think there > is a better way of addressing those cases? I'm somewhat sceptical about the security one. Are security sensitive buffers that large to matter? The performance one is more convincing and I don't see a better way, so OK. > >> >>> What do others think? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault 2015-06-23 13:04 ` Vlastimil Babka @ 2015-06-25 14:16 ` Eric B Munson 2015-06-25 14:26 ` Andy Lutomirski 0 siblings, 1 reply; 24+ messages in thread From: Eric B Munson @ 2015-06-25 14:16 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Shuah Khan, Michal Hocko, Michael Kerrisk, linux-alpha, linux-kernel, linux-mips, linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm, linux-arch, linux-api [-- Attachment #1: Type: text/plain, Size: 2978 bytes --] On Tue, 23 Jun 2015, Vlastimil Babka wrote: > On 06/15/2015 04:43 PM, Eric B Munson wrote: > >>Note that the semantic of MAP_LOCKED can be subtly surprising: > >> > >>"mlock(2) fails if the memory range cannot get populated to guarantee > >>that no future major faults will happen on the range. > >>mmap(MAP_LOCKED) on the other hand silently succeeds even if the > >>range was populated only > >>partially." > >> > >>( from http://marc.info/?l=linux-mm&m=143152790412727&w=2 ) > >> > >>So MAP_LOCKED can silently behave like MAP_LOCKONFAULT. While > >>MAP_LOCKONFAULT doesn't suffer from such problem, I wonder if that's > >>sufficient reason not to extend mmap by new mlock() flags that can > >>be instead applied to the VMA after mmapping, using the proposed > >>mlock2() with flags. So I think instead we could deprecate > >>MAP_LOCKED more prominently. I doubt the overhead of calling the > >>extra syscall matters here? > > > >We could talk about retiring the MAP_LOCKED flag but I suspect that > >would get significantly more pushback than adding a new mmap flag. > > Oh no we can't "retire" as in remove the flag, ever. Just not > continue the way of mmap() flags related to mlock(). > > >Likely that the overhead does not matter in most cases, but presumably > >there are cases where it does (as we have a MAP_LOCKED flag today). > >Even with the proposed new system calls I think we should have the > >MAP_LOCKONFAULT for parity with MAP_LOCKED. > > I'm not convinced, but it's not a major issue. > > >> > >>>- mlock() takes a `flags' argument. Presently that's > >>> MLOCK_LOCKED|MLOCK_LOCKONFAULT. > >>> > >>>- munlock() takes a `flags' arument. MLOCK_LOCKED|MLOCK_LOCKONFAULT > >>> to specify which flags are being cleared. > >>> > >>>- mlockall() and munlockall() ditto. > >>> > >>> > >>>IOW, LOCKED and LOCKEDONFAULT are treated identically and independently. > >>> > >>>Now, that's how we would have designed all this on day one. And I > >>>think we can do this now, by adding new mlock2() and munlock2() > >>>syscalls. And we may as well deprecate the old mlock() and munlock(), > >>>not that this matters much. > >>> > >>>*should* we do this? I'm thinking "yes" - it's all pretty simple > >>>boilerplate and wrappers and such, and it gets the interface correct, > >>>and extensible. > >> > >>If the new LOCKONFAULT functionality is indeed desired (I haven't > >>still decided myself) then I agree that would be the cleanest way. > > > >Do you disagree with the use cases I have listed or do you think there > >is a better way of addressing those cases? > > I'm somewhat sceptical about the security one. Are security > sensitive buffers that large to matter? The performance one is more > convincing and I don't see a better way, so OK. They can be, the two that come to mind are medical images and high resolution sensor data. > > > > >> > >>>What do others think? > [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault 2015-06-25 14:16 ` Eric B Munson @ 2015-06-25 14:26 ` Andy Lutomirski 0 siblings, 0 replies; 24+ messages in thread From: Andy Lutomirski @ 2015-06-25 14:26 UTC (permalink / raw) To: Eric B Munson Cc: Vlastimil Babka, Andrew Morton, Shuah Khan, Michal Hocko, Michael Kerrisk, linux-alpha, linux-kernel@vger.kernel.org, Linux MIPS Mailing List, linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm@kvack.org, linux-arch, Linux API On Thu, Jun 25, 2015 at 7:16 AM, Eric B Munson <emunson@akamai.com> wrote: > On Tue, 23 Jun 2015, Vlastimil Babka wrote: > >> On 06/15/2015 04:43 PM, Eric B Munson wrote: >> >> >> >>If the new LOCKONFAULT functionality is indeed desired (I haven't >> >>still decided myself) then I agree that would be the cleanest way. >> > >> >Do you disagree with the use cases I have listed or do you think there >> >is a better way of addressing those cases? >> >> I'm somewhat sceptical about the security one. Are security >> sensitive buffers that large to matter? The performance one is more >> convincing and I don't see a better way, so OK. > > They can be, the two that come to mind are medical images and high > resolution sensor data. I think we've been handling sensitive memory pages wrong forever. We shouldn't lock them into memory; we should flag them as sensitive and encrypt them if they're ever written out to disk. --Andy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault 2015-06-11 19:34 ` Andrew Morton 2015-06-11 19:55 ` Eric B Munson 2015-06-12 12:05 ` Vlastimil Babka @ 2015-06-15 14:39 ` Eric B Munson 2 siblings, 0 replies; 24+ messages in thread From: Eric B Munson @ 2015-06-15 14:39 UTC (permalink / raw) To: Andrew Morton Cc: Shuah Khan, Michal Hocko, Michael Kerrisk, linux-alpha, linux-kernel, linux-mips, linux-parisc, linuxppc-dev, sparclinux, linux-xtensa, linux-mm, linux-arch, linux-api [-- Attachment #1: Type: text/plain, Size: 2584 bytes --] On Thu, 11 Jun 2015, Andrew Morton wrote: > On Thu, 11 Jun 2015 15:21:30 -0400 Eric B Munson <emunson@akamai.com> wrote: > > > > Ditto mlockall(MCL_ONFAULT) followed by munlock(). I'm not sure > > > that even makes sense but the behaviour should be understood and > > > tested. > > > > I have extended the kselftest for lock-on-fault to try both of these > > scenarios and they work as expected. The VMA is split and the VM > > flags are set appropriately for the resulting VMAs. > > munlock() should do vma merging as well. I *think* we implemented > that. More tests for you to add ;) > > How are you testing the vma merging and splitting, btw? Parsing > the profcs files? The lock-on-fault test now covers VMA splitting and merging by parsing /proc/self/maps. VMA splitting and merging works as it should with both MAP_LOCKONFAULT and MCL_ONFAULT. > > > > What's missing here is a syscall to set VM_LOCKONFAULT on an > > > arbitrary range of memory - mlock() for lock-on-fault. It's a > > > shame that mlock() didn't take a `mode' argument. Perhaps we > > > should add such a syscall - that would make the mmap flag unneeded > > > but I suppose it should be kept for symmetry. > > > > Do you want such a system call as part of this set? I would need some > > time to make sure I had thought through all the possible corners one > > could get into with such a call, so it would delay a V3 quite a bit. > > Otherwise I can send a V3 out immediately. > > I think the way to look at this is to pretend that mm/mlock.c doesn't > exist and ask "how should we design these features". > > And that would be: > > - mmap() takes a `flags' argument: MAP_LOCKED|MAP_LOCKONFAULT. > > - mlock() takes a `flags' argument. Presently that's > MLOCK_LOCKED|MLOCK_LOCKONFAULT. > > - munlock() takes a `flags' arument. MLOCK_LOCKED|MLOCK_LOCKONFAULT > to specify which flags are being cleared. > > - mlockall() and munlockall() ditto. > > > IOW, LOCKED and LOCKEDONFAULT are treated identically and independently. > > Now, that's how we would have designed all this on day one. And I > think we can do this now, by adding new mlock2() and munlock2() > syscalls. And we may as well deprecate the old mlock() and munlock(), > not that this matters much. > > *should* we do this? I'm thinking "yes" - it's all pretty simple > boilerplate and wrappers and such, and it gets the interface correct, > and extensible. > > What do others think? I am working on V3 which will introduce the new system calls. [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2015-06-25 14:46 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-06-10 13:26 [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault Eric B Munson 2015-06-10 13:26 ` [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after " Eric B Munson 2015-06-18 15:29 ` Michal Hocko 2015-06-18 20:30 ` Eric B Munson 2015-06-19 14:57 ` Michal Hocko 2015-06-19 16:43 ` Eric B Munson [not found] ` <20150619164333.GD2329-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org> 2015-06-22 12:38 ` Michal Hocko [not found] ` <20150622123826.GF4430-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org> 2015-06-22 14:18 ` Eric B Munson 2015-06-23 12:45 ` Vlastimil Babka [not found] ` <558954DD.4060405-AlSwsSmVLrQ@public.gmane.org> 2015-06-24 9:47 ` Michal Hocko 2015-06-24 8:50 ` Michal Hocko 2015-06-25 14:46 ` Eric B Munson 2015-06-10 13:26 ` [RESEND PATCH V2 2/3] Add mlockall flag for locking pages on fault Eric B Munson [not found] ` <1433942810-7852-1-git-send-email-emunson-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org> 2015-06-10 13:26 ` [RESEND PATCH V2 3/3] Add tests for lock " Eric B Munson 2015-06-10 21:59 ` [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault Andrew Morton 2015-06-11 19:21 ` Eric B Munson 2015-06-11 19:34 ` Andrew Morton 2015-06-11 19:55 ` Eric B Munson 2015-06-12 12:05 ` Vlastimil Babka 2015-06-15 14:43 ` Eric B Munson [not found] ` <20150615144356.GB12300-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org> 2015-06-23 13:04 ` Vlastimil Babka 2015-06-25 14:16 ` Eric B Munson 2015-06-25 14:26 ` Andy Lutomirski 2015-06-15 14:39 ` Eric B Munson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).