* Re: [PATCH] mm: Add MREMAP_DONTUNMAP to mremap().
2020-01-23 1:46 ` Brian Geffon
@ 2020-01-23 3:02 ` Andy Lutomirski
-1 siblings, 0 replies; 41+ messages in thread
From: Andy Lutomirski @ 2020-01-23 3:02 UTC (permalink / raw)
To: Brian Geffon
Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Andrew Morton,
Michael S . Tsirkin, Arnd Bergmann, Sonny Rao, Minchan Kim,
Joel Fernandes, Lokesh Gidra, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA, Yu Zhao, Jesse Barnes
> On Jan 22, 2020, at 5:46 PM, Brian Geffon <bgeffon-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>
> MREMAP_DONTUNMAP is an additional flag that can be used with
> MREMAP_FIXED to move a mapping to a new address. Normally, mremap(2)
> would then tear down the old vma so subsequent accesses to the vma
> cause a segfault. However, with this new flag it will keep the old
> vma with zapping PTEs so any access to the old VMA after that point
> will result in a pagefault.
This needs a vastly better description. Perhaps:
When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is set, the source mapping will not be removed. Instead it will be cleared as if a brand new anonymous, private mapping had been created atomically as part of the mremap() call. If a userfaultfd was watching the source, it will continue to watch the new mapping. For a mapping that is shared or not anonymous, MREMAP_DONTUNMAP will cause the mremap() call to fail.
Or is it something else?
>
> This feature will find a use in ChromeOS along with userfaultfd.
> Specifically we will want to register a VMA with userfaultfd and then
> pull it out from under a running process. By using MREMAP_DONTUNMAP we
> don't have to worry about mprotecting and then potentially racing with
> VMA permission changes from a running process.
Does this mean you yank it out but you want to replace it simultaneously?
>
> This feature also has a use case in Android, Lokesh Gidra has said
> that "As part of using userfaultfd for GC, We'll have to move the physical
> pages of the java heap to a separate location. For this purpose mremap
> will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
> heap, its virtual mapping will be removed as well. Therefore, we'll
> require performing mmap immediately after. This is not only time consuming
> but also opens a time window where a native thread may call mmap and
> reserve the java heap's address range for its own usage. This flag
> solves the problem."
Cute.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm: Add MREMAP_DONTUNMAP to mremap().
@ 2020-01-23 3:02 ` Andy Lutomirski
0 siblings, 0 replies; 41+ messages in thread
From: Andy Lutomirski @ 2020-01-23 3:02 UTC (permalink / raw)
To: Brian Geffon
Cc: linux-mm, Andrew Morton, Michael S . Tsirkin, Arnd Bergmann,
Sonny Rao, Minchan Kim, Joel Fernandes, Lokesh Gidra,
linux-kernel, linux-api, Yu Zhao, Jesse Barnes
> On Jan 22, 2020, at 5:46 PM, Brian Geffon <bgeffon@google.com> wrote:
>
> MREMAP_DONTUNMAP is an additional flag that can be used with
> MREMAP_FIXED to move a mapping to a new address. Normally, mremap(2)
> would then tear down the old vma so subsequent accesses to the vma
> cause a segfault. However, with this new flag it will keep the old
> vma with zapping PTEs so any access to the old VMA after that point
> will result in a pagefault.
This needs a vastly better description. Perhaps:
When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is set, the source mapping will not be removed. Instead it will be cleared as if a brand new anonymous, private mapping had been created atomically as part of the mremap() call. If a userfaultfd was watching the source, it will continue to watch the new mapping. For a mapping that is shared or not anonymous, MREMAP_DONTUNMAP will cause the mremap() call to fail.
Or is it something else?
>
> This feature will find a use in ChromeOS along with userfaultfd.
> Specifically we will want to register a VMA with userfaultfd and then
> pull it out from under a running process. By using MREMAP_DONTUNMAP we
> don't have to worry about mprotecting and then potentially racing with
> VMA permission changes from a running process.
Does this mean you yank it out but you want to replace it simultaneously?
>
> This feature also has a use case in Android, Lokesh Gidra has said
> that "As part of using userfaultfd for GC, We'll have to move the physical
> pages of the java heap to a separate location. For this purpose mremap
> will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
> heap, its virtual mapping will be removed as well. Therefore, we'll
> require performing mmap immediately after. This is not only time consuming
> but also opens a time window where a native thread may call mmap and
> reserve the java heap's address range for its own usage. This flag
> solves the problem."
Cute.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm: Add MREMAP_DONTUNMAP to mremap().
2020-01-23 3:02 ` Andy Lutomirski
(?)
@ 2020-01-23 18:47 ` Lokesh Gidra
-1 siblings, 0 replies; 41+ messages in thread
From: Lokesh Gidra @ 2020-01-23 18:47 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Brian Geffon, linux-mm, Andrew Morton, Michael S . Tsirkin,
Arnd Bergmann, Sonny Rao, Minchan Kim, Joel Fernandes,
linux-kernel, linux-api, Yu Zhao, Jesse Barnes
[-- Attachment #1: Type: text/plain, Size: 2244 bytes --]
On Wed, Jan 22, 2020 at 7:02 PM Andy Lutomirski <luto@amacapital.net> wrote:
>
>
> > On Jan 22, 2020, at 5:46 PM, Brian Geffon <bgeffon@google.com> wrote:
> >
> > MREMAP_DONTUNMAP is an additional flag that can be used with
> > MREMAP_FIXED to move a mapping to a new address. Normally, mremap(2)
> > would then tear down the old vma so subsequent accesses to the vma
> > cause a segfault. However, with this new flag it will keep the old
> > vma with zapping PTEs so any access to the old VMA after that point
> > will result in a pagefault.
>
> This needs a vastly better description. Perhaps:
>
> When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is set,
> the source mapping will not be removed. Instead it will be cleared as if a
> brand new anonymous, private mapping had been created atomically as part of
> the mremap() call. If a userfaultfd was watching the source, it will
> continue to watch the new mapping. For a mapping that is shared or not
> anonymous, MREMAP_DONTUNMAP will cause the mremap() call to fail.
>
> This is the exact behaviour I'm looking for.
> Or is it something else?
>
> >
> > This feature will find a use in ChromeOS along with userfaultfd.
> > Specifically we will want to register a VMA with userfaultfd and then
> > pull it out from under a running process. By using MREMAP_DONTUNMAP we
> > don't have to worry about mprotecting and then potentially racing with
> > VMA permission changes from a running process.
>
> Does this mean you yank it out but you want to replace it simultaneously?
>
> >
> > This feature also has a use case in Android, Lokesh Gidra has said
> > that "As part of using userfaultfd for GC, We'll have to move the
> physical
> > pages of the java heap to a separate location. For this purpose mremap
> > will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
> > heap, its virtual mapping will be removed as well. Therefore, we'll
> > require performing mmap immediately after. This is not only time
> consuming
> > but also opens a time window where a native thread may call mmap and
> > reserve the java heap's address range for its own usage. This flag
> > solves the problem."
>
> Cute.
[-- Attachment #2: Type: text/html, Size: 2925 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm: Add MREMAP_DONTUNMAP to mremap().
2020-01-23 3:02 ` Andy Lutomirski
(?)
(?)
@ 2020-01-23 19:02 ` Brian Geffon
-1 siblings, 0 replies; 41+ messages in thread
From: Brian Geffon @ 2020-01-23 19:02 UTC (permalink / raw)
To: Andy Lutomirski
Cc: linux-mm, Andrew Morton, Michael S . Tsirkin, Arnd Bergmann,
Sonny Rao, Minchan Kim, Joel Fernandes, Lokesh Gidra, LKML,
linux-api, Yu Zhao, Jesse Barnes
[-- Attachment #1: Type: text/plain, Size: 2378 bytes --]
Andy,
Thanks, yes, that's a much clearer description of the feature. I'll make
sure to update the description with subsequent patches and with later man
page updates.
Brian
On Wed, Jan 22, 2020 at 7:02 PM Andy Lutomirski <luto@amacapital.net> wrote:
>
>
> > On Jan 22, 2020, at 5:46 PM, Brian Geffon <bgeffon@google.com> wrote:
> >
> > MREMAP_DONTUNMAP is an additional flag that can be used with
> > MREMAP_FIXED to move a mapping to a new address. Normally, mremap(2)
> > would then tear down the old vma so subsequent accesses to the vma
> > cause a segfault. However, with this new flag it will keep the old
> > vma with zapping PTEs so any access to the old VMA after that point
> > will result in a pagefault.
>
> This needs a vastly better description. Perhaps:
>
> When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is set,
> the source mapping will not be removed. Instead it will be cleared as if a
> brand new anonymous, private mapping had been created atomically as part of
> the mremap() call. If a userfaultfd was watching the source, it will
> continue to watch the new mapping. For a mapping that is shared or not
> anonymous, MREMAP_DONTUNMAP will cause the mremap() call to fail.
>
> Or is it something else?
>
> >
> > This feature will find a use in ChromeOS along with userfaultfd.
> > Specifically we will want to register a VMA with userfaultfd and then
> > pull it out from under a running process. By using MREMAP_DONTUNMAP we
> > don't have to worry about mprotecting and then potentially racing with
> > VMA permission changes from a running process.
>
> Does this mean you yank it out but you want to replace it simultaneously?
>
> >
> > This feature also has a use case in Android, Lokesh Gidra has said
> > that "As part of using userfaultfd for GC, We'll have to move the
> physical
> > pages of the java heap to a separate location. For this purpose mremap
> > will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
> > heap, its virtual mapping will be removed as well. Therefore, we'll
> > require performing mmap immediately after. This is not only time
> consuming
> > but also opens a time window where a native thread may call mmap and
> > reserve the java heap's address range for its own usage. This flag
> > solves the problem."
>
> Cute.
[-- Attachment #2: Type: text/html, Size: 2992 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm: Add MREMAP_DONTUNMAP to mremap().
2020-01-23 3:02 ` Andy Lutomirski
` (2 preceding siblings ...)
(?)
@ 2020-01-23 19:03 ` Brian Geffon
-1 siblings, 0 replies; 41+ messages in thread
From: Brian Geffon @ 2020-01-23 19:03 UTC (permalink / raw)
To: Andy Lutomirski
Cc: linux-mm, Andrew Morton, Michael S . Tsirkin, Arnd Bergmann,
Sonny Rao, Minchan Kim, Joel Fernandes, Lokesh Gidra, LKML,
linux-api, Yu Zhao, Jesse Barnes
Andy,
Thanks, yes, that's a much clearer description of the feature. I'll
make sure to update the description with subsequent patches and with
later man page updates.
Brian
On Wed, Jan 22, 2020 at 7:02 PM Andy Lutomirski <luto@amacapital.net> wrote:
>
>
>
> > On Jan 22, 2020, at 5:46 PM, Brian Geffon <bgeffon@google.com> wrote:
> >
> > MREMAP_DONTUNMAP is an additional flag that can be used with
> > MREMAP_FIXED to move a mapping to a new address. Normally, mremap(2)
> > would then tear down the old vma so subsequent accesses to the vma
> > cause a segfault. However, with this new flag it will keep the old
> > vma with zapping PTEs so any access to the old VMA after that point
> > will result in a pagefault.
>
> This needs a vastly better description. Perhaps:
>
> When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is set, the source mapping will not be removed. Instead it will be cleared as if a brand new anonymous, private mapping had been created atomically as part of the mremap() call. If a userfaultfd was watching the source, it will continue to watch the new mapping. For a mapping that is shared or not anonymous, MREMAP_DONTUNMAP will cause the mremap() call to fail.
>
> Or is it something else?
>
> >
> > This feature will find a use in ChromeOS along with userfaultfd.
> > Specifically we will want to register a VMA with userfaultfd and then
> > pull it out from under a running process. By using MREMAP_DONTUNMAP we
> > don't have to worry about mprotecting and then potentially racing with
> > VMA permission changes from a running process.
>
> Does this mean you yank it out but you want to replace it simultaneously?
>
> >
> > This feature also has a use case in Android, Lokesh Gidra has said
> > that "As part of using userfaultfd for GC, We'll have to move the physical
> > pages of the java heap to a separate location. For this purpose mremap
> > will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
> > heap, its virtual mapping will be removed as well. Therefore, we'll
> > require performing mmap immediately after. This is not only time consuming
> > but also opens a time window where a native thread may call mmap and
> > reserve the java heap's address range for its own usage. This flag
> > solves the problem."
>
> Cute.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH v2] mm: Add MREMAP_DONTUNMAP to mremap().
2020-01-23 1:46 ` Brian Geffon
@ 2020-01-24 19:06 ` Brian Geffon
-1 siblings, 0 replies; 41+ messages in thread
From: Brian Geffon @ 2020-01-24 19:06 UTC (permalink / raw)
To: Andrew Morton
Cc: Michael S . Tsirkin, Brian Geffon, Arnd Bergmann,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA,
Andy Lutomirski, Andrea Arcangeli, Sonny Rao, Minchan Kim,
Joel Fernandes, Yu Zhao, Jesse Barnes
When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is
set, the source mapping will not be removed. Instead it will be
cleared as if a brand new anonymous, private mapping had been created
atomically as part of the mremap() call. If a userfaultfd was watching
the source, it will continue to watch the new mapping. For a mapping
that is shared or not anonymous, MREMAP_DONTUNMAP will cause the
mremap() call to fail. MREMAP_DONTUNMAP implies that MREMAP_FIXED is
also used. The final result is two equally sized VMAs where the
destination contains the PTEs of the source.
We hope to use this in Chrome OS where with userfaultfd we could write
an anonymous mapping to disk without having to STOP the process or worry
about VMA permission changes.
This feature also has a use case in Android, Lokesh Gidra has said
that "As part of using userfaultfd for GC, We'll have to move the physical
pages of the java heap to a separate location. For this purpose mremap
will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
heap, its virtual mapping will be removed as well. Therefore, we'll
require performing mmap immediately after. This is not only time consuming
but also opens a time window where a native thread may call mmap and
reserve the java heap's address range for its own usage. This flag
solves the problem."
Signed-off-by: Brian Geffon <bgeffon-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
include/uapi/linux/mman.h | 5 +++--
mm/mremap.c | 37 ++++++++++++++++++++++++++++++-------
2 files changed, 33 insertions(+), 9 deletions(-)
diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h
index fc1a64c3447b..923cc162609c 100644
--- a/include/uapi/linux/mman.h
+++ b/include/uapi/linux/mman.h
@@ -5,8 +5,9 @@
#include <asm/mman.h>
#include <asm-generic/hugetlb_encode.h>
-#define MREMAP_MAYMOVE 1
-#define MREMAP_FIXED 2
+#define MREMAP_MAYMOVE 1
+#define MREMAP_FIXED 2
+#define MREMAP_DONTUNMAP 4
#define OVERCOMMIT_GUESS 0
#define OVERCOMMIT_ALWAYS 1
diff --git a/mm/mremap.c b/mm/mremap.c
index 122938dcec15..bf97c3eb538b 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -318,8 +318,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
static unsigned long move_vma(struct vm_area_struct *vma,
unsigned long old_addr, unsigned long old_len,
unsigned long new_len, unsigned long new_addr,
- bool *locked, struct vm_userfaultfd_ctx *uf,
- struct list_head *uf_unmap)
+ bool *locked, unsigned long flags,
+ struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap)
{
struct mm_struct *mm = vma->vm_mm;
struct vm_area_struct *new_vma;
@@ -408,6 +408,13 @@ static unsigned long move_vma(struct vm_area_struct *vma,
if (unlikely(vma->vm_flags & VM_PFNMAP))
untrack_pfn_moved(vma);
+ if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) {
+ if (vm_flags & VM_ACCOUNT)
+ vma->vm_flags |= VM_ACCOUNT;
+
+ goto out;
+ }
+
if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) {
/* OOM: unable to split vma, just get accounts right */
vm_unacct_memory(excess >> PAGE_SHIFT);
@@ -422,6 +429,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
vma->vm_next->vm_flags |= VM_ACCOUNT;
}
+out:
if (vm_flags & VM_LOCKED) {
mm->locked_vm += new_len >> PAGE_SHIFT;
*locked = true;
@@ -497,7 +505,7 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
unsigned long new_addr, unsigned long new_len, bool *locked,
- struct vm_userfaultfd_ctx *uf,
+ unsigned long flags, struct vm_userfaultfd_ctx *uf,
struct list_head *uf_unmap_early,
struct list_head *uf_unmap)
{
@@ -545,6 +553,17 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
old_len = new_len;
}
+ /*
+ * MREMAP_DONTUNMAP expands by old_len + (new_len - old_len), we will
+ * check that we can expand by old_len and vma_to_resize will handle
+ * the vma growing.
+ */
+ if (unlikely(flags & MREMAP_DONTUNMAP && !may_expand_vm(mm,
+ vma->vm_flags, old_len >> PAGE_SHIFT))) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
vma = vma_to_resize(addr, old_len, new_len, &charged);
if (IS_ERR(vma)) {
ret = PTR_ERR(vma);
@@ -561,7 +580,7 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
if (IS_ERR_VALUE(ret))
goto out1;
- ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, uf,
+ ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, flags, uf,
uf_unmap);
if (!(offset_in_page(ret)))
goto out;
@@ -609,12 +628,15 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
addr = untagged_addr(addr);
new_addr = untagged_addr(new_addr);
- if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE))
+ if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP))
return ret;
if (flags & MREMAP_FIXED && !(flags & MREMAP_MAYMOVE))
return ret;
+ if (flags & MREMAP_DONTUNMAP && !(flags & MREMAP_FIXED))
+ return ret;
+
if (offset_in_page(addr))
return ret;
@@ -634,7 +656,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
if (flags & MREMAP_FIXED) {
ret = mremap_to(addr, old_len, new_addr, new_len,
- &locked, &uf, &uf_unmap_early, &uf_unmap);
+ &locked, flags, &uf, &uf_unmap_early,
+ &uf_unmap);
goto out;
}
@@ -712,7 +735,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
}
ret = move_vma(vma, addr, old_len, new_len, new_addr,
- &locked, &uf, &uf_unmap);
+ &locked, flags, &uf, &uf_unmap);
}
out:
if (offset_in_page(ret)) {
--
2.25.0.341.g760bfbb309-goog
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH v2] mm: Add MREMAP_DONTUNMAP to mremap().
@ 2020-01-24 19:06 ` Brian Geffon
0 siblings, 0 replies; 41+ messages in thread
From: Brian Geffon @ 2020-01-24 19:06 UTC (permalink / raw)
To: Andrew Morton
Cc: Michael S . Tsirkin, Brian Geffon, Arnd Bergmann, linux-kernel,
linux-mm, linux-api, Andy Lutomirski, Andrea Arcangeli, Sonny Rao,
Minchan Kim, Joel Fernandes, Yu Zhao, Jesse Barnes
When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is
set, the source mapping will not be removed. Instead it will be
cleared as if a brand new anonymous, private mapping had been created
atomically as part of the mremap() call. If a userfaultfd was watching
the source, it will continue to watch the new mapping. For a mapping
that is shared or not anonymous, MREMAP_DONTUNMAP will cause the
mremap() call to fail. MREMAP_DONTUNMAP implies that MREMAP_FIXED is
also used. The final result is two equally sized VMAs where the
destination contains the PTEs of the source.
We hope to use this in Chrome OS where with userfaultfd we could write
an anonymous mapping to disk without having to STOP the process or worry
about VMA permission changes.
This feature also has a use case in Android, Lokesh Gidra has said
that "As part of using userfaultfd for GC, We'll have to move the physical
pages of the java heap to a separate location. For this purpose mremap
will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
heap, its virtual mapping will be removed as well. Therefore, we'll
require performing mmap immediately after. This is not only time consuming
but also opens a time window where a native thread may call mmap and
reserve the java heap's address range for its own usage. This flag
solves the problem."
Signed-off-by: Brian Geffon <bgeffon@google.com>
---
include/uapi/linux/mman.h | 5 +++--
mm/mremap.c | 37 ++++++++++++++++++++++++++++++-------
2 files changed, 33 insertions(+), 9 deletions(-)
diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h
index fc1a64c3447b..923cc162609c 100644
--- a/include/uapi/linux/mman.h
+++ b/include/uapi/linux/mman.h
@@ -5,8 +5,9 @@
#include <asm/mman.h>
#include <asm-generic/hugetlb_encode.h>
-#define MREMAP_MAYMOVE 1
-#define MREMAP_FIXED 2
+#define MREMAP_MAYMOVE 1
+#define MREMAP_FIXED 2
+#define MREMAP_DONTUNMAP 4
#define OVERCOMMIT_GUESS 0
#define OVERCOMMIT_ALWAYS 1
diff --git a/mm/mremap.c b/mm/mremap.c
index 122938dcec15..bf97c3eb538b 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -318,8 +318,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
static unsigned long move_vma(struct vm_area_struct *vma,
unsigned long old_addr, unsigned long old_len,
unsigned long new_len, unsigned long new_addr,
- bool *locked, struct vm_userfaultfd_ctx *uf,
- struct list_head *uf_unmap)
+ bool *locked, unsigned long flags,
+ struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap)
{
struct mm_struct *mm = vma->vm_mm;
struct vm_area_struct *new_vma;
@@ -408,6 +408,13 @@ static unsigned long move_vma(struct vm_area_struct *vma,
if (unlikely(vma->vm_flags & VM_PFNMAP))
untrack_pfn_moved(vma);
+ if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) {
+ if (vm_flags & VM_ACCOUNT)
+ vma->vm_flags |= VM_ACCOUNT;
+
+ goto out;
+ }
+
if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) {
/* OOM: unable to split vma, just get accounts right */
vm_unacct_memory(excess >> PAGE_SHIFT);
@@ -422,6 +429,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
vma->vm_next->vm_flags |= VM_ACCOUNT;
}
+out:
if (vm_flags & VM_LOCKED) {
mm->locked_vm += new_len >> PAGE_SHIFT;
*locked = true;
@@ -497,7 +505,7 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
unsigned long new_addr, unsigned long new_len, bool *locked,
- struct vm_userfaultfd_ctx *uf,
+ unsigned long flags, struct vm_userfaultfd_ctx *uf,
struct list_head *uf_unmap_early,
struct list_head *uf_unmap)
{
@@ -545,6 +553,17 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
old_len = new_len;
}
+ /*
+ * MREMAP_DONTUNMAP expands by old_len + (new_len - old_len), we will
+ * check that we can expand by old_len and vma_to_resize will handle
+ * the vma growing.
+ */
+ if (unlikely(flags & MREMAP_DONTUNMAP && !may_expand_vm(mm,
+ vma->vm_flags, old_len >> PAGE_SHIFT))) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
vma = vma_to_resize(addr, old_len, new_len, &charged);
if (IS_ERR(vma)) {
ret = PTR_ERR(vma);
@@ -561,7 +580,7 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
if (IS_ERR_VALUE(ret))
goto out1;
- ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, uf,
+ ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, flags, uf,
uf_unmap);
if (!(offset_in_page(ret)))
goto out;
@@ -609,12 +628,15 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
addr = untagged_addr(addr);
new_addr = untagged_addr(new_addr);
- if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE))
+ if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP))
return ret;
if (flags & MREMAP_FIXED && !(flags & MREMAP_MAYMOVE))
return ret;
+ if (flags & MREMAP_DONTUNMAP && !(flags & MREMAP_FIXED))
+ return ret;
+
if (offset_in_page(addr))
return ret;
@@ -634,7 +656,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
if (flags & MREMAP_FIXED) {
ret = mremap_to(addr, old_len, new_addr, new_len,
- &locked, &uf, &uf_unmap_early, &uf_unmap);
+ &locked, flags, &uf, &uf_unmap_early,
+ &uf_unmap);
goto out;
}
@@ -712,7 +735,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
}
ret = move_vma(vma, addr, old_len, new_len, new_addr,
- &locked, &uf, &uf_unmap);
+ &locked, flags, &uf, &uf_unmap);
}
out:
if (offset_in_page(ret)) {
--
2.25.0.341.g760bfbb309-goog
^ permalink raw reply related [flat|nested] 41+ messages in thread[parent not found: <20200124190625.257659-1-bgeffon-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH v2] mm: Add MREMAP_DONTUNMAP to mremap().
2020-01-24 19:06 ` Brian Geffon
@ 2020-01-26 5:16 ` Nathan Chancellor
-1 siblings, 0 replies; 41+ messages in thread
From: Nathan Chancellor @ 2020-01-26 5:16 UTC (permalink / raw)
To: Brian Geffon
Cc: Andrew Morton, Michael S . Tsirkin, Arnd Bergmann,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA,
Andy Lutomirski, Andrea Arcangeli, Sonny Rao, Minchan Kim,
Joel Fernandes, Yu Zhao, Jesse Barnes,
clang-built-linux-/JYPxA39Uh5TLH3MbocFFw
Hi Brian,
On Fri, Jan 24, 2020 at 11:06:25AM -0800, Brian Geffon wrote:
> When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is
> set, the source mapping will not be removed. Instead it will be
> cleared as if a brand new anonymous, private mapping had been created
> atomically as part of the mremap() call. If a userfaultfd was watching
> the source, it will continue to watch the new mapping. For a mapping
> that is shared or not anonymous, MREMAP_DONTUNMAP will cause the
> mremap() call to fail. MREMAP_DONTUNMAP implies that MREMAP_FIXED is
> also used. The final result is two equally sized VMAs where the
> destination contains the PTEs of the source.
>
> We hope to use this in Chrome OS where with userfaultfd we could write
> an anonymous mapping to disk without having to STOP the process or worry
> about VMA permission changes.
>
> This feature also has a use case in Android, Lokesh Gidra has said
> that "As part of using userfaultfd for GC, We'll have to move the physical
> pages of the java heap to a separate location. For this purpose mremap
> will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
> heap, its virtual mapping will be removed as well. Therefore, we'll
> require performing mmap immediately after. This is not only time consuming
> but also opens a time window where a native thread may call mmap and
> reserve the java heap's address range for its own usage. This flag
> solves the problem."
>
> Signed-off-by: Brian Geffon <bgeffon-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> ---
> include/uapi/linux/mman.h | 5 +++--
> mm/mremap.c | 37 ++++++++++++++++++++++++++++++-------
> 2 files changed, 33 insertions(+), 9 deletions(-)
>
> diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h
> index fc1a64c3447b..923cc162609c 100644
> --- a/include/uapi/linux/mman.h
> +++ b/include/uapi/linux/mman.h
> @@ -5,8 +5,9 @@
> #include <asm/mman.h>
> #include <asm-generic/hugetlb_encode.h>
>
> -#define MREMAP_MAYMOVE 1
> -#define MREMAP_FIXED 2
> +#define MREMAP_MAYMOVE 1
> +#define MREMAP_FIXED 2
> +#define MREMAP_DONTUNMAP 4
>
> #define OVERCOMMIT_GUESS 0
> #define OVERCOMMIT_ALWAYS 1
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 122938dcec15..bf97c3eb538b 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -318,8 +318,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> static unsigned long move_vma(struct vm_area_struct *vma,
> unsigned long old_addr, unsigned long old_len,
> unsigned long new_len, unsigned long new_addr,
> - bool *locked, struct vm_userfaultfd_ctx *uf,
> - struct list_head *uf_unmap)
> + bool *locked, unsigned long flags,
> + struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap)
> {
> struct mm_struct *mm = vma->vm_mm;
> struct vm_area_struct *new_vma;
> @@ -408,6 +408,13 @@ static unsigned long move_vma(struct vm_area_struct *vma,
> if (unlikely(vma->vm_flags & VM_PFNMAP))
> untrack_pfn_moved(vma);
>
> + if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) {
> + if (vm_flags & VM_ACCOUNT)
> + vma->vm_flags |= VM_ACCOUNT;
> +
> + goto out;
> + }
> +
> if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) {
> /* OOM: unable to split vma, just get accounts right */
> vm_unacct_memory(excess >> PAGE_SHIFT);
> @@ -422,6 +429,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
> vma->vm_next->vm_flags |= VM_ACCOUNT;
> }
>
> +out:
> if (vm_flags & VM_LOCKED) {
> mm->locked_vm += new_len >> PAGE_SHIFT;
> *locked = true;
> @@ -497,7 +505,7 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
>
> static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
> unsigned long new_addr, unsigned long new_len, bool *locked,
> - struct vm_userfaultfd_ctx *uf,
> + unsigned long flags, struct vm_userfaultfd_ctx *uf,
> struct list_head *uf_unmap_early,
> struct list_head *uf_unmap)
> {
> @@ -545,6 +553,17 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
> old_len = new_len;
> }
>
> + /*
> + * MREMAP_DONTUNMAP expands by old_len + (new_len - old_len), we will
> + * check that we can expand by old_len and vma_to_resize will handle
> + * the vma growing.
> + */
> + if (unlikely(flags & MREMAP_DONTUNMAP && !may_expand_vm(mm,
> + vma->vm_flags, old_len >> PAGE_SHIFT))) {
We received a Clang build report that vma is used uninitialized here
(they aren't being publicly sent to LKML due to GCC vs Clang
warning/error overlap):
https://groups.google.com/d/msg/clang-built-linux/gE5wRaeHdSI/xVA0MBQVEgAJ
Sure enough, vma is initialized first in the next block. Not sure if
this section should be moved below that initialization or if something
else should be done to resolve it but that dereference will obviously be
fatal.
Cheers,
Nathan
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> vma = vma_to_resize(addr, old_len, new_len, &charged);
> if (IS_ERR(vma)) {
> ret = PTR_ERR(vma);
> @@ -561,7 +580,7 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
> if (IS_ERR_VALUE(ret))
> goto out1;
>
> - ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, uf,
> + ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, flags, uf,
> uf_unmap);
> if (!(offset_in_page(ret)))
> goto out;
> @@ -609,12 +628,15 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
> addr = untagged_addr(addr);
> new_addr = untagged_addr(new_addr);
>
> - if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE))
> + if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP))
> return ret;
>
> if (flags & MREMAP_FIXED && !(flags & MREMAP_MAYMOVE))
> return ret;
>
> + if (flags & MREMAP_DONTUNMAP && !(flags & MREMAP_FIXED))
> + return ret;
> +
> if (offset_in_page(addr))
> return ret;
>
> @@ -634,7 +656,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
>
> if (flags & MREMAP_FIXED) {
> ret = mremap_to(addr, old_len, new_addr, new_len,
> - &locked, &uf, &uf_unmap_early, &uf_unmap);
> + &locked, flags, &uf, &uf_unmap_early,
> + &uf_unmap);
> goto out;
> }
>
> @@ -712,7 +735,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
> }
>
> ret = move_vma(vma, addr, old_len, new_len, new_addr,
> - &locked, &uf, &uf_unmap);
> + &locked, flags, &uf, &uf_unmap);
> }
> out:
> if (offset_in_page(ret)) {
> --
> 2.25.0.341.g760bfbb309-goog
>
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH v2] mm: Add MREMAP_DONTUNMAP to mremap().
@ 2020-01-26 5:16 ` Nathan Chancellor
0 siblings, 0 replies; 41+ messages in thread
From: Nathan Chancellor @ 2020-01-26 5:16 UTC (permalink / raw)
To: Brian Geffon
Cc: Andrew Morton, Michael S . Tsirkin, Arnd Bergmann, linux-kernel,
linux-mm, linux-api, Andy Lutomirski, Andrea Arcangeli, Sonny Rao,
Minchan Kim, Joel Fernandes, Yu Zhao, Jesse Barnes,
clang-built-linux
Hi Brian,
On Fri, Jan 24, 2020 at 11:06:25AM -0800, Brian Geffon wrote:
> When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is
> set, the source mapping will not be removed. Instead it will be
> cleared as if a brand new anonymous, private mapping had been created
> atomically as part of the mremap() call. If a userfaultfd was watching
> the source, it will continue to watch the new mapping. For a mapping
> that is shared or not anonymous, MREMAP_DONTUNMAP will cause the
> mremap() call to fail. MREMAP_DONTUNMAP implies that MREMAP_FIXED is
> also used. The final result is two equally sized VMAs where the
> destination contains the PTEs of the source.
>
> We hope to use this in Chrome OS where with userfaultfd we could write
> an anonymous mapping to disk without having to STOP the process or worry
> about VMA permission changes.
>
> This feature also has a use case in Android, Lokesh Gidra has said
> that "As part of using userfaultfd for GC, We'll have to move the physical
> pages of the java heap to a separate location. For this purpose mremap
> will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
> heap, its virtual mapping will be removed as well. Therefore, we'll
> require performing mmap immediately after. This is not only time consuming
> but also opens a time window where a native thread may call mmap and
> reserve the java heap's address range for its own usage. This flag
> solves the problem."
>
> Signed-off-by: Brian Geffon <bgeffon@google.com>
> ---
> include/uapi/linux/mman.h | 5 +++--
> mm/mremap.c | 37 ++++++++++++++++++++++++++++++-------
> 2 files changed, 33 insertions(+), 9 deletions(-)
>
> diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h
> index fc1a64c3447b..923cc162609c 100644
> --- a/include/uapi/linux/mman.h
> +++ b/include/uapi/linux/mman.h
> @@ -5,8 +5,9 @@
> #include <asm/mman.h>
> #include <asm-generic/hugetlb_encode.h>
>
> -#define MREMAP_MAYMOVE 1
> -#define MREMAP_FIXED 2
> +#define MREMAP_MAYMOVE 1
> +#define MREMAP_FIXED 2
> +#define MREMAP_DONTUNMAP 4
>
> #define OVERCOMMIT_GUESS 0
> #define OVERCOMMIT_ALWAYS 1
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 122938dcec15..bf97c3eb538b 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -318,8 +318,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> static unsigned long move_vma(struct vm_area_struct *vma,
> unsigned long old_addr, unsigned long old_len,
> unsigned long new_len, unsigned long new_addr,
> - bool *locked, struct vm_userfaultfd_ctx *uf,
> - struct list_head *uf_unmap)
> + bool *locked, unsigned long flags,
> + struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap)
> {
> struct mm_struct *mm = vma->vm_mm;
> struct vm_area_struct *new_vma;
> @@ -408,6 +408,13 @@ static unsigned long move_vma(struct vm_area_struct *vma,
> if (unlikely(vma->vm_flags & VM_PFNMAP))
> untrack_pfn_moved(vma);
>
> + if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) {
> + if (vm_flags & VM_ACCOUNT)
> + vma->vm_flags |= VM_ACCOUNT;
> +
> + goto out;
> + }
> +
> if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) {
> /* OOM: unable to split vma, just get accounts right */
> vm_unacct_memory(excess >> PAGE_SHIFT);
> @@ -422,6 +429,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
> vma->vm_next->vm_flags |= VM_ACCOUNT;
> }
>
> +out:
> if (vm_flags & VM_LOCKED) {
> mm->locked_vm += new_len >> PAGE_SHIFT;
> *locked = true;
> @@ -497,7 +505,7 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
>
> static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
> unsigned long new_addr, unsigned long new_len, bool *locked,
> - struct vm_userfaultfd_ctx *uf,
> + unsigned long flags, struct vm_userfaultfd_ctx *uf,
> struct list_head *uf_unmap_early,
> struct list_head *uf_unmap)
> {
> @@ -545,6 +553,17 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
> old_len = new_len;
> }
>
> + /*
> + * MREMAP_DONTUNMAP expands by old_len + (new_len - old_len), we will
> + * check that we can expand by old_len and vma_to_resize will handle
> + * the vma growing.
> + */
> + if (unlikely(flags & MREMAP_DONTUNMAP && !may_expand_vm(mm,
> + vma->vm_flags, old_len >> PAGE_SHIFT))) {
We received a Clang build report that vma is used uninitialized here
(they aren't being publicly sent to LKML due to GCC vs Clang
warning/error overlap):
https://groups.google.com/d/msg/clang-built-linux/gE5wRaeHdSI/xVA0MBQVEgAJ
Sure enough, vma is initialized first in the next block. Not sure if
this section should be moved below that initialization or if something
else should be done to resolve it but that dereference will obviously be
fatal.
Cheers,
Nathan
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> vma = vma_to_resize(addr, old_len, new_len, &charged);
> if (IS_ERR(vma)) {
> ret = PTR_ERR(vma);
> @@ -561,7 +580,7 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
> if (IS_ERR_VALUE(ret))
> goto out1;
>
> - ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, uf,
> + ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, flags, uf,
> uf_unmap);
> if (!(offset_in_page(ret)))
> goto out;
> @@ -609,12 +628,15 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
> addr = untagged_addr(addr);
> new_addr = untagged_addr(new_addr);
>
> - if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE))
> + if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP))
> return ret;
>
> if (flags & MREMAP_FIXED && !(flags & MREMAP_MAYMOVE))
> return ret;
>
> + if (flags & MREMAP_DONTUNMAP && !(flags & MREMAP_FIXED))
> + return ret;
> +
> if (offset_in_page(addr))
> return ret;
>
> @@ -634,7 +656,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
>
> if (flags & MREMAP_FIXED) {
> ret = mremap_to(addr, old_len, new_addr, new_len,
> - &locked, &uf, &uf_unmap_early, &uf_unmap);
> + &locked, flags, &uf, &uf_unmap_early,
> + &uf_unmap);
> goto out;
> }
>
> @@ -712,7 +735,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
> }
>
> ret = move_vma(vma, addr, old_len, new_len, new_addr,
> - &locked, &uf, &uf_unmap);
> + &locked, flags, &uf, &uf_unmap);
> }
> out:
> if (offset_in_page(ret)) {
> --
> 2.25.0.341.g760bfbb309-goog
>
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH v2] mm: Add MREMAP_DONTUNMAP to mremap().
2020-01-26 5:16 ` Nathan Chancellor
@ 2020-01-27 2:21 ` Brian Geffon
-1 siblings, 0 replies; 41+ messages in thread
From: Brian Geffon @ 2020-01-27 2:21 UTC (permalink / raw)
To: Nathan Chancellor
Cc: Andrew Morton, Michael S . Tsirkin, Arnd Bergmann, LKML, linux-mm,
linux-api-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
Andrea Arcangeli, Sonny Rao, Minchan Kim, Joel Fernandes, Yu Zhao,
Jesse Barnes, clang-built-linux-/JYPxA39Uh5TLH3MbocFFw
Hi Nathan,
Thank you! That was an oversight on my part. I'll address it in the next patch.
Brian
On Sat, Jan 25, 2020 at 9:16 PM Nathan Chancellor
<natechancellor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> Hi Brian,
>
> On Fri, Jan 24, 2020 at 11:06:25AM -0800, Brian Geffon wrote:
> > When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is
> > set, the source mapping will not be removed. Instead it will be
> > cleared as if a brand new anonymous, private mapping had been created
> > atomically as part of the mremap() call. If a userfaultfd was watching
> > the source, it will continue to watch the new mapping. For a mapping
> > that is shared or not anonymous, MREMAP_DONTUNMAP will cause the
> > mremap() call to fail. MREMAP_DONTUNMAP implies that MREMAP_FIXED is
> > also used. The final result is two equally sized VMAs where the
> > destination contains the PTEs of the source.
> >
> > We hope to use this in Chrome OS where with userfaultfd we could write
> > an anonymous mapping to disk without having to STOP the process or worry
> > about VMA permission changes.
> >
> > This feature also has a use case in Android, Lokesh Gidra has said
> > that "As part of using userfaultfd for GC, We'll have to move the physical
> > pages of the java heap to a separate location. For this purpose mremap
> > will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
> > heap, its virtual mapping will be removed as well. Therefore, we'll
> > require performing mmap immediately after. This is not only time consuming
> > but also opens a time window where a native thread may call mmap and
> > reserve the java heap's address range for its own usage. This flag
> > solves the problem."
> >
> > Signed-off-by: Brian Geffon <bgeffon-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > ---
> > include/uapi/linux/mman.h | 5 +++--
> > mm/mremap.c | 37 ++++++++++++++++++++++++++++++-------
> > 2 files changed, 33 insertions(+), 9 deletions(-)
> >
> > diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h
> > index fc1a64c3447b..923cc162609c 100644
> > --- a/include/uapi/linux/mman.h
> > +++ b/include/uapi/linux/mman.h
> > @@ -5,8 +5,9 @@
> > #include <asm/mman.h>
> > #include <asm-generic/hugetlb_encode.h>
> >
> > -#define MREMAP_MAYMOVE 1
> > -#define MREMAP_FIXED 2
> > +#define MREMAP_MAYMOVE 1
> > +#define MREMAP_FIXED 2
> > +#define MREMAP_DONTUNMAP 4
> >
> > #define OVERCOMMIT_GUESS 0
> > #define OVERCOMMIT_ALWAYS 1
> > diff --git a/mm/mremap.c b/mm/mremap.c
> > index 122938dcec15..bf97c3eb538b 100644
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -318,8 +318,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> > static unsigned long move_vma(struct vm_area_struct *vma,
> > unsigned long old_addr, unsigned long old_len,
> > unsigned long new_len, unsigned long new_addr,
> > - bool *locked, struct vm_userfaultfd_ctx *uf,
> > - struct list_head *uf_unmap)
> > + bool *locked, unsigned long flags,
> > + struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap)
> > {
> > struct mm_struct *mm = vma->vm_mm;
> > struct vm_area_struct *new_vma;
> > @@ -408,6 +408,13 @@ static unsigned long move_vma(struct vm_area_struct *vma,
> > if (unlikely(vma->vm_flags & VM_PFNMAP))
> > untrack_pfn_moved(vma);
> >
> > + if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) {
> > + if (vm_flags & VM_ACCOUNT)
> > + vma->vm_flags |= VM_ACCOUNT;
> > +
> > + goto out;
> > + }
> > +
> > if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) {
> > /* OOM: unable to split vma, just get accounts right */
> > vm_unacct_memory(excess >> PAGE_SHIFT);
> > @@ -422,6 +429,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
> > vma->vm_next->vm_flags |= VM_ACCOUNT;
> > }
> >
> > +out:
> > if (vm_flags & VM_LOCKED) {
> > mm->locked_vm += new_len >> PAGE_SHIFT;
> > *locked = true;
> > @@ -497,7 +505,7 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
> >
> > static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
> > unsigned long new_addr, unsigned long new_len, bool *locked,
> > - struct vm_userfaultfd_ctx *uf,
> > + unsigned long flags, struct vm_userfaultfd_ctx *uf,
> > struct list_head *uf_unmap_early,
> > struct list_head *uf_unmap)
> > {
> > @@ -545,6 +553,17 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
> > old_len = new_len;
> > }
> >
> > + /*
> > + * MREMAP_DONTUNMAP expands by old_len + (new_len - old_len), we will
> > + * check that we can expand by old_len and vma_to_resize will handle
> > + * the vma growing.
> > + */
> > + if (unlikely(flags & MREMAP_DONTUNMAP && !may_expand_vm(mm,
> > + vma->vm_flags, old_len >> PAGE_SHIFT))) {
>
> We received a Clang build report that vma is used uninitialized here
> (they aren't being publicly sent to LKML due to GCC vs Clang
> warning/error overlap):
>
> https://groups.google.com/d/msg/clang-built-linux/gE5wRaeHdSI/xVA0MBQVEgAJ
>
> Sure enough, vma is initialized first in the next block. Not sure if
> this section should be moved below that initialization or if something
> else should be done to resolve it but that dereference will obviously be
> fatal.
>
> Cheers,
> Nathan
>
> > + ret = -ENOMEM;
> > + goto out;
> > + }
> > +
> > vma = vma_to_resize(addr, old_len, new_len, &charged);
> > if (IS_ERR(vma)) {
> > ret = PTR_ERR(vma);
> > @@ -561,7 +580,7 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
> > if (IS_ERR_VALUE(ret))
> > goto out1;
> >
> > - ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, uf,
> > + ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, flags, uf,
> > uf_unmap);
> > if (!(offset_in_page(ret)))
> > goto out;
> > @@ -609,12 +628,15 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
> > addr = untagged_addr(addr);
> > new_addr = untagged_addr(new_addr);
> >
> > - if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE))
> > + if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP))
> > return ret;
> >
> > if (flags & MREMAP_FIXED && !(flags & MREMAP_MAYMOVE))
> > return ret;
> >
> > + if (flags & MREMAP_DONTUNMAP && !(flags & MREMAP_FIXED))
> > + return ret;
> > +
> > if (offset_in_page(addr))
> > return ret;
> >
> > @@ -634,7 +656,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
> >
> > if (flags & MREMAP_FIXED) {
> > ret = mremap_to(addr, old_len, new_addr, new_len,
> > - &locked, &uf, &uf_unmap_early, &uf_unmap);
> > + &locked, flags, &uf, &uf_unmap_early,
> > + &uf_unmap);
> > goto out;
> > }
> >
> > @@ -712,7 +735,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
> > }
> >
> > ret = move_vma(vma, addr, old_len, new_len, new_addr,
> > - &locked, &uf, &uf_unmap);
> > + &locked, flags, &uf, &uf_unmap);
> > }
> > out:
> > if (offset_in_page(ret)) {
> > --
> > 2.25.0.341.g760bfbb309-goog
> >
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH v2] mm: Add MREMAP_DONTUNMAP to mremap().
@ 2020-01-27 2:21 ` Brian Geffon
0 siblings, 0 replies; 41+ messages in thread
From: Brian Geffon @ 2020-01-27 2:21 UTC (permalink / raw)
To: Nathan Chancellor
Cc: Andrew Morton, Michael S . Tsirkin, Arnd Bergmann, LKML, linux-mm,
linux-api, Andy Lutomirski, Andrea Arcangeli, Sonny Rao,
Minchan Kim, Joel Fernandes, Yu Zhao, Jesse Barnes,
clang-built-linux
Hi Nathan,
Thank you! That was an oversight on my part. I'll address it in the next patch.
Brian
On Sat, Jan 25, 2020 at 9:16 PM Nathan Chancellor
<natechancellor@gmail.com> wrote:
>
> Hi Brian,
>
> On Fri, Jan 24, 2020 at 11:06:25AM -0800, Brian Geffon wrote:
> > When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is
> > set, the source mapping will not be removed. Instead it will be
> > cleared as if a brand new anonymous, private mapping had been created
> > atomically as part of the mremap() call. If a userfaultfd was watching
> > the source, it will continue to watch the new mapping. For a mapping
> > that is shared or not anonymous, MREMAP_DONTUNMAP will cause the
> > mremap() call to fail. MREMAP_DONTUNMAP implies that MREMAP_FIXED is
> > also used. The final result is two equally sized VMAs where the
> > destination contains the PTEs of the source.
> >
> > We hope to use this in Chrome OS where with userfaultfd we could write
> > an anonymous mapping to disk without having to STOP the process or worry
> > about VMA permission changes.
> >
> > This feature also has a use case in Android, Lokesh Gidra has said
> > that "As part of using userfaultfd for GC, We'll have to move the physical
> > pages of the java heap to a separate location. For this purpose mremap
> > will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
> > heap, its virtual mapping will be removed as well. Therefore, we'll
> > require performing mmap immediately after. This is not only time consuming
> > but also opens a time window where a native thread may call mmap and
> > reserve the java heap's address range for its own usage. This flag
> > solves the problem."
> >
> > Signed-off-by: Brian Geffon <bgeffon@google.com>
> > ---
> > include/uapi/linux/mman.h | 5 +++--
> > mm/mremap.c | 37 ++++++++++++++++++++++++++++++-------
> > 2 files changed, 33 insertions(+), 9 deletions(-)
> >
> > diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h
> > index fc1a64c3447b..923cc162609c 100644
> > --- a/include/uapi/linux/mman.h
> > +++ b/include/uapi/linux/mman.h
> > @@ -5,8 +5,9 @@
> > #include <asm/mman.h>
> > #include <asm-generic/hugetlb_encode.h>
> >
> > -#define MREMAP_MAYMOVE 1
> > -#define MREMAP_FIXED 2
> > +#define MREMAP_MAYMOVE 1
> > +#define MREMAP_FIXED 2
> > +#define MREMAP_DONTUNMAP 4
> >
> > #define OVERCOMMIT_GUESS 0
> > #define OVERCOMMIT_ALWAYS 1
> > diff --git a/mm/mremap.c b/mm/mremap.c
> > index 122938dcec15..bf97c3eb538b 100644
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -318,8 +318,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> > static unsigned long move_vma(struct vm_area_struct *vma,
> > unsigned long old_addr, unsigned long old_len,
> > unsigned long new_len, unsigned long new_addr,
> > - bool *locked, struct vm_userfaultfd_ctx *uf,
> > - struct list_head *uf_unmap)
> > + bool *locked, unsigned long flags,
> > + struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap)
> > {
> > struct mm_struct *mm = vma->vm_mm;
> > struct vm_area_struct *new_vma;
> > @@ -408,6 +408,13 @@ static unsigned long move_vma(struct vm_area_struct *vma,
> > if (unlikely(vma->vm_flags & VM_PFNMAP))
> > untrack_pfn_moved(vma);
> >
> > + if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) {
> > + if (vm_flags & VM_ACCOUNT)
> > + vma->vm_flags |= VM_ACCOUNT;
> > +
> > + goto out;
> > + }
> > +
> > if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) {
> > /* OOM: unable to split vma, just get accounts right */
> > vm_unacct_memory(excess >> PAGE_SHIFT);
> > @@ -422,6 +429,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
> > vma->vm_next->vm_flags |= VM_ACCOUNT;
> > }
> >
> > +out:
> > if (vm_flags & VM_LOCKED) {
> > mm->locked_vm += new_len >> PAGE_SHIFT;
> > *locked = true;
> > @@ -497,7 +505,7 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
> >
> > static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
> > unsigned long new_addr, unsigned long new_len, bool *locked,
> > - struct vm_userfaultfd_ctx *uf,
> > + unsigned long flags, struct vm_userfaultfd_ctx *uf,
> > struct list_head *uf_unmap_early,
> > struct list_head *uf_unmap)
> > {
> > @@ -545,6 +553,17 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
> > old_len = new_len;
> > }
> >
> > + /*
> > + * MREMAP_DONTUNMAP expands by old_len + (new_len - old_len), we will
> > + * check that we can expand by old_len and vma_to_resize will handle
> > + * the vma growing.
> > + */
> > + if (unlikely(flags & MREMAP_DONTUNMAP && !may_expand_vm(mm,
> > + vma->vm_flags, old_len >> PAGE_SHIFT))) {
>
> We received a Clang build report that vma is used uninitialized here
> (they aren't being publicly sent to LKML due to GCC vs Clang
> warning/error overlap):
>
> https://groups.google.com/d/msg/clang-built-linux/gE5wRaeHdSI/xVA0MBQVEgAJ
>
> Sure enough, vma is initialized first in the next block. Not sure if
> this section should be moved below that initialization or if something
> else should be done to resolve it but that dereference will obviously be
> fatal.
>
> Cheers,
> Nathan
>
> > + ret = -ENOMEM;
> > + goto out;
> > + }
> > +
> > vma = vma_to_resize(addr, old_len, new_len, &charged);
> > if (IS_ERR(vma)) {
> > ret = PTR_ERR(vma);
> > @@ -561,7 +580,7 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
> > if (IS_ERR_VALUE(ret))
> > goto out1;
> >
> > - ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, uf,
> > + ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, flags, uf,
> > uf_unmap);
> > if (!(offset_in_page(ret)))
> > goto out;
> > @@ -609,12 +628,15 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
> > addr = untagged_addr(addr);
> > new_addr = untagged_addr(new_addr);
> >
> > - if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE))
> > + if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP))
> > return ret;
> >
> > if (flags & MREMAP_FIXED && !(flags & MREMAP_MAYMOVE))
> > return ret;
> >
> > + if (flags & MREMAP_DONTUNMAP && !(flags & MREMAP_FIXED))
> > + return ret;
> > +
> > if (offset_in_page(addr))
> > return ret;
> >
> > @@ -634,7 +656,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
> >
> > if (flags & MREMAP_FIXED) {
> > ret = mremap_to(addr, old_len, new_addr, new_len,
> > - &locked, &uf, &uf_unmap_early, &uf_unmap);
> > + &locked, flags, &uf, &uf_unmap_early,
> > + &uf_unmap);
> > goto out;
> > }
> >
> > @@ -712,7 +735,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
> > }
> >
> > ret = move_vma(vma, addr, old_len, new_len, new_addr,
> > - &locked, &uf, &uf_unmap);
> > + &locked, flags, &uf, &uf_unmap);
> > }
> > out:
> > if (offset_in_page(ret)) {
> > --
> > 2.25.0.341.g760bfbb309-goog
> >
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2] mm: Add MREMAP_DONTUNMAP to mremap().
2020-01-24 19:06 ` Brian Geffon
@ 2020-01-26 22:06 ` Kirill A. Shutemov
-1 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2020-01-26 22:06 UTC (permalink / raw)
To: Brian Geffon
Cc: Andrew Morton, Michael S . Tsirkin, Arnd Bergmann,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA,
Andy Lutomirski, Andrea Arcangeli, Sonny Rao, Minchan Kim,
Joel Fernandes, Yu Zhao, Jesse Barnes
On Fri, Jan 24, 2020 at 11:06:25AM -0800, Brian Geffon wrote:
> When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is
> set, the source mapping will not be removed. Instead it will be
> cleared as if a brand new anonymous, private mapping had been created
> atomically as part of the mremap() call. If a userfaultfd was watching
> the source, it will continue to watch the new mapping. For a mapping
> that is shared or not anonymous, MREMAP_DONTUNMAP will cause the
> mremap() call to fail. MREMAP_DONTUNMAP implies that MREMAP_FIXED is
> also used.
Implies? From code it looks like it requires MREMAP_FIXED. And
MREMAP_FIXED requires MREMAP_MAYMOVE. That's strange flag chaining.
I don't really see need in such dependencies. It maybe indeed implied, not
requied.
> The final result is two equally sized VMAs where the
> destination contains the PTEs of the source.
Could you clarify rmap situation here? How the rmap hierarchy will look
like before and after the operation. Can the new VMA merge with the old
one? Sounds fishy to me.
> We hope to use this in Chrome OS where with userfaultfd we could write
> an anonymous mapping to disk without having to STOP the process or worry
> about VMA permission changes.
>
> This feature also has a use case in Android, Lokesh Gidra has said
> that "As part of using userfaultfd for GC, We'll have to move the physical
> pages of the java heap to a separate location. For this purpose mremap
> will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
> heap, its virtual mapping will be removed as well. Therefore, we'll
> require performing mmap immediately after. This is not only time consuming
> but also opens a time window where a native thread may call mmap and
> reserve the java heap's address range for its own usage. This flag
> solves the problem."
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2] mm: Add MREMAP_DONTUNMAP to mremap().
@ 2020-01-26 22:06 ` Kirill A. Shutemov
0 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2020-01-26 22:06 UTC (permalink / raw)
To: Brian Geffon
Cc: Andrew Morton, Michael S . Tsirkin, Arnd Bergmann, linux-kernel,
linux-mm, linux-api, Andy Lutomirski, Andrea Arcangeli, Sonny Rao,
Minchan Kim, Joel Fernandes, Yu Zhao, Jesse Barnes
On Fri, Jan 24, 2020 at 11:06:25AM -0800, Brian Geffon wrote:
> When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is
> set, the source mapping will not be removed. Instead it will be
> cleared as if a brand new anonymous, private mapping had been created
> atomically as part of the mremap() call. If a userfaultfd was watching
> the source, it will continue to watch the new mapping. For a mapping
> that is shared or not anonymous, MREMAP_DONTUNMAP will cause the
> mremap() call to fail. MREMAP_DONTUNMAP implies that MREMAP_FIXED is
> also used.
Implies? From code it looks like it requires MREMAP_FIXED. And
MREMAP_FIXED requires MREMAP_MAYMOVE. That's strange flag chaining.
I don't really see need in such dependencies. It maybe indeed implied, not
requied.
> The final result is two equally sized VMAs where the
> destination contains the PTEs of the source.
Could you clarify rmap situation here? How the rmap hierarchy will look
like before and after the operation. Can the new VMA merge with the old
one? Sounds fishy to me.
> We hope to use this in Chrome OS where with userfaultfd we could write
> an anonymous mapping to disk without having to STOP the process or worry
> about VMA permission changes.
>
> This feature also has a use case in Android, Lokesh Gidra has said
> that "As part of using userfaultfd for GC, We'll have to move the physical
> pages of the java heap to a separate location. For this purpose mremap
> will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
> heap, its virtual mapping will be removed as well. Therefore, we'll
> require performing mmap immediately after. This is not only time consuming
> but also opens a time window where a native thread may call mmap and
> reserve the java heap's address range for its own usage. This flag
> solves the problem."
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2] mm: Add MREMAP_DONTUNMAP to mremap().
2020-01-26 22:06 ` Kirill A. Shutemov
(?)
@ 2020-01-28 1:35 ` Brian Geffon
[not found] ` <CADyq12xCK_3MhGi88Am5P6DVZvrW8vqtyJMHO0zjNhvhYegm1w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
-1 siblings, 1 reply; 41+ messages in thread
From: Brian Geffon @ 2020-01-28 1:35 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Andrew Morton, Michael S . Tsirkin, Arnd Bergmann, LKML, linux-mm,
linux-api, Andy Lutomirski, Andrea Arcangeli, Sonny Rao,
Minchan Kim, Joel Fernandes, Yu Zhao, Jesse Barnes
Hi Kirill,
Thanks for taking the time to look at this. I'll update the wording to
make it clear that MREMAP_FIXED is required with MREMAP_DONTUNMAP.
Regarding rmap, you're completely right I'm going to roll a new patch
which will call unlink_anon_vmas() to make sure the rmap is correct,
I'll also explicitly check that the vma is anonymous and not shared
returning EINVAL if not, how does that sound?
Brian
On Sun, Jan 26, 2020 at 2:06 PM Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> On Fri, Jan 24, 2020 at 11:06:25AM -0800, Brian Geffon wrote:
> > When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is
> > set, the source mapping will not be removed. Instead it will be
> > cleared as if a brand new anonymous, private mapping had been created
> > atomically as part of the mremap() call. If a userfaultfd was watching
> > the source, it will continue to watch the new mapping. For a mapping
> > that is shared or not anonymous, MREMAP_DONTUNMAP will cause the
> > mremap() call to fail. MREMAP_DONTUNMAP implies that MREMAP_FIXED is
> > also used.
>
> Implies? From code it looks like it requires MREMAP_FIXED. And
> MREMAP_FIXED requires MREMAP_MAYMOVE. That's strange flag chaining.
> I don't really see need in such dependencies. It maybe indeed implied, not
> requied.
>
> > The final result is two equally sized VMAs where the
> > destination contains the PTEs of the source.
>
> Could you clarify rmap situation here? How the rmap hierarchy will look
> like before and after the operation. Can the new VMA merge with the old
> one? Sounds fishy to me.
>
> > We hope to use this in Chrome OS where with userfaultfd we could write
> > an anonymous mapping to disk without having to STOP the process or worry
> > about VMA permission changes.
> >
> > This feature also has a use case in Android, Lokesh Gidra has said
> > that "As part of using userfaultfd for GC, We'll have to move the physical
> > pages of the java heap to a separate location. For this purpose mremap
> > will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
> > heap, its virtual mapping will be removed as well. Therefore, we'll
> > require performing mmap immediately after. This is not only time consuming
> > but also opens a time window where a native thread may call mmap and
> > reserve the java heap's address range for its own usage. This flag
> > solves the problem."
>
> --
> Kirill A. Shutemov
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2] mm: Add MREMAP_DONTUNMAP to mremap().
2020-01-24 19:06 ` Brian Geffon
@ 2020-01-27 10:13 ` Florian Weimer
-1 siblings, 0 replies; 41+ messages in thread
From: Florian Weimer @ 2020-01-27 10:13 UTC (permalink / raw)
To: Brian Geffon
Cc: Andrew Morton, Michael S . Tsirkin, Arnd Bergmann,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA,
Andy Lutomirski, Andrea Arcangeli, Sonny Rao, Minchan Kim,
Joel Fernandes, Yu Zhao, Jesse Barnes
* Brian Geffon:
> When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is
> set, the source mapping will not be removed. Instead it will be
> cleared as if a brand new anonymous, private mapping had been created
> atomically as part of the mremap() call. If a userfaultfd was watching
> the source, it will continue to watch the new mapping. For a mapping
> that is shared or not anonymous, MREMAP_DONTUNMAP will cause the
> mremap() call to fail. MREMAP_DONTUNMAP implies that MREMAP_FIXED is
> also used. The final result is two equally sized VMAs where the
> destination contains the PTEs of the source.
What will be the protection flags of the source mapping? Will they
remain unchanged? Or PROT_NONE?
Thanks,
Florian
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2] mm: Add MREMAP_DONTUNMAP to mremap().
@ 2020-01-27 10:13 ` Florian Weimer
0 siblings, 0 replies; 41+ messages in thread
From: Florian Weimer @ 2020-01-27 10:13 UTC (permalink / raw)
To: Brian Geffon
Cc: Andrew Morton, Michael S . Tsirkin, Arnd Bergmann, linux-kernel,
linux-mm, linux-api, Andy Lutomirski, Andrea Arcangeli, Sonny Rao,
Minchan Kim, Joel Fernandes, Yu Zhao, Jesse Barnes
* Brian Geffon:
> When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is
> set, the source mapping will not be removed. Instead it will be
> cleared as if a brand new anonymous, private mapping had been created
> atomically as part of the mremap() call. If a userfaultfd was watching
> the source, it will continue to watch the new mapping. For a mapping
> that is shared or not anonymous, MREMAP_DONTUNMAP will cause the
> mremap() call to fail. MREMAP_DONTUNMAP implies that MREMAP_FIXED is
> also used. The final result is two equally sized VMAs where the
> destination contains the PTEs of the source.
What will be the protection flags of the source mapping? Will they
remain unchanged? Or PROT_NONE?
Thanks,
Florian
^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <87imkxxl5d.fsf-fjB847h8rq1N9UpBYOmNkhcY2uh10dtjAL8bYrjMMd8@public.gmane.org>]
* Re: [PATCH v2] mm: Add MREMAP_DONTUNMAP to mremap().
2020-01-27 10:13 ` Florian Weimer
@ 2020-01-27 22:33 ` Brian Geffon
-1 siblings, 0 replies; 41+ messages in thread
From: Brian Geffon @ 2020-01-27 22:33 UTC (permalink / raw)
To: Florian Weimer
Cc: Andrew Morton, Michael S . Tsirkin, Arnd Bergmann, LKML, linux-mm,
linux-api-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
Andrea Arcangeli, Sonny Rao, Minchan Kim, Joel Fernandes, Yu Zhao,
Jesse Barnes
Hi Florian,
copy_vma will make a copy of the existing VMA leaving the old VMA
unchanged, so the source keeps its existing protections, this is what
makes it very useful along with userfaultfd.
Thanks,
Brian
On Mon, Jan 27, 2020 at 2:13 AM Florian Weimer <fweimer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>
> * Brian Geffon:
>
> > When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is
> > set, the source mapping will not be removed. Instead it will be
> > cleared as if a brand new anonymous, private mapping had been created
> > atomically as part of the mremap() call. If a userfaultfd was watching
> > the source, it will continue to watch the new mapping. For a mapping
> > that is shared or not anonymous, MREMAP_DONTUNMAP will cause the
> > mremap() call to fail. MREMAP_DONTUNMAP implies that MREMAP_FIXED is
> > also used. The final result is two equally sized VMAs where the
> > destination contains the PTEs of the source.
>
> What will be the protection flags of the source mapping? Will they
> remain unchanged? Or PROT_NONE?
>
> Thanks,
> Florian
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2] mm: Add MREMAP_DONTUNMAP to mremap().
@ 2020-01-27 22:33 ` Brian Geffon
0 siblings, 0 replies; 41+ messages in thread
From: Brian Geffon @ 2020-01-27 22:33 UTC (permalink / raw)
To: Florian Weimer
Cc: Andrew Morton, Michael S . Tsirkin, Arnd Bergmann, LKML, linux-mm,
linux-api, Andy Lutomirski, Andrea Arcangeli, Sonny Rao,
Minchan Kim, Joel Fernandes, Yu Zhao, Jesse Barnes
Hi Florian,
copy_vma will make a copy of the existing VMA leaving the old VMA
unchanged, so the source keeps its existing protections, this is what
makes it very useful along with userfaultfd.
Thanks,
Brian
On Mon, Jan 27, 2020 at 2:13 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Brian Geffon:
>
> > When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is
> > set, the source mapping will not be removed. Instead it will be
> > cleared as if a brand new anonymous, private mapping had been created
> > atomically as part of the mremap() call. If a userfaultfd was watching
> > the source, it will continue to watch the new mapping. For a mapping
> > that is shared or not anonymous, MREMAP_DONTUNMAP will cause the
> > mremap() call to fail. MREMAP_DONTUNMAP implies that MREMAP_FIXED is
> > also used. The final result is two equally sized VMAs where the
> > destination contains the PTEs of the source.
>
> What will be the protection flags of the source mapping? Will they
> remain unchanged? Or PROT_NONE?
>
> Thanks,
> Florian
>
^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <CADyq12xCpTzLpYC16FjnM60tHhCfnccNfg6JJuqcBd_6ACDGcQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH v2] mm: Add MREMAP_DONTUNMAP to mremap().
2020-01-27 22:33 ` Brian Geffon
@ 2020-01-30 12:19 ` Florian Weimer
-1 siblings, 0 replies; 41+ messages in thread
From: Florian Weimer @ 2020-01-30 12:19 UTC (permalink / raw)
To: Brian Geffon
Cc: Andrew Morton, Michael S . Tsirkin, Arnd Bergmann, LKML, linux-mm,
linux-api-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
Andrea Arcangeli, Sonny Rao, Minchan Kim, Joel Fernandes, Yu Zhao,
Jesse Barnes
* Brian Geffon:
> Hi Florian,
> copy_vma will make a copy of the existing VMA leaving the old VMA
> unchanged, so the source keeps its existing protections, this is what
> makes it very useful along with userfaultfd.
I see. On the other hand, it's impossible to get the PROT_NONE behavior
by a subsequent mprotect call because the mremap has to fail in some
cases in vm.overcommit_memory=2 mode. But maybe that other behavior can
be provided with a different flag if it turns out to be useful in the
future.
Thanks,
Florian
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2] mm: Add MREMAP_DONTUNMAP to mremap().
@ 2020-01-30 12:19 ` Florian Weimer
0 siblings, 0 replies; 41+ messages in thread
From: Florian Weimer @ 2020-01-30 12:19 UTC (permalink / raw)
To: Brian Geffon
Cc: Andrew Morton, Michael S . Tsirkin, Arnd Bergmann, LKML, linux-mm,
linux-api, Andy Lutomirski, Andrea Arcangeli, Sonny Rao,
Minchan Kim, Joel Fernandes, Yu Zhao, Jesse Barnes
* Brian Geffon:
> Hi Florian,
> copy_vma will make a copy of the existing VMA leaving the old VMA
> unchanged, so the source keeps its existing protections, this is what
> makes it very useful along with userfaultfd.
I see. On the other hand, it's impossible to get the PROT_NONE behavior
by a subsequent mprotect call because the mremap has to fail in some
cases in vm.overcommit_memory=2 mode. But maybe that other behavior can
be provided with a different flag if it turns out to be useful in the
future.
Thanks,
Florian
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH v3] mm: Add MREMAP_DONTUNMAP to mremap().
2020-01-23 1:46 ` Brian Geffon
@ 2020-01-27 5:30 ` Brian Geffon
-1 siblings, 0 replies; 41+ messages in thread
From: Brian Geffon @ 2020-01-27 5:30 UTC (permalink / raw)
To: Andrew Morton
Cc: Michael S . Tsirkin, Brian Geffon, Arnd Bergmann,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA,
Andy Lutomirski, Andrea Arcangeli, Sonny Rao, Minchan Kim,
Joel Fernandes, Yu Zhao, Jesse Barnes, Nathan Chancellor
When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is
set, the source mapping will not be removed. Instead it will be
cleared as if a brand new anonymous, private mapping had been created
atomically as part of the mremap() call. If a userfaultfd was watching
the source, it will continue to watch the new mapping. For a mapping
that is shared or not anonymous, MREMAP_DONTUNMAP will cause the
mremap() call to fail. MREMAP_DONTUNMAP requires that MREMAP_FIXED is
also used. The final result is two equally sized VMAs where the
destination contains the PTEs of the source.
We hope to use this in Chrome OS where with userfaultfd we could write
an anonymous mapping to disk without having to STOP the process or worry
about VMA permission changes.
This feature also has a use case in Android, Lokesh Gidra has said
that "As part of using userfaultfd for GC, We'll have to move the physical
pages of the java heap to a separate location. For this purpose mremap
will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
heap, its virtual mapping will be removed as well. Therefore, we'll
require performing mmap immediately after. This is not only time consuming
but also opens a time window where a native thread may call mmap and
reserve the java heap's address range for its own usage. This flag
solves the problem."
Signed-off-by: Brian Geffon <bgeffon-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
include/uapi/linux/mman.h | 5 +++--
mm/mremap.c | 38 +++++++++++++++++++++++++++++++-------
2 files changed, 34 insertions(+), 9 deletions(-)
diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h
index fc1a64c3447b..923cc162609c 100644
--- a/include/uapi/linux/mman.h
+++ b/include/uapi/linux/mman.h
@@ -5,8 +5,9 @@
#include <asm/mman.h>
#include <asm-generic/hugetlb_encode.h>
-#define MREMAP_MAYMOVE 1
-#define MREMAP_FIXED 2
+#define MREMAP_MAYMOVE 1
+#define MREMAP_FIXED 2
+#define MREMAP_DONTUNMAP 4
#define OVERCOMMIT_GUESS 0
#define OVERCOMMIT_ALWAYS 1
diff --git a/mm/mremap.c b/mm/mremap.c
index 122938dcec15..1d164e5fdff0 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -318,8 +318,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
static unsigned long move_vma(struct vm_area_struct *vma,
unsigned long old_addr, unsigned long old_len,
unsigned long new_len, unsigned long new_addr,
- bool *locked, struct vm_userfaultfd_ctx *uf,
- struct list_head *uf_unmap)
+ bool *locked, unsigned long flags,
+ struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap)
{
struct mm_struct *mm = vma->vm_mm;
struct vm_area_struct *new_vma;
@@ -408,6 +408,13 @@ static unsigned long move_vma(struct vm_area_struct *vma,
if (unlikely(vma->vm_flags & VM_PFNMAP))
untrack_pfn_moved(vma);
+ if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) {
+ if (vm_flags & VM_ACCOUNT)
+ vma->vm_flags |= VM_ACCOUNT;
+
+ goto out;
+ }
+
if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) {
/* OOM: unable to split vma, just get accounts right */
vm_unacct_memory(excess >> PAGE_SHIFT);
@@ -422,6 +429,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
vma->vm_next->vm_flags |= VM_ACCOUNT;
}
+out:
if (vm_flags & VM_LOCKED) {
mm->locked_vm += new_len >> PAGE_SHIFT;
*locked = true;
@@ -497,7 +505,7 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
unsigned long new_addr, unsigned long new_len, bool *locked,
- struct vm_userfaultfd_ctx *uf,
+ unsigned long flags, struct vm_userfaultfd_ctx *uf,
struct list_head *uf_unmap_early,
struct list_head *uf_unmap)
{
@@ -551,6 +559,17 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
goto out;
}
+ /*
+ * MREMAP_DONTUNMAP expands by old_len + (new_len - old_len), we will
+ * check that we can expand by old_len and vma_to_resize will handle
+ * the vma growing.
+ */
+ if (unlikely(flags & MREMAP_DONTUNMAP && !may_expand_vm(mm,
+ vma->vm_flags, old_len >> PAGE_SHIFT))) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
map_flags = MAP_FIXED;
if (vma->vm_flags & VM_MAYSHARE)
map_flags |= MAP_SHARED;
@@ -561,7 +580,7 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
if (IS_ERR_VALUE(ret))
goto out1;
- ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, uf,
+ ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, flags, uf,
uf_unmap);
if (!(offset_in_page(ret)))
goto out;
@@ -609,12 +628,16 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
addr = untagged_addr(addr);
new_addr = untagged_addr(new_addr);
- if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE))
+ if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP)) {
return ret;
+ }
if (flags & MREMAP_FIXED && !(flags & MREMAP_MAYMOVE))
return ret;
+ if (flags & MREMAP_DONTUNMAP && !(flags & MREMAP_FIXED))
+ return ret;
+
if (offset_in_page(addr))
return ret;
@@ -634,7 +657,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
if (flags & MREMAP_FIXED) {
ret = mremap_to(addr, old_len, new_addr, new_len,
- &locked, &uf, &uf_unmap_early, &uf_unmap);
+ &locked, flags, &uf, &uf_unmap_early,
+ &uf_unmap);
goto out;
}
@@ -712,7 +736,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
}
ret = move_vma(vma, addr, old_len, new_len, new_addr,
- &locked, &uf, &uf_unmap);
+ &locked, flags, &uf, &uf_unmap);
}
out:
if (offset_in_page(ret)) {
--
2.25.0.341.g760bfbb309-goog
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH v3] mm: Add MREMAP_DONTUNMAP to mremap().
@ 2020-01-27 5:30 ` Brian Geffon
0 siblings, 0 replies; 41+ messages in thread
From: Brian Geffon @ 2020-01-27 5:30 UTC (permalink / raw)
To: Andrew Morton
Cc: Michael S . Tsirkin, Brian Geffon, Arnd Bergmann, linux-kernel,
linux-mm, linux-api, Andy Lutomirski, Andrea Arcangeli, Sonny Rao,
Minchan Kim, Joel Fernandes, Yu Zhao, Jesse Barnes,
Nathan Chancellor
When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is
set, the source mapping will not be removed. Instead it will be
cleared as if a brand new anonymous, private mapping had been created
atomically as part of the mremap() call. If a userfaultfd was watching
the source, it will continue to watch the new mapping. For a mapping
that is shared or not anonymous, MREMAP_DONTUNMAP will cause the
mremap() call to fail. MREMAP_DONTUNMAP requires that MREMAP_FIXED is
also used. The final result is two equally sized VMAs where the
destination contains the PTEs of the source.
We hope to use this in Chrome OS where with userfaultfd we could write
an anonymous mapping to disk without having to STOP the process or worry
about VMA permission changes.
This feature also has a use case in Android, Lokesh Gidra has said
that "As part of using userfaultfd for GC, We'll have to move the physical
pages of the java heap to a separate location. For this purpose mremap
will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
heap, its virtual mapping will be removed as well. Therefore, we'll
require performing mmap immediately after. This is not only time consuming
but also opens a time window where a native thread may call mmap and
reserve the java heap's address range for its own usage. This flag
solves the problem."
Signed-off-by: Brian Geffon <bgeffon@google.com>
---
include/uapi/linux/mman.h | 5 +++--
mm/mremap.c | 38 +++++++++++++++++++++++++++++++-------
2 files changed, 34 insertions(+), 9 deletions(-)
diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h
index fc1a64c3447b..923cc162609c 100644
--- a/include/uapi/linux/mman.h
+++ b/include/uapi/linux/mman.h
@@ -5,8 +5,9 @@
#include <asm/mman.h>
#include <asm-generic/hugetlb_encode.h>
-#define MREMAP_MAYMOVE 1
-#define MREMAP_FIXED 2
+#define MREMAP_MAYMOVE 1
+#define MREMAP_FIXED 2
+#define MREMAP_DONTUNMAP 4
#define OVERCOMMIT_GUESS 0
#define OVERCOMMIT_ALWAYS 1
diff --git a/mm/mremap.c b/mm/mremap.c
index 122938dcec15..1d164e5fdff0 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -318,8 +318,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
static unsigned long move_vma(struct vm_area_struct *vma,
unsigned long old_addr, unsigned long old_len,
unsigned long new_len, unsigned long new_addr,
- bool *locked, struct vm_userfaultfd_ctx *uf,
- struct list_head *uf_unmap)
+ bool *locked, unsigned long flags,
+ struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap)
{
struct mm_struct *mm = vma->vm_mm;
struct vm_area_struct *new_vma;
@@ -408,6 +408,13 @@ static unsigned long move_vma(struct vm_area_struct *vma,
if (unlikely(vma->vm_flags & VM_PFNMAP))
untrack_pfn_moved(vma);
+ if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) {
+ if (vm_flags & VM_ACCOUNT)
+ vma->vm_flags |= VM_ACCOUNT;
+
+ goto out;
+ }
+
if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) {
/* OOM: unable to split vma, just get accounts right */
vm_unacct_memory(excess >> PAGE_SHIFT);
@@ -422,6 +429,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
vma->vm_next->vm_flags |= VM_ACCOUNT;
}
+out:
if (vm_flags & VM_LOCKED) {
mm->locked_vm += new_len >> PAGE_SHIFT;
*locked = true;
@@ -497,7 +505,7 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
unsigned long new_addr, unsigned long new_len, bool *locked,
- struct vm_userfaultfd_ctx *uf,
+ unsigned long flags, struct vm_userfaultfd_ctx *uf,
struct list_head *uf_unmap_early,
struct list_head *uf_unmap)
{
@@ -551,6 +559,17 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
goto out;
}
+ /*
+ * MREMAP_DONTUNMAP expands by old_len + (new_len - old_len), we will
+ * check that we can expand by old_len and vma_to_resize will handle
+ * the vma growing.
+ */
+ if (unlikely(flags & MREMAP_DONTUNMAP && !may_expand_vm(mm,
+ vma->vm_flags, old_len >> PAGE_SHIFT))) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
map_flags = MAP_FIXED;
if (vma->vm_flags & VM_MAYSHARE)
map_flags |= MAP_SHARED;
@@ -561,7 +580,7 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
if (IS_ERR_VALUE(ret))
goto out1;
- ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, uf,
+ ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, flags, uf,
uf_unmap);
if (!(offset_in_page(ret)))
goto out;
@@ -609,12 +628,16 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
addr = untagged_addr(addr);
new_addr = untagged_addr(new_addr);
- if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE))
+ if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP)) {
return ret;
+ }
if (flags & MREMAP_FIXED && !(flags & MREMAP_MAYMOVE))
return ret;
+ if (flags & MREMAP_DONTUNMAP && !(flags & MREMAP_FIXED))
+ return ret;
+
if (offset_in_page(addr))
return ret;
@@ -634,7 +657,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
if (flags & MREMAP_FIXED) {
ret = mremap_to(addr, old_len, new_addr, new_len,
- &locked, &uf, &uf_unmap_early, &uf_unmap);
+ &locked, flags, &uf, &uf_unmap_early,
+ &uf_unmap);
goto out;
}
@@ -712,7 +736,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
}
ret = move_vma(vma, addr, old_len, new_len, new_addr,
- &locked, &uf, &uf_unmap);
+ &locked, flags, &uf, &uf_unmap);
}
out:
if (offset_in_page(ret)) {
--
2.25.0.341.g760bfbb309-goog
^ permalink raw reply related [flat|nested] 41+ messages in thread[parent not found: <20200127053056.213679-1-bgeffon-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH v3] mm: Add MREMAP_DONTUNMAP to mremap().
2020-01-27 5:30 ` Brian Geffon
@ 2020-01-28 15:26 ` Will Deacon
-1 siblings, 0 replies; 41+ messages in thread
From: Will Deacon @ 2020-01-28 15:26 UTC (permalink / raw)
To: Brian Geffon
Cc: Andrew Morton, Michael S . Tsirkin, Arnd Bergmann,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA,
Andy Lutomirski, Andrea Arcangeli, Sonny Rao, Minchan Kim,
Joel Fernandes, Yu Zhao, Jesse Barnes, Nathan Chancellor
Hi Brian,
On Sun, Jan 26, 2020 at 09:30:56PM -0800, Brian Geffon wrote:
> When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is
> set, the source mapping will not be removed. Instead it will be
> cleared as if a brand new anonymous, private mapping had been created
> atomically as part of the mremap() call. If a userfaultfd was watching
> the source, it will continue to watch the new mapping. For a mapping
> that is shared or not anonymous, MREMAP_DONTUNMAP will cause the
> mremap() call to fail. MREMAP_DONTUNMAP requires that MREMAP_FIXED is
> also used. The final result is two equally sized VMAs where the
> destination contains the PTEs of the source.
>
> We hope to use this in Chrome OS where with userfaultfd we could write
> an anonymous mapping to disk without having to STOP the process or worry
> about VMA permission changes.
>
> This feature also has a use case in Android, Lokesh Gidra has said
> that "As part of using userfaultfd for GC, We'll have to move the physical
> pages of the java heap to a separate location. For this purpose mremap
> will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
> heap, its virtual mapping will be removed as well. Therefore, we'll
> require performing mmap immediately after. This is not only time consuming
> but also opens a time window where a native thread may call mmap and
> reserve the java heap's address range for its own usage. This flag
> solves the problem."
Hmm, this sounds like you're dealing with a multi-threaded environment,
yet your change only supports private mappings. How does that work?
It's also worrying because, with two private mappings of the same anonymous
memory live simultaneously, you run the risk of hitting D-cache aliasing
issues on some architectures and losing coherency between them as a result
(even in a single-threaded scenario). Is userspace just supposed to deal
with this, or should we be enforcing SHMLBA alignment?
> Signed-off-by: Brian Geffon <bgeffon-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> ---
> include/uapi/linux/mman.h | 5 +++--
> mm/mremap.c | 38 +++++++++++++++++++++++++++++++-------
> 2 files changed, 34 insertions(+), 9 deletions(-)
Could you also a include a patch to update the mremap man page, please?
https://www.kernel.org/doc/man-pages/patches.html
Cheers,
Will
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v3] mm: Add MREMAP_DONTUNMAP to mremap().
@ 2020-01-28 15:26 ` Will Deacon
0 siblings, 0 replies; 41+ messages in thread
From: Will Deacon @ 2020-01-28 15:26 UTC (permalink / raw)
To: Brian Geffon
Cc: Andrew Morton, Michael S . Tsirkin, Arnd Bergmann, linux-kernel,
linux-mm, linux-api, Andy Lutomirski, Andrea Arcangeli, Sonny Rao,
Minchan Kim, Joel Fernandes, Yu Zhao, Jesse Barnes,
Nathan Chancellor
Hi Brian,
On Sun, Jan 26, 2020 at 09:30:56PM -0800, Brian Geffon wrote:
> When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is
> set, the source mapping will not be removed. Instead it will be
> cleared as if a brand new anonymous, private mapping had been created
> atomically as part of the mremap() call. If a userfaultfd was watching
> the source, it will continue to watch the new mapping. For a mapping
> that is shared or not anonymous, MREMAP_DONTUNMAP will cause the
> mremap() call to fail. MREMAP_DONTUNMAP requires that MREMAP_FIXED is
> also used. The final result is two equally sized VMAs where the
> destination contains the PTEs of the source.
>
> We hope to use this in Chrome OS where with userfaultfd we could write
> an anonymous mapping to disk without having to STOP the process or worry
> about VMA permission changes.
>
> This feature also has a use case in Android, Lokesh Gidra has said
> that "As part of using userfaultfd for GC, We'll have to move the physical
> pages of the java heap to a separate location. For this purpose mremap
> will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
> heap, its virtual mapping will be removed as well. Therefore, we'll
> require performing mmap immediately after. This is not only time consuming
> but also opens a time window where a native thread may call mmap and
> reserve the java heap's address range for its own usage. This flag
> solves the problem."
Hmm, this sounds like you're dealing with a multi-threaded environment,
yet your change only supports private mappings. How does that work?
It's also worrying because, with two private mappings of the same anonymous
memory live simultaneously, you run the risk of hitting D-cache aliasing
issues on some architectures and losing coherency between them as a result
(even in a single-threaded scenario). Is userspace just supposed to deal
with this, or should we be enforcing SHMLBA alignment?
> Signed-off-by: Brian Geffon <bgeffon@google.com>
> ---
> include/uapi/linux/mman.h | 5 +++--
> mm/mremap.c | 38 +++++++++++++++++++++++++++++++-------
> 2 files changed, 34 insertions(+), 9 deletions(-)
Could you also a include a patch to update the mremap man page, please?
https://www.kernel.org/doc/man-pages/patches.html
Cheers,
Will
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v3] mm: Add MREMAP_DONTUNMAP to mremap().
2020-01-28 15:26 ` Will Deacon
@ 2020-01-30 10:12 ` Will Deacon
-1 siblings, 0 replies; 41+ messages in thread
From: Will Deacon @ 2020-01-30 10:12 UTC (permalink / raw)
To: Brian Geffon
Cc: Andrew Morton, Michael S . Tsirkin, Arnd Bergmann,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA,
Andy Lutomirski, Andrea Arcangeli, Sonny Rao, Minchan Kim,
Joel Fernandes, Yu Zhao, Jesse Barnes, Nathan Chancellor
On Tue, Jan 28, 2020 at 03:26:41PM +0000, Will Deacon wrote:
> On Sun, Jan 26, 2020 at 09:30:56PM -0800, Brian Geffon wrote:
> > When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is
> > set, the source mapping will not be removed. Instead it will be
> > cleared as if a brand new anonymous, private mapping had been created
> > atomically as part of the mremap() call. If a userfaultfd was watching
> > the source, it will continue to watch the new mapping. For a mapping
> > that is shared or not anonymous, MREMAP_DONTUNMAP will cause the
> > mremap() call to fail. MREMAP_DONTUNMAP requires that MREMAP_FIXED is
> > also used. The final result is two equally sized VMAs where the
> > destination contains the PTEs of the source.
> >
> > We hope to use this in Chrome OS where with userfaultfd we could write
> > an anonymous mapping to disk without having to STOP the process or worry
> > about VMA permission changes.
> >
> > This feature also has a use case in Android, Lokesh Gidra has said
> > that "As part of using userfaultfd for GC, We'll have to move the physical
> > pages of the java heap to a separate location. For this purpose mremap
> > will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
> > heap, its virtual mapping will be removed as well. Therefore, we'll
> > require performing mmap immediately after. This is not only time consuming
> > but also opens a time window where a native thread may call mmap and
> > reserve the java heap's address range for its own usage. This flag
> > solves the problem."
>
> Hmm, this sounds like you're dealing with a multi-threaded environment,
> yet your change only supports private mappings. How does that work?
Sorry, this was badly worded. I was trying to understand how the GC is
implememented, and whether everything was part of the same process or if
things like memfds or something else were being used to share memory. Having
spoken to Brian off-list, it's all one process...
> It's also worrying because, with two private mappings of the same anonymous
> memory live simultaneously, you run the risk of hitting D-cache aliasing
> issues on some architectures and losing coherency between them as a result
> (even in a single-threaded scenario). Is userspace just supposed to deal
> with this, or should we be enforcing SHMLBA alignment?
... and this was me completely misreading the patch. The old mapping is
still torn down, but then replaced with a new private mapping rather than
being unmapped.
However, looks like there are some issues handling shared mappings with
this patch (and possibly mlock()?), so I'll wait for a new spin.
Will
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v3] mm: Add MREMAP_DONTUNMAP to mremap().
@ 2020-01-30 10:12 ` Will Deacon
0 siblings, 0 replies; 41+ messages in thread
From: Will Deacon @ 2020-01-30 10:12 UTC (permalink / raw)
To: Brian Geffon
Cc: Andrew Morton, Michael S . Tsirkin, Arnd Bergmann, linux-kernel,
linux-mm, linux-api, Andy Lutomirski, Andrea Arcangeli, Sonny Rao,
Minchan Kim, Joel Fernandes, Yu Zhao, Jesse Barnes,
Nathan Chancellor
On Tue, Jan 28, 2020 at 03:26:41PM +0000, Will Deacon wrote:
> On Sun, Jan 26, 2020 at 09:30:56PM -0800, Brian Geffon wrote:
> > When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is
> > set, the source mapping will not be removed. Instead it will be
> > cleared as if a brand new anonymous, private mapping had been created
> > atomically as part of the mremap() call. If a userfaultfd was watching
> > the source, it will continue to watch the new mapping. For a mapping
> > that is shared or not anonymous, MREMAP_DONTUNMAP will cause the
> > mremap() call to fail. MREMAP_DONTUNMAP requires that MREMAP_FIXED is
> > also used. The final result is two equally sized VMAs where the
> > destination contains the PTEs of the source.
> >
> > We hope to use this in Chrome OS where with userfaultfd we could write
> > an anonymous mapping to disk without having to STOP the process or worry
> > about VMA permission changes.
> >
> > This feature also has a use case in Android, Lokesh Gidra has said
> > that "As part of using userfaultfd for GC, We'll have to move the physical
> > pages of the java heap to a separate location. For this purpose mremap
> > will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java
> > heap, its virtual mapping will be removed as well. Therefore, we'll
> > require performing mmap immediately after. This is not only time consuming
> > but also opens a time window where a native thread may call mmap and
> > reserve the java heap's address range for its own usage. This flag
> > solves the problem."
>
> Hmm, this sounds like you're dealing with a multi-threaded environment,
> yet your change only supports private mappings. How does that work?
Sorry, this was badly worded. I was trying to understand how the GC is
implememented, and whether everything was part of the same process or if
things like memfds or something else were being used to share memory. Having
spoken to Brian off-list, it's all one process...
> It's also worrying because, with two private mappings of the same anonymous
> memory live simultaneously, you run the risk of hitting D-cache aliasing
> issues on some architectures and losing coherency between them as a result
> (even in a single-threaded scenario). Is userspace just supposed to deal
> with this, or should we be enforcing SHMLBA alignment?
... and this was me completely misreading the patch. The old mapping is
still torn down, but then replaced with a new private mapping rather than
being unmapped.
However, looks like there are some issues handling shared mappings with
this patch (and possibly mlock()?), so I'll wait for a new spin.
Will
^ permalink raw reply [flat|nested] 41+ messages in thread