linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held
@ 2012-12-21  0:49 Michel Lespinasse
  2012-12-21  0:49 ` [PATCH 1/9] mm: make mlockall preserve flags other than VM_LOCKED in def_flags Michel Lespinasse
                   ` (11 more replies)
  0 siblings, 12 replies; 40+ messages in thread
From: Michel Lespinasse @ 2012-12-21  0:49 UTC (permalink / raw)
  To: Andy Lutomirski, Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel,
	Rik van Riel
  Cc: Andrew Morton, linux-mm, linux-kernel

We have many vma manipulation functions that are fast in the typical case,
but can optionally be instructed to populate an unbounded number of ptes
within the region they work on:
- mmap with MAP_POPULATE or MAP_LOCKED flags;
- remap_file_pages() with MAP_NONBLOCK not set or when working on a
  VM_LOCKED vma;
- mmap_region() and all its wrappers when mlock(MCL_FUTURE) is in effect;
- brk() when mlock(MCL_FUTURE) is in effect.

Current code handles these pte operations locally, while the sourrounding
code has to hold the mmap_sem write side since it's manipulating vmas.
This means we're doing an unbounded amount of pte population work with
mmap_sem held, and this causes problems as Andy Lutomirski reported
(we've hit this at Google as well, though it's not entirely clear why
people keep trying to use mlock(MCL_FUTURE) in the first place).

I propose introducing a new mm_populate() function to do this pte
population work after the mmap_sem has been released. mm_populate()
does need to acquire the mmap_sem read side, but critically, it
doesn't need to hold continuously for the entire duration of the
operation - it can drop it whenever things take too long (such as when
hitting disk for a file read) and re-acquire it later on.

The following patches are against v3.7:

- Patches 1-2 fix some issues I noticed while working on the existing code.
  If needed, they could potentially go in before the rest of the patches.

- Patch 3 introduces the new mm_populate() function and changes
  mmap_region() call sites to use it after they drop mmap_sem. This is
  inspired from Andy Lutomirski's proposal and is built as an extension
  of the work I had previously done for mlock() and mlockall() around
  v2.6.38-rc1. I had tried doing something similar at the time but had
  given up as there were so many do_mmap() call sites; the recent cleanups
  by Linus and Viro are a tremendous help here.

- Patches 4-6 convert some of the less-obvious places doing unbounded
  pte populates to the new mm_populate() mechanism.

- Patches 7-8 are code cleanups that are made possible by the
  mm_populate() work. In particular, they remove more code than the
  entire patch series added, which should be a good thing :)

- Patch 9 is optional to this entire series. It only helps to deal more
  nicely with racy userspace programs that might modify their mappings
  while we're trying to populate them. It adds a new VM_POPULATE flag
  on the mappings we do want to populate, so that if userspace replaces
  them with mappings it doesn't want populated, mm_populate() won't
  populate those replacement mappings.

Michel Lespinasse (9):
  mm: make mlockall preserve flags other than VM_LOCKED in def_flags
  mm: remap_file_pages() fixes
  mm: introduce mm_populate() for populating new vmas
  mm: use mm_populate() for blocking remap_file_pages()
  mm: use mm_populate() when adjusting brk with MCL_FUTURE in effect.
  mm: use mm_populate() for mremap() of VM_LOCKED vmas
  mm: remove flags argument to mmap_region
  mm: directly use __mlock_vma_pages_range() in find_extend_vma()
  mm: introduce VM_POPULATE flag to better deal with racy userspace programs

 arch/tile/mm/elf.c   |    1 -
 fs/aio.c             |    6 +++-
 include/linux/mm.h   |   23 +++++++++---
 include/linux/mman.h |    4 ++-
 ipc/shm.c            |   12 ++++---
 mm/fremap.c          |   51 ++++++++++++++-------------
 mm/internal.h        |    4 +-
 mm/memory.c          |   24 -------------
 mm/mlock.c           |   94 +++++++++++++------------------------------------
 mm/mmap.c            |   77 ++++++++++++++++++++++++----------------
 mm/mremap.c          |   25 +++++++------
 mm/nommu.c           |    5 ++-
 mm/util.c            |    6 +++-
 13 files changed, 154 insertions(+), 178 deletions(-)

-- 
1.7.7.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 1/9] mm: make mlockall preserve flags other than VM_LOCKED in def_flags
  2012-12-21  0:49 [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held Michel Lespinasse
@ 2012-12-21  0:49 ` Michel Lespinasse
  2012-12-22  4:25   ` Rik van Riel
  2012-12-21  0:49 ` [PATCH 2/9] mm: remap_file_pages() fixes Michel Lespinasse
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 40+ messages in thread
From: Michel Lespinasse @ 2012-12-21  0:49 UTC (permalink / raw)
  To: Andy Lutomirski, Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel,
	Rik van Riel
  Cc: Andrew Morton, linux-mm, linux-kernel

On most architectures, def_flags is either 0 or VM_LOCKED depending on
whether mlockall(MCL_FUTURE) was called. However, this is not an absolute
rule as kvm support on s390 may set the VM_NOHUGEPAGE flag in def_flags.
We don't want mlockall to clear that.

Signed-off-by: Michel Lespinasse <walken@google.com>

---
 mm/mlock.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/mlock.c b/mm/mlock.c
index f0b9ce572fc7..a2ee45c030fa 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -517,10 +517,11 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
 static int do_mlockall(int flags)
 {
 	struct vm_area_struct * vma, * prev = NULL;
-	unsigned int def_flags = 0;
+	unsigned int def_flags;
 
+	def_flags = current->mm->def_flags & ~VM_LOCKED;
 	if (flags & MCL_FUTURE)
-		def_flags = VM_LOCKED;
+		def_flags |= VM_LOCKED;
 	current->mm->def_flags = def_flags;
 	if (flags == MCL_FUTURE)
 		goto out;
-- 
1.7.7.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 2/9] mm: remap_file_pages() fixes
  2012-12-21  0:49 [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held Michel Lespinasse
  2012-12-21  0:49 ` [PATCH 1/9] mm: make mlockall preserve flags other than VM_LOCKED in def_flags Michel Lespinasse
@ 2012-12-21  0:49 ` Michel Lespinasse
  2013-01-03  0:31   ` Rik van Riel
  2012-12-21  0:49 ` [PATCH 3/9] mm: introduce mm_populate() for populating new vmas Michel Lespinasse
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 40+ messages in thread
From: Michel Lespinasse @ 2012-12-21  0:49 UTC (permalink / raw)
  To: Andy Lutomirski, Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel,
	Rik van Riel
  Cc: Andrew Morton, linux-mm, linux-kernel

Assorted small fixes. The first two are quite small:

- Move check for vma->vm_private_data && !(vma->vm_flags & VM_NONLINEAR)
  within existing if (!(vma->vm_flags & VM_NONLINEAR)) block.
  Purely cosmetic.

- In the VM_LOCKED case, when dropping PG_Mlocked for the over-mapped
  range, make sure we own the mmap_sem write lock around the
  munlock_vma_pages_range call as this manipulates the vma's vm_flags.

Last fix requires a longer explanation. remap_file_pages() can do its work
either through VM_NONLINEAR manipulation or by creating extra vmas.
These two cases were inconsistent with each other (and ultimately, both wrong)
as to exactly when did they fault in the newly mapped file pages:

- In the VM_NONLINEAR case, new file pages would be populated if
  the MAP_NONBLOCK flag wasn't passed. If MAP_NONBLOCK was passed,
  new file pages wouldn't be populated even if the vma is already
  marked as VM_LOCKED.

- In the linear (emulated) case, the work is passed to the mmap_region()
  function which would populate the pages if the vma is marked as
  VM_LOCKED, and would not otherwise - regardless of the value of the
  MAP_NONBLOCK flag, because MAP_POPULATE wasn't being passed to
  mmap_region().

The desired behavior is that we want the pages to be populated and locked
if the vma is marked as VM_LOCKED, or to be populated if the MAP_NONBLOCK
flag is not passed to remap_file_pages().

Signed-off-by: Michel Lespinasse <walken@google.com>

---
 mm/fremap.c |   22 ++++++++++++++--------
 1 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/mm/fremap.c b/mm/fremap.c
index a0aaf0e56800..2db886e31044 100644
--- a/mm/fremap.c
+++ b/mm/fremap.c
@@ -160,15 +160,11 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
 	/*
 	 * Make sure the vma is shared, that it supports prefaulting,
 	 * and that the remapped range is valid and fully within
-	 * the single existing vma.  vm_private_data is used as a
-	 * swapout cursor in a VM_NONLINEAR vma.
+	 * the single existing vma.
 	 */
 	if (!vma || !(vma->vm_flags & VM_SHARED))
 		goto out;
 
-	if (vma->vm_private_data && !(vma->vm_flags & VM_NONLINEAR))
-		goto out;
-
 	if (!vma->vm_ops || !vma->vm_ops->remap_pages)
 		goto out;
 
@@ -177,6 +173,13 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
 
 	/* Must set VM_NONLINEAR before any pages are populated. */
 	if (!(vma->vm_flags & VM_NONLINEAR)) {
+		/*
+		 * vm_private_data is used as a swapout cursor
+		 * in a VM_NONLINEAR vma.
+		 */
+		if (vma->vm_private_data)
+			goto out;
+
 		/* Don't need a nonlinear mapping, exit success */
 		if (pgoff == linear_page_index(vma, start)) {
 			err = 0;
@@ -184,6 +187,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
 		}
 
 		if (!has_write_lock) {
+get_write_lock:
 			up_read(&mm->mmap_sem);
 			down_write(&mm->mmap_sem);
 			has_write_lock = 1;
@@ -199,7 +203,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
 			unsigned long addr;
 			struct file *file = get_file(vma->vm_file);
 
-			flags &= MAP_NONBLOCK;
+			flags = (flags & MAP_NONBLOCK) | MAP_POPULATE;
 			addr = mmap_region(file, start, size,
 					flags, vma->vm_flags, pgoff);
 			fput(file);
@@ -225,6 +229,8 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
 		 * drop PG_Mlocked flag for over-mapped range
 		 */
 		vm_flags_t saved_flags = vma->vm_flags;
+		if (!has_write_lock)
+			goto get_write_lock;
 		munlock_vma_pages_range(vma, start, start + size);
 		vma->vm_flags = saved_flags;
 	}
@@ -232,13 +238,13 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
 	mmu_notifier_invalidate_range_start(mm, start, start + size);
 	err = vma->vm_ops->remap_pages(vma, start, size, pgoff);
 	mmu_notifier_invalidate_range_end(mm, start, start + size);
-	if (!err && !(flags & MAP_NONBLOCK)) {
+	if (!err) {
 		if (vma->vm_flags & VM_LOCKED) {
 			/*
 			 * might be mapping previously unmapped range of file
 			 */
 			mlock_vma_pages_range(vma, start, start + size);
-		} else {
+		} else if (!(flags & MAP_NONBLOCK)) {
 			if (unlikely(has_write_lock)) {
 				downgrade_write(&mm->mmap_sem);
 				has_write_lock = 0;
-- 
1.7.7.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 3/9] mm: introduce mm_populate() for populating new vmas
  2012-12-21  0:49 [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held Michel Lespinasse
  2012-12-21  0:49 ` [PATCH 1/9] mm: make mlockall preserve flags other than VM_LOCKED in def_flags Michel Lespinasse
  2012-12-21  0:49 ` [PATCH 2/9] mm: remap_file_pages() fixes Michel Lespinasse
@ 2012-12-21  0:49 ` Michel Lespinasse
  2013-01-03  2:14   ` Rik van Riel
  2012-12-21  0:49 ` [PATCH 4/9] mm: use mm_populate() for blocking remap_file_pages() Michel Lespinasse
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 40+ messages in thread
From: Michel Lespinasse @ 2012-12-21  0:49 UTC (permalink / raw)
  To: Andy Lutomirski, Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel,
	Rik van Riel
  Cc: Andrew Morton, linux-mm, linux-kernel

When creating new mappings using the MAP_POPULATE / MAP_LOCKED flags
(or with MCL_FUTURE in effect), we want to populate the pages within the
newly created vmas. This may take a while as we may have to read pages
from disk, so ideally we want to do this outside of the write-locked
mmap_sem region.

This change introduces mm_populate(), which is used to defer populating
such mappings until after the mmap_sem write lock has been released.
This is implemented as a generalization of the former do_mlock_pages(),
which accomplished the same task but was using during mlock() / mlockall().

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Michel Lespinasse <walken@google.com>

---
 fs/aio.c           |    6 +++++-
 include/linux/mm.h |   18 +++++++++++++++---
 ipc/shm.c          |   12 +++++++-----
 mm/mlock.c         |   17 +++++++++++------
 mm/mmap.c          |   20 +++++++++++++++-----
 mm/nommu.c         |    5 ++++-
 mm/util.c          |    6 +++++-
 7 files changed, 62 insertions(+), 22 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 71f613cf4a85..82eec7c7b4bb 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -103,6 +103,7 @@ static int aio_setup_ring(struct kioctx *ctx)
 	unsigned nr_events = ctx->max_reqs;
 	unsigned long size;
 	int nr_pages;
+	bool populate;
 
 	/* Compensate for the ring buffer's head/tail overlap entry */
 	nr_events += 2;	/* 1 is required, 2 for good luck */
@@ -129,7 +130,8 @@ static int aio_setup_ring(struct kioctx *ctx)
 	down_write(&ctx->mm->mmap_sem);
 	info->mmap_base = do_mmap_pgoff(NULL, 0, info->mmap_size, 
 					PROT_READ|PROT_WRITE,
-					MAP_ANONYMOUS|MAP_PRIVATE, 0);
+					MAP_ANONYMOUS|MAP_PRIVATE, 0,
+					&populate);
 	if (IS_ERR((void *)info->mmap_base)) {
 		up_write(&ctx->mm->mmap_sem);
 		info->mmap_size = 0;
@@ -147,6 +149,8 @@ static int aio_setup_ring(struct kioctx *ctx)
 		aio_free_ring(ctx);
 		return -EAGAIN;
 	}
+	if (populate)
+		mm_populate(info->mmap_base, info->mmap_size);
 
 	ctx->user_id = info->mmap_base;
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bcaab4e6fe91..fea461cd9027 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1444,11 +1444,23 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo
 extern unsigned long mmap_region(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long flags,
 	vm_flags_t vm_flags, unsigned long pgoff);
-extern unsigned long do_mmap_pgoff(struct file *, unsigned long,
-        unsigned long, unsigned long,
-        unsigned long, unsigned long);
+extern unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
+	unsigned long len, unsigned long prot, unsigned long flags,
+	unsigned long pgoff, bool *populate);
 extern int do_munmap(struct mm_struct *, unsigned long, size_t);
 
+#ifdef CONFIG_MMU
+extern int __mm_populate(unsigned long addr, unsigned long len,
+			 int ignore_errors);
+static inline void mm_populate(unsigned long addr, unsigned long len)
+{
+	/* Ignore errors */
+	(void) __mm_populate(addr, len, 1);
+}
+#else
+static inline void mm_populate(unsigned long addr, unsigned long len) {}
+#endif
+
 /* These take the mm semaphore themselves */
 extern unsigned long vm_brk(unsigned long, unsigned long);
 extern int vm_munmap(unsigned long, size_t);
diff --git a/ipc/shm.c b/ipc/shm.c
index dff40c9f73c9..ee2dde1f94d1 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -966,11 +966,11 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, ulong *raddr,
 	unsigned long flags;
 	unsigned long prot;
 	int acc_mode;
-	unsigned long user_addr;
 	struct ipc_namespace *ns;
 	struct shm_file_data *sfd;
 	struct path path;
 	fmode_t f_mode;
+	bool populate = false;
 
 	err = -EINVAL;
 	if (shmid < 0)
@@ -1069,13 +1069,15 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, ulong *raddr,
 			goto invalid;
 	}
 		
-	user_addr = do_mmap_pgoff(file, addr, size, prot, flags, 0);
-	*raddr = user_addr;
+	addr = do_mmap_pgoff(file, addr, size, prot, flags, 0, &populate);
+	*raddr = addr;
 	err = 0;
-	if (IS_ERR_VALUE(user_addr))
-		err = (long)user_addr;
+	if (IS_ERR_VALUE(addr))
+		err = (long)addr;
 invalid:
 	up_write(&current->mm->mmap_sem);
+	if (populate)
+		mm_populate(addr, size);
 
 out_fput:
 	fput(file);
diff --git a/mm/mlock.c b/mm/mlock.c
index a2ee45c030fa..7f94bc3b46ef 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -416,7 +416,14 @@ static int do_mlock(unsigned long start, size_t len, int on)
 	return error;
 }
 
-static int do_mlock_pages(unsigned long start, size_t len, int ignore_errors)
+/*
+ * __mm_populate - populate and/or mlock pages within a range of address space.
+ *
+ * This is used to implement mlock() and the MAP_POPULATE / MAP_LOCKED mmap
+ * flags. VMAs must be already marked with the desired vm_flags, and
+ * mmap_sem must not be held.
+ */
+int __mm_populate(unsigned long start, unsigned long len, int ignore_errors)
 {
 	struct mm_struct *mm = current->mm;
 	unsigned long end, nstart, nend;
@@ -498,7 +505,7 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
 		error = do_mlock(start, len, 1);
 	up_write(&current->mm->mmap_sem);
 	if (!error)
-		error = do_mlock_pages(start, len, 0);
+		error = __mm_populate(start, len, 0);
 	return error;
 }
 
@@ -565,10 +572,8 @@ SYSCALL_DEFINE1(mlockall, int, flags)
 	    capable(CAP_IPC_LOCK))
 		ret = do_mlockall(flags);
 	up_write(&current->mm->mmap_sem);
-	if (!ret && (flags & MCL_CURRENT)) {
-		/* Ignore errors */
-		do_mlock_pages(0, TASK_SIZE, 1);
-	}
+	if (!ret && (flags & MCL_CURRENT))
+		mm_populate(0, TASK_SIZE);
 out:
 	return ret;
 }
diff --git a/mm/mmap.c b/mm/mmap.c
index 9a796c41e7d9..a16fc499dbd1 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1001,12 +1001,15 @@ static inline unsigned long round_hint_to_min(unsigned long hint)
 
 unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
 			unsigned long len, unsigned long prot,
-			unsigned long flags, unsigned long pgoff)
+			unsigned long flags, unsigned long pgoff,
+			bool *populate)
 {
 	struct mm_struct * mm = current->mm;
 	struct inode *inode;
 	vm_flags_t vm_flags;
 
+	*populate = false;
+
 	/*
 	 * Does the application expect PROT_READ to imply PROT_EXEC?
 	 *
@@ -1127,7 +1130,12 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
 		}
 	}
 
-	return mmap_region(file, addr, len, flags, vm_flags, pgoff);
+	addr = mmap_region(file, addr, len, flags, vm_flags, pgoff);
+	if (!IS_ERR_VALUE(addr) &&
+	    ((vm_flags & VM_LOCKED) ||
+	     (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE))
+		*populate = true;
+	return addr;
 }
 
 SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
@@ -1373,10 +1381,12 @@ out:
 
 	vm_stat_account(mm, vm_flags, file, len >> PAGE_SHIFT);
 	if (vm_flags & VM_LOCKED) {
-		if (!mlock_vma_pages_range(vma, addr, addr + len))
+		if (!((vm_flags & VM_SPECIAL) || is_vm_hugetlb_page(vma) ||
+					vma == get_gate_vma(current->mm)))
 			mm->locked_vm += (len >> PAGE_SHIFT);
-	} else if ((flags & MAP_POPULATE) && !(flags & MAP_NONBLOCK))
-		make_pages_present(addr, addr + len);
+		else
+			vma->vm_flags &= ~VM_LOCKED;
+	}
 
 	if (file)
 		uprobe_mmap(vma);
diff --git a/mm/nommu.c b/mm/nommu.c
index 45131b41bcdb..d7690b97b81d 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1234,7 +1234,8 @@ unsigned long do_mmap_pgoff(struct file *file,
 			    unsigned long len,
 			    unsigned long prot,
 			    unsigned long flags,
-			    unsigned long pgoff)
+			    unsigned long pgoff,
+			    bool *populate)
 {
 	struct vm_area_struct *vma;
 	struct vm_region *region;
@@ -1244,6 +1245,8 @@ unsigned long do_mmap_pgoff(struct file *file,
 
 	kenter(",%lx,%lx,%lx,%lx,%lx", addr, len, prot, flags, pgoff);
 
+	*populate = false;
+
 	/* decide whether we should attempt the mapping, and if so what sort of
 	 * mapping */
 	ret = validate_mmap_request(file, addr, len, prot, flags, pgoff,
diff --git a/mm/util.c b/mm/util.c
index dc3036cdcc6a..44f006ac2ccd 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -355,12 +355,16 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 {
 	unsigned long ret;
 	struct mm_struct *mm = current->mm;
+	bool populate;
 
 	ret = security_mmap_file(file, prot, flag);
 	if (!ret) {
 		down_write(&mm->mmap_sem);
-		ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff);
+		ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff,
+				    &populate);
 		up_write(&mm->mmap_sem);
+		if (!IS_ERR_VALUE(ret) && populate)
+			mm_populate(ret, len);
 	}
 	return ret;
 }
-- 
1.7.7.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 4/9] mm: use mm_populate() for blocking remap_file_pages()
  2012-12-21  0:49 [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held Michel Lespinasse
                   ` (2 preceding siblings ...)
  2012-12-21  0:49 ` [PATCH 3/9] mm: introduce mm_populate() for populating new vmas Michel Lespinasse
@ 2012-12-21  0:49 ` Michel Lespinasse
  2013-01-03  2:25   ` Rik van Riel
  2013-03-10 18:55   ` Tommi Rantala
  2012-12-21  0:49 ` [PATCH 5/9] mm: use mm_populate() when adjusting brk with MCL_FUTURE in effect Michel Lespinasse
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 40+ messages in thread
From: Michel Lespinasse @ 2012-12-21  0:49 UTC (permalink / raw)
  To: Andy Lutomirski, Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel,
	Rik van Riel
  Cc: Andrew Morton, linux-mm, linux-kernel

Signed-off-by: Michel Lespinasse <walken@google.com>

---
 mm/fremap.c |   22 ++++++----------------
 1 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/mm/fremap.c b/mm/fremap.c
index 2db886e31044..b42e32171530 100644
--- a/mm/fremap.c
+++ b/mm/fremap.c
@@ -129,6 +129,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
 	struct vm_area_struct *vma;
 	int err = -EINVAL;
 	int has_write_lock = 0;
+	vm_flags_t vm_flags;
 
 	if (prot)
 		return err;
@@ -228,30 +229,16 @@ get_write_lock:
 		/*
 		 * drop PG_Mlocked flag for over-mapped range
 		 */
-		vm_flags_t saved_flags = vma->vm_flags;
 		if (!has_write_lock)
 			goto get_write_lock;
+		vm_flags = vma->vm_flags;
 		munlock_vma_pages_range(vma, start, start + size);
-		vma->vm_flags = saved_flags;
+		vma->vm_flags = vm_flags;
 	}
 
 	mmu_notifier_invalidate_range_start(mm, start, start + size);
 	err = vma->vm_ops->remap_pages(vma, start, size, pgoff);
 	mmu_notifier_invalidate_range_end(mm, start, start + size);
-	if (!err) {
-		if (vma->vm_flags & VM_LOCKED) {
-			/*
-			 * might be mapping previously unmapped range of file
-			 */
-			mlock_vma_pages_range(vma, start, start + size);
-		} else if (!(flags & MAP_NONBLOCK)) {
-			if (unlikely(has_write_lock)) {
-				downgrade_write(&mm->mmap_sem);
-				has_write_lock = 0;
-			}
-			make_pages_present(start, start+size);
-		}
-	}
 
 	/*
 	 * We can't clear VM_NONLINEAR because we'd have to do
@@ -260,10 +247,13 @@ get_write_lock:
 	 */
 
 out:
+	vm_flags = vma->vm_flags;
 	if (likely(!has_write_lock))
 		up_read(&mm->mmap_sem);
 	else
 		up_write(&mm->mmap_sem);
+	if (!err && ((vm_flags & VM_LOCKED) || !(flags & MAP_NONBLOCK)))
+		mm_populate(start, size);
 
 	return err;
 }
-- 
1.7.7.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 5/9] mm: use mm_populate() when adjusting brk with MCL_FUTURE in effect.
  2012-12-21  0:49 [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held Michel Lespinasse
                   ` (3 preceding siblings ...)
  2012-12-21  0:49 ` [PATCH 4/9] mm: use mm_populate() for blocking remap_file_pages() Michel Lespinasse
@ 2012-12-21  0:49 ` Michel Lespinasse
  2013-01-03  2:56   ` Rik van Riel
  2012-12-21  0:49 ` [PATCH 6/9] mm: use mm_populate() for mremap() of VM_LOCKED vmas Michel Lespinasse
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 40+ messages in thread
From: Michel Lespinasse @ 2012-12-21  0:49 UTC (permalink / raw)
  To: Andy Lutomirski, Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel,
	Rik van Riel
  Cc: Andrew Morton, linux-mm, linux-kernel

Signed-off-by: Michel Lespinasse <walken@google.com>

---
 mm/mmap.c |   18 ++++++++++++++----
 1 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index a16fc499dbd1..4c8d39e64e80 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -240,6 +240,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
 	unsigned long newbrk, oldbrk;
 	struct mm_struct *mm = current->mm;
 	unsigned long min_brk;
+	bool populate;
 
 	down_write(&mm->mmap_sem);
 
@@ -289,8 +290,15 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
 	/* Ok, looks good - let it rip. */
 	if (do_brk(oldbrk, newbrk-oldbrk) != oldbrk)
 		goto out;
+
 set_brk:
 	mm->brk = brk;
+	populate = newbrk > oldbrk && (mm->def_flags & VM_LOCKED) != 0;
+	up_write(&mm->mmap_sem);
+	if (populate)
+		mm_populate(oldbrk, newbrk - oldbrk);
+	return brk;
+
 out:
 	retval = mm->brk;
 	up_write(&mm->mmap_sem);
@@ -2274,10 +2282,8 @@ static unsigned long do_brk(unsigned long addr, unsigned long len)
 out:
 	perf_event_mmap(vma);
 	mm->total_vm += len >> PAGE_SHIFT;
-	if (flags & VM_LOCKED) {
-		if (!mlock_vma_pages_range(vma, addr, addr + len))
-			mm->locked_vm += (len >> PAGE_SHIFT);
-	}
+	if (flags & VM_LOCKED)
+		mm->locked_vm += (len >> PAGE_SHIFT);
 	return addr;
 }
 
@@ -2285,10 +2291,14 @@ unsigned long vm_brk(unsigned long addr, unsigned long len)
 {
 	struct mm_struct *mm = current->mm;
 	unsigned long ret;
+	bool populate;
 
 	down_write(&mm->mmap_sem);
 	ret = do_brk(addr, len);
+	populate = ((mm->def_flags & VM_LOCKED) != 0);
 	up_write(&mm->mmap_sem);
+	if (populate)
+		mm_populate(addr, len);
 	return ret;
 }
 EXPORT_SYMBOL(vm_brk);
-- 
1.7.7.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 6/9] mm: use mm_populate() for mremap() of VM_LOCKED vmas
  2012-12-21  0:49 [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held Michel Lespinasse
                   ` (4 preceding siblings ...)
  2012-12-21  0:49 ` [PATCH 5/9] mm: use mm_populate() when adjusting brk with MCL_FUTURE in effect Michel Lespinasse
@ 2012-12-21  0:49 ` Michel Lespinasse
  2013-01-03  5:47   ` Rik van Riel
  2012-12-21  0:49 ` [PATCH 7/9] mm: remove flags argument to mmap_region Michel Lespinasse
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 40+ messages in thread
From: Michel Lespinasse @ 2012-12-21  0:49 UTC (permalink / raw)
  To: Andy Lutomirski, Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel,
	Rik van Riel
  Cc: Andrew Morton, linux-mm, linux-kernel

Signed-off-by: Michel Lespinasse <walken@google.com>

---
 mm/mremap.c |   25 +++++++++++++------------
 1 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index 1b61c2d3307a..c5a8bf344b1f 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -208,7 +208,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 
 static unsigned long move_vma(struct vm_area_struct *vma,
 		unsigned long old_addr, unsigned long old_len,
-		unsigned long new_len, unsigned long new_addr)
+		unsigned long new_len, unsigned long new_addr, bool *locked)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	struct vm_area_struct *new_vma;
@@ -299,9 +299,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
 
 	if (vm_flags & VM_LOCKED) {
 		mm->locked_vm += new_len >> PAGE_SHIFT;
-		if (new_len > old_len)
-			mlock_vma_pages_range(new_vma, new_addr + old_len,
-						       new_addr + new_len);
+		*locked = true;
 	}
 
 	return new_addr;
@@ -366,9 +364,8 @@ Eagain:
 	return ERR_PTR(-EAGAIN);
 }
 
-static unsigned long mremap_to(unsigned long addr,
-	unsigned long old_len, unsigned long new_addr,
-	unsigned long new_len)
+static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
+		unsigned long new_addr, unsigned long new_len, bool *locked)
 {
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma;
@@ -418,7 +415,7 @@ static unsigned long mremap_to(unsigned long addr,
 	if (ret & ~PAGE_MASK)
 		goto out1;
 
-	ret = move_vma(vma, addr, old_len, new_len, new_addr);
+	ret = move_vma(vma, addr, old_len, new_len, new_addr, locked);
 	if (!(ret & ~PAGE_MASK))
 		goto out;
 out1:
@@ -456,6 +453,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 	struct vm_area_struct *vma;
 	unsigned long ret = -EINVAL;
 	unsigned long charged = 0;
+	bool locked = false;
 
 	down_write(&current->mm->mmap_sem);
 
@@ -478,7 +476,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 
 	if (flags & MREMAP_FIXED) {
 		if (flags & MREMAP_MAYMOVE)
-			ret = mremap_to(addr, old_len, new_addr, new_len);
+			ret = mremap_to(addr, old_len, new_addr, new_len,
+					&locked);
 		goto out;
 	}
 
@@ -520,8 +519,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 			vm_stat_account(mm, vma->vm_flags, vma->vm_file, pages);
 			if (vma->vm_flags & VM_LOCKED) {
 				mm->locked_vm += pages;
-				mlock_vma_pages_range(vma, addr + old_len,
-						   addr + new_len);
+				locked = true;
+				new_addr = addr;
 			}
 			ret = addr;
 			goto out;
@@ -547,11 +546,13 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 			goto out;
 		}
 
-		ret = move_vma(vma, addr, old_len, new_len, new_addr);
+		ret = move_vma(vma, addr, old_len, new_len, new_addr, &locked);
 	}
 out:
 	if (ret & ~PAGE_MASK)
 		vm_unacct_memory(charged);
 	up_write(&current->mm->mmap_sem);
+	if (locked && new_len > old_len)
+		mm_populate(new_addr + old_len, new_len - old_len);
 	return ret;
 }
-- 
1.7.7.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 7/9] mm: remove flags argument to mmap_region
  2012-12-21  0:49 [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held Michel Lespinasse
                   ` (5 preceding siblings ...)
  2012-12-21  0:49 ` [PATCH 6/9] mm: use mm_populate() for mremap() of VM_LOCKED vmas Michel Lespinasse
@ 2012-12-21  0:49 ` Michel Lespinasse
  2013-01-03  5:49   ` Rik van Riel
  2012-12-21  0:49 ` [PATCH 8/9] mm: directly use __mlock_vma_pages_range() in find_extend_vma() Michel Lespinasse
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 40+ messages in thread
From: Michel Lespinasse @ 2012-12-21  0:49 UTC (permalink / raw)
  To: Andy Lutomirski, Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel,
	Rik van Riel
  Cc: Andrew Morton, linux-mm, linux-kernel

After the MAP_POPULATE handling has been moved to mmap_region() call sites,
the only remaining use of the flags argument is to pass the MAP_NORESERVE
flag. This can be just as easily handled by do_mmap_pgoff(), so do that
and remove the mmap_region() flags parameter.

Signed-off-by: Michel Lespinasse <walken@google.com>

---
 arch/tile/mm/elf.c |    1 -
 include/linux/mm.h |    3 +--
 mm/fremap.c        |    3 +--
 mm/mmap.c          |   33 ++++++++++++++++-----------------
 4 files changed, 18 insertions(+), 22 deletions(-)

diff --git a/arch/tile/mm/elf.c b/arch/tile/mm/elf.c
index 3cfa98bf9125..743c951c61b0 100644
--- a/arch/tile/mm/elf.c
+++ b/arch/tile/mm/elf.c
@@ -130,7 +130,6 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
 	if (!retval) {
 		unsigned long addr = MEM_USER_INTRPT;
 		addr = mmap_region(NULL, addr, INTRPT_SIZE,
-				   MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE,
 				   VM_READ|VM_EXEC|
 				   VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, 0);
 		if (addr > (unsigned long) -PAGE_SIZE)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index fea461cd9027..3b2912f6e91a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1442,8 +1442,7 @@ extern int install_special_mapping(struct mm_struct *mm,
 extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
 
 extern unsigned long mmap_region(struct file *file, unsigned long addr,
-	unsigned long len, unsigned long flags,
-	vm_flags_t vm_flags, unsigned long pgoff);
+	unsigned long len, vm_flags_t vm_flags, unsigned long pgoff);
 extern unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot, unsigned long flags,
 	unsigned long pgoff, bool *populate);
diff --git a/mm/fremap.c b/mm/fremap.c
index b42e32171530..503a72387087 100644
--- a/mm/fremap.c
+++ b/mm/fremap.c
@@ -204,9 +204,8 @@ get_write_lock:
 			unsigned long addr;
 			struct file *file = get_file(vma->vm_file);
 
-			flags = (flags & MAP_NONBLOCK) | MAP_POPULATE;
 			addr = mmap_region(file, start, size,
-					flags, vma->vm_flags, pgoff);
+					vma->vm_flags, pgoff);
 			fput(file);
 			if (IS_ERR_VALUE(addr)) {
 				err = addr;
diff --git a/mm/mmap.c b/mm/mmap.c
index 4c8d39e64e80..b0a341e5685f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1138,7 +1138,21 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
 		}
 	}
 
-	addr = mmap_region(file, addr, len, flags, vm_flags, pgoff);
+	/*
+	 * Set 'VM_NORESERVE' if we should not account for the
+	 * memory use of this mapping.
+	 */
+	if ((flags & MAP_NORESERVE)) {
+		/* We honor MAP_NORESERVE if allowed to overcommit */
+		if (sysctl_overcommit_memory != OVERCOMMIT_NEVER)
+			vm_flags |= VM_NORESERVE;
+
+		/* hugetlb applies strict overcommit unless MAP_NORESERVE */
+		if (file && is_file_hugepages(file))
+			vm_flags |= VM_NORESERVE;
+	}
+
+	addr = mmap_region(file, addr, len, vm_flags, pgoff);
 	if (!IS_ERR_VALUE(addr) &&
 	    ((vm_flags & VM_LOCKED) ||
 	     (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE))
@@ -1257,8 +1271,7 @@ static inline int accountable_mapping(struct file *file, vm_flags_t vm_flags)
 }
 
 unsigned long mmap_region(struct file *file, unsigned long addr,
-			  unsigned long len, unsigned long flags,
-			  vm_flags_t vm_flags, unsigned long pgoff)
+		unsigned long len, vm_flags_t vm_flags, unsigned long pgoff)
 {
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma, *prev;
@@ -1282,20 +1295,6 @@ munmap_back:
 		return -ENOMEM;
 
 	/*
-	 * Set 'VM_NORESERVE' if we should not account for the
-	 * memory use of this mapping.
-	 */
-	if ((flags & MAP_NORESERVE)) {
-		/* We honor MAP_NORESERVE if allowed to overcommit */
-		if (sysctl_overcommit_memory != OVERCOMMIT_NEVER)
-			vm_flags |= VM_NORESERVE;
-
-		/* hugetlb applies strict overcommit unless MAP_NORESERVE */
-		if (file && is_file_hugepages(file))
-			vm_flags |= VM_NORESERVE;
-	}
-
-	/*
 	 * Private writable mapping: check memory availability
 	 */
 	if (accountable_mapping(file, vm_flags)) {
-- 
1.7.7.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 8/9] mm: directly use __mlock_vma_pages_range() in find_extend_vma()
  2012-12-21  0:49 [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held Michel Lespinasse
                   ` (6 preceding siblings ...)
  2012-12-21  0:49 ` [PATCH 7/9] mm: remove flags argument to mmap_region Michel Lespinasse
@ 2012-12-21  0:49 ` Michel Lespinasse
  2013-01-03  5:50   ` Rik van Riel
  2012-12-21  0:49 ` [PATCH 9/9] mm: introduce VM_POPULATE flag to better deal with racy userspace programs Michel Lespinasse
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 40+ messages in thread
From: Michel Lespinasse @ 2012-12-21  0:49 UTC (permalink / raw)
  To: Andy Lutomirski, Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel,
	Rik van Riel
  Cc: Andrew Morton, linux-mm, linux-kernel

In find_extend_vma(), we don't need mlock_vma_pages_range() to verify the
vma type - we know we're working with a stack. So, we can call directly
into __mlock_vma_pages_range(), and remove the last make_pages_present()
call site.

Note that we don't use mm_populate() here, so we can't release the mmap_sem
while allocating new stack pages. This is deemed acceptable, because the
stack vmas grow by a bounded number of pages at a time, and these are
anon pages so we don't have to read from disk to populate them.

Signed-off-by: Michel Lespinasse <walken@google.com>

---
 include/linux/mm.h |    1 -
 mm/internal.h      |    4 +-
 mm/memory.c        |   24 ---------------------
 mm/mlock.c         |   57 ++-------------------------------------------------
 mm/mmap.c          |   10 +++-----
 5 files changed, 9 insertions(+), 87 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3b2912f6e91a..d32ace5fba93 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1007,7 +1007,6 @@ static inline int fixup_user_fault(struct task_struct *tsk,
 }
 #endif
 
-extern int make_pages_present(unsigned long addr, unsigned long end);
 extern int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, int len, int write);
 extern int access_remote_vm(struct mm_struct *mm, unsigned long addr,
 		void *buf, int len, int write);
diff --git a/mm/internal.h b/mm/internal.h
index a4fa284f6bc2..e646c46c0d63 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -158,8 +158,8 @@ void __vma_link_list(struct mm_struct *mm, struct vm_area_struct *vma,
 		struct vm_area_struct *prev, struct rb_node *rb_parent);
 
 #ifdef CONFIG_MMU
-extern long mlock_vma_pages_range(struct vm_area_struct *vma,
-			unsigned long start, unsigned long end);
+extern long __mlock_vma_pages_range(struct vm_area_struct *vma,
+		unsigned long start, unsigned long end, int *nonblocking);
 extern void munlock_vma_pages_range(struct vm_area_struct *vma,
 			unsigned long start, unsigned long end);
 static inline void munlock_vma_pages_all(struct vm_area_struct *vma)
diff --git a/mm/memory.c b/mm/memory.c
index 221fc9ffcab1..e4ab66b94bb8 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3629,30 +3629,6 @@ int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address)
 }
 #endif /* __PAGETABLE_PMD_FOLDED */
 
-int make_pages_present(unsigned long addr, unsigned long end)
-{
-	int ret, len, write;
-	struct vm_area_struct * vma;
-
-	vma = find_vma(current->mm, addr);
-	if (!vma)
-		return -ENOMEM;
-	/*
-	 * We want to touch writable mappings with a write fault in order
-	 * to break COW, except for shared mappings because these don't COW
-	 * and we would not want to dirty them for nothing.
-	 */
-	write = (vma->vm_flags & (VM_WRITE | VM_SHARED)) == VM_WRITE;
-	BUG_ON(addr >= end);
-	BUG_ON(end > vma->vm_end);
-	len = DIV_ROUND_UP(end, PAGE_SIZE) - addr/PAGE_SIZE;
-	ret = get_user_pages(current, current->mm, addr,
-			len, write, 0, NULL, NULL);
-	if (ret < 0)
-		return ret;
-	return ret == len ? 0 : -EFAULT;
-}
-
 #if !defined(__HAVE_ARCH_GATE_AREA)
 
 #if defined(AT_SYSINFO_EHDR)
diff --git a/mm/mlock.c b/mm/mlock.c
index 7f94bc3b46ef..ab0cfe21f538 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -155,9 +155,8 @@ void munlock_vma_page(struct page *page)
  *
  * vma->vm_mm->mmap_sem must be held for at least read.
  */
-static long __mlock_vma_pages_range(struct vm_area_struct *vma,
-				    unsigned long start, unsigned long end,
-				    int *nonblocking)
+long __mlock_vma_pages_range(struct vm_area_struct *vma,
+		unsigned long start, unsigned long end, int *nonblocking)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long addr = start;
@@ -202,56 +201,6 @@ static int __mlock_posix_error_return(long retval)
 	return retval;
 }
 
-/**
- * mlock_vma_pages_range() - mlock pages in specified vma range.
- * @vma - the vma containing the specfied address range
- * @start - starting address in @vma to mlock
- * @end   - end address [+1] in @vma to mlock
- *
- * For mmap()/mremap()/expansion of mlocked vma.
- *
- * return 0 on success for "normal" vmas.
- *
- * return number of pages [> 0] to be removed from locked_vm on success
- * of "special" vmas.
- */
-long mlock_vma_pages_range(struct vm_area_struct *vma,
-			unsigned long start, unsigned long end)
-{
-	int nr_pages = (end - start) / PAGE_SIZE;
-	BUG_ON(!(vma->vm_flags & VM_LOCKED));
-
-	/*
-	 * filter unlockable vmas
-	 */
-	if (vma->vm_flags & (VM_IO | VM_PFNMAP))
-		goto no_mlock;
-
-	if (!((vma->vm_flags & VM_DONTEXPAND) ||
-			is_vm_hugetlb_page(vma) ||
-			vma == get_gate_vma(current->mm))) {
-
-		__mlock_vma_pages_range(vma, start, end, NULL);
-
-		/* Hide errors from mmap() and other callers */
-		return 0;
-	}
-
-	/*
-	 * User mapped kernel pages or huge pages:
-	 * make these pages present to populate the ptes, but
-	 * fall thru' to reset VM_LOCKED--no need to unlock, and
-	 * return nr_pages so these don't get counted against task's
-	 * locked limit.  huge pages are already counted against
-	 * locked vm limit.
-	 */
-	make_pages_present(start, end);
-
-no_mlock:
-	vma->vm_flags &= ~VM_LOCKED;	/* and don't come back! */
-	return nr_pages;		/* error or pages NOT mlocked */
-}
-
 /*
  * munlock_vma_pages_range() - munlock all pages in the vma range.'
  * @vma - vma containing range to be munlock()ed.
@@ -303,7 +252,7 @@ void munlock_vma_pages_range(struct vm_area_struct *vma,
  *
  * Filters out "special" vmas -- VM_LOCKED never gets set for these, and
  * munlock is a no-op.  However, for some special vmas, we go ahead and
- * populate the ptes via make_pages_present().
+ * populate the ptes.
  *
  * For vmas that pass the filters, merge/split as appropriate.
  */
diff --git a/mm/mmap.c b/mm/mmap.c
index b0a341e5685f..290d023632e6 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1881,9 +1881,8 @@ find_extend_vma(struct mm_struct *mm, unsigned long addr)
 		return vma;
 	if (!prev || expand_stack(prev, addr))
 		return NULL;
-	if (prev->vm_flags & VM_LOCKED) {
-		mlock_vma_pages_range(prev, addr, prev->vm_end);
-	}
+	if (prev->vm_flags & VM_LOCKED)
+		__mlock_vma_pages_range(prev, addr, prev->vm_end, NULL);
 	return prev;
 }
 #else
@@ -1909,9 +1908,8 @@ find_extend_vma(struct mm_struct * mm, unsigned long addr)
 	start = vma->vm_start;
 	if (expand_stack(vma, addr))
 		return NULL;
-	if (vma->vm_flags & VM_LOCKED) {
-		mlock_vma_pages_range(vma, addr, start);
-	}
+	if (vma->vm_flags & VM_LOCKED)
+		__mlock_vma_pages_range(vma, addr, start, NULL);
 	return vma;
 }
 #endif
-- 
1.7.7.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 9/9] mm: introduce VM_POPULATE flag to better deal with racy userspace programs
  2012-12-21  0:49 [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held Michel Lespinasse
                   ` (7 preceding siblings ...)
  2012-12-21  0:49 ` [PATCH 8/9] mm: directly use __mlock_vma_pages_range() in find_extend_vma() Michel Lespinasse
@ 2012-12-21  0:49 ` Michel Lespinasse
  2013-01-03  6:20   ` Rik van Riel
  2012-12-21 10:46 ` [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held Michel Lespinasse
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 40+ messages in thread
From: Michel Lespinasse @ 2012-12-21  0:49 UTC (permalink / raw)
  To: Andy Lutomirski, Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel,
	Rik van Riel
  Cc: Andrew Morton, linux-mm, linux-kernel

The vm_populate() code populates user mappings without constantly
holding the mmap_sem. This makes it susceptible to racy userspace
programs: the user mappings may change while vm_populate() is running,
and in this case vm_populate() may end up populating the new mapping
instead of the old one.

In order to reduce the possibility of userspace getting surprised by
this behavior, this change introduces the VM_POPULATE vma flag which
gets set on vmas we want vm_populate() to work on. This way
vm_populate() may still end up populating the new mapping after such a
race, but only if the new mapping is also one that the user has
requested (using MAP_SHARED, MAP_LOCKED or mlock) to be populated.

Signed-off-by: Michel Lespinasse <walken@google.com>

---
 include/linux/mm.h   |    1 +
 include/linux/mman.h |    4 +++-
 mm/fremap.c          |   12 ++++++++++--
 mm/mlock.c           |   19 ++++++++++---------
 mm/mmap.c            |    4 +---
 5 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index d32ace5fba93..77311274f0b5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -87,6 +87,7 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_PFNMAP	0x00000400	/* Page-ranges managed without "struct page", just pure PFN */
 #define VM_DENYWRITE	0x00000800	/* ETXTBSY on write attempts.. */
 
+#define VM_POPULATE     0x00001000
 #define VM_LOCKED	0x00002000
 #define VM_IO           0x00004000	/* Memory mapped I/O or similar */
 
diff --git a/include/linux/mman.h b/include/linux/mman.h
index d09dde1e57fb..e4ad758962e5 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -77,6 +77,8 @@ calc_vm_flag_bits(unsigned long flags)
 {
 	return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
 	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
-	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    );
+	       ((flags & MAP_LOCKED) ? (VM_LOCKED | VM_POPULATE) : 0) |
+	       (((flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE) ?
+							VM_POPULATE : 0);
 }
 #endif /* _LINUX_MMAN_H */
diff --git a/mm/fremap.c b/mm/fremap.c
index 503a72387087..0cd4c11488ed 100644
--- a/mm/fremap.c
+++ b/mm/fremap.c
@@ -204,8 +204,10 @@ get_write_lock:
 			unsigned long addr;
 			struct file *file = get_file(vma->vm_file);
 
-			addr = mmap_region(file, start, size,
-					vma->vm_flags, pgoff);
+			vm_flags = vma->vm_flags;
+			if (!(flags & MAP_NONBLOCK))
+				vm_flags |= VM_POPULATE;
+			addr = mmap_region(file, start, size, vm_flags, pgoff);
 			fput(file);
 			if (IS_ERR_VALUE(addr)) {
 				err = addr;
@@ -224,6 +226,12 @@ get_write_lock:
 		mutex_unlock(&mapping->i_mmap_mutex);
 	}
 
+	if (!(flags & MAP_NONBLOCK) && !(vma->vm_flags & VM_POPULATE)) {
+		if (!has_write_lock)
+			goto get_write_lock;
+		vma->vm_flags |= VM_POPULATE;
+	}
+
 	if (vma->vm_flags & VM_LOCKED) {
 		/*
 		 * drop PG_Mlocked flag for over-mapped range
diff --git a/mm/mlock.c b/mm/mlock.c
index ab0cfe21f538..b1647fbd6bce 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -340,9 +340,9 @@ static int do_mlock(unsigned long start, size_t len, int on)
 
 		/* Here we know that  vma->vm_start <= nstart < vma->vm_end. */
 
-		newflags = vma->vm_flags | VM_LOCKED;
-		if (!on)
-			newflags &= ~VM_LOCKED;
+		newflags = vma->vm_flags & ~VM_LOCKED;
+		if (on)
+			newflags |= VM_LOCKED | VM_POPULATE;
 
 		tmp = vma->vm_end;
 		if (tmp > end)
@@ -402,7 +402,8 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors)
 		 * range with the first VMA. Also, skip undesirable VMA types.
 		 */
 		nend = min(end, vma->vm_end);
-		if (vma->vm_flags & (VM_IO | VM_PFNMAP))
+		if ((vma->vm_flags & (VM_IO | VM_PFNMAP | VM_POPULATE)) !=
+		    VM_POPULATE)
 			continue;
 		if (nstart < vma->vm_start)
 			nstart = vma->vm_start;
@@ -475,9 +476,9 @@ static int do_mlockall(int flags)
 	struct vm_area_struct * vma, * prev = NULL;
 	unsigned int def_flags;
 
-	def_flags = current->mm->def_flags & ~VM_LOCKED;
+	def_flags = current->mm->def_flags & ~(VM_LOCKED | VM_POPULATE);
 	if (flags & MCL_FUTURE)
-		def_flags |= VM_LOCKED;
+		def_flags |= (VM_LOCKED | VM_POPULATE);
 	current->mm->def_flags = def_flags;
 	if (flags == MCL_FUTURE)
 		goto out;
@@ -485,9 +486,9 @@ static int do_mlockall(int flags)
 	for (vma = current->mm->mmap; vma ; vma = prev->vm_next) {
 		vm_flags_t newflags;
 
-		newflags = vma->vm_flags | VM_LOCKED;
-		if (!(flags & MCL_CURRENT))
-			newflags &= ~VM_LOCKED;
+		newflags = vma->vm_flags & ~VM_LOCKED;
+		if (flags & MCL_CURRENT)
+			newflags |= VM_LOCKED | VM_POPULATE;
 
 		/* Ignore errors */
 		mlock_fixup(vma, &prev, vma->vm_start, vma->vm_end, newflags);
diff --git a/mm/mmap.c b/mm/mmap.c
index 290d023632e6..27f98850fa8c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1153,9 +1153,7 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
 	}
 
 	addr = mmap_region(file, addr, len, vm_flags, pgoff);
-	if (!IS_ERR_VALUE(addr) &&
-	    ((vm_flags & VM_LOCKED) ||
-	     (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE))
+	if (!IS_ERR_VALUE(addr) && (vm_flags & VM_POPULATE))
 		*populate = true;
 	return addr;
 }
-- 
1.7.7.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held
  2012-12-21  0:49 [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held Michel Lespinasse
                   ` (8 preceding siblings ...)
  2012-12-21  0:49 ` [PATCH 9/9] mm: introduce VM_POPULATE flag to better deal with racy userspace programs Michel Lespinasse
@ 2012-12-21 10:46 ` Michel Lespinasse
  2012-12-22 15:02   ` Greg Ungerer
  2013-01-23 13:37   ` Greg Ungerer
  2012-12-22  0:36 ` Andy Lutomirski
  2013-01-04 18:16 ` Andy Lutomirski
  11 siblings, 2 replies; 40+ messages in thread
From: Michel Lespinasse @ 2012-12-21 10:46 UTC (permalink / raw)
  To: Greg Ungerer
  Cc: Andy Lutomirski, Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel,
	Rik van Riel, Andrew Morton, linux-mm, linux-kernel

Hi Greg,

I wanted to ask if you could check the sanity of the following patches
in nommu configurations. My understanding is that these always
populate mappings when they are created, so that MAP_POPULATE and
MAP_LOCKED are actually no-ops. Is this an accurate description ?

Thanks,

On Thu, Dec 20, 2012 at 4:49 PM, Michel Lespinasse <walken@google.com> wrote:
> We have many vma manipulation functions that are fast in the typical case,
> but can optionally be instructed to populate an unbounded number of ptes
> within the region they work on:
> - mmap with MAP_POPULATE or MAP_LOCKED flags;
> - remap_file_pages() with MAP_NONBLOCK not set or when working on a
>   VM_LOCKED vma;
> - mmap_region() and all its wrappers when mlock(MCL_FUTURE) is in effect;
> - brk() when mlock(MCL_FUTURE) is in effect.
>
> Current code handles these pte operations locally, while the sourrounding
> code has to hold the mmap_sem write side since it's manipulating vmas.
> This means we're doing an unbounded amount of pte population work with
> mmap_sem held, and this causes problems as Andy Lutomirski reported
> (we've hit this at Google as well, though it's not entirely clear why
> people keep trying to use mlock(MCL_FUTURE) in the first place).
>
> I propose introducing a new mm_populate() function to do this pte
> population work after the mmap_sem has been released. mm_populate()
> does need to acquire the mmap_sem read side, but critically, it
> doesn't need to hold continuously for the entire duration of the
> operation - it can drop it whenever things take too long (such as when
> hitting disk for a file read) and re-acquire it later on.
>
> The following patches are against v3.7:
>
> - Patches 1-2 fix some issues I noticed while working on the existing code.
>   If needed, they could potentially go in before the rest of the patches.
>
> - Patch 3 introduces the new mm_populate() function and changes
>   mmap_region() call sites to use it after they drop mmap_sem. This is
>   inspired from Andy Lutomirski's proposal and is built as an extension
>   of the work I had previously done for mlock() and mlockall() around
>   v2.6.38-rc1. I had tried doing something similar at the time but had
>   given up as there were so many do_mmap() call sites; the recent cleanups
>   by Linus and Viro are a tremendous help here.
>
> - Patches 4-6 convert some of the less-obvious places doing unbounded
>   pte populates to the new mm_populate() mechanism.
>
> - Patches 7-8 are code cleanups that are made possible by the
>   mm_populate() work. In particular, they remove more code than the
>   entire patch series added, which should be a good thing :)
>
> - Patch 9 is optional to this entire series. It only helps to deal more
>   nicely with racy userspace programs that might modify their mappings
>   while we're trying to populate them. It adds a new VM_POPULATE flag
>   on the mappings we do want to populate, so that if userspace replaces
>   them with mappings it doesn't want populated, mm_populate() won't
>   populate those replacement mappings.
>
> Michel Lespinasse (9):
>   mm: make mlockall preserve flags other than VM_LOCKED in def_flags
>   mm: remap_file_pages() fixes
>   mm: introduce mm_populate() for populating new vmas
>   mm: use mm_populate() for blocking remap_file_pages()
>   mm: use mm_populate() when adjusting brk with MCL_FUTURE in effect.
>   mm: use mm_populate() for mremap() of VM_LOCKED vmas
>   mm: remove flags argument to mmap_region
>   mm: directly use __mlock_vma_pages_range() in find_extend_vma()
>   mm: introduce VM_POPULATE flag to better deal with racy userspace programs
>
>  arch/tile/mm/elf.c   |    1 -
>  fs/aio.c             |    6 +++-
>  include/linux/mm.h   |   23 +++++++++---
>  include/linux/mman.h |    4 ++-
>  ipc/shm.c            |   12 ++++---
>  mm/fremap.c          |   51 ++++++++++++++-------------
>  mm/internal.h        |    4 +-
>  mm/memory.c          |   24 -------------
>  mm/mlock.c           |   94 +++++++++++++------------------------------------
>  mm/mmap.c            |   77 ++++++++++++++++++++++++----------------
>  mm/mremap.c          |   25 +++++++------
>  mm/nommu.c           |    5 ++-
>  mm/util.c            |    6 +++-
>  13 files changed, 154 insertions(+), 178 deletions(-)
>
> --
> 1.7.7.3



-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held
  2012-12-21  0:49 [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held Michel Lespinasse
                   ` (9 preceding siblings ...)
  2012-12-21 10:46 ` [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held Michel Lespinasse
@ 2012-12-22  0:36 ` Andy Lutomirski
  2012-12-22  0:59   ` Michel Lespinasse
  2013-01-04 18:16 ` Andy Lutomirski
  11 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2012-12-22  0:36 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel

On Thu, Dec 20, 2012 at 4:49 PM, Michel Lespinasse <walken@google.com> wrote:
> We have many vma manipulation functions that are fast in the typical case,
> but can optionally be instructed to populate an unbounded number of ptes
> within the region they work on:
> - mmap with MAP_POPULATE or MAP_LOCKED flags;
> - remap_file_pages() with MAP_NONBLOCK not set or when working on a
>   VM_LOCKED vma;
> - mmap_region() and all its wrappers when mlock(MCL_FUTURE) is in effect;
> - brk() when mlock(MCL_FUTURE) is in effect.
>

Something's buggy here.  My evil test case is stuck with lots of
threads spinning at 100% system time.  Stack traces look like:

[<0000000000000000>] __mlock_vma_pages_range+0x66/0x70
[<0000000000000000>] __mm_populate+0xf9/0x150
[<0000000000000000>] vm_mmap_pgoff+0x9f/0xc0
[<0000000000000000>] sys_mmap_pgoff+0x7e/0x150
[<0000000000000000>] sys_mmap+0x22/0x30
[<0000000000000000>] system_call_fastpath+0x16/0x1b
[<0000000000000000>] 0xffffffffffffffff

perf top says:

 38.45%  [kernel]            [k] __mlock_vma_pages_range
 33.04%  [kernel]            [k] __get_user_pages
 28.18%  [kernel]            [k] __mm_populate

The tasks in question use MCL_FUTURE but not MAP_POPULATE.  These
tasks are immune to SIGKILL.

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held
  2012-12-22  0:36 ` Andy Lutomirski
@ 2012-12-22  0:59   ` Michel Lespinasse
  2012-12-22  1:09     ` Andy Lutomirski
  0 siblings, 1 reply; 40+ messages in thread
From: Michel Lespinasse @ 2012-12-22  0:59 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel

On Fri, Dec 21, 2012 at 4:36 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Thu, Dec 20, 2012 at 4:49 PM, Michel Lespinasse <walken@google.com> wrote:
>> We have many vma manipulation functions that are fast in the typical case,
>> but can optionally be instructed to populate an unbounded number of ptes
>> within the region they work on:
>> - mmap with MAP_POPULATE or MAP_LOCKED flags;
>> - remap_file_pages() with MAP_NONBLOCK not set or when working on a
>>   VM_LOCKED vma;
>> - mmap_region() and all its wrappers when mlock(MCL_FUTURE) is in effect;
>> - brk() when mlock(MCL_FUTURE) is in effect.
>>
>
> Something's buggy here.  My evil test case is stuck with lots of
> threads spinning at 100% system time.  Stack traces look like:
>
> [<0000000000000000>] __mlock_vma_pages_range+0x66/0x70
> [<0000000000000000>] __mm_populate+0xf9/0x150
> [<0000000000000000>] vm_mmap_pgoff+0x9f/0xc0
> [<0000000000000000>] sys_mmap_pgoff+0x7e/0x150
> [<0000000000000000>] sys_mmap+0x22/0x30
> [<0000000000000000>] system_call_fastpath+0x16/0x1b
> [<0000000000000000>] 0xffffffffffffffff
>
> perf top says:
>
>  38.45%  [kernel]            [k] __mlock_vma_pages_range
>  33.04%  [kernel]            [k] __get_user_pages
>  28.18%  [kernel]            [k] __mm_populate
>
> The tasks in question use MCL_FUTURE but not MAP_POPULATE.  These
> tasks are immune to SIGKILL.

Looking into it.

There seems to be a problem with mlockall - the following program
fails in an unkillable way even before my changes:

#include <sys/mman.h>
#include <stdio.h>
#include <stdint.h>

int main(void) {
  void *p = mmap(NULL, 0x100000000000,
                 PROT_READ | PROT_WRITE,
                 MAP_PRIVATE | MAP_ANON | MAP_NORESERVE,
                 -1, 0);
  printf("p: %p\n", p);
  mlockall(MCL_CURRENT);
  return 0;
}

I think my changes propagate this existing problem so it now shows up
in more places :/

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held
  2012-12-22  0:59   ` Michel Lespinasse
@ 2012-12-22  1:09     ` Andy Lutomirski
  2012-12-22  1:59       ` Michel Lespinasse
  0 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2012-12-22  1:09 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel

On Fri, Dec 21, 2012 at 4:59 PM, Michel Lespinasse <walken@google.com> wrote:
> On Fri, Dec 21, 2012 at 4:36 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Thu, Dec 20, 2012 at 4:49 PM, Michel Lespinasse <walken@google.com> wrote:
>>> We have many vma manipulation functions that are fast in the typical case,
>>> but can optionally be instructed to populate an unbounded number of ptes
>>> within the region they work on:
>>> - mmap with MAP_POPULATE or MAP_LOCKED flags;
>>> - remap_file_pages() with MAP_NONBLOCK not set or when working on a
>>>   VM_LOCKED vma;
>>> - mmap_region() and all its wrappers when mlock(MCL_FUTURE) is in effect;
>>> - brk() when mlock(MCL_FUTURE) is in effect.
>>>
>>
>> Something's buggy here.  My evil test case is stuck with lots of
>> threads spinning at 100% system time.  Stack traces look like:
>>
>> [<0000000000000000>] __mlock_vma_pages_range+0x66/0x70
>> [<0000000000000000>] __mm_populate+0xf9/0x150
>> [<0000000000000000>] vm_mmap_pgoff+0x9f/0xc0
>> [<0000000000000000>] sys_mmap_pgoff+0x7e/0x150
>> [<0000000000000000>] sys_mmap+0x22/0x30
>> [<0000000000000000>] system_call_fastpath+0x16/0x1b
>> [<0000000000000000>] 0xffffffffffffffff
>>
>> perf top says:
>>
>>  38.45%  [kernel]            [k] __mlock_vma_pages_range
>>  33.04%  [kernel]            [k] __get_user_pages
>>  28.18%  [kernel]            [k] __mm_populate
>>
>> The tasks in question use MCL_FUTURE but not MAP_POPULATE.  These
>> tasks are immune to SIGKILL.
>
> Looking into it.
>
> There seems to be a problem with mlockall - the following program
> fails in an unkillable way even before my changes:
>
> #include <sys/mman.h>
> #include <stdio.h>
> #include <stdint.h>
>
> int main(void) {
>   void *p = mmap(NULL, 0x100000000000,
>                  PROT_READ | PROT_WRITE,
>                  MAP_PRIVATE | MAP_ANON | MAP_NORESERVE,
>                  -1, 0);
>   printf("p: %p\n", p);
>   mlockall(MCL_CURRENT);
>   return 0;
> }
>
> I think my changes propagate this existing problem so it now shows up
> in more places :/

Hmm.  I'm using MCL_FUTURE with MAP_NORESERVE, but those mappings are
not insanely large.  Should MAP_NORESERVE would negate MCL_FUTURE?
I'm doing MAP_NORESERVE, PROT_NONE to prevent pages from being
allocated in the future -- I have no intention of ever using them.

The other odd thing I do is use MAP_FIXED to replace MAP_NORESERVE pages.

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held
  2012-12-22  1:09     ` Andy Lutomirski
@ 2012-12-22  1:59       ` Michel Lespinasse
  2012-12-22  2:16         ` Andy Lutomirski
  0 siblings, 1 reply; 40+ messages in thread
From: Michel Lespinasse @ 2012-12-22  1:59 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel

On Fri, Dec 21, 2012 at 5:09 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Dec 21, 2012 at 4:59 PM, Michel Lespinasse <walken@google.com> wrote:
>> On Fri, Dec 21, 2012 at 4:36 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> Something's buggy here.  My evil test case is stuck with lots of
>>> threads spinning at 100% system time.
>>>
>>> The tasks in question use MCL_FUTURE but not MAP_POPULATE.  These
>>> tasks are immune to SIGKILL.
>>
>> Looking into it.
>>
>> There seems to be a problem with mlockall - the following program
>> fails in an unkillable way even before my changes:
>>
>> #include <sys/mman.h>
>> #include <stdio.h>
>> #include <stdint.h>
>>
>> int main(void) {
>>   void *p = mmap(NULL, 0x100000000000,
>>                  PROT_READ | PROT_WRITE,
>>                  MAP_PRIVATE | MAP_ANON | MAP_NORESERVE,
>>                  -1, 0);
>>   printf("p: %p\n", p);
>>   mlockall(MCL_CURRENT);
>>   return 0;
>> }
>>
>> I think my changes propagate this existing problem so it now shows up
>> in more places :/

So in my test case, the issue was caused by the mapping being 2^32
pages, which overflowed the integer 'nr_pages' argument to
__get_user_pages, which caused an infinite loop as __get_user_pages()
would return 0 so __mm_populate() would make no progress.

When dropping one zero from that humongous size in the test case, the
test case becomes at least killable.

> Hmm.  I'm using MCL_FUTURE with MAP_NORESERVE, but those mappings are
> not insanely large.  Should MAP_NORESERVE would negate MCL_FUTURE?
> I'm doing MAP_NORESERVE, PROT_NONE to prevent pages from being
> allocated in the future -- I have no intention of ever using them.

MAP_NORESERVE doesn't prevent page allocation, but PROT_NONE does
(precisely because people use it the same way as you do :)

> The other odd thing I do is use MAP_FIXED to replace MAP_NORESERVE pages.
Yes, I've seen people do that here too.

Could you share your test case so I can try reproducing the issue
you're seeing ?

Thanks,

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held
  2012-12-22  1:59       ` Michel Lespinasse
@ 2012-12-22  2:16         ` Andy Lutomirski
  2012-12-22  9:37           ` Michel Lespinasse
  0 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2012-12-22  2:16 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel

On Fri, Dec 21, 2012 at 5:59 PM, Michel Lespinasse <walken@google.com> wrote:
> On Fri, Dec 21, 2012 at 5:09 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Fri, Dec 21, 2012 at 4:59 PM, Michel Lespinasse <walken@google.com> wrote:
>>> On Fri, Dec 21, 2012 at 4:36 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>> Something's buggy here.  My evil test case is stuck with lots of
>>>> threads spinning at 100% system time.
>>>>
>>>> The tasks in question use MCL_FUTURE but not MAP_POPULATE.  These
>>>> tasks are immune to SIGKILL.
>>>
>>> Looking into it.
>>>
>>> There seems to be a problem with mlockall - the following program
>>> fails in an unkillable way even before my changes:
>>>
>>> #include <sys/mman.h>
>>> #include <stdio.h>
>>> #include <stdint.h>
>>>
>>> int main(void) {
>>>   void *p = mmap(NULL, 0x100000000000,
>>>                  PROT_READ | PROT_WRITE,
>>>                  MAP_PRIVATE | MAP_ANON | MAP_NORESERVE,
>>>                  -1, 0);
>>>   printf("p: %p\n", p);
>>>   mlockall(MCL_CURRENT);
>>>   return 0;
>>> }
>>>
>>> I think my changes propagate this existing problem so it now shows up
>>> in more places :/
>
> So in my test case, the issue was caused by the mapping being 2^32
> pages, which overflowed the integer 'nr_pages' argument to
> __get_user_pages, which caused an infinite loop as __get_user_pages()
> would return 0 so __mm_populate() would make no progress.
>
> When dropping one zero from that humongous size in the test case, the
> test case becomes at least killable.
>
>> Hmm.  I'm using MCL_FUTURE with MAP_NORESERVE, but those mappings are
>> not insanely large.  Should MAP_NORESERVE would negate MCL_FUTURE?
>> I'm doing MAP_NORESERVE, PROT_NONE to prevent pages from being
>> allocated in the future -- I have no intention of ever using them.
>
> MAP_NORESERVE doesn't prevent page allocation, but PROT_NONE does
> (precisely because people use it the same way as you do :)
>
>> The other odd thing I do is use MAP_FIXED to replace MAP_NORESERVE pages.
> Yes, I've seen people do that here too.
>
> Could you share your test case so I can try reproducing the issue
> you're seeing ?

Not so easy.  My test case is a large chunk of a high-frequency
trading system :)

I just tried it again.  Not I have a task stuck in
mlockall(MCL_CURRENT|MCL_FUTURE).  The stack is:

[<0000000000000000>] flush_work+0x1c2/0x280
[<0000000000000000>] schedule_on_each_cpu+0xe3/0x130
[<0000000000000000>] lru_add_drain_all+0x15/0x20
[<0000000000000000>] sys_mlockall+0x125/0x1a0
[<0000000000000000>] tracesys+0xd0/0xd5
[<0000000000000000>] 0xffffffffffffffff

The sequence of mmap and munmap calls, according to strace, is:

6084  mmap(NULL, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0e4000
6084  mmap(NULL, 8388744, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f55096c3000
6084  mmap(0x7f5509ec2000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3ff000) = 0x7f5509ec2000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0e3000
6084  mmap(NULL, 2413688, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5509475000
6084  mmap(0x7f550969c000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x27000) = 0x7f550969c000
6084  mmap(0x7f550969e000, 148600, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f550969e000
6084  mmap(NULL, 12636304, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5508867000
6084  mmap(0x7f5509420000, 327680, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x7b9000) = 0x7f5509420000
6084  mmap(0x7f5509470000, 16528, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5509470000
6084  mmap(NULL, 8409224, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5508061000
6084  mmap(0x7f550885b000, 36864, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3fa000) = 0x7f550885b000
6084  mmap(0x7f5508864000, 8328, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5508864000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5508060000
6084  mmap(NULL, 8404144, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f550785c000
6084  mmap(0x7f5508054000, 45056, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3f8000) = 0x7f5508054000
6084  mmap(0x7f550805f000, 3248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f550805f000
6084  mmap(NULL, 8390584, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f550705b000
6084  mmap(0x7f5507859000, 12288, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3fe000) = 0x7f5507859000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550705a000
6084  mmap(NULL, 8393296, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5506858000
6084  mmap(0x7f5507055000, 16384, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3fd000) = 0x7f5507055000
6084  mmap(0x7f5507059000, 592, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5507059000
6084  mmap(NULL, 8393224, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5506056000
6084  mmap(0x7f5506855000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3ff000) = 0x7f5506855000
6084  mmap(0x7f5506857000, 520, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5506857000
6084  mmap(NULL, 46777, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f550604a000
6084  mmap(NULL, 2109672, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5505e46000
6084  mmap(0x7f5506048000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f5506048000
6084  mmap(NULL, 3150176, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5505b44000
6084  mmap(0x7f5505e3e000, 28672, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xfa000) = 0x7f5505e3e000
6084  mmap(0x7f5505e45000, 352, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5505e45000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0e2000
6084  mmap(NULL, 2217600, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5505926000
6084  mmap(0x7f5505b42000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c000) = 0x7f5505b42000
6084  mmap(NULL, 2509488, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f55056c1000
6084  mmap(0x7f5505922000, 16384, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x61000) = 0x7f5505922000
6084  mmap(NULL, 3073904, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f55053d2000
6084  mmap(0x7f55056bc000, 20480, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xea000) = 0x7f55056bc000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0e1000
6084  mmap(NULL, 3517816, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5505077000
6084  mmap(0x7f55053c7000, 40960, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x150000) = 0x7f55053c7000
6084  mmap(0x7f55053d1000, 3448, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f55053d1000
6084  mmap(NULL, 3142544, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5504d77000
6084  mmap(0x7f5505058000, 40960, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xe1000) = 0x7f5505058000
6084  mmap(0x7f5505062000, 82832, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5505062000
6084  mmap(NULL, 3125544, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5504a7b000
6084  mmap(0x7f5504d75000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xfa000) = 0x7f5504d75000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0e0000
6084  mmap(NULL, 2184216, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5504865000
6084  mmap(0x7f5504a79000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14000) = 0x7f5504a79000
6084  mmap(NULL, 2212904, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5504648000
6084  mmap(0x7f550485f000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17000) = 0x7f550485f000
6084  mmap(0x7f5504861000, 13352, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5504861000
6084  mmap(NULL, 3925208, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5504289000
6084  mmap(0x7f550463d000, 24576, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b4000) = 0x7f550463d000
6084  mmap(0x7f5504643000, 17624, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5504643000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0df000
6084  mmap(NULL, 8392040, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5503a88000
6084  mmap(0x7f5504287000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3ff000) = 0x7f5504287000
6084  mmap(NULL, 2196528, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f550386f000
6084  mmap(0x7f5503a85000, 12288, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16000) = 0x7f5503a85000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0de000
6084  mmap(NULL, 2708032, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f55035d9000
6084  mmap(0x7f5503867000, 32768, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x8e000) = 0x7f5503867000
6084  mmap(NULL, 2408864, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f550338c000
6084  mmap(0x7f55035d7000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4b000) = 0x7f55035d7000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0dd000
6084  mmap(NULL, 8388656, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5502b8b000
6084  mmap(0x7f550338a000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3ff000) = 0x7f550338a000
6084  mmap(NULL, 2518464, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5502924000
6084  mmap(0x7f5502b82000, 32768, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x5e000) = 0x7f5502b82000
6084  mmap(0x7f5502b8a000, 3520, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5502b8a000
6084  mmap(NULL, 2162064, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5502714000
6084  mmap(0x7f5502922000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xe000) = 0x7f5502922000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5502713000
6084  mmap(NULL, 4454088, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f55022d3000
6084  mmap(0x7f5502701000, 69632, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x22e000) = 0x7f5502701000
6084  mmap(0x7f5502712000, 1736, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5502712000
6084  mmap(NULL, 2249496, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f55020ad000
6084  mmap(0x7f55022be000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x11000) = 0x7f55022be000
6084  mmap(0x7f55022c0000, 74520, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f55022c0000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f55020ac000
6084  mmap(NULL, 2232416, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5501e8a000
6084  mmap(0x7f55020aa000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x20000) = 0x7f55020aa000
6084  mmap(NULL, 2113944, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5501c85000
6084  mmap(0x7f5501e88000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x7f5501e88000
6084  mmap(NULL, 2128984, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5501a7d000
6084  mmap(0x7f5501c83000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7f5501c83000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0dc000
6084  mmap(NULL, 2109704, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5501879000
6084  mmap(0x7f5501a7b000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f5501a7b000
6084  mmap(NULL, 2366184, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5501637000
6084  mmap(0x7f5501870000, 32768, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x39000) = 0x7f5501870000
6084  mmap(0x7f5501878000, 2792, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5501878000
6084  mmap(NULL, 3578368, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f55012cd000
6084  mmap(0x7f5501622000, 69632, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x155000) = 0x7f5501622000
6084  mmap(0x7f5501633000, 14848, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5501633000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0db000
6084  mmap(NULL, 3964496, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5500f05000
6084  mmap(0x7f55012bf000, 57344, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ba000) = 0x7f55012bf000
6084  mmap(NULL, 2187824, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5500cee000
6084  mmap(0x7f5500f03000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15000) = 0x7f5500f03000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0da000
6084  mmap(NULL, 2469936, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5500a92000
6084  mmap(0x7f5500ce4000, 36864, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x52000) = 0x7f5500ce4000
6084  mmap(0x7f5500ced000, 48, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5500ced000
6084  mmap(NULL, 2134448, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5500888000
6084  mmap(0x7f5500a90000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x8000) = 0x7f5500a90000
6084  mmap(NULL, 2205832, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f550066d000
6084  mmap(0x7f5500886000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x19000) = 0x7f5500886000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0d9000
6084  mmap(NULL, 2864104, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f55003b1000
6084  mmap(0x7f5500665000, 28672, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xb4000) = 0x7f5500665000
6084  mmap(0x7f550066c000, 1000, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f550066c000
6084  mmap(NULL, 2255936, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f550018a000
6084  mmap(0x7f55003ac000, 20480, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x22000) = 0x7f55003ac000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5500189000
6084  mmap(NULL, 20377632, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f54fee19000
6084  mmap(0x7f5500187000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x116e000) = 0x7f5500187000
6084  mmap(NULL, 3963320, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f54fea51000
6084  mmap(0x7f54fedef000, 155648, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x19e000) = 0x7f54fedef000
6084  mmap(0x7f54fee15000, 14776, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f54fee15000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f54fea50000
6084  mmap(NULL, 2612480, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f54fe7d2000
6084  mmap(0x7f54fea4c000, 16384, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x7a000) = 0x7f54fea4c000
6084  mmap(NULL, 2302408, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f54fe59f000
6084  mmap(0x7f54fe7d0000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x31000) = 0x7f54fe7d0000
6084  mmap(NULL, 2125992, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f54fe397000
6084  mmap(0x7f54fe59d000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7f54fe59d000
6084  mmap(NULL, 2348544, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f54fe159000
6084  mmap(0x7f54fe394000, 12288, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3b000) = 0x7f54fe394000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0d8000
6084  mmap(NULL, 2163496, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f54fdf48000
6084  mmap(0x7f54fe157000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xf000) = 0x7f54fe157000
6084  mmap(NULL, 2168048, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f54fdd36000
6084  mmap(0x7f54fdf46000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x10000) = 0x7f54fdf46000
6084  mmap(NULL, 2109528, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f54fdb32000
6084  mmap(0x7f54fdd34000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f54fdd34000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0d7000
6084  mmap(NULL, 2940512, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f54fd864000
6084  mmap(0x7f54fdb27000, 45056, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xc3000) = 0x7f54fdb27000
6084  mmap(NULL, 2256944, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f54fd63c000
6084  mmap(0x7f54fd861000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x7f54fd861000
6084  mmap(0x7f54fd863000, 48, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f54fd863000
6084  mmap(NULL, 2109864, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f54fd438000
6084  mmap(0x7f54fd63a000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f54fd63a000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0d6000
6084  mmap(NULL, 2126576, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f54fd230000
6084  mmap(0x7f54fd436000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7f54fd436000
6084  mmap(NULL, 2109536, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f54fd02c000
6084  mmap(0x7f54fd22e000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f54fd22e000
6084  mmap(NULL, 2210424, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f54fce10000
6084  mmap(0x7f54fd028000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18000) = 0x7f54fd028000
6084  mmap(0x7f54fd02a000, 6776, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f54fd02a000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0d5000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0d4000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0d3000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0d2000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0d1000
6084  mmap(NULL, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0cf000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0ce000
6084  mmap(NULL, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0cc000
6084  munmap(0x7f550604a000, 46777)     = 0
6084  mmap(NULL, 139264, PROT_NONE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f550a0aa000
6084  mmap(0x7f550a0ab000, 131072, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f550a0ab000
6084  mmap(NULL, 139264, PROT_NONE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f550a088000
6084  mmap(0x7f550a089000, 131072, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f550a089000
6084  mmap(NULL, 139264, PROT_NONE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f550a066000
6084  mmap(0x7f550a067000, 131072, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f550a067000
6084  mmap(NULL, 548864, PROT_NONE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f5509fe0000
6084  mmap(0x7f5509fe1000, 540672, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5509fe1000
6084  mmap(NULL, 40960, PROT_NONE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f5509fd6000
6084  mmap(0x7f5509fd7000, 32768, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5509fd7000
6084  mmap(NULL, 139264, PROT_NONE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f5509fb4000
6084  mmap(0x7f5509fb5000, 131072, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5509fb5000
6084  mmap(NULL, 4198400, PROT_NONE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f54fca0f000
6084  mmap(0x7f54fcc00000, 2097152, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS|0x40000, -1, 0) = 0x7f54fcc00000
6084  munmap(0x7f54fca0f000, 2031616)   = 0
6084  munmap(0x7f54fce01000, 61440)     = 0
6084  mmap(NULL, 40960, PROT_NONE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f5509faa000
6084  mmap(0x7f5509fab000, 32768, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5509fab000
6084  mmap(NULL, 24576, PROT_NONE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f5509fa4000
6084  mmap(0x7f5509fa5000, 16384, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5509fa5000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5509fa3000
6084  munmap(0x7f5509fa3000, 4096)      = 0
6084  mmap(NULL, 2919792, PROT_READ, MAP_PRIVATE, 4, 0) = 0x7f54fc936000
6084  mmap(NULL, 26258, PROT_READ, MAP_SHARED, 4, 0) = 0x7f5509f9d000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5509f9c000
6084  munmap(0x7f5509f9c000, 4096)      = 0
6084  mlockall(MCL_CURRENT|MCL_FUTURE

This task is unkillable.  Two other tasks are stuck spinning.

Note the line with 0x40000, which is MAP_HUGETLB.  Could that be the problem?

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/9] mm: make mlockall preserve flags other than VM_LOCKED in def_flags
  2012-12-21  0:49 ` [PATCH 1/9] mm: make mlockall preserve flags other than VM_LOCKED in def_flags Michel Lespinasse
@ 2012-12-22  4:25   ` Rik van Riel
  0 siblings, 0 replies; 40+ messages in thread
From: Rik van Riel @ 2012-12-22  4:25 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Andy Lutomirski, Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel,
	Andrew Morton, linux-mm, linux-kernel

On 12/20/2012 07:49 PM, Michel Lespinasse wrote:
> On most architectures, def_flags is either 0 or VM_LOCKED depending on
> whether mlockall(MCL_FUTURE) was called. However, this is not an absolute
> rule as kvm support on s390 may set the VM_NOHUGEPAGE flag in def_flags.
> We don't want mlockall to clear that.
>
> Signed-off-by: Michel Lespinasse <walken@google.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held
  2012-12-22  2:16         ` Andy Lutomirski
@ 2012-12-22  9:37           ` Michel Lespinasse
  2012-12-22  9:45             ` [PATCH 10/9] mm: make do_mmap_pgoff return populate as a size in bytes, not as a bool Michel Lespinasse
  2013-01-03 18:56             ` [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held Andy Lutomirski
  0 siblings, 2 replies; 40+ messages in thread
From: Michel Lespinasse @ 2012-12-22  9:37 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel

On Fri, Dec 21, 2012 at 6:16 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Dec 21, 2012 at 5:59 PM, Michel Lespinasse <walken@google.com> wrote:
>> Could you share your test case so I can try reproducing the issue
>> you're seeing ?
>
> Not so easy.  My test case is a large chunk of a high-frequency
> trading system :)

Huh, its probably better if I don't see it then :)

> I just tried it again.  Not I have a task stuck in
> mlockall(MCL_CURRENT|MCL_FUTURE).  The stack is:
>
> [<0000000000000000>] flush_work+0x1c2/0x280
> [<0000000000000000>] schedule_on_each_cpu+0xe3/0x130
> [<0000000000000000>] lru_add_drain_all+0x15/0x20
> [<0000000000000000>] sys_mlockall+0x125/0x1a0
> [<0000000000000000>] tracesys+0xd0/0xd5
> [<0000000000000000>] 0xffffffffffffffff
>
> The sequence of mmap and munmap calls, according to strace, is:
>
[...]
> 6084  mmap(0x7f54fd02a000, 6776, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f54fd02a000

So I noticed you use mmap with a size that is not a multiple of
PAGE_SIZE. This is perfectly legal, but I hadn't tested that case, and
lo and behold, it's something I got wrong. Patch to be sent as a reply
to this. Without this patch, vm_populate() will show a debug message
if you have CONFIG_DEBUG_VM set, and likely spin in an infinite loop
if you don't.

> 6084  mmap(NULL, 26258, PROT_READ, MAP_SHARED, 4, 0) = 0x7f5509f9d000
> 6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5509f9c000
> 6084  munmap(0x7f5509f9c000, 4096)      = 0
> 6084  mlockall(MCL_CURRENT|MCL_FUTURE
>
> This task is unkillable.  Two other tasks are stuck spinning.

Now I'm confused, because:

1- your trace shows the hang occurs during mlockall(), and this code
really wasn't touched much in my series (besides renaming
do_mlock_pages into __mm_populate())

2- the backtrace above showed sys_mlockall() -> lru_add_drain_all(),
which is the very beginning of mlockall(), before anything of
importance happens (and in particular, before the MCL_FUTURE flag
takes action). So, I'm going to assume that it's one of the other
spinning threads that is breaking things. If one of the spinning
threads got stuck within vm_populate(), this could even be explained
by the bug I mentioned above.

Could you check if the fix I'm going to send as a reply to this works
for you, and if not, where the two spinning threads are being stuck ?

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 10/9] mm: make do_mmap_pgoff return populate as a size in bytes, not as a bool
  2012-12-22  9:37           ` Michel Lespinasse
@ 2012-12-22  9:45             ` Michel Lespinasse
  2013-01-03  6:21               ` Rik van Riel
  2013-01-03 18:54               ` Andy Lutomirski
  2013-01-03 18:56             ` [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held Andy Lutomirski
  1 sibling, 2 replies; 40+ messages in thread
From: Michel Lespinasse @ 2012-12-22  9:45 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel

do_mmap_pgoff() rounds up the desired size to the next PAGE_SIZE multiple,
however there was no equivalent code in mm_populate(), which caused issues.

This could be fixed by introduced the same rounding in mm_populate(),
however I think it's preferable to make do_mmap_pgoff() return populate
as a size rather than as a boolean, so we don't have to duplicate the
size rounding logic in mm_populate().

Signed-off-by: Michel Lespinasse <walken@google.com>

---
 fs/aio.c           |    5 ++---
 include/linux/mm.h |    2 +-
 ipc/shm.c          |    4 ++--
 mm/mmap.c          |    6 +++---
 mm/nommu.c         |    4 ++--
 mm/util.c          |    6 +++---
 6 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 82eec7c7b4bb..064bfbe37566 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -101,9 +101,8 @@ static int aio_setup_ring(struct kioctx *ctx)
 	struct aio_ring *ring;
 	struct aio_ring_info *info = &ctx->ring_info;
 	unsigned nr_events = ctx->max_reqs;
-	unsigned long size;
+	unsigned long size, populate;
 	int nr_pages;
-	bool populate;
 
 	/* Compensate for the ring buffer's head/tail overlap entry */
 	nr_events += 2;	/* 1 is required, 2 for good luck */
@@ -150,7 +149,7 @@ static int aio_setup_ring(struct kioctx *ctx)
 		return -EAGAIN;
 	}
 	if (populate)
-		mm_populate(info->mmap_base, info->mmap_size);
+		mm_populate(info->mmap_base, populate);
 
 	ctx->user_id = info->mmap_base;
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 77311274f0b5..5fe5b86908ab 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1445,7 +1445,7 @@ extern unsigned long mmap_region(struct file *file, unsigned long addr,
 	unsigned long len, vm_flags_t vm_flags, unsigned long pgoff);
 extern unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot, unsigned long flags,
-	unsigned long pgoff, bool *populate);
+	unsigned long pgoff, unsigned long *populate);
 extern int do_munmap(struct mm_struct *, unsigned long, size_t);
 
 #ifdef CONFIG_MMU
diff --git a/ipc/shm.c b/ipc/shm.c
index ee2dde1f94d1..2171b3d43943 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -970,7 +970,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, ulong *raddr,
 	struct shm_file_data *sfd;
 	struct path path;
 	fmode_t f_mode;
-	bool populate = false;
+	unsigned long populate = 0;
 
 	err = -EINVAL;
 	if (shmid < 0)
@@ -1077,7 +1077,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, ulong *raddr,
 invalid:
 	up_write(&current->mm->mmap_sem);
 	if (populate)
-		mm_populate(addr, size);
+		mm_populate(addr, populate);
 
 out_fput:
 	fput(file);
diff --git a/mm/mmap.c b/mm/mmap.c
index 27f98850fa8c..0baf8a42b822 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1010,13 +1010,13 @@ static inline unsigned long round_hint_to_min(unsigned long hint)
 unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
 			unsigned long len, unsigned long prot,
 			unsigned long flags, unsigned long pgoff,
-			bool *populate)
+			unsigned long *populate)
 {
 	struct mm_struct * mm = current->mm;
 	struct inode *inode;
 	vm_flags_t vm_flags;
 
-	*populate = false;
+	*populate = 0;
 
 	/*
 	 * Does the application expect PROT_READ to imply PROT_EXEC?
@@ -1154,7 +1154,7 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
 
 	addr = mmap_region(file, addr, len, vm_flags, pgoff);
 	if (!IS_ERR_VALUE(addr) && (vm_flags & VM_POPULATE))
-		*populate = true;
+		*populate = len;
 	return addr;
 }
 
diff --git a/mm/nommu.c b/mm/nommu.c
index d7690b97b81d..0622fb9d083a 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1235,7 +1235,7 @@ unsigned long do_mmap_pgoff(struct file *file,
 			    unsigned long prot,
 			    unsigned long flags,
 			    unsigned long pgoff,
-			    bool *populate)
+			    unsigned long *populate)
 {
 	struct vm_area_struct *vma;
 	struct vm_region *region;
@@ -1245,7 +1245,7 @@ unsigned long do_mmap_pgoff(struct file *file,
 
 	kenter(",%lx,%lx,%lx,%lx,%lx", addr, len, prot, flags, pgoff);
 
-	*populate = false;
+	*populate = 0;
 
 	/* decide whether we should attempt the mapping, and if so what sort of
 	 * mapping */
diff --git a/mm/util.c b/mm/util.c
index 44f006ac2ccd..463ef33b32fa 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -355,7 +355,7 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 {
 	unsigned long ret;
 	struct mm_struct *mm = current->mm;
-	bool populate;
+	unsigned long populate;
 
 	ret = security_mmap_file(file, prot, flag);
 	if (!ret) {
@@ -363,8 +363,8 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 		ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff,
 				    &populate);
 		up_write(&mm->mmap_sem);
-		if (!IS_ERR_VALUE(ret) && populate)
-			mm_populate(ret, len);
+		if (populate)
+			mm_populate(ret, populate);
 	}
 	return ret;
 }
-- 
1.7.7.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held
  2012-12-21 10:46 ` [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held Michel Lespinasse
@ 2012-12-22 15:02   ` Greg Ungerer
  2013-01-23 13:37   ` Greg Ungerer
  1 sibling, 0 replies; 40+ messages in thread
From: Greg Ungerer @ 2012-12-22 15:02 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Greg Ungerer, Andy Lutomirski, Ingo Molnar, Al Viro, Hugh Dickins,
	Jorn_Engel, Rik van Riel, Andrew Morton, linux-mm, linux-kernel,
	David Howells

Hi Michel,

On 12/21/2012 08:46 PM, Michel Lespinasse wrote:
> I wanted to ask if you could check the sanity of the following patches
> in nommu configurations. My understanding is that these always

Sure. I think it is worth CC'ing David Howells on these as well,
he has spent a fair bit of time in the mmap code for nommu.

Regards
Greg


> populate mappings when they are created, so that MAP_POPULATE and
> MAP_LOCKED are actually no-ops. Is this an accurate description ?
>
> Thanks,
>
> On Thu, Dec 20, 2012 at 4:49 PM, Michel Lespinasse <walken@google.com> wrote:
>> We have many vma manipulation functions that are fast in the typical case,
>> but can optionally be instructed to populate an unbounded number of ptes
>> within the region they work on:
>> - mmap with MAP_POPULATE or MAP_LOCKED flags;
>> - remap_file_pages() with MAP_NONBLOCK not set or when working on a
>>    VM_LOCKED vma;
>> - mmap_region() and all its wrappers when mlock(MCL_FUTURE) is in effect;
>> - brk() when mlock(MCL_FUTURE) is in effect.
>>
>> Current code handles these pte operations locally, while the sourrounding
>> code has to hold the mmap_sem write side since it's manipulating vmas.
>> This means we're doing an unbounded amount of pte population work with
>> mmap_sem held, and this causes problems as Andy Lutomirski reported
>> (we've hit this at Google as well, though it's not entirely clear why
>> people keep trying to use mlock(MCL_FUTURE) in the first place).
>>
>> I propose introducing a new mm_populate() function to do this pte
>> population work after the mmap_sem has been released. mm_populate()
>> does need to acquire the mmap_sem read side, but critically, it
>> doesn't need to hold continuously for the entire duration of the
>> operation - it can drop it whenever things take too long (such as when
>> hitting disk for a file read) and re-acquire it later on.
>>
>> The following patches are against v3.7:
>>
>> - Patches 1-2 fix some issues I noticed while working on the existing code.
>>    If needed, they could potentially go in before the rest of the patches.
>>
>> - Patch 3 introduces the new mm_populate() function and changes
>>    mmap_region() call sites to use it after they drop mmap_sem. This is
>>    inspired from Andy Lutomirski's proposal and is built as an extension
>>    of the work I had previously done for mlock() and mlockall() around
>>    v2.6.38-rc1. I had tried doing something similar at the time but had
>>    given up as there were so many do_mmap() call sites; the recent cleanups
>>    by Linus and Viro are a tremendous help here.
>>
>> - Patches 4-6 convert some of the less-obvious places doing unbounded
>>    pte populates to the new mm_populate() mechanism.
>>
>> - Patches 7-8 are code cleanups that are made possible by the
>>    mm_populate() work. In particular, they remove more code than the
>>    entire patch series added, which should be a good thing :)
>>
>> - Patch 9 is optional to this entire series. It only helps to deal more
>>    nicely with racy userspace programs that might modify their mappings
>>    while we're trying to populate them. It adds a new VM_POPULATE flag
>>    on the mappings we do want to populate, so that if userspace replaces
>>    them with mappings it doesn't want populated, mm_populate() won't
>>    populate those replacement mappings.
>>
>> Michel Lespinasse (9):
>>    mm: make mlockall preserve flags other than VM_LOCKED in def_flags
>>    mm: remap_file_pages() fixes
>>    mm: introduce mm_populate() for populating new vmas
>>    mm: use mm_populate() for blocking remap_file_pages()
>>    mm: use mm_populate() when adjusting brk with MCL_FUTURE in effect.
>>    mm: use mm_populate() for mremap() of VM_LOCKED vmas
>>    mm: remove flags argument to mmap_region
>>    mm: directly use __mlock_vma_pages_range() in find_extend_vma()
>>    mm: introduce VM_POPULATE flag to better deal with racy userspace programs
>>
>>   arch/tile/mm/elf.c   |    1 -
>>   fs/aio.c             |    6 +++-
>>   include/linux/mm.h   |   23 +++++++++---
>>   include/linux/mman.h |    4 ++-
>>   ipc/shm.c            |   12 ++++---
>>   mm/fremap.c          |   51 ++++++++++++++-------------
>>   mm/internal.h        |    4 +-
>>   mm/memory.c          |   24 -------------
>>   mm/mlock.c           |   94 +++++++++++++------------------------------------
>>   mm/mmap.c            |   77 ++++++++++++++++++++++++----------------
>>   mm/mremap.c          |   25 +++++++------
>>   mm/nommu.c           |    5 ++-
>>   mm/util.c            |    6 +++-
>>   13 files changed, 154 insertions(+), 178 deletions(-)
>>
>> --
>> 1.7.7.3
>
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/9] mm: remap_file_pages() fixes
  2012-12-21  0:49 ` [PATCH 2/9] mm: remap_file_pages() fixes Michel Lespinasse
@ 2013-01-03  0:31   ` Rik van Riel
  0 siblings, 0 replies; 40+ messages in thread
From: Rik van Riel @ 2013-01-03  0:31 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Andy Lutomirski, Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel,
	Andrew Morton, linux-mm, linux-kernel

On 12/20/2012 07:49 PM, Michel Lespinasse wrote:
> Assorted small fixes. The first two are quite small:
>
> - Move check for vma->vm_private_data && !(vma->vm_flags & VM_NONLINEAR)
>    within existing if (!(vma->vm_flags & VM_NONLINEAR)) block.
>    Purely cosmetic.
>
> - In the VM_LOCKED case, when dropping PG_Mlocked for the over-mapped
>    range, make sure we own the mmap_sem write lock around the
>    munlock_vma_pages_range call as this manipulates the vma's vm_flags.
>
> Last fix requires a longer explanation. remap_file_pages() can do its work
> either through VM_NONLINEAR manipulation or by creating extra vmas.
> These two cases were inconsistent with each other (and ultimately, both wrong)
> as to exactly when did they fault in the newly mapped file pages:
>
> - In the VM_NONLINEAR case, new file pages would be populated if
>    the MAP_NONBLOCK flag wasn't passed. If MAP_NONBLOCK was passed,
>    new file pages wouldn't be populated even if the vma is already
>    marked as VM_LOCKED.
>
> - In the linear (emulated) case, the work is passed to the mmap_region()
>    function which would populate the pages if the vma is marked as
>    VM_LOCKED, and would not otherwise - regardless of the value of the
>    MAP_NONBLOCK flag, because MAP_POPULATE wasn't being passed to
>    mmap_region().
>
> The desired behavior is that we want the pages to be populated and locked
> if the vma is marked as VM_LOCKED, or to be populated if the MAP_NONBLOCK
> flag is not passed to remap_file_pages().
>
> Signed-off-by: Michel Lespinasse <walken@google.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/9] mm: introduce mm_populate() for populating new vmas
  2012-12-21  0:49 ` [PATCH 3/9] mm: introduce mm_populate() for populating new vmas Michel Lespinasse
@ 2013-01-03  2:14   ` Rik van Riel
  0 siblings, 0 replies; 40+ messages in thread
From: Rik van Riel @ 2013-01-03  2:14 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Andy Lutomirski, Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel,
	Andrew Morton, linux-mm, linux-kernel

On 12/20/2012 07:49 PM, Michel Lespinasse wrote:
> When creating new mappings using the MAP_POPULATE / MAP_LOCKED flags
> (or with MCL_FUTURE in effect), we want to populate the pages within the
> newly created vmas. This may take a while as we may have to read pages
> from disk, so ideally we want to do this outside of the write-locked
> mmap_sem region.
>
> This change introduces mm_populate(), which is used to defer populating
> such mappings until after the mmap_sem write lock has been released.
> This is implemented as a generalization of the former do_mlock_pages(),
> which accomplished the same task but was using during mlock() / mlockall().
>
> Reported-by: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: Michel Lespinasse <walken@google.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/9] mm: use mm_populate() for blocking remap_file_pages()
  2012-12-21  0:49 ` [PATCH 4/9] mm: use mm_populate() for blocking remap_file_pages() Michel Lespinasse
@ 2013-01-03  2:25   ` Rik van Riel
  2013-03-10 18:55   ` Tommi Rantala
  1 sibling, 0 replies; 40+ messages in thread
From: Rik van Riel @ 2013-01-03  2:25 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Andy Lutomirski, Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel,
	Andrew Morton, linux-mm, linux-kernel

On 12/20/2012 07:49 PM, Michel Lespinasse wrote:
> Signed-off-by: Michel Lespinasse <walken@google.com>

Changelog could use some help :)

Other than that:

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/9] mm: use mm_populate() when adjusting brk with MCL_FUTURE in effect.
  2012-12-21  0:49 ` [PATCH 5/9] mm: use mm_populate() when adjusting brk with MCL_FUTURE in effect Michel Lespinasse
@ 2013-01-03  2:56   ` Rik van Riel
  0 siblings, 0 replies; 40+ messages in thread
From: Rik van Riel @ 2013-01-03  2:56 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Andy Lutomirski, Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel,
	Andrew Morton, linux-mm, linux-kernel

On 12/20/2012 07:49 PM, Michel Lespinasse wrote:
> Signed-off-by: Michel Lespinasse <walken@google.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 6/9] mm: use mm_populate() for mremap() of VM_LOCKED vmas
  2012-12-21  0:49 ` [PATCH 6/9] mm: use mm_populate() for mremap() of VM_LOCKED vmas Michel Lespinasse
@ 2013-01-03  5:47   ` Rik van Riel
  0 siblings, 0 replies; 40+ messages in thread
From: Rik van Riel @ 2013-01-03  5:47 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Andy Lutomirski, Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel,
	Andrew Morton, linux-mm, linux-kernel

On 12/20/2012 07:49 PM, Michel Lespinasse wrote:
> Signed-off-by: Michel Lespinasse <walken@google.com>

Changelog?

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/9] mm: remove flags argument to mmap_region
  2012-12-21  0:49 ` [PATCH 7/9] mm: remove flags argument to mmap_region Michel Lespinasse
@ 2013-01-03  5:49   ` Rik van Riel
  0 siblings, 0 replies; 40+ messages in thread
From: Rik van Riel @ 2013-01-03  5:49 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Andy Lutomirski, Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel,
	Andrew Morton, linux-mm, linux-kernel

On 12/20/2012 07:49 PM, Michel Lespinasse wrote:
> After the MAP_POPULATE handling has been moved to mmap_region() call sites,
> the only remaining use of the flags argument is to pass the MAP_NORESERVE
> flag. This can be just as easily handled by do_mmap_pgoff(), so do that
> and remove the mmap_region() flags parameter.
>
> Signed-off-by: Michel Lespinasse <walken@google.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 8/9] mm: directly use __mlock_vma_pages_range() in find_extend_vma()
  2012-12-21  0:49 ` [PATCH 8/9] mm: directly use __mlock_vma_pages_range() in find_extend_vma() Michel Lespinasse
@ 2013-01-03  5:50   ` Rik van Riel
  0 siblings, 0 replies; 40+ messages in thread
From: Rik van Riel @ 2013-01-03  5:50 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Andy Lutomirski, Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel,
	Andrew Morton, linux-mm, linux-kernel

On 12/20/2012 07:49 PM, Michel Lespinasse wrote:
> In find_extend_vma(), we don't need mlock_vma_pages_range() to verify the
> vma type - we know we're working with a stack. So, we can call directly
> into __mlock_vma_pages_range(), and remove the last make_pages_present()
> call site.
>
> Note that we don't use mm_populate() here, so we can't release the mmap_sem
> while allocating new stack pages. This is deemed acceptable, because the
> stack vmas grow by a bounded number of pages at a time, and these are
> anon pages so we don't have to read from disk to populate them.
>
> Signed-off-by: Michel Lespinasse <walken@google.com>

Acked-by: Rik van Riel <riel@redhat.com>


-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 9/9] mm: introduce VM_POPULATE flag to better deal with racy userspace programs
  2012-12-21  0:49 ` [PATCH 9/9] mm: introduce VM_POPULATE flag to better deal with racy userspace programs Michel Lespinasse
@ 2013-01-03  6:20   ` Rik van Riel
  0 siblings, 0 replies; 40+ messages in thread
From: Rik van Riel @ 2013-01-03  6:20 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Andy Lutomirski, Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel,
	Andrew Morton, linux-mm, linux-kernel

On 12/20/2012 07:49 PM, Michel Lespinasse wrote:
> The vm_populate() code populates user mappings without constantly
> holding the mmap_sem. This makes it susceptible to racy userspace
> programs: the user mappings may change while vm_populate() is running,
> and in this case vm_populate() may end up populating the new mapping
> instead of the old one.
>
> In order to reduce the possibility of userspace getting surprised by
> this behavior, this change introduces the VM_POPULATE vma flag which
> gets set on vmas we want vm_populate() to work on. This way
> vm_populate() may still end up populating the new mapping after such a
> race, but only if the new mapping is also one that the user has
> requested (using MAP_SHARED, MAP_LOCKED or mlock) to be populated.
>
> Signed-off-by: Michel Lespinasse <walken@google.com>

Acked-by: Rik van Riel <ri

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 10/9] mm: make do_mmap_pgoff return populate as a size in bytes, not as a bool
  2012-12-22  9:45             ` [PATCH 10/9] mm: make do_mmap_pgoff return populate as a size in bytes, not as a bool Michel Lespinasse
@ 2013-01-03  6:21               ` Rik van Riel
  2013-01-03 18:54               ` Andy Lutomirski
  1 sibling, 0 replies; 40+ messages in thread
From: Rik van Riel @ 2013-01-03  6:21 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Andy Lutomirski, Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel,
	Andrew Morton, linux-mm, linux-kernel

On 12/22/2012 04:45 AM, Michel Lespinasse wrote:
> do_mmap_pgoff() rounds up the desired size to the next PAGE_SIZE multiple,
> however there was no equivalent code in mm_populate(), which caused issues.
>
> This could be fixed by introduced the same rounding in mm_populate(),
> however I think it's preferable to make do_mmap_pgoff() return populate
> as a size rather than as a boolean, so we don't have to duplicate the
> size rounding logic in mm_populate().
>
> Signed-off-by: Michel Lespinasse <walken@google.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 10/9] mm: make do_mmap_pgoff return populate as a size in bytes, not as a bool
  2012-12-22  9:45             ` [PATCH 10/9] mm: make do_mmap_pgoff return populate as a size in bytes, not as a bool Michel Lespinasse
  2013-01-03  6:21               ` Rik van Riel
@ 2013-01-03 18:54               ` Andy Lutomirski
  1 sibling, 0 replies; 40+ messages in thread
From: Andy Lutomirski @ 2013-01-03 18:54 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel

On Sat, Dec 22, 2012 at 1:45 AM, Michel Lespinasse <walken@google.com> wrote:
> do_mmap_pgoff() rounds up the desired size to the next PAGE_SIZE multiple,
> however there was no equivalent code in mm_populate(), which caused issues.
>
> This could be fixed by introduced the same rounding in mm_populate(),
> however I think it's preferable to make do_mmap_pgoff() return populate
> as a size rather than as a boolean, so we don't have to duplicate the
> size rounding logic in mm_populate().
>
> Signed-off-by: Michel Lespinasse <walken@google.com>

This appears to fix my hangs.

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held
  2012-12-22  9:37           ` Michel Lespinasse
  2012-12-22  9:45             ` [PATCH 10/9] mm: make do_mmap_pgoff return populate as a size in bytes, not as a bool Michel Lespinasse
@ 2013-01-03 18:56             ` Andy Lutomirski
  1 sibling, 0 replies; 40+ messages in thread
From: Andy Lutomirski @ 2013-01-03 18:56 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel

On Sat, Dec 22, 2012 at 1:37 AM, Michel Lespinasse <walken@google.com> wrote:
> On Fri, Dec 21, 2012 at 6:16 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Fri, Dec 21, 2012 at 5:59 PM, Michel Lespinasse <walken@google.com> wrote:
>>> Could you share your test case so I can try reproducing the issue
>>> you're seeing ?
>>
>> Not so easy.  My test case is a large chunk of a high-frequency
>> trading system :)
>
> Huh, its probably better if I don't see it then :)
>
>> I just tried it again.  Not I have a task stuck in
>> mlockall(MCL_CURRENT|MCL_FUTURE).  The stack is:
>>
>> [<0000000000000000>] flush_work+0x1c2/0x280
>> [<0000000000000000>] schedule_on_each_cpu+0xe3/0x130
>> [<0000000000000000>] lru_add_drain_all+0x15/0x20
>> [<0000000000000000>] sys_mlockall+0x125/0x1a0
>> [<0000000000000000>] tracesys+0xd0/0xd5
>> [<0000000000000000>] 0xffffffffffffffff
>>
>> The sequence of mmap and munmap calls, according to strace, is:
>>
> [...]
>> 6084  mmap(0x7f54fd02a000, 6776, PROT_READ|PROT_WRITE,
>> MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f54fd02a000
>
> So I noticed you use mmap with a size that is not a multiple of
> PAGE_SIZE. This is perfectly legal, but I hadn't tested that case, and
> lo and behold, it's something I got wrong. Patch to be sent as a reply
> to this. Without this patch, vm_populate() will show a debug message
> if you have CONFIG_DEBUG_VM set, and likely spin in an infinite loop
> if you don't.
>
>> 6084  mmap(NULL, 26258, PROT_READ, MAP_SHARED, 4, 0) = 0x7f5509f9d000
>> 6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
>> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5509f9c000
>> 6084  munmap(0x7f5509f9c000, 4096)      = 0
>> 6084  mlockall(MCL_CURRENT|MCL_FUTURE
>>
>> This task is unkillable.  Two other tasks are stuck spinning.
>
> Now I'm confused, because:
>
> 1- your trace shows the hang occurs during mlockall(), and this code
> really wasn't touched much in my series (besides renaming
> do_mlock_pages into __mm_populate())
>
> 2- the backtrace above showed sys_mlockall() -> lru_add_drain_all(),
> which is the very beginning of mlockall(), before anything of
> importance happens (and in particular, before the MCL_FUTURE flag
> takes action). So, I'm going to assume that it's one of the other
> spinning threads that is breaking things. If one of the spinning
> threads got stuck within vm_populate(), this could even be explained
> by the bug I mentioned above.
>
> Could you check if the fix I'm going to send as a reply to this works
> for you, and if not, where the two spinning threads are being stuck ?
>

It works.  In case anyone cares, the whole series is

Tested-by: Andy Lutomirski <luto@amacapital.net>

I'll let you know if anything else breaks.  I'll be pounding on a
kernel with this patched in for the next couple of days, I expect.

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held
  2012-12-21  0:49 [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held Michel Lespinasse
                   ` (10 preceding siblings ...)
  2012-12-22  0:36 ` Andy Lutomirski
@ 2013-01-04 18:16 ` Andy Lutomirski
  2013-01-04 22:58   ` Michel Lespinasse
  11 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2013-01-04 18:16 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel

On Thu, Dec 20, 2012 at 4:49 PM, Michel Lespinasse <walken@google.com> wrote:
> We have many vma manipulation functions that are fast in the typical case,
> but can optionally be instructed to populate an unbounded number of ptes
> within the region they work on:
> - mmap with MAP_POPULATE or MAP_LOCKED flags;
> - remap_file_pages() with MAP_NONBLOCK not set or when working on a
>   VM_LOCKED vma;
> - mmap_region() and all its wrappers when mlock(MCL_FUTURE) is in effect;
> - brk() when mlock(MCL_FUTURE) is in effect.
>
> Current code handles these pte operations locally, while the sourrounding
> code has to hold the mmap_sem write side since it's manipulating vmas.
> This means we're doing an unbounded amount of pte population work with
> mmap_sem held, and this causes problems as Andy Lutomirski reported
> (we've hit this at Google as well, though it's not entirely clear why
> people keep trying to use mlock(MCL_FUTURE) in the first place).
>
> I propose introducing a new mm_populate() function to do this pte
> population work after the mmap_sem has been released. mm_populate()
> does need to acquire the mmap_sem read side, but critically, it
> doesn't need to hold continuously for the entire duration of the
> operation - it can drop it whenever things take too long (such as when
> hitting disk for a file read) and re-acquire it later on.
>

I still have quite a few instances of 2-6 ms of latency due to
"call_rwsem_down_read_failed __do_page_fault do_page_fault
page_fault".  Any idea why?  I don't know any great way to figure out
who is holding mmap_sem at the time.  Given what my code is doing, I
suspect the contention is due to mmap or munmap on a file.  MCL_FUTURE
is set, and MAP_POPULATE is not set.

It could be the other thread calling mmap and getting preempted (or
otherwise calling schedule()).  Grr.

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held
  2013-01-04 18:16 ` Andy Lutomirski
@ 2013-01-04 22:58   ` Michel Lespinasse
  0 siblings, 0 replies; 40+ messages in thread
From: Michel Lespinasse @ 2013-01-04 22:58 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel

On Fri, Jan 4, 2013 at 10:16 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> I still have quite a few instances of 2-6 ms of latency due to
> "call_rwsem_down_read_failed __do_page_fault do_page_fault
> page_fault".  Any idea why?  I don't know any great way to figure out
> who is holding mmap_sem at the time.  Given what my code is doing, I
> suspect the contention is due to mmap or munmap on a file.  MCL_FUTURE
> is set, and MAP_POPULATE is not set.
>
> It could be the other thread calling mmap and getting preempted (or
> otherwise calling schedule()).  Grr.

The simplest way to find out who's holding the lock too long might be
to enable CONFIG_LOCK_STATS. This will slow things down a little, but
give you lots of useful information including which threads hold
mmap_sem the longest and the call stack for where they grab it from.
See Documentation/lockstat.txt

I think munmap is a likely culprit, as it still happens with mmap_sem
held for write (I do plan to go work on this next). But it's hard to
be sure without lockstats :)

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held
  2012-12-21 10:46 ` [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held Michel Lespinasse
  2012-12-22 15:02   ` Greg Ungerer
@ 2013-01-23 13:37   ` Greg Ungerer
  1 sibling, 0 replies; 40+ messages in thread
From: Greg Ungerer @ 2013-01-23 13:37 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Andy Lutomirski, Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel,
	Rik van Riel, Andrew Morton, linux-mm, linux-kernel

Hi Michael,

On 12/21/2012 08:46 PM, Michel Lespinasse wrote:
> I wanted to ask if you could check the sanity of the following patches
> in nommu configurations. My understanding is that these always
> populate mappings when they are created, so that MAP_POPULATE and
> MAP_LOCKED are actually no-ops. Is this an accurate description ?

Sorry, I have been very slow at getting around to this.

MAP_LOCKED and MAP_POPULATE are effectively no-ops, we don't deal
with them on the no-mmu mmap path.

I had a look over the patches and they look ok to me with regard
to no-mmu. So if you still want it:

Acked-by: Greg Ungerer <gerg@uclinux.org>

Regards
Greg



> On Thu, Dec 20, 2012 at 4:49 PM, Michel Lespinasse <walken@google.com> wrote:
>> We have many vma manipulation functions that are fast in the typical case,
>> but can optionally be instructed to populate an unbounded number of ptes
>> within the region they work on:
>> - mmap with MAP_POPULATE or MAP_LOCKED flags;
>> - remap_file_pages() with MAP_NONBLOCK not set or when working on a
>>   VM_LOCKED vma;
>> - mmap_region() and all its wrappers when mlock(MCL_FUTURE) is in effect;
>> - brk() when mlock(MCL_FUTURE) is in effect.
>>
>> Current code handles these pte operations locally, while the sourrounding
>> code has to hold the mmap_sem write side since it's manipulating vmas.
>> This means we're doing an unbounded amount of pte population work with
>> mmap_sem held, and this causes problems as Andy Lutomirski reported
>> (we've hit this at Google as well, though it's not entirely clear why
>> people keep trying to use mlock(MCL_FUTURE) in the first place).
>>
>> I propose introducing a new mm_populate() function to do this pte
>> population work after the mmap_sem has been released. mm_populate()
>> does need to acquire the mmap_sem read side, but critically, it
>> doesn't need to hold continuously for the entire duration of the
>> operation - it can drop it whenever things take too long (such as when
>> hitting disk for a file read) and re-acquire it later on.
>>
>> The following patches are against v3.7:
>>
>> - Patches 1-2 fix some issues I noticed while working on the existing code.
>>   If needed, they could potentially go in before the rest of the patches.
>>
>> - Patch 3 introduces the new mm_populate() function and changes
>>   mmap_region() call sites to use it after they drop mmap_sem. This is
>>   inspired from Andy Lutomirski's proposal and is built as an extension
>>   of the work I had previously done for mlock() and mlockall() around
>>   v2.6.38-rc1. I had tried doing something similar at the time but had
>>   given up as there were so many do_mmap() call sites; the recent cleanups
>>   by Linus and Viro are a tremendous help here.
>>
>> - Patches 4-6 convert some of the less-obvious places doing unbounded
>>   pte populates to the new mm_populate() mechanism.
>>
>> - Patches 7-8 are code cleanups that are made possible by the
>>   mm_populate() work. In particular, they remove more code than the
>>   entire patch series added, which should be a good thing :)
>>
>> - Patch 9 is optional to this entire series. It only helps to deal more
>>   nicely with racy userspace programs that might modify their mappings
>>   while we're trying to populate them. It adds a new VM_POPULATE flag
>>   on the mappings we do want to populate, so that if userspace replaces
>>   them with mappings it doesn't want populated, mm_populate() won't
>>   populate those replacement mappings.
>>
>> Michel Lespinasse (9):
>>   mm: make mlockall preserve flags other than VM_LOCKED in def_flags
>>   mm: remap_file_pages() fixes
>>   mm: introduce mm_populate() for populating new vmas
>>   mm: use mm_populate() for blocking remap_file_pages()
>>   mm: use mm_populate() when adjusting brk with MCL_FUTURE in effect.
>>   mm: use mm_populate() for mremap() of VM_LOCKED vmas
>>   mm: remove flags argument to mmap_region
>>   mm: directly use __mlock_vma_pages_range() in find_extend_vma()
>>   mm: introduce VM_POPULATE flag to better deal with racy userspace programs
>>
>>  arch/tile/mm/elf.c   |    1 -
>>  fs/aio.c             |    6 +++-
>>  include/linux/mm.h   |   23 +++++++++---
>>  include/linux/mman.h |    4 ++-
>>  ipc/shm.c            |   12 ++++---
>>  mm/fremap.c          |   51 ++++++++++++++-------------
>>  mm/internal.h        |    4 +-
>>  mm/memory.c          |   24 -------------
>>  mm/mlock.c           |   94 +++++++++++++------------------------------------
>>  mm/mmap.c            |   77 ++++++++++++++++++++++++----------------
>>  mm/mremap.c          |   25 +++++++------
>>  mm/nommu.c           |    5 ++-
>>  mm/util.c            |    6 +++-
>>  13 files changed, 154 insertions(+), 178 deletions(-)
>>
>> --
>> 1.7.7.3
> 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/9] mm: use mm_populate() for blocking remap_file_pages()
  2012-12-21  0:49 ` [PATCH 4/9] mm: use mm_populate() for blocking remap_file_pages() Michel Lespinasse
  2013-01-03  2:25   ` Rik van Riel
@ 2013-03-10 18:55   ` Tommi Rantala
  2013-03-11 23:03     ` Andrew Morton
  1 sibling, 1 reply; 40+ messages in thread
From: Tommi Rantala @ 2013-03-10 18:55 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Andy Lutomirski, Ingo Molnar, Al Viro, Hugh Dickins, Jorn_Engel,
	Rik van Riel, Andrew Morton, linux-mm@kvack.org, LKML, Dave Jones

2012/12/21 Michel Lespinasse <walken@google.com>:
> Signed-off-by: Michel Lespinasse <walken@google.com>

Hello, this patch introduced the following bug, seen while fuzzing with trinity:

[  396.825414] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000050
[  396.826013] IP: [<ffffffff81176efb>] sys_remap_file_pages+0xbb/0x3e0
[  396.826013] PGD 61e65067 PUD 3fb4067 PMD 0
[  396.826013] Oops: 0000 [#8] SMP
[  396.826013] CPU 0
[  396.826013] Pid: 27553, comm: trinity-child53 Tainted: G      D W
 3.9.0-rc1+ #108 Bochs Bochs
[  396.826013] RIP: 0010:[<ffffffff81176efb>]  [<ffffffff81176efb>]
sys_remap_file_pages+0xbb/0x3e0
[  396.826013] RSP: 0018:ffff880071a23f08  EFLAGS: 00010246
[  396.826013] RAX: 0000000000000000 RBX: ffffffff00000000 RCX: 0000000000000001
[  396.826013] RDX: 0000000000000000 RSI: ffffffff00000000 RDI: ffff8800679657c0
[  396.826013] RBP: ffff880071a23f78 R08: 0000000000000002 R09: 0000000000000000
[  396.826013] R10: 0000000026dad294 R11: 0000000000000000 R12: 0000000000000000
[  396.826013] R13: ffff880067965870 R14: ffffffffffffffea R15: 0000000000000000
[  396.826013] FS:  00007f6691a57700(0000) GS:ffff88007f800000(0000)
knlGS:0000000000000000
[  396.826013] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  396.826013] CR2: 0000000000000050 CR3: 0000000068ab3000 CR4: 00000000000006f0
[  396.826013] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  396.826013] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  396.826013] Process trinity-child53 (pid: 27553, threadinfo
ffff880071a22000, task ffff88006a360000)
[  396.826013] Stack:
[  396.826013]  0000000000000000 ffffffff810f33b6 0000000000000035
0000000000000000
[  396.826013]  000000000000f000 0000000000000000 0000000026dad294
ffff8800679657c0
[  396.826013]  a80006367e000000 ffffffff00000000 00000000000006c0
00000000000000d8
[  396.826013] Call Trace:
[  396.826013]  [<ffffffff810f33b6>] ? trace_hardirqs_on_caller+0x16/0x1f0
[  396.826013]  [<ffffffff81faf169>] system_call_fastpath+0x16/0x1b
[  396.826013] Code: 43 e3 00 48 8b 45 a8 25 00 00 01 00 48 89 45 b8
48 8b 7d c8 48 89 de e8 74 9b 00 00 48 85 c0 49 89 c7 75 1c 49 c7 c6
ea ff ff ff <48> 8b 14 25 50 00 00 00 44 89 f0 e9 7f 02 00 00 0f 1f 44
00 00
[  396.826013] RIP  [<ffffffff81176efb>] sys_remap_file_pages+0xbb/0x3e0
[  396.826013]  RSP <ffff880071a23f08>
[  396.826013] CR2: 0000000000000050
[  396.876275] ---[ end trace 0444599b5c1ba02b ]---

> ---
>  mm/fremap.c |   22 ++++++----------------
>  1 files changed, 6 insertions(+), 16 deletions(-)
>
> diff --git a/mm/fremap.c b/mm/fremap.c
> index 2db886e31044..b42e32171530 100644
> --- a/mm/fremap.c
> +++ b/mm/fremap.c
> @@ -129,6 +129,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
>         struct vm_area_struct *vma;
>         int err = -EINVAL;
>         int has_write_lock = 0;
> +       vm_flags_t vm_flags;
>
>         if (prot)
>                 return err;
> @@ -228,30 +229,16 @@ get_write_lock:
>                 /*
>                  * drop PG_Mlocked flag for over-mapped range
>                  */
> -               vm_flags_t saved_flags = vma->vm_flags;
>                 if (!has_write_lock)
>                         goto get_write_lock;
> +               vm_flags = vma->vm_flags;
>                 munlock_vma_pages_range(vma, start, start + size);
> -               vma->vm_flags = saved_flags;
> +               vma->vm_flags = vm_flags;
>         }
>
>         mmu_notifier_invalidate_range_start(mm, start, start + size);
>         err = vma->vm_ops->remap_pages(vma, start, size, pgoff);
>         mmu_notifier_invalidate_range_end(mm, start, start + size);
> -       if (!err) {
> -               if (vma->vm_flags & VM_LOCKED) {
> -                       /*
> -                        * might be mapping previously unmapped range of file
> -                        */
> -                       mlock_vma_pages_range(vma, start, start + size);
> -               } else if (!(flags & MAP_NONBLOCK)) {
> -                       if (unlikely(has_write_lock)) {
> -                               downgrade_write(&mm->mmap_sem);
> -                               has_write_lock = 0;
> -                       }
> -                       make_pages_present(start, start+size);
> -               }
> -       }
>
>         /*
>          * We can't clear VM_NONLINEAR because we'd have to do
> @@ -260,10 +247,13 @@ get_write_lock:
>          */
>
>  out:
> +       vm_flags = vma->vm_flags;

When find_vma() fails, vma is NULL here.

>         if (likely(!has_write_lock))
>                 up_read(&mm->mmap_sem);
>         else
>                 up_write(&mm->mmap_sem);
> +       if (!err && ((vm_flags & VM_LOCKED) || !(flags & MAP_NONBLOCK)))
> +               mm_populate(start, size);
>
>         return err;
>  }
> --
> 1.7.7.3
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/9] mm: use mm_populate() for blocking remap_file_pages()
  2013-03-10 18:55   ` Tommi Rantala
@ 2013-03-11 23:03     ` Andrew Morton
  2013-03-12  0:24       ` Michel Lespinasse
  0 siblings, 1 reply; 40+ messages in thread
From: Andrew Morton @ 2013-03-11 23:03 UTC (permalink / raw)
  To: Tommi Rantala
  Cc: Michel Lespinasse, Andy Lutomirski, Ingo Molnar, Al Viro,
	Hugh Dickins, Jorn_Engel, Rik van Riel, linux-mm@kvack.org, LKML,
	Dave Jones

On Sun, 10 Mar 2013 20:55:21 +0200 Tommi Rantala <tt.rantala@gmail.com> wrote:

> 2012/12/21 Michel Lespinasse <walken@google.com>:
> > Signed-off-by: Michel Lespinasse <walken@google.com>
> 
> Hello, this patch introduced the following bug, seen while fuzzing with trinity:
> 
> [  396.825414] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000050
>
> ...
>
> > +       vm_flags = vma->vm_flags;
> 
> When find_vma() fails, vma is NULL here.

Yup, thanks.

This, methinks:


From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm/fremap.c: fix oops on error path

If find_vma() fails, sys_remap_file_pages() will dereference `vma', which
contains NULL.  Fix it by checking the pointer.

(We could alternatively check for err==0, but this seems more direct)

(The vm_flags change is to squish a bogus used-uninitialised warning
without adding extra code).

Reported-by: Tommi Rantala <tt.rantala@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/fremap.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff -puN mm/fremap.c~mm-fremapc-fix-oops-on-error-path mm/fremap.c
--- a/mm/fremap.c~mm-fremapc-fix-oops-on-error-path
+++ a/mm/fremap.c
@@ -163,7 +163,8 @@ SYSCALL_DEFINE5(remap_file_pages, unsign
 	 * and that the remapped range is valid and fully within
 	 * the single existing vma.
 	 */
-	if (!vma || !(vma->vm_flags & VM_SHARED))
+	vm_flags = vma->vm_flags;
+	if (!vma || !(vm_flags & VM_SHARED))
 		goto out;
 
 	if (!vma->vm_ops || !vma->vm_ops->remap_pages)
@@ -254,7 +255,8 @@ get_write_lock:
 	 */
 
 out:
-	vm_flags = vma->vm_flags;
+	if (vma)
+		vm_flags = vma->vm_flags;
 	if (likely(!has_write_lock))
 		up_read(&mm->mmap_sem);
 	else
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/9] mm: use mm_populate() for blocking remap_file_pages()
  2013-03-11 23:03     ` Andrew Morton
@ 2013-03-12  0:24       ` Michel Lespinasse
  2013-03-12  4:23         ` Hillf Danton
  2013-03-12 20:47         ` Andrew Morton
  0 siblings, 2 replies; 40+ messages in thread
From: Michel Lespinasse @ 2013-03-12  0:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Tommi Rantala, Andy Lutomirski, Ingo Molnar, Al Viro,
	Hugh Dickins, Jorn_Engel, Rik van Riel, linux-mm@kvack.org, LKML,
	Dave Jones

(Sorry for the late reply)

On Mon, Mar 11, 2013 at 4:03 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Sun, 10 Mar 2013 20:55:21 +0200 Tommi Rantala <tt.rantala@gmail.com> wrote:
>
>> 2012/12/21 Michel Lespinasse <walken@google.com>:
>> > Signed-off-by: Michel Lespinasse <walken@google.com>
>>
>> Hello, this patch introduced the following bug, seen while fuzzing with trinity:
>>
>> [  396.825414] BUG: unable to handle kernel NULL pointer dereference
>> at 0000000000000050

Good catch...

> From: Andrew Morton <akpm@linux-foundation.org>
> Subject: mm/fremap.c: fix oops on error path
>
> If find_vma() fails, sys_remap_file_pages() will dereference `vma', which
> contains NULL.  Fix it by checking the pointer.
>
> (We could alternatively check for err==0, but this seems more direct)
>
> (The vm_flags change is to squish a bogus used-uninitialised warning
> without adding extra code).
>
> Reported-by: Tommi Rantala <tt.rantala@gmail.com>
> Cc: Michel Lespinasse <walken@google.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
>  mm/fremap.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff -puN mm/fremap.c~mm-fremapc-fix-oops-on-error-path mm/fremap.c
> --- a/mm/fremap.c~mm-fremapc-fix-oops-on-error-path
> +++ a/mm/fremap.c
> @@ -163,7 +163,8 @@ SYSCALL_DEFINE5(remap_file_pages, unsign
>          * and that the remapped range is valid and fully within
>          * the single existing vma.
>          */
> -       if (!vma || !(vma->vm_flags & VM_SHARED))
> +       vm_flags = vma->vm_flags;
> +       if (!vma || !(vm_flags & VM_SHARED))
>                 goto out;

Your commit message indicates the vm_flags load here doesn't generate any code, but this seems very brittle and compiler dependent. If the compiler was to generate an actual load here, the issue with vma == NULL would reappear.

>         if (!vma->vm_ops || !vma->vm_ops->remap_pages)
> @@ -254,7 +255,8 @@ get_write_lock:
>          */
>
>  out:
> -       vm_flags = vma->vm_flags;
> +       if (vma)
> +               vm_flags = vma->vm_flags;
>         if (likely(!has_write_lock))
>                 up_read(&mm->mmap_sem);
>         else



Would the following work ? I think it's simpler, and with the compiler
I'm using here it doesn't emit warnings:

diff --git a/mm/fremap.c b/mm/fremap.c
index 0cd4c11488ed..329507e832fb 100644
--- a/mm/fremap.c
+++ b/mm/fremap.c
@@ -254,7 +254,8 @@ get_write_lock:
 	 */
 
 out:
-	vm_flags = vma->vm_flags;
+	if (!err)
+		vm_flags = vma->vm_flags;
 	if (likely(!has_write_lock))
 		up_read(&mm->mmap_sem);
 	else

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/9] mm: use mm_populate() for blocking remap_file_pages()
  2013-03-12  0:24       ` Michel Lespinasse
@ 2013-03-12  4:23         ` Hillf Danton
  2013-03-12  5:01           ` Michel Lespinasse
  2013-03-12 20:47         ` Andrew Morton
  1 sibling, 1 reply; 40+ messages in thread
From: Hillf Danton @ 2013-03-12  4:23 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Andrew Morton, Tommi Rantala, Andy Lutomirski, Ingo Molnar,
	Al Viro, Hugh Dickins, Jorn_Engel, Rik van Riel, Linux-MM, LKML

On Tue, Mar 12, 2013 at 8:24 AM, Michel Lespinasse <walken@google.com> wrote:
> (Sorry for the late reply)
>
> On Mon, Mar 11, 2013 at 4:03 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
>> On Sun, 10 Mar 2013 20:55:21 +0200 Tommi Rantala <tt.rantala@gmail.com> wrote:
>>
>>> 2012/12/21 Michel Lespinasse <walken@google.com>:
>>> > Signed-off-by: Michel Lespinasse <walken@google.com>
>>>
>>> Hello, this patch introduced the following bug, seen while fuzzing with trinity:
>>>
>>> [  396.825414] BUG: unable to handle kernel NULL pointer dereference
>>> at 0000000000000050
>
> Good catch...
>
>> From: Andrew Morton <akpm@linux-foundation.org>
>> Subject: mm/fremap.c: fix oops on error path
>>
>> If find_vma() fails, sys_remap_file_pages() will dereference `vma', which
>> contains NULL.  Fix it by checking the pointer.
>>
>> (We could alternatively check for err==0, but this seems more direct)
>>
>> (The vm_flags change is to squish a bogus used-uninitialised warning
>> without adding extra code).
>>
>> Reported-by: Tommi Rantala <tt.rantala@gmail.com>
>> Cc: Michel Lespinasse <walken@google.com>
>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> ---
>>
>>  mm/fremap.c |    6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff -puN mm/fremap.c~mm-fremapc-fix-oops-on-error-path mm/fremap.c
>> --- a/mm/fremap.c~mm-fremapc-fix-oops-on-error-path
>> +++ a/mm/fremap.c
>> @@ -163,7 +163,8 @@ SYSCALL_DEFINE5(remap_file_pages, unsign
>>          * and that the remapped range is valid and fully within
>>          * the single existing vma.
>>          */
>> -       if (!vma || !(vma->vm_flags & VM_SHARED))
>> +       vm_flags = vma->vm_flags;
>> +       if (!vma || !(vm_flags & VM_SHARED))
>>                 goto out;
>
> Your commit message indicates the vm_flags load here doesn't generate any code, but this seems very brittle and compiler dependent. If the compiler was to generate an actual load here, the issue with vma == NULL would reappear.
>
>>         if (!vma->vm_ops || !vma->vm_ops->remap_pages)
>> @@ -254,7 +255,8 @@ get_write_lock:
>>          */
>>
>>  out:
>> -       vm_flags = vma->vm_flags;
>> +       if (vma)
>> +               vm_flags = vma->vm_flags;
>>         if (likely(!has_write_lock))
>>                 up_read(&mm->mmap_sem);
>>         else
>
>
>
> Would the following work ? I think it's simpler, and with the compiler
> I'm using here it doesn't emit warnings:
>
> diff --git a/mm/fremap.c b/mm/fremap.c
> index 0cd4c11488ed..329507e832fb 100644
> --- a/mm/fremap.c
> +++ b/mm/fremap.c
> @@ -254,7 +254,8 @@ get_write_lock:
>          */
>
>  out:
> -       vm_flags = vma->vm_flags;
> +       if (!err)
> +               vm_flags = vma->vm_flags;
>         if (likely(!has_write_lock))
>                 up_read(&mm->mmap_sem);
>         else
>
Is it still necessary to populate mm if bail out due
to a linear mapping encountered?

Hillf

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/9] mm: use mm_populate() for blocking remap_file_pages()
  2013-03-12  4:23         ` Hillf Danton
@ 2013-03-12  5:01           ` Michel Lespinasse
  0 siblings, 0 replies; 40+ messages in thread
From: Michel Lespinasse @ 2013-03-12  5:01 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Andrew Morton, Tommi Rantala, Andy Lutomirski, Ingo Molnar,
	Al Viro, Hugh Dickins, Jorn_Engel, Rik van Riel, Linux-MM, LKML

On Mon, Mar 11, 2013 at 9:23 PM, Hillf Danton <dhillf@gmail.com> wrote:
> Is it still necessary to populate mm if bail out due
> to a linear mapping encountered?

Yes. mmap_region() doesn't do it on its own, and we want the emulated
linear case to behave similarly to the true nonlinear case.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/9] mm: use mm_populate() for blocking remap_file_pages()
  2013-03-12  0:24       ` Michel Lespinasse
  2013-03-12  4:23         ` Hillf Danton
@ 2013-03-12 20:47         ` Andrew Morton
  1 sibling, 0 replies; 40+ messages in thread
From: Andrew Morton @ 2013-03-12 20:47 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Tommi Rantala, Andy Lutomirski, Ingo Molnar, Al Viro,
	Hugh Dickins, Jorn_Engel, Rik van Riel, linux-mm@kvack.org, LKML,
	Dave Jones

On Mon, 11 Mar 2013 17:24:29 -0700 Michel Lespinasse <walken@google.com> wrote:
> > --- a/mm/fremap.c~mm-fremapc-fix-oops-on-error-path
> > +++ a/mm/fremap.c
> > @@ -163,7 +163,8 @@ SYSCALL_DEFINE5(remap_file_pages, unsign
> >          * and that the remapped range is valid and fully within
> >          * the single existing vma.
> >          */
> > -       if (!vma || !(vma->vm_flags & VM_SHARED))
> > +       vm_flags = vma->vm_flags;
> > +       if (!vma || !(vm_flags & VM_SHARED))
> >                 goto out;
> 
> Your commit message indicates the vm_flags load here doesn't generate any code, but this seems very brittle and compiler dependent. If the compiler was to generate an actual load here, the issue with vma == NULL would reappear.

I didn't try very hard.  I have a surprisingly strong dislike of adding
"= 0" everywhere just to squish warnings.

There are actually quite a lot of places where this function could use
s/vma->vm_flags/vm_flags/ and might save a bit of code as a result. 
But the function's pretty straggly and I stopped doing it.

> 
> >         if (!vma->vm_ops || !vma->vm_ops->remap_pages)
> > @@ -254,7 +255,8 @@ get_write_lock:
> >          */
> >
> >  out:
> > -       vm_flags = vma->vm_flags;
> > +       if (vma)
> > +               vm_flags = vma->vm_flags;
> >         if (likely(!has_write_lock))
> >                 up_read(&mm->mmap_sem);
> >         else
> 
> 
> 
> Would the following work ? I think it's simpler, and with the compiler
> I'm using here it doesn't emit warnings:
> 
> diff --git a/mm/fremap.c b/mm/fremap.c
> index 0cd4c11488ed..329507e832fb 100644
> --- a/mm/fremap.c
> +++ b/mm/fremap.c
> @@ -254,7 +254,8 @@ get_write_lock:
>  	 */
>  
>  out:
> -	vm_flags = vma->vm_flags;
> +	if (!err)
> +		vm_flags = vma->vm_flags;
>  	if (likely(!has_write_lock))
>  		up_read(&mm->mmap_sem);
>  	else

Yes, this will work.

gcc-4.4.4 does generate the warning with this.

Testing `err' was my v1, but it is not obvious that err==0 always
correlates with vma!= NULL.  This is true (I checked), and it had
better be true in the future, but it just feels safer and simpler to
test `vma' directly.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2013-03-12 20:47 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-21  0:49 [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held Michel Lespinasse
2012-12-21  0:49 ` [PATCH 1/9] mm: make mlockall preserve flags other than VM_LOCKED in def_flags Michel Lespinasse
2012-12-22  4:25   ` Rik van Riel
2012-12-21  0:49 ` [PATCH 2/9] mm: remap_file_pages() fixes Michel Lespinasse
2013-01-03  0:31   ` Rik van Riel
2012-12-21  0:49 ` [PATCH 3/9] mm: introduce mm_populate() for populating new vmas Michel Lespinasse
2013-01-03  2:14   ` Rik van Riel
2012-12-21  0:49 ` [PATCH 4/9] mm: use mm_populate() for blocking remap_file_pages() Michel Lespinasse
2013-01-03  2:25   ` Rik van Riel
2013-03-10 18:55   ` Tommi Rantala
2013-03-11 23:03     ` Andrew Morton
2013-03-12  0:24       ` Michel Lespinasse
2013-03-12  4:23         ` Hillf Danton
2013-03-12  5:01           ` Michel Lespinasse
2013-03-12 20:47         ` Andrew Morton
2012-12-21  0:49 ` [PATCH 5/9] mm: use mm_populate() when adjusting brk with MCL_FUTURE in effect Michel Lespinasse
2013-01-03  2:56   ` Rik van Riel
2012-12-21  0:49 ` [PATCH 6/9] mm: use mm_populate() for mremap() of VM_LOCKED vmas Michel Lespinasse
2013-01-03  5:47   ` Rik van Riel
2012-12-21  0:49 ` [PATCH 7/9] mm: remove flags argument to mmap_region Michel Lespinasse
2013-01-03  5:49   ` Rik van Riel
2012-12-21  0:49 ` [PATCH 8/9] mm: directly use __mlock_vma_pages_range() in find_extend_vma() Michel Lespinasse
2013-01-03  5:50   ` Rik van Riel
2012-12-21  0:49 ` [PATCH 9/9] mm: introduce VM_POPULATE flag to better deal with racy userspace programs Michel Lespinasse
2013-01-03  6:20   ` Rik van Riel
2012-12-21 10:46 ` [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held Michel Lespinasse
2012-12-22 15:02   ` Greg Ungerer
2013-01-23 13:37   ` Greg Ungerer
2012-12-22  0:36 ` Andy Lutomirski
2012-12-22  0:59   ` Michel Lespinasse
2012-12-22  1:09     ` Andy Lutomirski
2012-12-22  1:59       ` Michel Lespinasse
2012-12-22  2:16         ` Andy Lutomirski
2012-12-22  9:37           ` Michel Lespinasse
2012-12-22  9:45             ` [PATCH 10/9] mm: make do_mmap_pgoff return populate as a size in bytes, not as a bool Michel Lespinasse
2013-01-03  6:21               ` Rik van Riel
2013-01-03 18:54               ` Andy Lutomirski
2013-01-03 18:56             ` [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held Andy Lutomirski
2013-01-04 18:16 ` Andy Lutomirski
2013-01-04 22:58   ` Michel Lespinasse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).