linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
       [not found]                 ` <0000013ed28b638a-066d7dc7-b590-49f8-9423-badb9537b8b6-000000@email.amazonses.com>
@ 2013-05-24 14:01                   ` Peter Zijlstra
  2013-05-24 15:40                     ` Christoph Lameter
  2013-05-30 21:00                     ` KOSAKI Motohiro
  0 siblings, 2 replies; 14+ messages in thread
From: Peter Zijlstra @ 2013-05-24 14:01 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Al Viro, Vince Weaver, linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, trinity, akpm, torvalds, roland,
	infinipath, linux-mm

Subject: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK

Patch bc3e53f682 ("mm: distinguish between mlocked and pinned pages")
broke RLIMIT_MEMLOCK.

The primary purpose of mm_struct::locked_vm was to act as resource
counter against RLIMIT_MEMLOCK. By splitting it the semantics got
greatly fuddled.

The 'bug' fixed was when you would mlock() IB pinned memory, in that
case the pages would be counted twice in mm_struct::locked_vm. The only
reason someone would actually do this is with mlockall(), which
basically requires you've disabled RLIMIT_MEMLOCK.

However by splitting the counter into two counters, RLIMIT_MEMLOCK has
no single resource counter; instead the patch makes it so that both
locked and pinned pages are below RLIMIT_MEMLOCK, effectively doubling
RLIMIT_MEMLOCK.

I claim that the patch made a bad situation worse by shifting the bug
from an unlikely corner case to a far more likely scenario.

This patch proposes to properly fix the problem by introducing
VM_PINNED. This also provides the groundwork for a possible mpin()
syscall or MADV_PIN -- although these are not included.

It recognises that pinned page semantics are a strict super-set of
locked page semantics -- a pinned page will not generate major faults
(and thus satisfies mlock() requirements).

The patch has some rough edges, primarily IB doesn't compile for I got a
little lost trying to find the process address for a number of
*_release_user_pages() calls which are now required.

Furthermore, vma expansion (stack) vs VM_PINNED is 'broken' in that I
don't think it will behave properly if you PIN the bottom (or top if
your arch is so inclined) stack page.

If people find this approach unworkable, I request we revert the above
mentioned patch to at least restore RLIMIT_MEMLOCK to a usable state
again.

Not-signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
vers/infiniband/core/umem.c                 | 41 ++++++------
 drivers/infiniband/hw/ipath/ipath_file_ops.c   |  2 +-
 drivers/infiniband/hw/ipath/ipath_kernel.h     |  4 +-
 drivers/infiniband/hw/ipath/ipath_user_pages.c | 29 ++++++---
 drivers/infiniband/hw/qib/qib.h                |  2 +-
 drivers/infiniband/hw/qib/qib_file_ops.c       |  2 +-
 drivers/infiniband/hw/qib/qib_user_pages.c     | 18 ++++--
 include/linux/mm.h                             |  3 +
 include/linux/mm_types.h                       |  5 ++
 include/linux/perf_event.h                     |  1 -
 include/rdma/ib_umem.h                         |  3 +-
 kernel/events/core.c                           | 17 +++--
 kernel/fork.c                                  |  2 +
 mm/mlock.c                                     | 87 +++++++++++++++++++++-----
 mm/mmap.c                                      | 16 +++--
 15 files changed, 167 insertions(+), 65 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index a841123..f8f47dc 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -137,17 +137,22 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 
 	down_write(&current->mm->mmap_sem);
 
-	locked     = npages + current->mm->pinned_vm;
+	locked     = npages + mm_locked_pages(current->mm);
 	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 
 	if ((locked > lock_limit) && !capable(CAP_IPC_LOCK)) {
 		ret = -ENOMEM;
-		goto out;
+		goto err;
 	}
 
 	cur_base = addr & PAGE_MASK;
+	umem->start_addr = cur_base;
+	umem->end_addr   = cur_base + npages;
+
+	ret = mm_mpin(umem->start_addr, umem->end_addr);
+	if (ret)
+		goto err;
 
-	ret = 0;
 	while (npages) {
 		ret = get_user_pages(current, current->mm, cur_base,
 				     min_t(unsigned long, npages,
@@ -155,7 +160,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 				     1, !umem->writable, page_list, vma_list);
 
 		if (ret < 0)
-			goto out;
+			goto err_unpin;
 
 		cur_base += ret * PAGE_SIZE;
 		npages   -= ret;
@@ -168,7 +173,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 					GFP_KERNEL);
 			if (!chunk) {
 				ret = -ENOMEM;
-				goto out;
+				goto err_unpin;
 			}
 
 			chunk->nents = min_t(int, ret, IB_UMEM_MAX_PAGE_CHUNK);
@@ -191,7 +196,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 				kfree(chunk);
 
 				ret = -ENOMEM;
-				goto out;
+				goto err_unpin;
 			}
 
 			ret -= chunk->nents;
@@ -202,19 +207,23 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 		ret = 0;
 	}
 
-out:
-	if (ret < 0) {
-		__ib_umem_release(context->device, umem, 0);
-		kfree(umem);
-	} else
-		current->mm->pinned_vm = locked;
+	if (ret < 0)
+		goto err_unpin;
 
+unlock:
 	up_write(&current->mm->mmap_sem);
 	if (vma_list)
 		free_page((unsigned long) vma_list);
 	free_page((unsigned long) page_list);
 
 	return ret < 0 ? ERR_PTR(ret) : umem;
+
+err_unpin:
+	mm_munpin(umem->start_addr, umem->end_addr);
+err:
+	__ib_umem_release(context->device, umem, 0);
+	kfree(umem);
+	goto unlock;
 }
 EXPORT_SYMBOL(ib_umem_get);
 
@@ -223,7 +232,7 @@ static void ib_umem_account(struct work_struct *work)
 	struct ib_umem *umem = container_of(work, struct ib_umem, work);
 
 	down_write(&umem->mm->mmap_sem);
-	umem->mm->pinned_vm -= umem->diff;
+	mm_munpin(umem->start_addr, umem->end_addr);
 	up_write(&umem->mm->mmap_sem);
 	mmput(umem->mm);
 	kfree(umem);
@@ -237,7 +246,6 @@ void ib_umem_release(struct ib_umem *umem)
 {
 	struct ib_ucontext *context = umem->context;
 	struct mm_struct *mm;
-	unsigned long diff;
 
 	__ib_umem_release(umem->context->device, umem, 1);
 
@@ -247,8 +255,6 @@ void ib_umem_release(struct ib_umem *umem)
 		return;
 	}
 
-	diff = PAGE_ALIGN(umem->length + umem->offset) >> PAGE_SHIFT;
-
 	/*
 	 * We may be called with the mm's mmap_sem already held.  This
 	 * can happen when a userspace munmap() is the call that drops
@@ -261,7 +267,6 @@ void ib_umem_release(struct ib_umem *umem)
 		if (!down_write_trylock(&mm->mmap_sem)) {
 			INIT_WORK(&umem->work, ib_umem_account);
 			umem->mm   = mm;
-			umem->diff = diff;
 
 			queue_work(ib_wq, &umem->work);
 			return;
@@ -269,7 +274,7 @@ void ib_umem_release(struct ib_umem *umem)
 	} else
 		down_write(&mm->mmap_sem);
 
-	current->mm->pinned_vm -= diff;
+	mm_munpin(umem->start_addr, umem->end_addr);
 	up_write(&mm->mmap_sem);
 	mmput(mm);
 	kfree(umem);
diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c
index 6d7f453..16a0ad2 100644
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c
@@ -456,7 +456,7 @@ static int ipath_tid_update(struct ipath_portdata *pd, struct file *fp,
 				ipath_stats.sps_pageunlocks++;
 			}
 		}
-		ipath_release_user_pages(pagep, cnt);
+		ipath_release_user_pages(pagep, ti->tidvaddr, cnt);
 	} else {
 		/*
 		 * Copy the updated array, with ipath_tid's filled in, back
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index 6559af6..448bc97 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -1082,8 +1082,8 @@ static inline void ipath_sdma_desc_unreserve(struct ipath_devdata *dd, u16 cnt)
 #define IPATH_DFLT_RCVHDRSIZE 9
 
 int ipath_get_user_pages(unsigned long, size_t, struct page **);
-void ipath_release_user_pages(struct page **, size_t);
-void ipath_release_user_pages_on_close(struct page **, size_t);
+void ipath_release_user_pages(struct page **, unsigned long, size_t);
+void ipath_release_user_pages_on_close(struct page **, unsigned long, size_t);
 int ipath_eeprom_read(struct ipath_devdata *, u8, void *, int);
 int ipath_eeprom_write(struct ipath_devdata *, u8, const void *, int);
 int ipath_tempsense_read(struct ipath_devdata *, u8 regnum);
diff --git a/drivers/infiniband/hw/ipath/ipath_user_pages.c b/drivers/infiniband/hw/ipath/ipath_user_pages.c
index dc66c45..e368614 100644
--- a/drivers/infiniband/hw/ipath/ipath_user_pages.c
+++ b/drivers/infiniband/hw/ipath/ipath_user_pages.c
@@ -39,7 +39,7 @@
 #include "ipath_kernel.h"
 
 static void __ipath_release_user_pages(struct page **p, size_t num_pages,
-				   int dirty)
+				       int dirty)
 {
 	size_t i;
 
@@ -57,12 +57,15 @@ static int __ipath_get_user_pages(unsigned long start_page, size_t num_pages,
 				  struct page **p, struct vm_area_struct **vma)
 {
 	unsigned long lock_limit;
+	unsigned long locked;
 	size_t got;
 	int ret;
 
+	locked = mm_locked_pages(current->mm);
+	locked += num_pages;
 	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 
-	if (num_pages > lock_limit) {
+	if (locked > lock_limit) {
 		ret = -ENOMEM;
 		goto bail;
 	}
@@ -70,6 +73,10 @@ static int __ipath_get_user_pages(unsigned long start_page, size_t num_pages,
 	ipath_cdbg(VERBOSE, "pin %lx pages from vaddr %lx\n",
 		   (unsigned long) num_pages, start_page);
 
+	ret = mm_mpin(start_page, start_page + num_pages);
+	if (ret)
+		goto bail;
+
 	for (got = 0; got < num_pages; got += ret) {
 		ret = get_user_pages(current, current->mm,
 				     start_page + got * PAGE_SIZE,
@@ -78,14 +85,12 @@ static int __ipath_get_user_pages(unsigned long start_page, size_t num_pages,
 		if (ret < 0)
 			goto bail_release;
 	}
-
-	current->mm->pinned_vm += num_pages;
-
 	ret = 0;
 	goto bail;
 
 bail_release:
 	__ipath_release_user_pages(p, got, 0);
+	mm_munpin(start_page, start_page + num_pages);
 bail:
 	return ret;
 }
@@ -172,13 +177,13 @@ int ipath_get_user_pages(unsigned long start_page, size_t num_pages,
 	return ret;
 }
 
-void ipath_release_user_pages(struct page **p, size_t num_pages)
+void ipath_release_user_pages(struct page **p, unsigned long start_page,
+			      size_t num_pages)
 {
 	down_write(&current->mm->mmap_sem);
 
 	__ipath_release_user_pages(p, num_pages, 1);
-
-	current->mm->pinned_vm -= num_pages;
+	mm_munpin(start_page, start_page + num_pages);
 
 	up_write(&current->mm->mmap_sem);
 }
@@ -186,6 +191,7 @@ void ipath_release_user_pages(struct page **p, size_t num_pages)
 struct ipath_user_pages_work {
 	struct work_struct work;
 	struct mm_struct *mm;
+	unsigned long start_page;
 	unsigned long num_pages;
 };
 
@@ -195,13 +201,15 @@ static void user_pages_account(struct work_struct *_work)
 		container_of(_work, struct ipath_user_pages_work, work);
 
 	down_write(&work->mm->mmap_sem);
-	work->mm->pinned_vm -= work->num_pages;
+	mm_munpin(work->start_page, work->start_page + work->num_pages);
 	up_write(&work->mm->mmap_sem);
 	mmput(work->mm);
 	kfree(work);
 }
 
-void ipath_release_user_pages_on_close(struct page **p, size_t num_pages)
+void ipath_release_user_pages_on_close(struct page **p, 
+				       unsigned long start_page,
+				       size_t num_pages)
 {
 	struct ipath_user_pages_work *work;
 	struct mm_struct *mm;
@@ -218,6 +226,7 @@ void ipath_release_user_pages_on_close(struct page **p, size_t num_pages)
 
 	INIT_WORK(&work->work, user_pages_account);
 	work->mm = mm;
+	work->start_page = start_page;
 	work->num_pages = num_pages;
 
 	queue_work(ib_wq, &work->work);
diff --git a/drivers/infiniband/hw/qib/qib.h b/drivers/infiniband/hw/qib/qib.h
index 4d11575..a6213e2 100644
--- a/drivers/infiniband/hw/qib/qib.h
+++ b/drivers/infiniband/hw/qib/qib.h
@@ -1343,7 +1343,7 @@ void qib_sdma_process_event(struct qib_pportdata *, enum qib_sdma_events);
 #define QIB_RCVHDR_ENTSIZE 32
 
 int qib_get_user_pages(unsigned long, size_t, struct page **);
-void qib_release_user_pages(struct page **, size_t);
+void qib_release_user_pages(struct page **, unsigned long, size_t);
 int qib_eeprom_read(struct qib_devdata *, u8, void *, int);
 int qib_eeprom_write(struct qib_devdata *, u8, const void *, int);
 u32 __iomem *qib_getsendbuf_range(struct qib_devdata *, u32 *, u32, u32);
diff --git a/drivers/infiniband/hw/qib/qib_file_ops.c b/drivers/infiniband/hw/qib/qib_file_ops.c
index b56c942..ceff69f 100644
--- a/drivers/infiniband/hw/qib/qib_file_ops.c
+++ b/drivers/infiniband/hw/qib/qib_file_ops.c
@@ -423,7 +423,7 @@ static int qib_tid_update(struct qib_ctxtdata *rcd, struct file *fp,
 				dd->pageshadow[ctxttid + tid] = NULL;
 			}
 		}
-		qib_release_user_pages(pagep, cnt);
+		qib_release_user_pages(pagep, ti->tidvaddr, cnt);
 	} else {
 		/*
 		 * Copy the updated array, with qib_tid's filled in, back
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
index 2bc1d2b..e7b0e0d 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -55,16 +55,24 @@ static int __qib_get_user_pages(unsigned long start_page, size_t num_pages,
 				struct page **p, struct vm_area_struct **vma)
 {
 	unsigned long lock_limit;
+	unsigned long locked;
 	size_t got;
 	int ret;
 
+	locked = mm_locked_pages(current->mm);
+	locked += num_pages;
+
 	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 
-	if (num_pages > lock_limit && !capable(CAP_IPC_LOCK)) {
+	if (locked > lock_limit && !capable(CAP_IPC_LOCK)) {
 		ret = -ENOMEM;
 		goto bail;
 	}
 
+	ret = mm_mpin(start_page, start_page + num_pages);
+	if (ret)
+		goto bail;
+
 	for (got = 0; got < num_pages; got += ret) {
 		ret = get_user_pages(current, current->mm,
 				     start_page + got * PAGE_SIZE,
@@ -74,13 +82,12 @@ static int __qib_get_user_pages(unsigned long start_page, size_t num_pages,
 			goto bail_release;
 	}
 
-	current->mm->pinned_vm += num_pages;
-
 	ret = 0;
 	goto bail;
 
 bail_release:
 	__qib_release_user_pages(p, got, 0);
+	mm_munpin(start_page, star_page + num_pages);
 bail:
 	return ret;
 }
@@ -143,15 +150,16 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages,
 	return ret;
 }
 
-void qib_release_user_pages(struct page **p, size_t num_pages)
+void qib_release_user_pages(struct page **p, unsigned long start_page, size_t num_pages)
 {
 	if (current->mm) /* during close after signal, mm can be NULL */
 		down_write(&current->mm->mmap_sem);
 
 	__qib_release_user_pages(p, num_pages, 1);
 
+
 	if (current->mm) {
-		current->mm->pinned_vm -= num_pages;
+		mm_munpin(start_page, start_page + num_pages);
 		up_write(&current->mm->mmap_sem);
 	}
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e0c8528..de10e5b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -90,6 +90,7 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_PFNMAP	0x00000400	/* Page-ranges managed without "struct page", just pure PFN */
 #define VM_DENYWRITE	0x00000800	/* ETXTBSY on write attempts.. */
 
+#define VM_PINNED	0x00001000
 #define VM_LOCKED	0x00002000
 #define VM_IO           0x00004000	/* Memory mapped I/O or similar */
 
@@ -1526,6 +1527,8 @@ static inline void mm_populate(unsigned long addr, unsigned long len)
 	/* Ignore errors */
 	(void) __mm_populate(addr, len, 1);
 }
+extern int mm_mpin(unsigned long start, unsigned long end);
+extern int mm_unpin(unsigned long start, unsigned long end);
 #else
 static inline void mm_populate(unsigned long addr, unsigned long len) {}
 #endif
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index ace9a5f..3e3475b 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -456,4 +456,9 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
 	return mm->cpu_vm_mask_var;
 }
 
+static inline unsigned long mm_locked_pages(struct mm_struct *mm)
+{
+	return mm->pinned_vm + mm->locked_vm;
+}
+
 #endif /* _LINUX_MM_TYPES_H */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index f463a46..1003c71 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -389,7 +389,6 @@ struct perf_event {
 	/* mmap bits */
 	struct mutex			mmap_mutex;
 	atomic_t			mmap_count;
-	int				mmap_locked;
 	struct user_struct		*mmap_user;
 	struct ring_buffer		*rb;
 	struct list_head		rb_entry;
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 9ee0d2e..4f3c802 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -41,6 +41,8 @@ struct ib_ucontext;
 
 struct ib_umem {
 	struct ib_ucontext     *context;
+	unsigned long		start_addr;
+	unsigned long		nr_pages;
 	size_t			length;
 	int			offset;
 	int			page_size;
@@ -49,7 +51,6 @@ struct ib_umem {
 	struct list_head	chunk_list;
 	struct work_struct	work;
 	struct mm_struct       *mm;
-	unsigned long		diff;
 };
 
 struct ib_umem_chunk {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9dc297f..ceb8905 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3617,7 +3617,6 @@ static void perf_mmap_close(struct vm_area_struct *vma)
 		struct ring_buffer *rb = event->rb;
 
 		atomic_long_sub((size >> PAGE_SHIFT) + 1, &user->locked_vm);
-		vma->vm_mm->pinned_vm -= event->mmap_locked;
 		rcu_assign_pointer(event->rb, NULL);
 		ring_buffer_detach(event, rb);
 		mutex_unlock(&event->mmap_mutex);
@@ -3699,7 +3698,7 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
 
 	lock_limit = rlimit(RLIMIT_MEMLOCK);
 	lock_limit >>= PAGE_SHIFT;
-	locked = vma->vm_mm->pinned_vm + extra;
+	locked = mm_locked_pages(vma->vm_mm) + extra;
 
 	if ((locked > lock_limit) && perf_paranoid_tracepoint_raw() &&
 		!capable(CAP_IPC_LOCK)) {
@@ -3723,9 +3722,8 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
 	rcu_assign_pointer(event->rb, rb);
 
 	atomic_long_add(user_extra, &user->locked_vm);
-	event->mmap_locked = extra;
 	event->mmap_user = get_current_user();
-	vma->vm_mm->pinned_vm += event->mmap_locked;
+	vma->vm_mm->pinned_vm += extra;
 
 	perf_event_update_userpage(event);
 
@@ -3734,7 +3732,16 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
 		atomic_inc(&event->mmap_count);
 	mutex_unlock(&event->mmap_mutex);
 
-	vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+	/*
+	 * We're a vma of pinned pages; add special to avoid MADV_DODUMP and
+	 * mlock(). mlock()+munlock() could accidentally put these pages onto
+	 * the LRU, we wouldn't want that; also it implies VM_DONTEXPAND
+	 * which we need as well.
+	 *
+	 * VM_PINNED will make munmap automagically subtract
+	 * mm_struct::pinned_vm.
+	 */
+	vma->vm_flags |= VM_PINNED | VM_SPECIAL | VM_DONTDUMP;
 	vma->vm_ops = &perf_mmap_vmops;
 
 	return ret;
diff --git a/kernel/fork.c b/kernel/fork.c
index 987b28a..815f196 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -411,6 +411,8 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
 		if (anon_vma_fork(tmp, mpnt))
 			goto fail_nomem_anon_vma_fork;
 		tmp->vm_flags &= ~VM_LOCKED;
+		if (tmp->vm_flags & VM_PINNED)
+			mm->pinned_vm += vma_pages(tmp);
 		tmp->vm_next = tmp->vm_prev = NULL;
 		file = tmp->vm_file;
 		if (file) {
diff --git a/mm/mlock.c b/mm/mlock.c
index 79b7cf7..335c705 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -277,9 +277,8 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
 {
 	struct mm_struct *mm = vma->vm_mm;
 	pgoff_t pgoff;
-	int nr_pages;
+	int nr_pages, nr_locked, nr_pinned;
 	int ret = 0;
-	int lock = !!(newflags & VM_LOCKED);
 
 	if (newflags == vma->vm_flags || (vma->vm_flags & VM_SPECIAL) ||
 	    is_vm_hugetlb_page(vma) || vma == get_gate_vma(current->mm))
@@ -310,9 +309,49 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
 	 * Keep track of amount of locked VM.
 	 */
 	nr_pages = (end - start) >> PAGE_SHIFT;
-	if (!lock)
-		nr_pages = -nr_pages;
-	mm->locked_vm += nr_pages;
+
+	/*
+	 * We should only account pages once, if VM_PINNED is set pages are
+	 * accounted in mm_struct::pinned_vm, otherwise if VM_LOCKED is set,
+	 * we'll account them in mm_struct::locked_vm.
+	 *
+	 * PL  := vma->vm_flags
+	 * PL' := newflags
+	 * PLd := {pinned,locked}_vm delta
+	 *
+	 * PL->PL' PLd
+	 * -----------
+	 * 00  01  0+
+	 * 00  10  +0
+	 * 01  11  +-
+	 * 01  00  0-
+	 * 10  00  -0
+	 * 10  11  00
+	 * 11  01  -+
+	 * 11  10  00
+	 */
+
+	nr_pinned = nr_locked = 0;
+
+	if ((vma->vm_flags ^ newflags) & VM_PINNED) {
+		if (vma->vm_flags & VM_PINNED)
+			nr_pinned = -nr_pages;
+		else
+			nr_pinned = nr_pages;
+	}
+
+	if (vma->vm_flags & VM_PINNED) {
+		if ((newflags & (VM_PINNED|VM_LOCKED)) == VM_LOCKED)
+			nr_locked = nr_pages;
+	} else {
+		if (vma->vm_flags & VM_LOCKED)
+			nr_locked = -nr_pages;
+		else if (newflags & VM_LOCKED)
+			nr_locked = nr_pages;
+	}
+
+	mm->pinned_vm += nr_pinned;
+	mm->locked_vm += nr_locked;
 
 	/*
 	 * vm_flags is protected by the mmap_sem held in write mode.
@@ -320,7 +359,7 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
 	 * set VM_LOCKED, __mlock_vma_pages_range will bring it back.
 	 */
 
-	if (lock)
+	if (((vma->vm_flags ^ newflags) & VM_PINNED) || (newflags & VM_LOCKED))
 		vma->vm_flags = newflags;
 	else
 		munlock_vma_pages_range(vma, start, end);
@@ -330,7 +369,10 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
 	return ret;
 }
 
-static int do_mlock(unsigned long start, size_t len, int on)
+#define MLOCK_F_ON	0x01
+#define MLOCK_F_PIN	0x02
+
+static int do_mlock(unsigned long start, size_t len, unsigned int flags)
 {
 	unsigned long nstart, end, tmp;
 	struct vm_area_struct * vma, * prev;
@@ -352,13 +394,18 @@ static int do_mlock(unsigned long start, size_t len, int on)
 		prev = vma;
 
 	for (nstart = start ; ; ) {
-		vm_flags_t newflags;
+		vm_flags_t newflags = vma->vm_flags;
+		vm_flags_t flag = VM_LOCKED;
 
-		/* Here we know that  vma->vm_start <= nstart < vma->vm_end. */
+		if (flags & MLOCK_F_PIN)
+			flag = VM_PINNED;
 
-		newflags = vma->vm_flags & ~VM_LOCKED;
-		if (on)
-			newflags |= VM_LOCKED;
+		if (flags & MLOCK_F_ON)
+			newflags |= flag;
+		else
+			newflags &= ~flag;
+
+		/* Here we know that  vma->vm_start <= nstart < vma->vm_end. */
 
 		tmp = vma->vm_end;
 		if (tmp > end)
@@ -381,6 +428,18 @@ static int do_mlock(unsigned long start, size_t len, int on)
 	return error;
 }
 
+int mm_mpin(unsigned long start, unsigned long end)
+{
+	return do_mlock(start, end, MLOCK_F_ON | MLOCK_F_PIN);
+}
+EXPORT_SYMBOL_GPL(mm_mpin);
+
+int mm_munpin(unsigned long start, unsigned long end)
+{
+	return do_mlock(start, end, MLOCK_F_PIN);
+}
+EXPORT_SYMBOL_GPL(mm_munpin);
+
 /*
  * __mm_populate - populate and/or mlock pages within a range of address space.
  *
@@ -460,14 +519,14 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
 	start &= PAGE_MASK;
 
 	locked = len >> PAGE_SHIFT;
-	locked += current->mm->locked_vm;
+	locked += mm_locked_pages(current->mm);
 
 	lock_limit = rlimit(RLIMIT_MEMLOCK);
 	lock_limit >>= PAGE_SHIFT;
 
 	/* check against resource limits */
 	if ((locked <= lock_limit) || capable(CAP_IPC_LOCK))
-		error = do_mlock(start, len, 1);
+		error = do_mlock(start, len, MLOCK_F_ON);
 	up_write(&current->mm->mmap_sem);
 	if (!error)
 		error = __mm_populate(start, len, 0);
diff --git a/mm/mmap.c b/mm/mmap.c
index f681e18..5cd10c4 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1258,7 +1258,7 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
 	if (vm_flags & VM_LOCKED) {
 		unsigned long locked, lock_limit;
 		locked = len >> PAGE_SHIFT;
-		locked += mm->locked_vm;
+		locked += mm_locked_pages(mm);
 		lock_limit = rlimit(RLIMIT_MEMLOCK);
 		lock_limit >>= PAGE_SHIFT;
 		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
@@ -2070,7 +2070,7 @@ static int acct_stack_growth(struct vm_area_struct *vma, unsigned long size, uns
 	if (vma->vm_flags & VM_LOCKED) {
 		unsigned long locked;
 		unsigned long limit;
-		locked = mm->locked_vm + grow;
+		locked = mm_locked_pages(mm) + grow;
 		limit = ACCESS_ONCE(rlim[RLIMIT_MEMLOCK].rlim_cur);
 		limit >>= PAGE_SHIFT;
 		if (locked > limit && !capable(CAP_IPC_LOCK))
@@ -2547,13 +2547,17 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len)
 	/*
 	 * unlock any mlock()ed ranges before detaching vmas
 	 */
-	if (mm->locked_vm) {
+	if (mm->locked_vm || mm->pinned_vm) {
 		struct vm_area_struct *tmp = vma;
 		while (tmp && tmp->vm_start < end) {
-			if (tmp->vm_flags & VM_LOCKED) {
+			if (tmp->vm_flags & VM_PINNED)
+				mm->pinned_vm -= vma_pages(tmp);
+			else if (tmp->vm_flags & VM_LOCKED)
 				mm->locked_vm -= vma_pages(tmp);
+
+			if (tmp->vm_flags & VM_LOCKED)
 				munlock_vma_pages_all(tmp);
-			}
+
 			tmp = tmp->vm_next;
 		}
 	}
@@ -2628,7 +2632,7 @@ static unsigned long do_brk(unsigned long addr, unsigned long len)
 	if (mm->def_flags & VM_LOCKED) {
 		unsigned long locked, lock_limit;
 		locked = len >> PAGE_SHIFT;
-		locked += mm->locked_vm;
+		locked += mm_locked_pages(mm);
 		lock_limit = rlimit(RLIMIT_MEMLOCK);
 		lock_limit >>= PAGE_SHIFT;
 		if (locked > lock_limit && !capable(CAP_IPC_LOCK))

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
  2013-05-24 14:01                   ` [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK Peter Zijlstra
@ 2013-05-24 15:40                     ` Christoph Lameter
  2013-05-26  1:11                       ` KOSAKI Motohiro
  2013-05-27  6:48                       ` Peter Zijlstra
  2013-05-30 21:00                     ` KOSAKI Motohiro
  1 sibling, 2 replies; 14+ messages in thread
From: Christoph Lameter @ 2013-05-24 15:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Al Viro, Vince Weaver, linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, trinity, akpm, torvalds, roland,
	infinipath, linux-mm, linux-rdma, Or Gerlitz

On Fri, 24 May 2013, Peter Zijlstra wrote:

> Patch bc3e53f682 ("mm: distinguish between mlocked and pinned pages")
> broke RLIMIT_MEMLOCK.

Nope the patch fixed a problem with double accounting.

The problem that we seem to have is to define what mlocked and pinned mean
and how this relates to RLIMIT_MEMLOCK.

mlocked pages are pages that are movable (not pinned!!!) and that are
marked in some way by user space actions as mlocked (POSIX semantics).
They are marked with a special page flag (PG_mlocked).

Pinned pages are pages that have an elevated refcount because the hardware
needs to use these pages for I/O. The elevated refcount may be temporary
(then we dont care about this) or for a longer time (such as the memory
registration of the IB subsystem). That is when we account the memory as
pinned. The elevated refcount stops page migration and other things from
trying to move that memory.

Pages can be both pinned and mlocked. Before my patch some pages those two
issues were conflated since the same counter was used and therefore these
pages were counted twice. If an RDMA application was running using
mlockall() and was performing large scale I/O then the counters could show
extraordinary large numbers and the VM would start to behave erratically.

It is important for the VM to know which pages cannot be evicted but that
involves many more pages due to dirty pages etc etc.

So far the assumption has been that RLIMIT_MEMLOCK is a limit on the pages
that userspace has mlocked.

You want the counter to mean something different it seems. What is it?

I think we need to be first clear on what we want to accomplish and what
these counters actually should count before changing things.

Certainly would appreciate improvements in this area but resurrecting the
conflation between mlocked and pinned pages is not the way to go.

> This patch proposes to properly fix the problem by introducing
> VM_PINNED. This also provides the groundwork for a possible mpin()
> syscall or MADV_PIN -- although these are not included.

Maybe add a new PIN page flag? Pages are not pinned per vma as the patch
seems to assume.

> It recognises that pinned page semantics are a strict super-set of
> locked page semantics -- a pinned page will not generate major faults
> (and thus satisfies mlock() requirements).

Not exactly true. Pinned pages may not have the mlocked flag set and they
are not managed on the unevictable LRU lists of the MM.

> If people find this approach unworkable, I request we revert the above
> mentioned patch to at least restore RLIMIT_MEMLOCK to a usable state
> again.

Cannot do that. This will cause the breakage that the patch was fixing to
resurface.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
  2013-05-24 15:40                     ` Christoph Lameter
@ 2013-05-26  1:11                       ` KOSAKI Motohiro
  2013-05-28 16:19                         ` Christoph Lameter
  2013-05-27  6:48                       ` Peter Zijlstra
  1 sibling, 1 reply; 14+ messages in thread
From: KOSAKI Motohiro @ 2013-05-26  1:11 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Peter Zijlstra, Al Viro, Vince Weaver, LKML, Paul Mackerras,
	Ingo Molnar, Arnaldo Carvalho de Melo, trinity, Andrew Morton,
	Linus Torvalds, Roland Dreier, infinipath, linux-mm@kvack.org,
	linux-rdma, Or Gerlitz

On Fri, May 24, 2013 at 11:40 AM, Christoph Lameter <cl@linux.com> wrote:
> On Fri, 24 May 2013, Peter Zijlstra wrote:
>
>> Patch bc3e53f682 ("mm: distinguish between mlocked and pinned pages")
>> broke RLIMIT_MEMLOCK.
>
> Nope the patch fixed a problem with double accounting.
>
> The problem that we seem to have is to define what mlocked and pinned mean
> and how this relates to RLIMIT_MEMLOCK.
>
> mlocked pages are pages that are movable (not pinned!!!) and that are
> marked in some way by user space actions as mlocked (POSIX semantics).
> They are marked with a special page flag (PG_mlocked).
>
> Pinned pages are pages that have an elevated refcount because the hardware
> needs to use these pages for I/O. The elevated refcount may be temporary
> (then we dont care about this) or for a longer time (such as the memory
> registration of the IB subsystem). That is when we account the memory as
> pinned. The elevated refcount stops page migration and other things from
> trying to move that memory.
>
> Pages can be both pinned and mlocked. Before my patch some pages those two
> issues were conflated since the same counter was used and therefore these
>
pages were counted twice. If an RDMA application was running using
> mlockall() and was performing large scale I/O then the counters could show
> extraordinary large numbers and the VM would start to behave erratically.
>
> It is important for the VM to know which pages cannot be evicted but that
> involves many more pages due to dirty pages etc etc.
>
> So far the assumption has been that RLIMIT_MEMLOCK is a limit on the pages
> that userspace has mlocked.
>
> You want the counter to mean something different it seems. What is it?
>
> I think we need to be first clear on what we want to accomplish and what
> these counters actually should count before changing things.

Hm.
If pinned and mlocked are totally difference intentionally, why IB uses
RLIMIT_MEMLOCK. Why don't IB uses IB specific limit and why only IB raise up
number of pinned pages and other gup users don't.
I can't guess IB folk's intent.

And now ever IB code has duplicated RLIMIT_MEMLOCK
check and at least __ipath_get_user_pages() forget to check
capable(CAP_IPC_LOCK).
That's bad.


> Certainly would appreciate improvements in this area but resurrecting the
> conflation between mlocked and pinned pages is not the way to go.
>
>> This patch proposes to properly fix the problem by introducing
>> VM_PINNED. This also provides the groundwork for a possible mpin()
>> syscall or MADV_PIN -- although these are not included.
>
> Maybe add a new PIN page flag? Pages are not pinned per vma as the patch
> seems to assume.

Generically, you are right. But if VM_PINNED is really only for IB,
this is acceptable
limitation. They can split vma for their own purpose.

Anyway, I agree we should clearly understand the semantics of IB pinning and
the userland usage and assumption.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
  2013-05-24 15:40                     ` Christoph Lameter
  2013-05-26  1:11                       ` KOSAKI Motohiro
@ 2013-05-27  6:48                       ` Peter Zijlstra
  2013-05-28 16:37                         ` Christoph Lameter
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2013-05-27  6:48 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Al Viro, Vince Weaver, linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, trinity, akpm, torvalds, roland,
	infinipath, linux-mm, linux-rdma, Or Gerlitz

On Fri, May 24, 2013 at 03:40:26PM +0000, Christoph Lameter wrote:
> On Fri, 24 May 2013, Peter Zijlstra wrote:
> 
> > Patch bc3e53f682 ("mm: distinguish between mlocked and pinned pages")
> > broke RLIMIT_MEMLOCK.
> 
> Nope the patch fixed a problem with double accounting.

And introduces another problem by now under-counting in a much more
likely scenario.

Before your patch pinned was included in locked and thus RLIMIT_MEMLOCK
had a single resource counter. After your patch RLIMIT_MEMLOCK is
applied separately to both -- more or less.

Your patch changes RLIMIT_MEMLOCK semantics, doesn't mention this in the
Changelog which makes one think this was unintentional. And you're
refusing to acknowledge that this might be a problem.

The fact that RLIMIT_MEMLOCK is the only consumer of the counter tells
me you simply didn't seem to care and did a sloppy patch 'solving' your
immediate issue.

The below discussion of what mlock vs pinned should mean for
RLIMIT_MEMLOCK should have been part of that patch, since it changed the
previously established behaviour.

> The problem that we seem to have is to define what mlocked and pinned mean
> and how this relates to RLIMIT_MEMLOCK.
> 
> mlocked pages are pages that are movable (not pinned!!!) and that are
> marked in some way by user space actions as mlocked (POSIX semantics).
> They are marked with a special page flag (PG_mlocked).

NO, mlocked pages are pages that do not leave core memory; IOW do not
cause major faults. Pinning pages is a perfectly spec compliant mlock()
implementation.

See: http://pubs.opengroup.org/onlinepubs/7908799/xsh/mlock.html

The fact that we _want_ a weaker implementation because we like to move
them despite them being mlocked is us playing weasel words with the spec
:-) So much so that we have to introduce a means of actually pinning
them _before_ we start to actually migrate mlocked pages (or did
somebody actually change that too?).

RT people will be unhappy if they hit minor faults or have to wait
behind page-migration (esp. since the migrate code only removes the
migration PTE after all pages in the batch are migrated).

BTW, notice that mlock() is part of the REALTIME spec, and while its
wording is such that is allows us to make it unsuitable for actual
real-time usage this goes against the intent of the spec.

Now in an earlier discussion on the issue 'we' (I can't remember if you
participated there, I remember Mel and Kosaki-San) agreed that for
'normal' (read not whacky real-time people) mlock can still be useful
and we should introduce a pinned user API for the RT people.

> Pinned pages are pages that have an elevated refcount because the hardware
> needs to use these pages for I/O. The elevated refcount may be temporary
> (then we dont care about this) or for a longer time (such as the memory
> registration of the IB subsystem). That is when we account the memory as
> pinned. The elevated refcount stops page migration and other things from
> trying to move that memory.

Again I _know_ that!!!

> Pages can be both pinned and mlocked. 

Right, but apart for mlockall() this is a highly unlikely situation to
actually occur. And if you're using mlockall() you've effectively
disabled RLIMIT_MEMLOCK and thus nobody cares if the resource counter
goes funny.

> Before my patch some pages those two
> issues were conflated since the same counter was used and therefore these
> pages were counted twice. If an RDMA application was running using
> mlockall() and was performing large scale I/O then the counters could show
> extraordinary large numbers and the VM would start to behave erratically.
> 
> It is important for the VM to know which pages cannot be evicted but that
> involves many more pages due to dirty pages etc etc.

But the VM as such doesn't care, the VM doesn't use either locked_vm nor
pinned_vm for anything! The only user is RLIMIT_MEMLOCK and you silently
changed semantics.

> So far the assumption has been that RLIMIT_MEMLOCK is a limit on the pages
> that userspace has mlocked.
> 
> You want the counter to mean something different it seems. What is it?

It would've been so much better if you'd asked me that before you mucked
it up.

The intent of RLIMIT_MEMLOCK is clearly to limit the amount of memory
the user can lock into core memory as per the direct definition of
mlock. By this definition pinned should be included. This leaves us to
solve the issue of pinned mlocked memory.

> I think we need to be first clear on what we want to accomplish and what
> these counters actually should count before changing things.

Backward isn't it... _you_ changed it without consideration.

> Certainly would appreciate improvements in this area but resurrecting the
> conflation between mlocked and pinned pages is not the way to go.
> 
> > This patch proposes to properly fix the problem by introducing
> > VM_PINNED. This also provides the groundwork for a possible mpin()
> > syscall or MADV_PIN -- although these are not included.
> 
> Maybe add a new PIN page flag? Pages are not pinned per vma as the patch
> seems to assume.

The IB code does a big get_user_pages(), which last time I checked
pins a sequential range of pages. Therefore the VMA approach.

The perf thing is guaranteed to be virtually sequential and already is a
VMA -- its the result of a user mmap() call.

> > It recognises that pinned page semantics are a strict super-set of
> > locked page semantics -- a pinned page will not generate major faults
> > (and thus satisfies mlock() requirements).
> 
> Not exactly true. Pinned pages may not have the mlocked flag set and they
> are not managed on the unevictable LRU lists of the MM.

Re-read the spec; pinned pages stay in memory; therefore they qualify.
The fact that we've gone all 'smart' about dealing with such pages is
immaterial.

> > If people find this approach unworkable, I request we revert the above
> > mentioned patch to at least restore RLIMIT_MEMLOCK to a usable state
> > again.
> 
> Cannot do that. This will cause the breakage that the patch was fixing to
> resurface.

But fixes my borkage, which I argue is much more user visible since it
doesn't require CAP_IPC_LOCK :-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
  2013-05-26  1:11                       ` KOSAKI Motohiro
@ 2013-05-28 16:19                         ` Christoph Lameter
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Lameter @ 2013-05-28 16:19 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Peter Zijlstra, Al Viro, Vince Weaver, LKML, Paul Mackerras,
	Ingo Molnar, Arnaldo Carvalho de Melo, trinity, Andrew Morton,
	Linus Torvalds, Roland Dreier, infinipath, linux-mm@kvack.org,
	linux-rdma, Or Gerlitz

On Sat, 25 May 2013, KOSAKI Motohiro wrote:

> If pinned and mlocked are totally difference intentionally, why IB uses
> RLIMIT_MEMLOCK. Why don't IB uses IB specific limit and why only IB raise up
> number of pinned pages and other gup users don't.
> I can't guess IB folk's intent.

True another limit would be better. The reason that IB raises the
pinned pages is because IB permanently pins those pages. Other users of
gup do that temporarily.

If there are other users that pin pages permanently should also account
for it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
  2013-05-27  6:48                       ` Peter Zijlstra
@ 2013-05-28 16:37                         ` Christoph Lameter
  2013-05-29  7:58                           ` [regression] " Ingo Molnar
                                             ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Christoph Lameter @ 2013-05-28 16:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Al Viro, Vince Weaver, linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, trinity, akpm, torvalds, roland,
	infinipath, linux-mm, linux-rdma, Or Gerlitz, Hugh Dickins

On Mon, 27 May 2013, Peter Zijlstra wrote:

> Before your patch pinned was included in locked and thus RLIMIT_MEMLOCK
> had a single resource counter. After your patch RLIMIT_MEMLOCK is
> applied separately to both -- more or less.

Before the patch the count was doubled since a single page was counted
twice: Once because it was mlocked (marked with PG_mlock) and then again
because it was also pinned (the refcount was increased). Two different things.

We have agreed for a long time that mlocked pages are movable. That is not
true for pinned pages and therefore pinning pages therefore do not fall
into that category (Hugh? AFAICR you came up with that rule?)

> NO, mlocked pages are pages that do not leave core memory; IOW do not
> cause major faults. Pinning pages is a perfectly spec compliant mlock()
> implementation.

That is not the definition that we have used so far.

> Now in an earlier discussion on the issue 'we' (I can't remember if you
> participated there, I remember Mel and Kosaki-San) agreed that for
> 'normal' (read not whacky real-time people) mlock can still be useful
> and we should introduce a pinned user API for the RT people.

Right. I remember that.

> > Pinned pages are pages that have an elevated refcount because the hardware
> > needs to use these pages for I/O. The elevated refcount may be temporary
> > (then we dont care about this) or for a longer time (such as the memory
> > registration of the IB subsystem). That is when we account the memory as
> > pinned. The elevated refcount stops page migration and other things from
> > trying to move that memory.
>
> Again I _know_ that!!!

But then you refuse to acknowledge the difference and want to conflate
both.

> > Pages can be both pinned and mlocked.
>
> Right, but apart for mlockall() this is a highly unlikely situation to
> actually occur. And if you're using mlockall() you've effectively
> disabled RLIMIT_MEMLOCK and thus nobody cares if the resource counter
> goes funny.

mlockall() would never be used on all processes. You still need the
RLIMIT_MLOCK to ensure that the box does not lock up.

> > I think we need to be first clear on what we want to accomplish and what
> > these counters actually should count before changing things.
>
> Backward isn't it... _you_ changed it without consideration.

I applied the categorization that we had agreed on before during the
development of page migratiob. Pinning is not compatible.

> The IB code does a big get_user_pages(), which last time I checked
> pins a sequential range of pages. Therefore the VMA approach.

The IB code (and other code) can require the pinning of pages in various
ways.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [regression] Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
  2013-05-28 16:37                         ` Christoph Lameter
@ 2013-05-29  7:58                           ` Ingo Molnar
  2013-05-29 19:53                             ` KOSAKI Motohiro
  2013-05-30 18:30                           ` Peter Zijlstra
  2013-05-30 19:59                           ` Pekka Enberg
  2 siblings, 1 reply; 14+ messages in thread
From: Ingo Molnar @ 2013-05-29  7:58 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Peter Zijlstra, Al Viro, Vince Weaver, linux-kernel,
	Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo, trinity,
	akpm, torvalds, roland, infinipath, linux-mm, linux-rdma,
	Or Gerlitz, Hugh Dickins


* Christoph Lameter <cl@linux.com> wrote:

> On Mon, 27 May 2013, Peter Zijlstra wrote:
> 
> > Before your patch pinned was included in locked and thus RLIMIT_MEMLOCK
> > had a single resource counter. After your patch RLIMIT_MEMLOCK is
> > applied separately to both -- more or less.
> 
> Before the patch the count was doubled since a single page was counted 
> twice: Once because it was mlocked (marked with PG_mlock) and then again 
> because it was also pinned (the refcount was increased). Two different 
> things.

Christoph, why are you *STILL* arguing??

You caused a *regression* in a userspace ABI plain and simple, and a 
security relevant one. Furtermore you modified kernel/events/core.c yet 
you never even Cc:-ed the parties involved ...

All your excuses, obfuscation and attempts to redefine the universe to 
your liking won't change reality: it worked before, it does not now. Take 
responsibility for your action for christ's sake and move forward towards 
a resolution , okay?

When can we expect a fix from you for the breakage you caused? Or at least 
a word that acknowledges that you broke a user ABI carelessly?

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
  2013-05-29  7:58                           ` [regression] " Ingo Molnar
@ 2013-05-29 19:53                             ` KOSAKI Motohiro
  2013-05-30  6:32                               ` Ingo Molnar
  0 siblings, 1 reply; 14+ messages in thread
From: KOSAKI Motohiro @ 2013-05-29 19:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Christoph Lameter, Peter Zijlstra, Al Viro, Vince Weaver,
	linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, trinity, akpm, torvalds, roland,
	infinipath, linux-mm, linux-rdma, Or Gerlitz, Hugh Dickins,
	kosaki.motohiro

Hi

I'm unhappy you guys uses offensive word so much. Please cool down all you guys. :-/
In fact, _BOTH_ the behavior before and after Cristoph's patch doesn't have cleaner semantics.
And PeterZ proposed make new cleaner one rather than revert. No need to hassle.

I'm 100% sure -rt people need stronger-mlock api. Please join discussion to make better API.
In my humble opinion is: we should make mlock3(addr, len flags) new syscall (*) and support
-rt requirement directly. And current strange IB RLIMIT_MEMLOCK usage should gradually migrate
it.
(*) or, to enhance mbind() is an option because i expect apps need to pin pages nearby NUMA nodes
in many case.

As your know, current IB pinning implementation doesn't guarantee no minor fault when fork
is used. It's ok for IB. They uses madvise(MADV_NOFORK) too. But I'm not sure *all* of rt
application are satisfied this. We might need to implement copy-on-fork or might not. I'd
like hear other people's opinion.

Also, all developer should know this pinning breaks when memory hot-plug is happen likes
cpu bounding bysched_setaffinity() may break when cpu hot-remove.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
  2013-05-29 19:53                             ` KOSAKI Motohiro
@ 2013-05-30  6:32                               ` Ingo Molnar
  2013-05-30 20:42                                 ` KOSAKI Motohiro
  0 siblings, 1 reply; 14+ messages in thread
From: Ingo Molnar @ 2013-05-30  6:32 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, Peter Zijlstra, Al Viro, Vince Weaver,
	linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, trinity, akpm, torvalds, roland,
	infinipath, linux-mm, linux-rdma, Or Gerlitz, Hugh Dickins


* KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote:

> Hi
> 
> I'm unhappy you guys uses offensive word so much. Please cool down all 
> you guys. :-/ In fact, _BOTH_ the behavior before and after Cristoph's 
> patch doesn't have cleaner semantics.

Erm, this feature _regressed_ after the patch. All other concerns are 
secondary. What's so difficult to understand about that?

> And PeterZ proposed make new cleaner one rather than revert. No need to 
> hassle.

Sorry, that's not how we deal with regressions upstream...

I've been pretty polite in fact, compared to say Linus - someone should 
put a politically correct summary of:

   https://lkml.org/lkml/2013/2/22/456
   https://lkml.org/lkml/2013/3/26/1113

into Documentation/, people keep forgetting about it.

I'm sorry if being direct and to the point offends you [no, not really], 
this discussion got _way_ off the real track it should be on: that this is 
a _regression_.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
  2013-05-28 16:37                         ` Christoph Lameter
  2013-05-29  7:58                           ` [regression] " Ingo Molnar
@ 2013-05-30 18:30                           ` Peter Zijlstra
  2013-05-30 19:59                           ` Pekka Enberg
  2 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2013-05-30 18:30 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Al Viro, Vince Weaver, linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, trinity, akpm, torvalds, roland,
	infinipath, linux-mm, linux-rdma, Or Gerlitz, Hugh Dickins

On Tue, May 28, 2013 at 04:37:06PM +0000, Christoph Lameter wrote:
> On Mon, 27 May 2013, Peter Zijlstra wrote:
> 
> > Before your patch pinned was included in locked and thus RLIMIT_MEMLOCK
> > had a single resource counter. After your patch RLIMIT_MEMLOCK is
> > applied separately to both -- more or less.
> 
> Before the patch the count was doubled since a single page was counted
> twice: Once because it was mlocked (marked with PG_mlock) and then again
> because it was also pinned (the refcount was increased). Two different things.

Before the patch RLIMIT_MEMLOCK was a limit on the sum of both, after it
is no longer. This change was not described in the changelog.

This is a bug plain and simple and you refuse to acknowledge it.

> We have agreed for a long time that mlocked pages are movable. That is not
> true for pinned pages and therefore pinning pages therefore do not fall
> into that category (Hugh? AFAICR you came up with that rule?)

Into what category? RLIMIT_MEMLOCK should be a limit on the amount of
pages that are constrained to memory (can not be paged). How does a
pinned page not qualify for that?

> > NO, mlocked pages are pages that do not leave core memory; IOW do not
> > cause major faults. Pinning pages is a perfectly spec compliant mlock()
> > implementation.
> 
> That is not the definition that we have used so far.

There's two statements there; which one do you disagree with?

On the first; that is exactly the mlock() definition you want; excluding
major faults only means faults cannot be subject to IO, IOW pages must
remain in memory. Not excluding minor faults allows you to unmap and
migrate pages.

On the second; excluding any faults; that is what the mlock() specs
intended -- because that is the thing real-time people really want and
mlock is part of the real-time POSIX spec.

[ Hence the need to provide mpin() and munpin() syscalls before we make
  mlock() pages migratable, otherwise RT people are screwed. ]

Since excluding any fault per definition also excludes major faults,
the second is a clear superset of the first. Both ensure pages stay in
memory, thus both should count towards RLIMIT_MEMLOCK.

> But then you refuse to acknowledge the difference and want to conflate
> both.

You're the one that is confused. You're just repeating yourself without
providing argument or reason. 

> > > Pages can be both pinned and mlocked.
> >
> > Right, but apart for mlockall() this is a highly unlikely situation to
> > actually occur. And if you're using mlockall() you've effectively
> > disabled RLIMIT_MEMLOCK and thus nobody cares if the resource counter
> > goes funny.
> 
> mlockall() would never be used on all processes. You still need the
> RLIMIT_MLOCK to ensure that the box does not lock up.

You're so skilled at missing the point its not funny. 

The resource counter (formerly mm_struct::locked_vm, currently broken)
associated with RLIMIT_MEMLOCK is per process. Therefore when you use
mlockall() you've already done away with the limit, and thus the
resource counter value is irrelevant.

> > > I think we need to be first clear on what we want to accomplish and what
> > > these counters actually should count before changing things.
> >
> > Backward isn't it... _you_ changed it without consideration.
> 
> I applied the categorization that we had agreed on before during the
> development of page migratiob. Pinning is not compatible.

I've never agreed to any such thing, you changed my code without cc'ing
me and without describing the actual change.

Please explain how pinned pages are not stuck in memory and can be
paged? Once you've convinced me of that, I'll concede they should not be
counted towards RLIMIT_MEMLOCK.

> > The IB code does a big get_user_pages(), which last time I checked
> > pins a sequential range of pages. Therefore the VMA approach.
> 
> The IB code (and other code) can require the pinning of pages in various
> ways.

You can also mlock()/madvise()/mprotect() in various ways, this is a
non-argument against using VMAs.

Is there anything besides IB and perf that has user controlled page
pinning? If so, it needs fixing.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
  2013-05-28 16:37                         ` Christoph Lameter
  2013-05-29  7:58                           ` [regression] " Ingo Molnar
  2013-05-30 18:30                           ` Peter Zijlstra
@ 2013-05-30 19:59                           ` Pekka Enberg
  2 siblings, 0 replies; 14+ messages in thread
From: Pekka Enberg @ 2013-05-30 19:59 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Peter Zijlstra, Al Viro, Vince Weaver, LKML, Paul Mackerras,
	Ingo Molnar, Arnaldo Carvalho de Melo, trinity, Andrew Morton,
	Linus Torvalds, roland, infinipath, linux-mm@kvack.org,
	linux-rdma, Or Gerlitz, Hugh Dickins

On Mon, 27 May 2013, Peter Zijlstra wrote:
>> Before your patch pinned was included in locked and thus RLIMIT_MEMLOCK
>> had a single resource counter. After your patch RLIMIT_MEMLOCK is
>> applied separately to both -- more or less.

On Tue, May 28, 2013 at 7:37 PM, Christoph Lameter <cl@linux.com> wrote:
> Before the patch the count was doubled since a single page was counted
> twice: Once because it was mlocked (marked with PG_mlock) and then again
> because it was also pinned (the refcount was increased). Two different things.

Pinned vs. mlocked counting isn't the problem here. You changed
RLIMIT_MEMLOCK userspace ABI and that needs to be restored. So the
question is how can we preserve the old RLIMIT_MEMLOCK semantics while
avoiding the double accounting issue.

                        Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
  2013-05-30  6:32                               ` Ingo Molnar
@ 2013-05-30 20:42                                 ` KOSAKI Motohiro
  2013-05-31  9:27                                   ` Ingo Molnar
  0 siblings, 1 reply; 14+ messages in thread
From: KOSAKI Motohiro @ 2013-05-30 20:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Christoph Lameter, Peter Zijlstra, Al Viro, Vince Weaver, LKML,
	Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo, trinity,
	Andrew Morton, Linus Torvalds, Roland Dreier, infinipath,
	linux-mm@kvack.org, linux-rdma, Or Gerlitz, Hugh Dickins

>> I'm unhappy you guys uses offensive word so much. Please cool down all
>> you guys. :-/ In fact, _BOTH_ the behavior before and after Cristoph's
>> patch doesn't have cleaner semantics.
>
> Erm, this feature _regressed_ after the patch. All other concerns are
> secondary. What's so difficult to understand about that?

Because it is not new commit at all. Christoph's patch was introduced
10 releases before.

$ git describe bc3e53f682
v3.1-7235-gbc3e53f

If we just revert it, we may get another and opposite regression
report. I'm worried
about it. Moreover, I don't think discussion better fix is too difficult for us.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
  2013-05-24 14:01                   ` [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK Peter Zijlstra
  2013-05-24 15:40                     ` Christoph Lameter
@ 2013-05-30 21:00                     ` KOSAKI Motohiro
  1 sibling, 0 replies; 14+ messages in thread
From: KOSAKI Motohiro @ 2013-05-30 21:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Lameter, Al Viro, Vince Weaver, LKML, Paul Mackerras,
	Ingo Molnar, Arnaldo Carvalho de Melo, trinity, Andrew Morton,
	Linus Torvalds, Roland Dreier, infinipath, linux-mm@kvack.org

> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index a841123..f8f47dc 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -137,17 +137,22 @@ struct ib_umem *ib_umem_get (struct ib_ucontext *context, unsigned long addr,
>
>         down_write(&current->mm->mmap_sem);
>
> -       locked     = npages + current->mm->pinned_vm;
> +       locked     = npages + mm_locked_pages(current->mm);
>         lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>
>         if ((locked > lock_limit) && !capable(CAP_IPC_LOCK)) {
>                 ret = -ENOMEM;
> -               goto out;
> +               goto err;
>         }
>
>         cur_base = addr & PAGE_MASK;
> +       umem->start_addr = cur_base;
> +       umem->end_addr   = cur_base + npages;
> +
> +       ret = mm_mpin(umem->start_addr, umem->end_addr);
> +       if (ret)
> +               goto err;

I believe RLIMIT_MEMLOCK should be checked within mm_mpin().


> +static inline unsigned long mm_locked_pages(struct mm_struct *mm)
> +{
> +       return mm->pinned_vm + mm->locked_vm;
> +}

This is acceptable. but if we create mm_locked_pages(), /proc should
also use this.
Otherwise pinning operation magically decrease VmLck field in
/proc/pid/status and people
get a confusion.



> @@ -310,9 +309,49 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
>          * Keep track of amount of locked VM.
>          */
>         nr_pages = (end - start) >> PAGE_SHIFT;
> -       if (!lock)
> -               nr_pages = -nr_pages;
> -       mm->locked_vm += nr_pages;
> +
> +       /*
> +        * We should only account pages once, if VM_PINNED is set pages are
> +        * accounted in mm_struct::pinned_vm, otherwise if VM_LOCKED is set,
> +        * we'll account them in mm_struct::locked_vm.
> +        *
> +        * PL  := vma->vm_flags
> +        * PL' := newflags
> +        * PLd := {pinned,locked}_vm delta
> +        *
> +        * PL->PL' PLd
> +        * -----------
> +        * 00  01  0+
> +        * 00  10  +0
> +        * 01  11  +-
> +        * 01  00  0-
> +        * 10  00  -0
> +        * 10  11  00
> +        * 11  01  -+
> +        * 11  10  00
> +        */

This comment is too cryptic. :-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
  2013-05-30 20:42                                 ` KOSAKI Motohiro
@ 2013-05-31  9:27                                   ` Ingo Molnar
  0 siblings, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2013-05-31  9:27 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, Peter Zijlstra, Al Viro, Vince Weaver, LKML,
	Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo, trinity,
	Andrew Morton, Linus Torvalds, Roland Dreier, infinipath,
	linux-mm@kvack.org, linux-rdma, Or Gerlitz, Hugh Dickins


* KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote:

> >> I'm unhappy you guys uses offensive word so much. Please cool down all
> >> you guys. :-/ In fact, _BOTH_ the behavior before and after Cristoph's
> >> patch doesn't have cleaner semantics.
> >
> > Erm, this feature _regressed_ after the patch. All other concerns are
> > secondary. What's so difficult to understand about that?
> 
> Because it is not new commit at all. Christoph's patch was introduced
> 10 releases before.

That's indeed sad - resource limits are not typically tested by apps. The 
lack of Cc:s to people affected also helped hide the effects of the 
change.

> $ git describe bc3e53f682
> v3.1-7235-gbc3e53f
> 
> If we just revert it, we may get another and opposite regression report. 
> I'm worried about it. Moreover, I don't think discussion better fix is 
> too difficult for us.

Sure, I agree that a fix is generally better.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-05-31  9:27 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <alpine.DEB.2.10.1305221523420.9944@vincent-weaver-1.um.maine.edu>
     [not found] ` <alpine.DEB.2.10.1305221953370.11450@vincent-weaver-1.um.maine.edu>
     [not found]   ` <alpine.DEB.2.10.1305222344060.12929@vincent-weaver-1.um.maine.edu>
     [not found]     ` <20130523044803.GA25399@ZenIV.linux.org.uk>
     [not found]       ` <20130523104154.GA23650@twins.programming.kicks-ass.net>
     [not found]         ` <0000013ed1b8d0cc-ad2bb878-51bd-430c-8159-629b23ed1b44-000000@email.amazonses.com>
     [not found]           ` <20130523152458.GD23650@twins.programming.kicks-ass.net>
     [not found]             ` <0000013ed2297ba8-467d474a-7068-45b3-9fa3-82641e6aa363-000000@email.amazonses.com>
     [not found]               ` <20130523163901.GG23650@twins.programming.kicks-ass.net>
     [not found]                 ` <0000013ed28b638a-066d7dc7-b590-49f8-9423-badb9537b8b6-000000@email.amazonses.com>
2013-05-24 14:01                   ` [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK Peter Zijlstra
2013-05-24 15:40                     ` Christoph Lameter
2013-05-26  1:11                       ` KOSAKI Motohiro
2013-05-28 16:19                         ` Christoph Lameter
2013-05-27  6:48                       ` Peter Zijlstra
2013-05-28 16:37                         ` Christoph Lameter
2013-05-29  7:58                           ` [regression] " Ingo Molnar
2013-05-29 19:53                             ` KOSAKI Motohiro
2013-05-30  6:32                               ` Ingo Molnar
2013-05-30 20:42                                 ` KOSAKI Motohiro
2013-05-31  9:27                                   ` Ingo Molnar
2013-05-30 18:30                           ` Peter Zijlstra
2013-05-30 19:59                           ` Pekka Enberg
2013-05-30 21:00                     ` KOSAKI Motohiro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).