linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/26] get_user_pages() cleanup
@ 2013-10-02 14:27 Jan Kara
  2013-10-02 14:28 ` [PATCH 20/26] ib: Convert ib_umem_get() to get_user_pages_unlocked() Jan Kara
                   ` (6 more replies)
  0 siblings, 7 replies; 28+ messages in thread
From: Jan Kara @ 2013-10-02 14:27 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, Jan Kara, Alexander Viro, Andreas Dilger, Andy Walls,
	Arnd Bergmann, Benjamin LaHaise, ceph-devel, Dan Williams,
	David Airlie, dri-devel, Gleb Natapov, Greg Kroah-Hartman,
	hpdd-discuss, Jarod Wilson, Jayant Mangalampalli,
	Jean-Christophe Plagniol-Villard, Jesper Nilsson, Kai Makisara,
	kvm, Laurent Pinchart, linux-aio, linux-cris-kernel, linux-fbdev,
	linux-fsdevel

  Hello,

  In my quest for changing locking around page faults to make things easier for
filesystems I found out get_user_pages() users could use a cleanup.  The
knowledge about necessary locking for get_user_pages() is in tons of places in
drivers and quite a few of them actually get it wrong (don't have mmap_sem when
calling get_user_pages() or hold mmap_sem when calling copy_from_user() in the
surrounding code). Rather often this actually doesn't seem necessary. This
patch series converts lots of places to use either get_user_pages_fast()
or a new simple wrapper get_user_pages_unlocked() to remove the knowledge
of mmap_sem from the drivers. I'm still looking into converting a few remaining
drivers (most notably v4l2) which are more complex.

As I already wrote, in some cases I actually think drivers were buggy (and I
note that in corresponding changelogs). I would really like to ask respective
maintainers to have a look at the patches in their area. Also any other
comments are welcome. Thanks.

								Honza

PS: Sorry for the huge recipient list but I don't really know how to trim it
    down...

CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Andreas Dilger <andreas.dilger@intel.com>
CC: Andy Walls <awalls@md.metrocast.net>
CC: Arnd Bergmann <arnd@arndb.de>
CC: Benjamin LaHaise <bcrl@kvack.org>
CC: ceph-devel@vger.kernel.org
CC: Dan Williams <dan.j.williams@intel.com>
CC: David Airlie <airlied@linux.ie>
CC: dri-devel@lists.freedesktop.org
CC: Gleb Natapov <gleb@redhat.com>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: hpdd-discuss@lists.01.org
CC: Jarod Wilson <jarod@wilsonet.com>
CC: Jayant Mangalampalli <jayant.mangalampalli@intel.com>
CC: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
CC: Jesper Nilsson <jesper.nilsson@axis.com>
CC: Kai Makisara <Kai.Makisara@kolumbus.fi>
CC: kvm@vger.kernel.org
CC: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
CC: linux-aio@kvack.org
CC: linux-cris-kernel@axis.com
CC: linux-fbdev@vger.kernel.org
CC: linux-fsdevel@vger.kernel.org
CC: linux-ia64@vger.kernel.org
CC: linux-media@vger.kernel.org
CC: linux-nfs@vger.kernel.org
CC: linux-rdma@vger.kernel.org
CC: linux-scsi@vger.kernel.org
CC: Manu Abraham <abraham.manu@gmail.com>
CC: Mark Allyn <mark.a.allyn@intel.com>
CC: Mikael Starvik <starvik@axis.com>
CC: Mike Marciniszyn <infinipath@intel.com>
CC: Naren Sankar <nsankar@broadcom.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Peng Tao <tao.peng@emc.com>
CC: Roland Dreier <roland@kernel.org>
CC: Sage Weil <sage@inktank.com>
CC: Scott Davilla <davilla@4pi.com>
CC: Timur Tabi <timur@freescale.com>
CC: Tomi Valkeinen <tomi.valkeinen@ti.com>
CC: Tony Luck <tony.luck@intel.com>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 20/26] ib: Convert ib_umem_get() to get_user_pages_unlocked()
  2013-10-02 14:27 [PATCH 0/26] get_user_pages() cleanup Jan Kara
@ 2013-10-02 14:28 ` Jan Kara
  2013-10-02 14:28 ` [PATCH 21/26] ib: Convert ipath_get_user_pages() " Jan Kara
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2013-10-02 14:28 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara, Roland Dreier, linux-rdma

Convert ib_umem_get() to use get_user_pages_unlocked(). This
significantly shortens the section where mmap_sem is held (we only need
it for updating of mm->pinned_vm and inside get_user_pages()) and removes
the knowledge about locking of get_user_pages().

CC: Roland Dreier <roland@kernel.org>
CC: linux-rdma@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/infiniband/core/umem.c | 41 +++++++++++++++++------------------------
 1 file changed, 17 insertions(+), 24 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index a84112322071..0640a89021a9 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -80,7 +80,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 {
 	struct ib_umem *umem;
 	struct page **page_list;
-	struct vm_area_struct **vma_list;
 	struct ib_umem_chunk *chunk;
 	unsigned long locked;
 	unsigned long lock_limit;
@@ -125,34 +124,31 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 		return ERR_PTR(-ENOMEM);
 	}
 
-	/*
-	 * if we can't alloc the vma_list, it's not so bad;
-	 * just assume the memory is not hugetlb memory
-	 */
-	vma_list = (struct vm_area_struct **) __get_free_page(GFP_KERNEL);
-	if (!vma_list)
-		umem->hugetlb = 0;
-
 	npages = PAGE_ALIGN(size + umem->offset) >> PAGE_SHIFT;
 
 	down_write(&current->mm->mmap_sem);
 
-	locked     = npages + current->mm->pinned_vm;
 	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-
-	if ((locked > lock_limit) && !capable(CAP_IPC_LOCK)) {
-		ret = -ENOMEM;
-		goto out;
+	locked = npages;
+	if (npages + current->mm->pinned_vm > lock_limit &&
+	    !capable(CAP_IPC_LOCK)) {
+		up_write(&current->mm->mmap_sem);
+		kfree(umem);
+		free_page((unsigned long) page_list);
+		return ERR_PTR(-ENOMEM);
 	}
+	current->mm->pinned_vm += npages;
+
+	up_write(&current->mm->mmap_sem);
 
 	cur_base = addr & PAGE_MASK;
 
 	ret = 0;
 	while (npages) {
-		ret = get_user_pages(current, current->mm, cur_base,
+		ret = get_user_pages_unlocked(current, current->mm, cur_base,
 				     min_t(unsigned long, npages,
 					   PAGE_SIZE / sizeof (struct page *)),
-				     1, !umem->writable, page_list, vma_list);
+				     1, !umem->writable, page_list);
 
 		if (ret < 0)
 			goto out;
@@ -174,8 +170,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 			chunk->nents = min_t(int, ret, IB_UMEM_MAX_PAGE_CHUNK);
 			sg_init_table(chunk->page_list, chunk->nents);
 			for (i = 0; i < chunk->nents; ++i) {
-				if (vma_list &&
-				    !is_vm_hugetlb_page(vma_list[i + off]))
+				if (!PageHuge(page_list[i + off]))
 					umem->hugetlb = 0;
 				sg_set_page(&chunk->page_list[i], page_list[i + off], PAGE_SIZE, 0);
 			}
@@ -206,12 +201,10 @@ out:
 	if (ret < 0) {
 		__ib_umem_release(context->device, umem, 0);
 		kfree(umem);
-	} else
-		current->mm->pinned_vm = locked;
-
-	up_write(&current->mm->mmap_sem);
-	if (vma_list)
-		free_page((unsigned long) vma_list);
+		down_write(&current->mm->mmap_sem);
+		current->mm->pinned_vm -= locked;
+		up_write(&current->mm->mmap_sem);
+	}
 	free_page((unsigned long) page_list);
 
 	return ret < 0 ? ERR_PTR(ret) : umem;
-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 21/26] ib: Convert ipath_get_user_pages() to get_user_pages_unlocked()
  2013-10-02 14:27 [PATCH 0/26] get_user_pages() cleanup Jan Kara
  2013-10-02 14:28 ` [PATCH 20/26] ib: Convert ib_umem_get() to get_user_pages_unlocked() Jan Kara
@ 2013-10-02 14:28 ` Jan Kara
  2013-10-02 14:28 ` [PATCH 22/26] ib: Convert ipath_user_sdma_pin_pages() to use get_user_pages_unlocked() Jan Kara
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2013-10-02 14:28 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara, Mike Marciniszyn, Roland Dreier, linux-rdma

Convert ipath_get_user_pages() to use get_user_pages_unlocked(). This
shortens the section where we hold mmap_sem for writing and also removes
the knowledge about get_user_pages() locking from ipath driver. We also
fix a bug in testing pinned number of pages when changing the code.

CC: Mike Marciniszyn <infinipath@intel.com>
CC: Roland Dreier <roland@kernel.org>
CC: linux-rdma@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/infiniband/hw/ipath/ipath_user_pages.c | 62 +++++++++++---------------
 1 file changed, 27 insertions(+), 35 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_user_pages.c b/drivers/infiniband/hw/ipath/ipath_user_pages.c
index dc66c4506916..a89af9654112 100644
--- a/drivers/infiniband/hw/ipath/ipath_user_pages.c
+++ b/drivers/infiniband/hw/ipath/ipath_user_pages.c
@@ -52,40 +52,58 @@ static void __ipath_release_user_pages(struct page **p, size_t num_pages,
 	}
 }
 
-/* call with current->mm->mmap_sem held */
-static int __ipath_get_user_pages(unsigned long start_page, size_t num_pages,
-				  struct page **p, struct vm_area_struct **vma)
+/**
+ * ipath_get_user_pages - lock user pages into memory
+ * @start_page: the start page
+ * @num_pages: the number of pages
+ * @p: the output page structures
+ *
+ * This function takes a given start page (page aligned user virtual
+ * address) and pins it and the following specified number of pages.  For
+ * now, num_pages is always 1, but that will probably change at some point
+ * (because caller is doing expected sends on a single virtually contiguous
+ * buffer, so we can do all pages at once).
+ */
+int ipath_get_user_pages(unsigned long start_page, size_t num_pages,
+			 struct page **p)
 {
 	unsigned long lock_limit;
 	size_t got;
 	int ret;
 
+	down_write(&current->mm->mmap_sem);
 	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 
-	if (num_pages > lock_limit) {
+	if (current->mm->pinned_vm + num_pages > lock_limit && 
+	    !capable(CAP_IPC_LOCK)) {
+		up_write(&current->mm->mmap_sem);
 		ret = -ENOMEM;
 		goto bail;
 	}
+	current->mm->pinned_vm += num_pages;
+	up_write(&current->mm->mmap_sem);
 
 	ipath_cdbg(VERBOSE, "pin %lx pages from vaddr %lx\n",
 		   (unsigned long) num_pages, start_page);
 
 	for (got = 0; got < num_pages; got += ret) {
-		ret = get_user_pages(current, current->mm,
-				     start_page + got * PAGE_SIZE,
-				     num_pages - got, 1, 1,
-				     p + got, vma);
+		ret = get_user_pages_unlocked(current, current->mm,
+					      start_page + got * PAGE_SIZE,
+					      num_pages - got, 1, 1,
+					      p + got);
 		if (ret < 0)
 			goto bail_release;
 	}
 
-	current->mm->pinned_vm += num_pages;
 
 	ret = 0;
 	goto bail;
 
 bail_release:
 	__ipath_release_user_pages(p, got, 0);
+	down_write(&current->mm->mmap_sem);
+	current->mm->pinned_vm -= num_pages;
+	up_write(&current->mm->mmap_sem);
 bail:
 	return ret;
 }
@@ -146,32 +164,6 @@ dma_addr_t ipath_map_single(struct pci_dev *hwdev, void *ptr, size_t size,
 	return phys;
 }
 
-/**
- * ipath_get_user_pages - lock user pages into memory
- * @start_page: the start page
- * @num_pages: the number of pages
- * @p: the output page structures
- *
- * This function takes a given start page (page aligned user virtual
- * address) and pins it and the following specified number of pages.  For
- * now, num_pages is always 1, but that will probably change at some point
- * (because caller is doing expected sends on a single virtually contiguous
- * buffer, so we can do all pages at once).
- */
-int ipath_get_user_pages(unsigned long start_page, size_t num_pages,
-			 struct page **p)
-{
-	int ret;
-
-	down_write(&current->mm->mmap_sem);
-
-	ret = __ipath_get_user_pages(start_page, num_pages, p, NULL);
-
-	up_write(&current->mm->mmap_sem);
-
-	return ret;
-}
-
 void ipath_release_user_pages(struct page **p, size_t num_pages)
 {
 	down_write(&current->mm->mmap_sem);
-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 22/26] ib: Convert ipath_user_sdma_pin_pages() to use get_user_pages_unlocked()
  2013-10-02 14:27 [PATCH 0/26] get_user_pages() cleanup Jan Kara
  2013-10-02 14:28 ` [PATCH 20/26] ib: Convert ib_umem_get() to get_user_pages_unlocked() Jan Kara
  2013-10-02 14:28 ` [PATCH 21/26] ib: Convert ipath_get_user_pages() " Jan Kara
@ 2013-10-02 14:28 ` Jan Kara
  2013-10-02 14:28 ` [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked() Jan Kara
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2013-10-02 14:28 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara, Mike Marciniszyn, Roland Dreier, linux-rdma

Function ipath_user_sdma_queue_pkts() gets called with mmap_sem held for
writing. Except for get_user_pages() deep down in
ipath_user_sdma_pin_pages() we don't seem to need mmap_sem at all. Even
more interestingly the function ipath_user_sdma_queue_pkts() (and also
ipath_user_sdma_coalesce() called somewhat later) call copy_from_user()
which can hit a page fault and we deadlock on trying to get mmap_sem
when handling that fault. So just make ipath_user_sdma_pin_pages() use
get_user_pages_unlocked() and leave mmap_sem locking for mm.

CC: Mike Marciniszyn <infinipath@intel.com>
CC: Roland Dreier <roland@kernel.org>
CC: linux-rdma@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/infiniband/hw/ipath/ipath_user_sdma.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_user_sdma.c b/drivers/infiniband/hw/ipath/ipath_user_sdma.c
index f5cb13b21445..462c9b136e85 100644
--- a/drivers/infiniband/hw/ipath/ipath_user_sdma.c
+++ b/drivers/infiniband/hw/ipath/ipath_user_sdma.c
@@ -280,8 +280,8 @@ static int ipath_user_sdma_pin_pages(const struct ipath_devdata *dd,
 	int j;
 	int ret;
 
-	ret = get_user_pages(current, current->mm, addr,
-			     npages, 0, 1, pages, NULL);
+	ret = get_user_pages_unlocked(current, current->mm, addr,
+				      npages, 0, 1, pages);
 
 	if (ret != npages) {
 		int i;
@@ -811,10 +811,7 @@ int ipath_user_sdma_writev(struct ipath_devdata *dd,
 	while (dim) {
 		const int mxp = 8;
 
-		down_write(&current->mm->mmap_sem);
 		ret = ipath_user_sdma_queue_pkts(dd, pq, &list, iov, dim, mxp);
-		up_write(&current->mm->mmap_sem);
-
 		if (ret <= 0)
 			goto done_unlock;
 		else {
-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
  2013-10-02 14:27 [PATCH 0/26] get_user_pages() cleanup Jan Kara
                   ` (2 preceding siblings ...)
  2013-10-02 14:28 ` [PATCH 22/26] ib: Convert ipath_user_sdma_pin_pages() to use get_user_pages_unlocked() Jan Kara
@ 2013-10-02 14:28 ` Jan Kara
       [not found]   ` <1380724087-13927-24-git-send-email-jack-AlSwsSmVLrQ@public.gmane.org>
  2013-10-02 14:28 ` [PATCH 24/26] ib: Convert qib_user_sdma_pin_pages() to use get_user_pages_unlocked() Jan Kara
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2013-10-02 14:28 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara, Mike Marciniszyn, Roland Dreier, linux-rdma

Convert qib_get_user_pages() to use get_user_pages_unlocked().  This
shortens the section where we hold mmap_sem for writing and also removes
the knowledge about get_user_pages() locking from ipath driver. We also
fix a bug in testing pinned number of pages when changing the code.

CC: Mike Marciniszyn <infinipath@intel.com>
CC: Roland Dreier <roland@kernel.org>
CC: linux-rdma@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/infiniband/hw/qib/qib_user_pages.c | 62 +++++++++++++-----------------
 1 file changed, 26 insertions(+), 36 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
index 2bc1d2b96298..57ce83c2d1d9 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -48,39 +48,55 @@ static void __qib_release_user_pages(struct page **p, size_t num_pages,
 	}
 }
 
-/*
- * Call with current->mm->mmap_sem held.
+/**
+ * qib_get_user_pages - lock user pages into memory
+ * @start_page: the start page
+ * @num_pages: the number of pages
+ * @p: the output page structures
+ *
+ * This function takes a given start page (page aligned user virtual
+ * address) and pins it and the following specified number of pages.  For
+ * now, num_pages is always 1, but that will probably change at some point
+ * (because caller is doing expected sends on a single virtually contiguous
+ * buffer, so we can do all pages at once).
  */
-static int __qib_get_user_pages(unsigned long start_page, size_t num_pages,
-				struct page **p, struct vm_area_struct **vma)
+int qib_get_user_pages(unsigned long start_page, size_t num_pages,
+		       struct page **p)
 {
 	unsigned long lock_limit;
 	size_t got;
 	int ret;
+	struct mm_struct *mm = current->mm;
 
+	down_write(&mm->mmap_sem);
 	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 
-	if (num_pages > lock_limit && !capable(CAP_IPC_LOCK)) {
+	if (mm->pinned_vm + num_pages > lock_limit && !capable(CAP_IPC_LOCK)) {
+		up_write(&mm->mmap_sem);
 		ret = -ENOMEM;
 		goto bail;
 	}
+	mm->pinned_vm += num_pages;
+	up_write(&mm->mmap_sem);
 
 	for (got = 0; got < num_pages; got += ret) {
-		ret = get_user_pages(current, current->mm,
-				     start_page + got * PAGE_SIZE,
-				     num_pages - got, 1, 1,
-				     p + got, vma);
+		ret = get_user_pages_unlocked(current, mm,
+					      start_page + got * PAGE_SIZE,
+					      num_pages - got, 1, 1,
+					      p + got);
 		if (ret < 0)
 			goto bail_release;
 	}
 
-	current->mm->pinned_vm += num_pages;
 
 	ret = 0;
 	goto bail;
 
 bail_release:
 	__qib_release_user_pages(p, got, 0);
+	down_write(&mm->mmap_sem);
+	mm->pinned_vm -= num_pages;
+	up_write(&mm->mmap_sem);
 bail:
 	return ret;
 }
@@ -117,32 +133,6 @@ dma_addr_t qib_map_page(struct pci_dev *hwdev, struct page *page,
 	return phys;
 }
 
-/**
- * qib_get_user_pages - lock user pages into memory
- * @start_page: the start page
- * @num_pages: the number of pages
- * @p: the output page structures
- *
- * This function takes a given start page (page aligned user virtual
- * address) and pins it and the following specified number of pages.  For
- * now, num_pages is always 1, but that will probably change at some point
- * (because caller is doing expected sends on a single virtually contiguous
- * buffer, so we can do all pages at once).
- */
-int qib_get_user_pages(unsigned long start_page, size_t num_pages,
-		       struct page **p)
-{
-	int ret;
-
-	down_write(&current->mm->mmap_sem);
-
-	ret = __qib_get_user_pages(start_page, num_pages, p, NULL);
-
-	up_write(&current->mm->mmap_sem);
-
-	return ret;
-}
-
 void qib_release_user_pages(struct page **p, size_t num_pages)
 {
 	if (current->mm) /* during close after signal, mm can be NULL */
-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 24/26] ib: Convert qib_user_sdma_pin_pages() to use get_user_pages_unlocked()
  2013-10-02 14:27 [PATCH 0/26] get_user_pages() cleanup Jan Kara
                   ` (3 preceding siblings ...)
  2013-10-02 14:28 ` [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked() Jan Kara
@ 2013-10-02 14:28 ` Jan Kara
  2013-10-02 14:28 ` [PATCH 25/26] ib: Convert mthca_map_user_db() to use get_user_pages_fast() Jan Kara
  2013-10-02 16:20 ` [PATCH 0/26] get_user_pages() cleanup Christoph Hellwig
  6 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2013-10-02 14:28 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara, Mike Marciniszyn, Roland Dreier, linux-rdma

Function qib_user_sdma_queue_pkts() gets called with mmap_sem held for
writing. Except for get_user_pages() deep down in
qib_user_sdma_pin_pages() we don't seem to need mmap_sem at all.  Even
more interestingly the function qib_user_sdma_queue_pkts() (and also
qib_user_sdma_coalesce() called somewhat later) call copy_from_user()
which can hit a page fault and we deadlock on trying to get mmap_sem
when handling that fault. So just make qib_user_sdma_pin_pages() use
get_user_pages_unlocked() and leave mmap_sem locking for mm.

CC: Mike Marciniszyn <infinipath@intel.com>
CC: Roland Dreier <roland@kernel.org>
CC: linux-rdma@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/infiniband/hw/qib/qib_user_sdma.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_user_sdma.c b/drivers/infiniband/hw/qib/qib_user_sdma.c
index d0a0ea0c14d6..c1b6463acd59 100644
--- a/drivers/infiniband/hw/qib/qib_user_sdma.c
+++ b/drivers/infiniband/hw/qib/qib_user_sdma.c
@@ -594,8 +594,8 @@ static int qib_user_sdma_pin_pages(const struct qib_devdata *dd,
 		else
 			j = npages;
 
-		ret = get_user_pages(current, current->mm, addr,
-			     j, 0, 1, pages, NULL);
+		ret = get_user_pages_unlocked(current, current->mm, addr,
+			 		      j, 0, 1, pages);
 		if (ret != j) {
 			i = 0;
 			j = ret;
@@ -1294,11 +1294,8 @@ int qib_user_sdma_writev(struct qib_ctxtdata *rcd,
 		int mxp = 8;
 		int ndesc = 0;
 
-		down_write(&current->mm->mmap_sem);
 		ret = qib_user_sdma_queue_pkts(dd, ppd, pq,
 				iov, dim, &list, &mxp, &ndesc);
-		up_write(&current->mm->mmap_sem);
-
 		if (ret < 0)
 			goto done_unlock;
 		else {
-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 25/26] ib: Convert mthca_map_user_db() to use get_user_pages_fast()
  2013-10-02 14:27 [PATCH 0/26] get_user_pages() cleanup Jan Kara
                   ` (4 preceding siblings ...)
  2013-10-02 14:28 ` [PATCH 24/26] ib: Convert qib_user_sdma_pin_pages() to use get_user_pages_unlocked() Jan Kara
@ 2013-10-02 14:28 ` Jan Kara
  2013-10-02 16:20 ` [PATCH 0/26] get_user_pages() cleanup Christoph Hellwig
  6 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2013-10-02 14:28 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara, Roland Dreier, linux-rdma

Function mthca_map_user_db() appears to call get_user_pages() without
holding mmap_sem. Fix the bug by using get_user_pages_fast() instead
which also takes care of the locking.

CC: Roland Dreier <roland@kernel.org>
CC: linux-rdma@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/infiniband/hw/mthca/mthca_memfree.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c b/drivers/infiniband/hw/mthca/mthca_memfree.c
index 7d2e42dd6926..c3543b27a2a7 100644
--- a/drivers/infiniband/hw/mthca/mthca_memfree.c
+++ b/drivers/infiniband/hw/mthca/mthca_memfree.c
@@ -472,8 +472,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct mthca_uar *uar,
 		goto out;
 	}
 
-	ret = get_user_pages(current, current->mm, uaddr & PAGE_MASK, 1, 1, 0,
-			     pages, NULL);
+	ret = get_user_pages_fast(uaddr & PAGE_MASK, 1, 1, pages);
 	if (ret < 0)
 		goto out;
 
-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* RE: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
       [not found]   ` <1380724087-13927-24-git-send-email-jack-AlSwsSmVLrQ@public.gmane.org>
@ 2013-10-02 14:54     ` Marciniszyn, Mike
       [not found]       ` <32E1700B9017364D9B60AED9960492BC211AEF75-AtyAts71sc9zLByeVOV5+bfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2013-10-04 13:52     ` Marciniszyn, Mike
  1 sibling, 1 reply; 28+ messages in thread
From: Marciniszyn, Mike @ 2013-10-02 14:54 UTC (permalink / raw)
  To: Jan Kara, LKML
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, infinipath,
	Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Thanks!!

I would like to test these two patches and also do the stable work for the deadlock as well.  Do you have these patches in a repo somewhere to save me a bit of work?

We had been working on an internal version of the deadlock portion of this patch that uses get_user_pages_fast() vs. the new get_user_pages_unlocked().

The risk of GUP fast is the loss of the "force" arg on GUP fast, which I don't see as significant give our use case.

Some nits on the subject and commit message:
1. use IB/qib, IB/ipath vs. ib
2. use the correct ipath vs. qib in the commit message text

Mike

> -----Original Message-----
> From: Jan Kara [mailto:jack-AlSwsSmVLrQ@public.gmane.org]
> Sent: Wednesday, October 02, 2013 10:28 AM
> To: LKML
> Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org; Jan Kara; infinipath; Roland Dreier; linux-
> rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: [PATCH 23/26] ib: Convert qib_get_user_pages() to
> get_user_pages_unlocked()
> 
> Convert qib_get_user_pages() to use get_user_pages_unlocked().  This
> shortens the section where we hold mmap_sem for writing and also removes
> the knowledge about get_user_pages() locking from ipath driver. We also fix
> a bug in testing pinned number of pages when changing the code.
> 
> CC: Mike Marciniszyn <infinipath-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> CC: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> CC: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
> ---
>  drivers/infiniband/hw/qib/qib_user_pages.c | 62 +++++++++++++----------------
> -
>  1 file changed, 26 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c
> b/drivers/infiniband/hw/qib/qib_user_pages.c
> index 2bc1d2b96298..57ce83c2d1d9 100644
> --- a/drivers/infiniband/hw/qib/qib_user_pages.c
> +++ b/drivers/infiniband/hw/qib/qib_user_pages.c
> @@ -48,39 +48,55 @@ static void __qib_release_user_pages(struct page
> **p, size_t num_pages,
>  	}
>  }
> 
> -/*
> - * Call with current->mm->mmap_sem held.
> +/**
> + * qib_get_user_pages - lock user pages into memory
> + * @start_page: the start page
> + * @num_pages: the number of pages
> + * @p: the output page structures
> + *
> + * This function takes a given start page (page aligned user virtual
> + * address) and pins it and the following specified number of pages.
> +For
> + * now, num_pages is always 1, but that will probably change at some
> +point
> + * (because caller is doing expected sends on a single virtually
> +contiguous
> + * buffer, so we can do all pages at once).
>   */
> -static int __qib_get_user_pages(unsigned long start_page, size_t
> num_pages,
> -				struct page **p, struct vm_area_struct
> **vma)
> +int qib_get_user_pages(unsigned long start_page, size_t num_pages,
> +		       struct page **p)
>  {
>  	unsigned long lock_limit;
>  	size_t got;
>  	int ret;
> +	struct mm_struct *mm = current->mm;
> 
> +	down_write(&mm->mmap_sem);
>  	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> 
> -	if (num_pages > lock_limit && !capable(CAP_IPC_LOCK)) {
> +	if (mm->pinned_vm + num_pages > lock_limit &&
> !capable(CAP_IPC_LOCK)) {
> +		up_write(&mm->mmap_sem);
>  		ret = -ENOMEM;
>  		goto bail;
>  	}
> +	mm->pinned_vm += num_pages;
> +	up_write(&mm->mmap_sem);
> 
>  	for (got = 0; got < num_pages; got += ret) {
> -		ret = get_user_pages(current, current->mm,
> -				     start_page + got * PAGE_SIZE,
> -				     num_pages - got, 1, 1,
> -				     p + got, vma);
> +		ret = get_user_pages_unlocked(current, mm,
> +					      start_page + got * PAGE_SIZE,
> +					      num_pages - got, 1, 1,
> +					      p + got);
>  		if (ret < 0)
>  			goto bail_release;
>  	}
> 
> -	current->mm->pinned_vm += num_pages;
> 
>  	ret = 0;
>  	goto bail;
> 
>  bail_release:
>  	__qib_release_user_pages(p, got, 0);
> +	down_write(&mm->mmap_sem);
> +	mm->pinned_vm -= num_pages;
> +	up_write(&mm->mmap_sem);
>  bail:
>  	return ret;
>  }
> @@ -117,32 +133,6 @@ dma_addr_t qib_map_page(struct pci_dev *hwdev,
> struct page *page,
>  	return phys;
>  }
> 
> -/**
> - * qib_get_user_pages - lock user pages into memory
> - * @start_page: the start page
> - * @num_pages: the number of pages
> - * @p: the output page structures
> - *
> - * This function takes a given start page (page aligned user virtual
> - * address) and pins it and the following specified number of pages.  For
> - * now, num_pages is always 1, but that will probably change at some point
> - * (because caller is doing expected sends on a single virtually contiguous
> - * buffer, so we can do all pages at once).
> - */
> -int qib_get_user_pages(unsigned long start_page, size_t num_pages,
> -		       struct page **p)
> -{
> -	int ret;
> -
> -	down_write(&current->mm->mmap_sem);
> -
> -	ret = __qib_get_user_pages(start_page, num_pages, p, NULL);
> -
> -	up_write(&current->mm->mmap_sem);
> -
> -	return ret;
> -}
> -
>  void qib_release_user_pages(struct page **p, size_t num_pages)  {
>  	if (current->mm) /* during close after signal, mm can be NULL */
> --
> 1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
       [not found]       ` <32E1700B9017364D9B60AED9960492BC211AEF75-AtyAts71sc9zLByeVOV5+bfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2013-10-02 15:28         ` Jan Kara
       [not found]           ` <20131002152811.GC32181-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2013-10-02 15:28 UTC (permalink / raw)
  To: Marciniszyn, Mike
  Cc: Jan Kara, LKML, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	infinipath, Roland Dreier,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Wed 02-10-13 14:54:59, Marciniszyn, Mike wrote:
> Thanks!!
> 
> I would like to test these two patches and also do the stable work for
> the deadlock as well.  Do you have these patches in a repo somewhere to
> save me a bit of work?
  I've pushed the patches to:
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git
into the branch 'get_user_pages'.

> We had been working on an internal version of the deadlock portion of
> this patch that uses get_user_pages_fast() vs. the new
> get_user_pages_unlocked().
> 
> The risk of GUP fast is the loss of the "force" arg on GUP fast, which I
> don't see as significant give our use case.
  Yes. I was discussing with Roland some time ago whether the force
argument is needed and he said it is. So I kept the arguments of
get_user_pages() intact and just simplified the locking...

BTW: Infiniband still needs mmap_sem for modification of mm->pinned_vm. It
might be worthwhile to actually change that to atomic_long_t (only
kernel/events/core.c would need update besides infiniband) and avoid taking
mmap_sem in infiniband code altogether.

> Some nits on the subject and commit message:
> 1. use IB/qib, IB/ipath vs. ib
> 2. use the correct ipath vs. qib in the commit message text
  Sure will do. Thanks the quick reply and for comments.

								Honza
> > -----Original Message-----
> > From: Jan Kara [mailto:jack-AlSwsSmVLrQ@public.gmane.org]
> > Sent: Wednesday, October 02, 2013 10:28 AM
> > To: LKML
> > Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org; Jan Kara; infinipath; Roland Dreier; linux-
> > rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > Subject: [PATCH 23/26] ib: Convert qib_get_user_pages() to
> > get_user_pages_unlocked()
> > 
> > Convert qib_get_user_pages() to use get_user_pages_unlocked().  This
> > shortens the section where we hold mmap_sem for writing and also removes
> > the knowledge about get_user_pages() locking from ipath driver. We also fix
> > a bug in testing pinned number of pages when changing the code.
> > 
> > CC: Mike Marciniszyn <infinipath-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > CC: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> > CC: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
> > ---
> >  drivers/infiniband/hw/qib/qib_user_pages.c | 62 +++++++++++++----------------
> > -
> >  1 file changed, 26 insertions(+), 36 deletions(-)
> > 
> > diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c
> > b/drivers/infiniband/hw/qib/qib_user_pages.c
> > index 2bc1d2b96298..57ce83c2d1d9 100644
> > --- a/drivers/infiniband/hw/qib/qib_user_pages.c
> > +++ b/drivers/infiniband/hw/qib/qib_user_pages.c
> > @@ -48,39 +48,55 @@ static void __qib_release_user_pages(struct page
> > **p, size_t num_pages,
> >  	}
> >  }
> > 
> > -/*
> > - * Call with current->mm->mmap_sem held.
> > +/**
> > + * qib_get_user_pages - lock user pages into memory
> > + * @start_page: the start page
> > + * @num_pages: the number of pages
> > + * @p: the output page structures
> > + *
> > + * This function takes a given start page (page aligned user virtual
> > + * address) and pins it and the following specified number of pages.
> > +For
> > + * now, num_pages is always 1, but that will probably change at some
> > +point
> > + * (because caller is doing expected sends on a single virtually
> > +contiguous
> > + * buffer, so we can do all pages at once).
> >   */
> > -static int __qib_get_user_pages(unsigned long start_page, size_t
> > num_pages,
> > -				struct page **p, struct vm_area_struct
> > **vma)
> > +int qib_get_user_pages(unsigned long start_page, size_t num_pages,
> > +		       struct page **p)
> >  {
> >  	unsigned long lock_limit;
> >  	size_t got;
> >  	int ret;
> > +	struct mm_struct *mm = current->mm;
> > 
> > +	down_write(&mm->mmap_sem);
> >  	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> > 
> > -	if (num_pages > lock_limit && !capable(CAP_IPC_LOCK)) {
> > +	if (mm->pinned_vm + num_pages > lock_limit &&
> > !capable(CAP_IPC_LOCK)) {
> > +		up_write(&mm->mmap_sem);
> >  		ret = -ENOMEM;
> >  		goto bail;
> >  	}
> > +	mm->pinned_vm += num_pages;
> > +	up_write(&mm->mmap_sem);
> > 
> >  	for (got = 0; got < num_pages; got += ret) {
> > -		ret = get_user_pages(current, current->mm,
> > -				     start_page + got * PAGE_SIZE,
> > -				     num_pages - got, 1, 1,
> > -				     p + got, vma);
> > +		ret = get_user_pages_unlocked(current, mm,
> > +					      start_page + got * PAGE_SIZE,
> > +					      num_pages - got, 1, 1,
> > +					      p + got);
> >  		if (ret < 0)
> >  			goto bail_release;
> >  	}
> > 
> > -	current->mm->pinned_vm += num_pages;
> > 
> >  	ret = 0;
> >  	goto bail;
> > 
> >  bail_release:
> >  	__qib_release_user_pages(p, got, 0);
> > +	down_write(&mm->mmap_sem);
> > +	mm->pinned_vm -= num_pages;
> > +	up_write(&mm->mmap_sem);
> >  bail:
> >  	return ret;
> >  }
> > @@ -117,32 +133,6 @@ dma_addr_t qib_map_page(struct pci_dev *hwdev,
> > struct page *page,
> >  	return phys;
> >  }
> > 
> > -/**
> > - * qib_get_user_pages - lock user pages into memory
> > - * @start_page: the start page
> > - * @num_pages: the number of pages
> > - * @p: the output page structures
> > - *
> > - * This function takes a given start page (page aligned user virtual
> > - * address) and pins it and the following specified number of pages.  For
> > - * now, num_pages is always 1, but that will probably change at some point
> > - * (because caller is doing expected sends on a single virtually contiguous
> > - * buffer, so we can do all pages at once).
> > - */
> > -int qib_get_user_pages(unsigned long start_page, size_t num_pages,
> > -		       struct page **p)
> > -{
> > -	int ret;
> > -
> > -	down_write(&current->mm->mmap_sem);
> > -
> > -	ret = __qib_get_user_pages(start_page, num_pages, p, NULL);
> > -
> > -	up_write(&current->mm->mmap_sem);
> > -
> > -	return ret;
> > -}
> > -
> >  void qib_release_user_pages(struct page **p, size_t num_pages)  {
> >  	if (current->mm) /* during close after signal, mm can be NULL */
> > --
> > 1.8.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
       [not found]           ` <20131002152811.GC32181-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
@ 2013-10-02 15:32             ` Marciniszyn, Mike
       [not found]               ` <32E1700B9017364D9B60AED9960492BC211AF005-AtyAts71sc9zLByeVOV5+bfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2013-10-04 13:44               ` Marciniszyn, Mike
  0 siblings, 2 replies; 28+ messages in thread
From: Marciniszyn, Mike @ 2013-10-02 15:32 UTC (permalink / raw)
  To: Jan Kara
  Cc: LKML, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	infinipath, Roland Dreier,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

> > The risk of GUP fast is the loss of the "force" arg on GUP fast, which
> > I don't see as significant give our use case.
>   Yes. I was discussing with Roland some time ago whether the force
> argument is needed and he said it is. So I kept the arguments of
> get_user_pages() intact and just simplified the locking...

The PSM side of the code is a more traditional use of GUP (like direct I/O), so I think it is a different use case than the locking for IB memory regions.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
       [not found]               ` <32E1700B9017364D9B60AED9960492BC211AF005-AtyAts71sc9zLByeVOV5+bfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2013-10-02 15:38                 ` Jan Kara
       [not found]                   ` <20131002153842.GD32181-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2013-10-02 15:38 UTC (permalink / raw)
  To: Marciniszyn, Mike
  Cc: Jan Kara, LKML, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	infinipath, Roland Dreier,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Wed 02-10-13 15:32:47, Marciniszyn, Mike wrote:
> > > The risk of GUP fast is the loss of the "force" arg on GUP fast, which
> > > I don't see as significant give our use case.
> >   Yes. I was discussing with Roland some time ago whether the force
> > argument is needed and he said it is. So I kept the arguments of
> > get_user_pages() intact and just simplified the locking...
> 
> The PSM side of the code is a more traditional use of GUP (like direct
> I/O), so I think it is a different use case than the locking for IB
> memory regions.
  Ah, I see. Whatever suits you best. I don't really care as long as
get_user_pages() locking doesn't leak into IB drivers :)

								Honza
-- 
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/26] get_user_pages() cleanup
  2013-10-02 14:27 [PATCH 0/26] get_user_pages() cleanup Jan Kara
                   ` (5 preceding siblings ...)
  2013-10-02 14:28 ` [PATCH 25/26] ib: Convert mthca_map_user_db() to use get_user_pages_fast() Jan Kara
@ 2013-10-02 16:20 ` Christoph Hellwig
  2013-10-02 20:29   ` Jan Kara
  6 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2013-10-02 16:20 UTC (permalink / raw)
  To: Jan Kara
  Cc: LKML, linux-mm, Alexander Viro, Andreas Dilger, Andy Walls,
	Arnd Bergmann, Benjamin LaHaise, ceph-devel, Dan Williams,
	David Airlie, dri-devel, Gleb Natapov, Greg Kroah-Hartman,
	hpdd-discuss, Jarod Wilson, Jayant Mangalampalli,
	Jean-Christophe Plagniol-Villard, Jesper Nilsson, Kai Makisara,
	kvm, Laurent Pinchart, linux-aio, linux-cris-kernel, linux-fbdev,
	linux-fs

On Wed, Oct 02, 2013 at 04:27:41PM +0200, Jan Kara wrote:
>   Hello,
> 
>   In my quest for changing locking around page faults to make things easier for
> filesystems I found out get_user_pages() users could use a cleanup.  The
> knowledge about necessary locking for get_user_pages() is in tons of places in
> drivers and quite a few of them actually get it wrong (don't have mmap_sem when
> calling get_user_pages() or hold mmap_sem when calling copy_from_user() in the
> surrounding code). Rather often this actually doesn't seem necessary. This
> patch series converts lots of places to use either get_user_pages_fast()
> or a new simple wrapper get_user_pages_unlocked() to remove the knowledge
> of mmap_sem from the drivers. I'm still looking into converting a few remaining
> drivers (most notably v4l2) which are more complex.

Even looking over the kerneldoc comment next to it I still fail to
understand when you'd want to use get_user_pages_fast and when not.

This isn't meant as an argument against your series, but maybe a hint
that we'd need further work in this direction.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/26] get_user_pages() cleanup
  2013-10-02 16:20 ` [PATCH 0/26] get_user_pages() cleanup Christoph Hellwig
@ 2013-10-02 20:29   ` Jan Kara
  2013-10-04 20:31     ` KOSAKI Motohiro
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2013-10-02 20:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, LKML, linux-mm, Alexander Viro, Andreas Dilger,
	Andy Walls, Arnd Bergmann, Benjamin LaHaise, ceph-devel,
	Dan Williams, David Airlie, dri-devel, Gleb Natapov,
	Greg Kroah-Hartman, hpdd-discuss, Jarod Wilson,
	Jayant Mangalampalli, Jean-Christophe Plagniol-Villard,
	Jesper Nilsson, Kai Makisara, kvm, Laurent Pinchart, linux-aio,
	linux-cris-kernel, linux-fbdev@

On Wed 02-10-13 09:20:09, Christoph Hellwig wrote:
> On Wed, Oct 02, 2013 at 04:27:41PM +0200, Jan Kara wrote:
> >   Hello,
> > 
> >   In my quest for changing locking around page faults to make things easier for
> > filesystems I found out get_user_pages() users could use a cleanup.  The
> > knowledge about necessary locking for get_user_pages() is in tons of places in
> > drivers and quite a few of them actually get it wrong (don't have mmap_sem when
> > calling get_user_pages() or hold mmap_sem when calling copy_from_user() in the
> > surrounding code). Rather often this actually doesn't seem necessary. This
> > patch series converts lots of places to use either get_user_pages_fast()
> > or a new simple wrapper get_user_pages_unlocked() to remove the knowledge
> > of mmap_sem from the drivers. I'm still looking into converting a few remaining
> > drivers (most notably v4l2) which are more complex.
> 
> Even looking over the kerneldoc comment next to it I still fail to
> understand when you'd want to use get_user_pages_fast and when not.
  AFAIU get_user_pages_fast() should be used
1) if you don't need any special get_user_pages() arguments (like calling
   it for mm of a different process, forcing COW, or similar).
2) you don't expect pages to be unmapped (then get_user_pages_fast() is
actually somewhat slower because it walks page tables twice).

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
       [not found]                   ` <20131002153842.GD32181-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
@ 2013-10-04 13:39                     ` Marciniszyn, Mike
       [not found]                       ` <32E1700B9017364D9B60AED9960492BC211B0123-AtyAts71sc9zLByeVOV5+bfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Marciniszyn, Mike @ 2013-10-04 13:39 UTC (permalink / raw)
  To: Jan Kara
  Cc: LKML, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	infinipath, Roland Dreier,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org



> -----Original Message-----
> From: Jan Kara [mailto:jack-AlSwsSmVLrQ@public.gmane.org]
> Sent: Wednesday, October 02, 2013 11:39 AM
> To: Marciniszyn, Mike
> Cc: Jan Kara; LKML; linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org; infinipath; Roland Dreier; linux-
> rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: [PATCH 23/26] ib: Convert qib_get_user_pages() to
> get_user_pages_unlocked()
> 
> On Wed 02-10-13 15:32:47, Marciniszyn, Mike wrote:
> > > > The risk of GUP fast is the loss of the "force" arg on GUP fast,
> > > > which I don't see as significant give our use case.
> > >   Yes. I was discussing with Roland some time ago whether the force
> > > argument is needed and he said it is. So I kept the arguments of
> > > get_user_pages() intact and just simplified the locking...
> >
> > The PSM side of the code is a more traditional use of GUP (like direct
> > I/O), so I think it is a different use case than the locking for IB
> > memory regions.
>   Ah, I see. Whatever suits you best. I don't really care as long as
> get_user_pages() locking doesn't leak into IB drivers :)
> 
> 								Honza
> --
> Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
> SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
  2013-10-02 15:32             ` Marciniszyn, Mike
       [not found]               ` <32E1700B9017364D9B60AED9960492BC211AF005-AtyAts71sc9zLByeVOV5+bfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2013-10-04 13:44               ` Marciniszyn, Mike
  1 sibling, 0 replies; 28+ messages in thread
From: Marciniszyn, Mike @ 2013-10-04 13:44 UTC (permalink / raw)
  To: Jan Kara
  Cc: LKML, linux-mm@kvack.org, infinipath, Roland Dreier,
	linux-rdma@vger.kernel.org

> The PSM side of the code is a more traditional use of GUP (like direct I/O), so
> I think it is a different use case than the locking for IB memory regions.

I have resubmitted the two deadlock fixes using get_user_pages_fast() and marked them stable.

See http://marc.info/?l=linux-rdma&m=138089335506355&w=2

Mike

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
       [not found]                       ` <32E1700B9017364D9B60AED9960492BC211B0123-AtyAts71sc9zLByeVOV5+bfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2013-10-04 13:46                         ` Marciniszyn, Mike
  0 siblings, 0 replies; 28+ messages in thread
From: Marciniszyn, Mike @ 2013-10-04 13:46 UTC (permalink / raw)
  To: Jan Kara
  Cc: LKML, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	infinipath, Roland Dreier,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Inadvertent send!

Mike

> -----Original Message-----
> From: Marciniszyn, Mike
> Sent: Friday, October 04, 2013 9:39 AM
> To: Jan Kara
> Cc: LKML; linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org; infinipath; Roland Dreier; linux-
> rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: RE: [PATCH 23/26] ib: Convert qib_get_user_pages() to
> get_user_pages_unlocked()
> 
> 
> 
> > -----Original Message-----
> > From: Jan Kara [mailto:jack-AlSwsSmVLrQ@public.gmane.org]
> > Sent: Wednesday, October 02, 2013 11:39 AM
> > To: Marciniszyn, Mike
> > Cc: Jan Kara; LKML; linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org; infinipath; Roland Dreier;
> > linux- rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > Subject: Re: [PATCH 23/26] ib: Convert qib_get_user_pages() to
> > get_user_pages_unlocked()
> >
> > On Wed 02-10-13 15:32:47, Marciniszyn, Mike wrote:
> > > > > The risk of GUP fast is the loss of the "force" arg on GUP fast,
> > > > > which I don't see as significant give our use case.
> > > >   Yes. I was discussing with Roland some time ago whether the
> > > > force argument is needed and he said it is. So I kept the
> > > > arguments of
> > > > get_user_pages() intact and just simplified the locking...
> > >
> > > The PSM side of the code is a more traditional use of GUP (like
> > > direct I/O), so I think it is a different use case than the locking
> > > for IB memory regions.
> >   Ah, I see. Whatever suits you best. I don't really care as long as
> > get_user_pages() locking doesn't leak into IB drivers :)
> >
> > 								Honza
> > --
> > Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
> > SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
       [not found]   ` <1380724087-13927-24-git-send-email-jack-AlSwsSmVLrQ@public.gmane.org>
  2013-10-02 14:54     ` Marciniszyn, Mike
@ 2013-10-04 13:52     ` Marciniszyn, Mike
       [not found]       ` <32E1700B9017364D9B60AED9960492BC211B0176-AtyAts71sc9zLByeVOV5+bfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  1 sibling, 1 reply; 28+ messages in thread
From: Marciniszyn, Mike @ 2013-10-04 13:52 UTC (permalink / raw)
  To: Jan Kara, LKML
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, infinipath,
	Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

> Convert qib_get_user_pages() to use get_user_pages_unlocked().  This
> shortens the section where we hold mmap_sem for writing and also removes
> the knowledge about get_user_pages() locking from ipath driver. We also fix
> a bug in testing pinned number of pages when changing the code.
> 

This patch and the sibling ipath patch will nominally take the mmap_sem twice where the old routine only took it once.   This is a performance issue.

Is the intent here to deprecate get_user_pages()?

I agree, the old code's lock limit test is broke and needs to be fixed.   I like the elimination of the silly wrapper routine!

Could the lock limit test be pushed into another version of the wrapper so that there is only one set of mmap_sem transactions?

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
       [not found]       ` <32E1700B9017364D9B60AED9960492BC211B0176-AtyAts71sc9zLByeVOV5+bfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2013-10-04 18:33         ` Jan Kara
       [not found]           ` <20131004183315.GA19557-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
  2013-10-07 15:38           ` Marciniszyn, Mike
  0 siblings, 2 replies; 28+ messages in thread
From: Jan Kara @ 2013-10-04 18:33 UTC (permalink / raw)
  To: Marciniszyn, Mike
  Cc: Jan Kara, LKML, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	infinipath, Roland Dreier,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Fri 04-10-13 13:52:49, Marciniszyn, Mike wrote:
> > Convert qib_get_user_pages() to use get_user_pages_unlocked().  This
> > shortens the section where we hold mmap_sem for writing and also removes
> > the knowledge about get_user_pages() locking from ipath driver. We also fix
> > a bug in testing pinned number of pages when changing the code.
> > 
> 
> This patch and the sibling ipath patch will nominally take the mmap_sem
> twice where the old routine only took it once.   This is a performance
> issue.
  It will take mmap_sem only once during normal operation. Only if
get_user_pages_unlocked() fail, we have to take mmap_sem again to undo
the change of mm->pinned_vm.

> Is the intent here to deprecate get_user_pages()?
  Well, as much as I'd like to, there are really places in mm code which
need to call get_user_pages() while holding mmap_sem to be able to inspect
corresponding vmas etc. So I want to reduce get_user_pages() use as much as
possible but I'm not really hoping in completely removing it.

> I agree, the old code's lock limit test is broke and needs to be fixed.
> I like the elimination of the silly wrapper routine!
> 
> Could the lock limit test be pushed into another version of the wrapper
> so that there is only one set of mmap_sem transactions?
  I'm sorry, I don't understand what you mean here...

								Honza
-- 
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/26] get_user_pages() cleanup
  2013-10-02 20:29   ` Jan Kara
@ 2013-10-04 20:31     ` KOSAKI Motohiro
  2013-10-04 20:42       ` KOSAKI Motohiro
  0 siblings, 1 reply; 28+ messages in thread
From: KOSAKI Motohiro @ 2013-10-04 20:31 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, LKML, linux-mm, Alexander Viro, Andreas Dilger,
	Andy Walls, Arnd Bergmann, Benjamin LaHaise, ceph-devel,
	Dan Williams, David Airlie, dri-devel, Gleb Natapov,
	Greg Kroah-Hartman, hpdd-discuss, Jarod Wilson,
	Jayant Mangalampalli, Jean-Christophe Plagniol-Villard,
	Jesper Nilsson, Kai Makisara, kvm, Laurent Pinchart, linux-aio,
	linux-cris-kernel@

(10/2/13 4:29 PM), Jan Kara wrote:
> On Wed 02-10-13 09:20:09, Christoph Hellwig wrote:
>> On Wed, Oct 02, 2013 at 04:27:41PM +0200, Jan Kara wrote:
>>>    Hello,
>>>
>>>    In my quest for changing locking around page faults to make things easier for
>>> filesystems I found out get_user_pages() users could use a cleanup.  The
>>> knowledge about necessary locking for get_user_pages() is in tons of places in
>>> drivers and quite a few of them actually get it wrong (don't have mmap_sem when
>>> calling get_user_pages() or hold mmap_sem when calling copy_from_user() in the
>>> surrounding code). Rather often this actually doesn't seem necessary. This
>>> patch series converts lots of places to use either get_user_pages_fast()
>>> or a new simple wrapper get_user_pages_unlocked() to remove the knowledge
>>> of mmap_sem from the drivers. I'm still looking into converting a few remaining
>>> drivers (most notably v4l2) which are more complex.
>>
>> Even looking over the kerneldoc comment next to it I still fail to
>> understand when you'd want to use get_user_pages_fast and when not.
>    AFAIU get_user_pages_fast() should be used
> 1) if you don't need any special get_user_pages() arguments (like calling
>     it for mm of a different process, forcing COW, or similar).
> 2) you don't expect pages to be unmapped (then get_user_pages_fast() is
> actually somewhat slower because it walks page tables twice).

If target page point to anon or private mapping pages, get_user_pages_fast()
is fork unsafe. O_DIRECT man pages describe a bit about this.


see http://man7.org/linux/man-pages/man2/open.2.html

>       O_DIRECT I/Os should never be run concurrently with the fork(2)
>       system call, if the memory buffer is a private mapping (i.e., any
>       mapping created with the mmap(2) MAP_PRIVATE flag; this includes
>       memory allocated on the heap and statically allocated buffers).  Any
>       such I/Os, whether submitted via an asynchronous I/O interface or
>       from another thread in the process, should be completed before
>       fork(2) is called.  Failure to do so can result in data corruption
>       and undefined behavior in parent and child processes.  This
>       restriction does not apply when the memory buffer for the O_DIRECT
>       I/Os was created using shmat(2) or mmap(2) with the MAP_SHARED flag.
>       Nor does this restriction apply when the memory buffer has been
>       advised as MADV_DONTFORK with madvise(2), ensuring that it will not
>       be available to the child after fork(2).

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/26] get_user_pages() cleanup
  2013-10-04 20:31     ` KOSAKI Motohiro
@ 2013-10-04 20:42       ` KOSAKI Motohiro
  2013-10-07 21:18         ` Jan Kara
  0 siblings, 1 reply; 28+ messages in thread
From: KOSAKI Motohiro @ 2013-10-04 20:42 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, LKML, linux-mm, Alexander Viro, Andreas Dilger,
	Andy Walls, Arnd Bergmann, Benjamin LaHaise, ceph-devel,
	Dan Williams, David Airlie, dri-devel, Gleb Natapov,
	Greg Kroah-Hartman, hpdd-discuss, Jarod Wilson,
	Jayant Mangalampalli, Jean-Christophe Plagniol-Villard,
	Jesper Nilsson, Kai Makisara, kvm, Laurent Pinchart, linux-aio,
	linux-cris-kernel@

(10/4/13 4:31 PM), KOSAKI Motohiro wrote:
> (10/2/13 4:29 PM), Jan Kara wrote:
>> On Wed 02-10-13 09:20:09, Christoph Hellwig wrote:
>>> On Wed, Oct 02, 2013 at 04:27:41PM +0200, Jan Kara wrote:
>>>>     Hello,
>>>>
>>>>     In my quest for changing locking around page faults to make things easier for
>>>> filesystems I found out get_user_pages() users could use a cleanup.  The
>>>> knowledge about necessary locking for get_user_pages() is in tons of places in
>>>> drivers and quite a few of them actually get it wrong (don't have mmap_sem when
>>>> calling get_user_pages() or hold mmap_sem when calling copy_from_user() in the
>>>> surrounding code). Rather often this actually doesn't seem necessary. This
>>>> patch series converts lots of places to use either get_user_pages_fast()
>>>> or a new simple wrapper get_user_pages_unlocked() to remove the knowledge
>>>> of mmap_sem from the drivers. I'm still looking into converting a few remaining
>>>> drivers (most notably v4l2) which are more complex.
>>>
>>> Even looking over the kerneldoc comment next to it I still fail to
>>> understand when you'd want to use get_user_pages_fast and when not.
>>     AFAIU get_user_pages_fast() should be used
>> 1) if you don't need any special get_user_pages() arguments (like calling
>>      it for mm of a different process, forcing COW, or similar).
>> 2) you don't expect pages to be unmapped (then get_user_pages_fast() is
>> actually somewhat slower because it walks page tables twice).
>
> If target page point to anon or private mapping pages, get_user_pages_fast()
> is fork unsafe. O_DIRECT man pages describe a bit about this.
>
>
> see http://man7.org/linux/man-pages/man2/open.2.html
>
>>        O_DIRECT I/Os should never be run concurrently with the fork(2)
>>        system call, if the memory buffer is a private mapping (i.e., any
>>        mapping created with the mmap(2) MAP_PRIVATE flag; this includes
>>        memory allocated on the heap and statically allocated buffers).  Any
>>        such I/Os, whether submitted via an asynchronous I/O interface or
>>        from another thread in the process, should be completed before
>>        fork(2) is called.  Failure to do so can result in data corruption
>>        and undefined behavior in parent and child processes.  This
>>        restriction does not apply when the memory buffer for the O_DIRECT
>>        I/Os was created using shmat(2) or mmap(2) with the MAP_SHARED flag.
>>        Nor does this restriction apply when the memory buffer has been
>>        advised as MADV_DONTFORK with madvise(2), ensuring that it will not
>>        be available to the child after fork(2).

IMHO, get_user_pages_fast() should be renamed to get_user_pages_quirk(). Its
semantics is not equal to get_user_pages(). When someone simply substitute
get_user_pages() to get_user_pages_fast(), they might see huge trouble.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
       [not found]           ` <20131004183315.GA19557-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
@ 2013-10-07 15:20             ` Marciniszyn, Mike
  0 siblings, 0 replies; 28+ messages in thread
From: Marciniszyn, Mike @ 2013-10-07 15:20 UTC (permalink / raw)
  To: Jan Kara
  Cc: LKML, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	infinipath, Roland Dreier,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org



> -----Original Message-----
> From: Jan Kara [mailto:jack-AlSwsSmVLrQ@public.gmane.org]
> Sent: Friday, October 04, 2013 2:33 PM
> To: Marciniszyn, Mike
> Cc: Jan Kara; LKML; linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org; infinipath; Roland Dreier; linux-
> rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: [PATCH 23/26] ib: Convert qib_get_user_pages() to
> get_user_pages_unlocked()
> 
> On Fri 04-10-13 13:52:49, Marciniszyn, Mike wrote:
> > > Convert qib_get_user_pages() to use get_user_pages_unlocked().  This
> > > shortens the section where we hold mmap_sem for writing and also
> > > removes the knowledge about get_user_pages() locking from ipath
> > > driver. We also fix a bug in testing pinned number of pages when
> changing the code.
> > >
> >
> > This patch and the sibling ipath patch will nominally take the mmap_sem
> > twice where the old routine only took it once.   This is a performance
> > issue.
>   It will take mmap_sem only once during normal operation. Only if
> get_user_pages_unlocked() fail, we have to take mmap_sem again to undo
> the change of mm->pinned_vm.
> 
> > Is the intent here to deprecate get_user_pages()?
>   Well, as much as I'd like to, there are really places in mm code which need
> to call get_user_pages() while holding mmap_sem to be able to inspect
> corresponding vmas etc. So I want to reduce get_user_pages() use as much
> as possible but I'm not really hoping in completely removing it.
> 
> > I agree, the old code's lock limit test is broke and needs to be fixed.
> > I like the elimination of the silly wrapper routine!
> >
> > Could the lock limit test be pushed into another version of the
> > wrapper so that there is only one set of mmap_sem transactions?
>   I'm sorry, I don't understand what you mean here...
> 
> 								Honza
> --
> Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
> SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
  2013-10-04 18:33         ` Jan Kara
       [not found]           ` <20131004183315.GA19557-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
@ 2013-10-07 15:38           ` Marciniszyn, Mike
  2013-10-07 17:26             ` Jan Kara
  1 sibling, 1 reply; 28+ messages in thread
From: Marciniszyn, Mike @ 2013-10-07 15:38 UTC (permalink / raw)
  To: Jan Kara
  Cc: LKML, linux-mm@kvack.org, infinipath, Roland Dreier,
	linux-rdma@vger.kernel.org

> > This patch and the sibling ipath patch will nominally take the mmap_sem
> > twice where the old routine only took it once.   This is a performance
> > issue.
>   It will take mmap_sem only once during normal operation. Only if
> get_user_pages_unlocked() fail, we have to take mmap_sem again to undo
> the change of mm->pinned_vm.
> 
> > Is the intent here to deprecate get_user_pages()?

The old code looked like:
__qib_get_user_pages()
	(broken) ulimit test
             for (...)
		get_user_pages()

qib_get_user_pages()
	mmap_sem lock
	__qib_get_user_pages()
             mmap_sem() unlock

The new code is:

get_user_pages_unlocked()
	mmap_sem  lock
	get_user_pages()
	mmap_sem unlock

qib_get_user_pages()
	mmap_sem lock
             ulimit test and locked pages maintenance
             mmap_sem unlock
	for (...)
		get_user_pages_unlocked()

I count an additional pair of mmap_sem transactions in the normal case.

> 
> > Could the lock limit test be pushed into another version of the
> > wrapper so that there is only one set of mmap_sem transactions?
>   I'm sorry, I don't understand what you mean here...
> 

This is what I had in mind:

get_user_pages_ulimit_unlocked()
	mmap_sem  lock
	ulimit test and locked pages maintenance (from qib/ipath)
             for (...)
	       get_user_pages_unlocked()	
	mmap_sem unlock
	
qib_get_user_pages()
	get_user_pages_ulimit_unlocked()

This really pushes the code into a new wrapper common to ipath/qib and any others that might want to combine locking with ulimit enforcement.

Mike

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
  2013-10-07 15:38           ` Marciniszyn, Mike
@ 2013-10-07 17:26             ` Jan Kara
  2013-10-08 19:06               ` Jan Kara
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2013-10-07 17:26 UTC (permalink / raw)
  To: Marciniszyn, Mike
  Cc: Jan Kara, LKML, linux-mm@kvack.org, infinipath, Roland Dreier,
	linux-rdma@vger.kernel.org

On Mon 07-10-13 15:38:24, Marciniszyn, Mike wrote:
> > > This patch and the sibling ipath patch will nominally take the mmap_sem
> > > twice where the old routine only took it once.   This is a performance
> > > issue.
> >   It will take mmap_sem only once during normal operation. Only if
> > get_user_pages_unlocked() fail, we have to take mmap_sem again to undo
> > the change of mm->pinned_vm.
> > 
> > > Is the intent here to deprecate get_user_pages()?
> 
> The old code looked like:
> __qib_get_user_pages()
> 	(broken) ulimit test
>              for (...)
> 		get_user_pages()
> 
> qib_get_user_pages()
> 	mmap_sem lock
> 	__qib_get_user_pages()
>              mmap_sem() unlock
> 
> The new code is:
> 
> get_user_pages_unlocked()
> 	mmap_sem  lock
> 	get_user_pages()
> 	mmap_sem unlock
> 
> qib_get_user_pages()
> 	mmap_sem lock
>              ulimit test and locked pages maintenance
>              mmap_sem unlock
> 	for (...)
> 		get_user_pages_unlocked()
> 
> I count an additional pair of mmap_sem transactions in the normal case.
  Ah, sorry, you are right.

> > > Could the lock limit test be pushed into another version of the
> > > wrapper so that there is only one set of mmap_sem transactions?
> >   I'm sorry, I don't understand what you mean here...
> > 
> 
> This is what I had in mind:
> 
> get_user_pages_ulimit_unlocked()
> 	mmap_sem  lock
> 	ulimit test and locked pages maintenance (from qib/ipath)
>              for (...)
> 	       get_user_pages_unlocked()	
> 	mmap_sem unlock
> 	
> qib_get_user_pages()
> 	get_user_pages_ulimit_unlocked()
> 
> This really pushes the code into a new wrapper common to ipath/qib and
> any others that might want to combine locking with ulimit enforcement.
  We could do that but frankly, I'd rather change ulimit enforcement to not
require mmap_sem and use atomic counter instead. I'll see what I can do.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/26] get_user_pages() cleanup
  2013-10-04 20:42       ` KOSAKI Motohiro
@ 2013-10-07 21:18         ` Jan Kara
  2013-10-08  0:27           ` KOSAKI Motohiro
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2013-10-07 21:18 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Jan Kara, Christoph Hellwig, LKML, linux-mm, Alexander Viro,
	Andreas Dilger, Andy Walls, Arnd Bergmann, Benjamin LaHaise,
	ceph-devel, Dan Williams, David Airlie, dri-devel, Gleb Natapov,
	Greg Kroah-Hartman, hpdd-discuss, Jarod Wilson,
	Jayant Mangalampalli, Jean-Christophe Plagniol-Villard,
	Jesper Nilsson, Kai Makisara, kvm, Laurent Pinchart, linux-aio

On Fri 04-10-13 16:42:19, KOSAKI Motohiro wrote:
> (10/4/13 4:31 PM), KOSAKI Motohiro wrote:
> >(10/2/13 4:29 PM), Jan Kara wrote:
> >>On Wed 02-10-13 09:20:09, Christoph Hellwig wrote:
> >>>On Wed, Oct 02, 2013 at 04:27:41PM +0200, Jan Kara wrote:
> >>>>    Hello,
> >>>>
> >>>>    In my quest for changing locking around page faults to make things easier for
> >>>>filesystems I found out get_user_pages() users could use a cleanup.  The
> >>>>knowledge about necessary locking for get_user_pages() is in tons of places in
> >>>>drivers and quite a few of them actually get it wrong (don't have mmap_sem when
> >>>>calling get_user_pages() or hold mmap_sem when calling copy_from_user() in the
> >>>>surrounding code). Rather often this actually doesn't seem necessary. This
> >>>>patch series converts lots of places to use either get_user_pages_fast()
> >>>>or a new simple wrapper get_user_pages_unlocked() to remove the knowledge
> >>>>of mmap_sem from the drivers. I'm still looking into converting a few remaining
> >>>>drivers (most notably v4l2) which are more complex.
> >>>
> >>>Even looking over the kerneldoc comment next to it I still fail to
> >>>understand when you'd want to use get_user_pages_fast and when not.
> >>    AFAIU get_user_pages_fast() should be used
> >>1) if you don't need any special get_user_pages() arguments (like calling
> >>     it for mm of a different process, forcing COW, or similar).
> >>2) you don't expect pages to be unmapped (then get_user_pages_fast() is
> >>actually somewhat slower because it walks page tables twice).
> >
> >If target page point to anon or private mapping pages, get_user_pages_fast()
> >is fork unsafe. O_DIRECT man pages describe a bit about this.
> >
> >
> >see http://man7.org/linux/man-pages/man2/open.2.html
> >
> >>       O_DIRECT I/Os should never be run concurrently with the fork(2)
> >>       system call, if the memory buffer is a private mapping (i.e., any
> >>       mapping created with the mmap(2) MAP_PRIVATE flag; this includes
> >>       memory allocated on the heap and statically allocated buffers).  Any
> >>       such I/Os, whether submitted via an asynchronous I/O interface or
> >>       from another thread in the process, should be completed before
> >>       fork(2) is called.  Failure to do so can result in data corruption
> >>       and undefined behavior in parent and child processes.  This
> >>       restriction does not apply when the memory buffer for the O_DIRECT
> >>       I/Os was created using shmat(2) or mmap(2) with the MAP_SHARED flag.
> >>       Nor does this restriction apply when the memory buffer has been
> >>       advised as MADV_DONTFORK with madvise(2), ensuring that it will not
> >>       be available to the child after fork(2).
> 
> IMHO, get_user_pages_fast() should be renamed to get_user_pages_quirk(). Its
> semantics is not equal to get_user_pages(). When someone simply substitute
> get_user_pages() to get_user_pages_fast(), they might see huge trouble.
  I forgot about this speciality (and actually comments didn't remind me
:(). But thinking about this some more get_user_pages_fast() seems as save
as get_user_pages() in presence of threads sharing mm, doesn't it? Because
while get_user_pages() are working, other thread can happilly trigger COW
on some of the pages and thus get_user_pages() can return pages some of
which are invisible in our mm by the time get_user_pages() returns. So
although in practice I agree problems of get_user_pages_fast() with fork(2)
are more visible, in essence they are still present with clone(2) and
get_user_pages().

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/26] get_user_pages() cleanup
  2013-10-07 21:18         ` Jan Kara
@ 2013-10-08  0:27           ` KOSAKI Motohiro
  2013-10-08  6:06             ` Jan Kara
  0 siblings, 1 reply; 28+ messages in thread
From: KOSAKI Motohiro @ 2013-10-08  0:27 UTC (permalink / raw)
  To: Jan Kara
  Cc: KOSAKI Motohiro, Christoph Hellwig, LKML, linux-mm,
	Alexander Viro, Andreas Dilger, Andy Walls, Arnd Bergmann,
	Benjamin LaHaise, ceph-devel, Dan Williams, David Airlie,
	dri-devel, Gleb Natapov, Greg Kroah-Hartman, hpdd-discuss,
	Jarod Wilson, Jayant Mangalampalli,
	Jean-Christophe Plagniol-Villard, Jesper Nilsson, Kai Makisara,
	kvm, Laurent Pinchart

(10/7/13 5:18 PM), Jan Kara wrote:
> On Fri 04-10-13 16:42:19, KOSAKI Motohiro wrote:
>> (10/4/13 4:31 PM), KOSAKI Motohiro wrote:
>>> (10/2/13 4:29 PM), Jan Kara wrote:
>>>> On Wed 02-10-13 09:20:09, Christoph Hellwig wrote:
>>>>> On Wed, Oct 02, 2013 at 04:27:41PM +0200, Jan Kara wrote:
>>>>>>     Hello,
>>>>>>
>>>>>>     In my quest for changing locking around page faults to make things easier for
>>>>>> filesystems I found out get_user_pages() users could use a cleanup.  The
>>>>>> knowledge about necessary locking for get_user_pages() is in tons of places in
>>>>>> drivers and quite a few of them actually get it wrong (don't have mmap_sem when
>>>>>> calling get_user_pages() or hold mmap_sem when calling copy_from_user() in the
>>>>>> surrounding code). Rather often this actually doesn't seem necessary. This
>>>>>> patch series converts lots of places to use either get_user_pages_fast()
>>>>>> or a new simple wrapper get_user_pages_unlocked() to remove the knowledge
>>>>>> of mmap_sem from the drivers. I'm still looking into converting a few remaining
>>>>>> drivers (most notably v4l2) which are more complex.
>>>>>
>>>>> Even looking over the kerneldoc comment next to it I still fail to
>>>>> understand when you'd want to use get_user_pages_fast and when not.
>>>>     AFAIU get_user_pages_fast() should be used
>>>> 1) if you don't need any special get_user_pages() arguments (like calling
>>>>      it for mm of a different process, forcing COW, or similar).
>>>> 2) you don't expect pages to be unmapped (then get_user_pages_fast() is
>>>> actually somewhat slower because it walks page tables twice).
>>>
>>> If target page point to anon or private mapping pages, get_user_pages_fast()
>>> is fork unsafe. O_DIRECT man pages describe a bit about this.
>>>
>>>
>>> see http://man7.org/linux/man-pages/man2/open.2.html
>>>
>>>>        O_DIRECT I/Os should never be run concurrently with the fork(2)
>>>>        system call, if the memory buffer is a private mapping (i.e., any
>>>>        mapping created with the mmap(2) MAP_PRIVATE flag; this includes
>>>>        memory allocated on the heap and statically allocated buffers).  Any
>>>>        such I/Os, whether submitted via an asynchronous I/O interface or
>>>>        from another thread in the process, should be completed before
>>>>        fork(2) is called.  Failure to do so can result in data corruption
>>>>        and undefined behavior in parent and child processes.  This
>>>>        restriction does not apply when the memory buffer for the O_DIRECT
>>>>        I/Os was created using shmat(2) or mmap(2) with the MAP_SHARED flag.
>>>>        Nor does this restriction apply when the memory buffer has been
>>>>        advised as MADV_DONTFORK with madvise(2), ensuring that it will not
>>>>        be available to the child after fork(2).
>>
>> IMHO, get_user_pages_fast() should be renamed to get_user_pages_quirk(). Its
>> semantics is not equal to get_user_pages(). When someone simply substitute
>> get_user_pages() to get_user_pages_fast(), they might see huge trouble.
>    I forgot about this speciality (and actually comments didn't remind me
> :(). But thinking about this some more get_user_pages_fast() seems as save
> as get_user_pages() in presence of threads sharing mm, doesn't it?

It depends.

If there is any guarantee that other threads don't touch the same page which
retrieved get_user_pages(), get_user_pages_fast() give us brilliant fast way.
Example, as far as I heard form IB guys, the userland library of the infiniband
stack uses madvise(MADV_DONTFORK), and then they don't need to care COW issue
and can choose fastest way. An another example is a futex. futex doesn't use
the contents of the pages, it uses vaddr only for looking up key. Then, it
also doesn't have COW issue.

I don't know other cases. But as far as I know, everything is case-by-case.

> Because
> while get_user_pages() are working, other thread can happilly trigger COW
> on some of the pages and thus get_user_pages() can return pages some of
> which are invisible in our mm by the time get_user_pages() returns.

If you are talking about get_user_pages() instead of get_user_pages_fast(), this
can't be happen because page-fault takes mmap_sem too.

I would say, mmap_sem has too fat responsibility really.

> So
> although in practice I agree problems of get_user_pages_fast() with fork(2)
> are more visible, in essence they are still present with clone(2) and
> get_user_pages().
>
> 								Honza
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/26] get_user_pages() cleanup
  2013-10-08  0:27           ` KOSAKI Motohiro
@ 2013-10-08  6:06             ` Jan Kara
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2013-10-08  6:06 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Jan Kara, Christoph Hellwig, LKML, linux-mm, Alexander Viro,
	Andreas Dilger, Andy Walls, Arnd Bergmann, Benjamin LaHaise,
	ceph-devel, Dan Williams, David Airlie, dri-devel, Gleb Natapov,
	Greg Kroah-Hartman, hpdd-discuss, Jarod Wilson,
	Jayant Mangalampalli, Jean-Christophe Plagniol-Villard,
	Jesper Nilsson, Kai Makisara, kvm, Laurent Pinchart, linux-aio

On Mon 07-10-13 20:27:16, KOSAKI Motohiro wrote:
> (10/7/13 5:18 PM), Jan Kara wrote:
> >On Fri 04-10-13 16:42:19, KOSAKI Motohiro wrote:
> >>(10/4/13 4:31 PM), KOSAKI Motohiro wrote:
> >>>(10/2/13 4:29 PM), Jan Kara wrote:
> >>>>On Wed 02-10-13 09:20:09, Christoph Hellwig wrote:
> >>>>>On Wed, Oct 02, 2013 at 04:27:41PM +0200, Jan Kara wrote:
> >>>>>>    Hello,
> >>>>>>
> >>>>>>    In my quest for changing locking around page faults to make things easier for
> >>>>>>filesystems I found out get_user_pages() users could use a cleanup.  The
> >>>>>>knowledge about necessary locking for get_user_pages() is in tons of places in
> >>>>>>drivers and quite a few of them actually get it wrong (don't have mmap_sem when
> >>>>>>calling get_user_pages() or hold mmap_sem when calling copy_from_user() in the
> >>>>>>surrounding code). Rather often this actually doesn't seem necessary. This
> >>>>>>patch series converts lots of places to use either get_user_pages_fast()
> >>>>>>or a new simple wrapper get_user_pages_unlocked() to remove the knowledge
> >>>>>>of mmap_sem from the drivers. I'm still looking into converting a few remaining
> >>>>>>drivers (most notably v4l2) which are more complex.
> >>>>>
> >>>>>Even looking over the kerneldoc comment next to it I still fail to
> >>>>>understand when you'd want to use get_user_pages_fast and when not.
> >>>>    AFAIU get_user_pages_fast() should be used
> >>>>1) if you don't need any special get_user_pages() arguments (like calling
> >>>>     it for mm of a different process, forcing COW, or similar).
> >>>>2) you don't expect pages to be unmapped (then get_user_pages_fast() is
> >>>>actually somewhat slower because it walks page tables twice).
> >>>
> >>>If target page point to anon or private mapping pages, get_user_pages_fast()
> >>>is fork unsafe. O_DIRECT man pages describe a bit about this.
> >>>
> >>>
> >>>see http://man7.org/linux/man-pages/man2/open.2.html
> >>>
> >>>>       O_DIRECT I/Os should never be run concurrently with the fork(2)
> >>>>       system call, if the memory buffer is a private mapping (i.e., any
> >>>>       mapping created with the mmap(2) MAP_PRIVATE flag; this includes
> >>>>       memory allocated on the heap and statically allocated buffers).  Any
> >>>>       such I/Os, whether submitted via an asynchronous I/O interface or
> >>>>       from another thread in the process, should be completed before
> >>>>       fork(2) is called.  Failure to do so can result in data corruption
> >>>>       and undefined behavior in parent and child processes.  This
> >>>>       restriction does not apply when the memory buffer for the O_DIRECT
> >>>>       I/Os was created using shmat(2) or mmap(2) with the MAP_SHARED flag.
> >>>>       Nor does this restriction apply when the memory buffer has been
> >>>>       advised as MADV_DONTFORK with madvise(2), ensuring that it will not
> >>>>       be available to the child after fork(2).
> >>
> >>IMHO, get_user_pages_fast() should be renamed to get_user_pages_quirk(). Its
> >>semantics is not equal to get_user_pages(). When someone simply substitute
> >>get_user_pages() to get_user_pages_fast(), they might see huge trouble.
> >   I forgot about this speciality (and actually comments didn't remind me
> >:(). But thinking about this some more get_user_pages_fast() seems as save
> >as get_user_pages() in presence of threads sharing mm, doesn't it?
> 
> It depends.
> 
> If there is any guarantee that other threads don't touch the same page which
> retrieved get_user_pages(), get_user_pages_fast() give us brilliant fast way.
> Example, as far as I heard form IB guys, the userland library of the infiniband
> stack uses madvise(MADV_DONTFORK), and then they don't need to care COW issue
> and can choose fastest way. An another example is a futex. futex doesn't use
> the contents of the pages, it uses vaddr only for looking up key. Then, it
> also doesn't have COW issue.
> 
> I don't know other cases. But as far as I know, everything is case-by-case.
> 
> >Because
> >while get_user_pages() are working, other thread can happilly trigger COW
> >on some of the pages and thus get_user_pages() can return pages some of
> >which are invisible in our mm by the time get_user_pages() returns.
> 
> If you are talking about get_user_pages() instead of get_user_pages_fast(), this
> can't be happen because page-fault takes mmap_sem too.
  But both take mmap_sem for reading, so they won't exclude each other...

> I would say, mmap_sem has too fat responsibility really.
  I heartily agree with this. And that makes any changes to it really hard.
I've once went through all the places in kernel which acquired mmap_sem.
If I remember right, there were close to two hundreds of them! Basic use
is the protection of vma tree but then we have less obvious stuff like
protection of some other fields in mm_struct, fork exclusion, and god knows
what...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
  2013-10-07 17:26             ` Jan Kara
@ 2013-10-08 19:06               ` Jan Kara
       [not found]                 ` <20131008190604.GB14223-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2013-10-08 19:06 UTC (permalink / raw)
  To: Marciniszyn, Mike
  Cc: Jan Kara, LKML, linux-mm@kvack.org, infinipath, Roland Dreier,
	linux-rdma@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 2440 bytes --]

On Mon 07-10-13 19:26:04, Jan Kara wrote:
> On Mon 07-10-13 15:38:24, Marciniszyn, Mike wrote:
> > > > This patch and the sibling ipath patch will nominally take the mmap_sem
> > > > twice where the old routine only took it once.   This is a performance
> > > > issue.
> > >   It will take mmap_sem only once during normal operation. Only if
> > > get_user_pages_unlocked() fail, we have to take mmap_sem again to undo
> > > the change of mm->pinned_vm.
> > > 
> > > > Is the intent here to deprecate get_user_pages()?
> > 
> > The old code looked like:
> > __qib_get_user_pages()
> > 	(broken) ulimit test
> >              for (...)
> > 		get_user_pages()
> > 
> > qib_get_user_pages()
> > 	mmap_sem lock
> > 	__qib_get_user_pages()
> >              mmap_sem() unlock
> > 
> > The new code is:
> > 
> > get_user_pages_unlocked()
> > 	mmap_sem  lock
> > 	get_user_pages()
> > 	mmap_sem unlock
> > 
> > qib_get_user_pages()
> > 	mmap_sem lock
> >              ulimit test and locked pages maintenance
> >              mmap_sem unlock
> > 	for (...)
> > 		get_user_pages_unlocked()
> > 
> > I count an additional pair of mmap_sem transactions in the normal case.
>   Ah, sorry, you are right.
> 
> > > > Could the lock limit test be pushed into another version of the
> > > > wrapper so that there is only one set of mmap_sem transactions?
> > >   I'm sorry, I don't understand what you mean here...
> > > 
> > 
> > This is what I had in mind:
> > 
> > get_user_pages_ulimit_unlocked()
> > 	mmap_sem  lock
> > 	ulimit test and locked pages maintenance (from qib/ipath)
> >              for (...)
> > 	       get_user_pages_unlocked()	
> > 	mmap_sem unlock
> > 	
> > qib_get_user_pages()
> > 	get_user_pages_ulimit_unlocked()
> > 
> > This really pushes the code into a new wrapper common to ipath/qib and
> > any others that might want to combine locking with ulimit enforcement.
>   We could do that but frankly, I'd rather change ulimit enforcement to not
> require mmap_sem and use atomic counter instead. I'll see what I can do.
  OK, so something like the attached patch (compile tested only). What do
you think? I'm just not 100% sure removing mmap_sem surrounding stuff like
__ipath_release_user_pages() is safe. I don't see a reason why it shouldn't
be - we have references to the pages and we only mark them dirty and put the
reference - but maybe I miss something subtle...

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: 0001-mm-Switch-mm-pinned_vm-to-atomic_long_t.patch --]
[-- Type: text/x-patch, Size: 14825 bytes --]

>From 44fe90e8303b293370f077ae665d6e43846cf277 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Tue, 8 Oct 2013 14:16:24 +0200
Subject: [PATCH] mm: Switch mm->pinned_vm to atomic_long_t

Currently updates to mm->pinned_vm were protected by mmap_sem.
kernel/events/core.c actually held it only for reading so that may have
been racy. The only other user - Infiniband - used mmap_sem for writing
but it caused quite some complications to it.

So switch mm->pinned_vm and convert all the places using it. This allows
quite some simplifications to Infiniband code and it now doesn't have to
care about mmap_sem at all.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/infiniband/core/umem.c                 | 56 +++------------------
 drivers/infiniband/core/uverbs_cmd.c           |  1 -
 drivers/infiniband/core/uverbs_main.c          |  2 -
 drivers/infiniband/hw/ipath/ipath_file_ops.c   |  2 +-
 drivers/infiniband/hw/ipath/ipath_kernel.h     |  1 -
 drivers/infiniband/hw/ipath/ipath_user_pages.c | 70 +++-----------------------
 drivers/infiniband/hw/qib/qib_user_pages.c     | 20 ++------
 fs/proc/task_mmu.c                             |  2 +-
 include/linux/mm_types.h                       |  2 +-
 include/rdma/ib_umem.h                         |  3 --
 include/rdma/ib_verbs.h                        |  1 -
 kernel/events/core.c                           | 13 +++--
 12 files changed, 29 insertions(+), 144 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 0640a89021a9..294d2e468177 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -80,6 +80,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 {
 	struct ib_umem *umem;
 	struct page **page_list;
+	struct mm_struct *mm = current->mm;
 	struct ib_umem_chunk *chunk;
 	unsigned long locked;
 	unsigned long lock_limit;
@@ -126,20 +127,13 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 
 	npages = PAGE_ALIGN(size + umem->offset) >> PAGE_SHIFT;
 
-	down_write(&current->mm->mmap_sem);
-
 	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 	locked = npages;
-	if (npages + current->mm->pinned_vm > lock_limit &&
+	if (atomic_long_add_return(&mm->pinned_vm, npages) > lock_limit &&
 	    !capable(CAP_IPC_LOCK)) {
-		up_write(&current->mm->mmap_sem);
-		kfree(umem);
-		free_page((unsigned long) page_list);
-		return ERR_PTR(-ENOMEM);
+		ret = -ENOMEM;
+		goto out;
 	}
-	current->mm->pinned_vm += npages;
-
-	up_write(&current->mm->mmap_sem);
 
 	cur_base = addr & PAGE_MASK;
 
@@ -201,9 +195,7 @@ out:
 	if (ret < 0) {
 		__ib_umem_release(context->device, umem, 0);
 		kfree(umem);
-		down_write(&current->mm->mmap_sem);
-		current->mm->pinned_vm -= locked;
-		up_write(&current->mm->mmap_sem);
+		atomic_long_sub(&mm->pinned_vm, locked);
 	}
 	free_page((unsigned long) page_list);
 
@@ -211,17 +203,6 @@ out:
 }
 EXPORT_SYMBOL(ib_umem_get);
 
-static void ib_umem_account(struct work_struct *work)
-{
-	struct ib_umem *umem = container_of(work, struct ib_umem, work);
-
-	down_write(&umem->mm->mmap_sem);
-	umem->mm->pinned_vm -= umem->diff;
-	up_write(&umem->mm->mmap_sem);
-	mmput(umem->mm);
-	kfree(umem);
-}
-
 /**
  * ib_umem_release - release memory pinned with ib_umem_get
  * @umem: umem struct to release
@@ -234,37 +215,14 @@ void ib_umem_release(struct ib_umem *umem)
 
 	__ib_umem_release(umem->context->device, umem, 1);
 
-	mm = get_task_mm(current);
+	mm = current->mm;
 	if (!mm) {
 		kfree(umem);
 		return;
 	}
 
 	diff = PAGE_ALIGN(umem->length + umem->offset) >> PAGE_SHIFT;
-
-	/*
-	 * We may be called with the mm's mmap_sem already held.  This
-	 * can happen when a userspace munmap() is the call that drops
-	 * the last reference to our file and calls our release
-	 * method.  If there are memory regions to destroy, we'll end
-	 * up here and not be able to take the mmap_sem.  In that case
-	 * we defer the vm_locked accounting to the system workqueue.
-	 */
-	if (context->closing) {
-		if (!down_write_trylock(&mm->mmap_sem)) {
-			INIT_WORK(&umem->work, ib_umem_account);
-			umem->mm   = mm;
-			umem->diff = diff;
-
-			queue_work(ib_wq, &umem->work);
-			return;
-		}
-	} else
-		down_write(&mm->mmap_sem);
-
-	current->mm->pinned_vm -= diff;
-	up_write(&mm->mmap_sem);
-	mmput(mm);
+	atomic_long_sub(&mm->pinned_vm, diff);
 	kfree(umem);
 }
 EXPORT_SYMBOL(ib_umem_release);
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index f2b81b9ee0d6..16381c6eae77 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -332,7 +332,6 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
 	INIT_LIST_HEAD(&ucontext->ah_list);
 	INIT_LIST_HEAD(&ucontext->xrcd_list);
 	INIT_LIST_HEAD(&ucontext->rule_list);
-	ucontext->closing = 0;
 
 	resp.num_comp_vectors = file->device->num_comp_vectors;
 
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 75ad86c4abf8..8fdc9ca62c27 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -196,8 +196,6 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 	if (!context)
 		return 0;
 
-	context->closing = 1;
-
 	list_for_each_entry_safe(uobj, tmp, &context->ah_list, list) {
 		struct ib_ah *ah = uobj->object;
 
diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c
index 6d7f453b4d05..f219f15ad1cf 100644
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c
@@ -2025,7 +2025,7 @@ static void unlock_expected_tids(struct ipath_portdata *pd)
 		dd->ipath_pageshadow[i] = NULL;
 		pci_unmap_page(dd->pcidev, dd->ipath_physshadow[i],
 			PAGE_SIZE, PCI_DMA_FROMDEVICE);
-		ipath_release_user_pages_on_close(&ps, 1);
+		ipath_release_user_pages(&ps, 1);
 		cnt++;
 		ipath_stats.sps_pageunlocks++;
 	}
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index 6559af60bffd..afc2f868541b 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -1083,7 +1083,6 @@ static inline void ipath_sdma_desc_unreserve(struct ipath_devdata *dd, u16 cnt)
 
 int ipath_get_user_pages(unsigned long, size_t, struct page **);
 void ipath_release_user_pages(struct page **, size_t);
-void ipath_release_user_pages_on_close(struct page **, size_t);
 int ipath_eeprom_read(struct ipath_devdata *, u8, void *, int);
 int ipath_eeprom_write(struct ipath_devdata *, u8, const void *, int);
 int ipath_tempsense_read(struct ipath_devdata *, u8 regnum);
diff --git a/drivers/infiniband/hw/ipath/ipath_user_pages.c b/drivers/infiniband/hw/ipath/ipath_user_pages.c
index a89af9654112..8081e76fa72c 100644
--- a/drivers/infiniband/hw/ipath/ipath_user_pages.c
+++ b/drivers/infiniband/hw/ipath/ipath_user_pages.c
@@ -68,26 +68,22 @@ int ipath_get_user_pages(unsigned long start_page, size_t num_pages,
 			 struct page **p)
 {
 	unsigned long lock_limit;
+	struct mm_struct *mm = current->mm;
 	size_t got;
 	int ret;
 
-	down_write(&current->mm->mmap_sem);
 	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-
-	if (current->mm->pinned_vm + num_pages > lock_limit && 
+	if (atomic_long_add_return(&mm->pinned_vm, num_pages) > lock_limit &&
 	    !capable(CAP_IPC_LOCK)) {
-		up_write(&current->mm->mmap_sem);
 		ret = -ENOMEM;
-		goto bail;
+		goto bail_sub;
 	}
-	current->mm->pinned_vm += num_pages;
-	up_write(&current->mm->mmap_sem);
 
 	ipath_cdbg(VERBOSE, "pin %lx pages from vaddr %lx\n",
 		   (unsigned long) num_pages, start_page);
 
 	for (got = 0; got < num_pages; got += ret) {
-		ret = get_user_pages_unlocked(current, current->mm,
+		ret = get_user_pages_unlocked(current, mm,
 					      start_page + got * PAGE_SIZE,
 					      num_pages - got, 1, 1,
 					      p + got);
@@ -101,9 +97,8 @@ int ipath_get_user_pages(unsigned long start_page, size_t num_pages,
 
 bail_release:
 	__ipath_release_user_pages(p, got, 0);
-	down_write(&current->mm->mmap_sem);
-	current->mm->pinned_vm -= num_pages;
-	up_write(&current->mm->mmap_sem);
+bail_sub:
+	atomic_long_sub(&mm->pinned_vm, num_pages);
 bail:
 	return ret;
 }
@@ -166,56 +161,7 @@ dma_addr_t ipath_map_single(struct pci_dev *hwdev, void *ptr, size_t size,
 
 void ipath_release_user_pages(struct page **p, size_t num_pages)
 {
-	down_write(&current->mm->mmap_sem);
-
-	__ipath_release_user_pages(p, num_pages, 1);
-
-	current->mm->pinned_vm -= num_pages;
-
-	up_write(&current->mm->mmap_sem);
-}
-
-struct ipath_user_pages_work {
-	struct work_struct work;
-	struct mm_struct *mm;
-	unsigned long num_pages;
-};
-
-static void user_pages_account(struct work_struct *_work)
-{
-	struct ipath_user_pages_work *work =
-		container_of(_work, struct ipath_user_pages_work, work);
-
-	down_write(&work->mm->mmap_sem);
-	work->mm->pinned_vm -= work->num_pages;
-	up_write(&work->mm->mmap_sem);
-	mmput(work->mm);
-	kfree(work);
-}
-
-void ipath_release_user_pages_on_close(struct page **p, size_t num_pages)
-{
-	struct ipath_user_pages_work *work;
-	struct mm_struct *mm;
-
 	__ipath_release_user_pages(p, num_pages, 1);
-
-	mm = get_task_mm(current);
-	if (!mm)
-		return;
-
-	work = kmalloc(sizeof(*work), GFP_KERNEL);
-	if (!work)
-		goto bail_mm;
-
-	INIT_WORK(&work->work, user_pages_account);
-	work->mm = mm;
-	work->num_pages = num_pages;
-
-	queue_work(ib_wq, &work->work);
-	return;
-
-bail_mm:
-	mmput(mm);
-	return;
+	if (current->mm)
+		atomic_long_sub(&current->mm->pinned_vm, num_pages);
 }
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
index 57ce83c2d1d9..049ab8db5f32 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -68,16 +68,13 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages,
 	int ret;
 	struct mm_struct *mm = current->mm;
 
-	down_write(&mm->mmap_sem);
 	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 
-	if (mm->pinned_vm + num_pages > lock_limit && !capable(CAP_IPC_LOCK)) {
-		up_write(&mm->mmap_sem);
+	if (atomic_long_add_return(&mm->pinned_vm, num_pages) > lock_limit &&
+	    !capable(CAP_IPC_LOCK)) {
 		ret = -ENOMEM;
 		goto bail;
 	}
-	mm->pinned_vm += num_pages;
-	up_write(&mm->mmap_sem);
 
 	for (got = 0; got < num_pages; got += ret) {
 		ret = get_user_pages_unlocked(current, mm,
@@ -94,9 +91,7 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages,
 
 bail_release:
 	__qib_release_user_pages(p, got, 0);
-	down_write(&mm->mmap_sem);
-	mm->pinned_vm -= num_pages;
-	up_write(&mm->mmap_sem);
+	atomic_long_sub(&mm->pinned_vm, num_pages);
 bail:
 	return ret;
 }
@@ -135,13 +130,8 @@ dma_addr_t qib_map_page(struct pci_dev *hwdev, struct page *page,
 
 void qib_release_user_pages(struct page **p, size_t num_pages)
 {
-	if (current->mm) /* during close after signal, mm can be NULL */
-		down_write(&current->mm->mmap_sem);
-
 	__qib_release_user_pages(p, num_pages, 1);
 
-	if (current->mm) {
-		current->mm->pinned_vm -= num_pages;
-		up_write(&current->mm->mmap_sem);
-	}
+	if (current->mm)
+		atomic_long_sub(&current->mm->pinned_vm, num_pages);
 }
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 7366e9d63cee..9123bfef1dea 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -57,7 +57,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 		hiwater_vm << (PAGE_SHIFT-10),
 		total_vm << (PAGE_SHIFT-10),
 		mm->locked_vm << (PAGE_SHIFT-10),
-		mm->pinned_vm << (PAGE_SHIFT-10),
+		atomic_long_read(&mm->pinned_vm) << (PAGE_SHIFT-10),
 		hiwater_rss << (PAGE_SHIFT-10),
 		total_rss << (PAGE_SHIFT-10),
 		data << (PAGE_SHIFT-10),
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index d9851eeb6e1d..f2bf72a86b70 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -355,7 +355,7 @@ struct mm_struct {
 
 	unsigned long total_vm;		/* Total pages mapped */
 	unsigned long locked_vm;	/* Pages that have PG_mlocked set */
-	unsigned long pinned_vm;	/* Refcount permanently increased */
+	atomic_long_t pinned_vm;	/* Refcount permanently increased */
 	unsigned long shared_vm;	/* Shared pages (files) */
 	unsigned long exec_vm;		/* VM_EXEC & ~VM_WRITE */
 	unsigned long stack_vm;		/* VM_GROWSUP/DOWN */
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 9ee0d2e51b16..2dbbd2c56074 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -47,9 +47,6 @@ struct ib_umem {
 	int                     writable;
 	int                     hugetlb;
 	struct list_head	chunk_list;
-	struct work_struct	work;
-	struct mm_struct       *mm;
-	unsigned long		diff;
 };
 
 struct ib_umem_chunk {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index e393171e2fac..bce6d2b91ec7 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -961,7 +961,6 @@ struct ib_ucontext {
 	struct list_head	ah_list;
 	struct list_head	xrcd_list;
 	struct list_head	rule_list;
-	int			closing;
 };
 
 struct ib_uobject {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index dd236b66ca3a..80d2ba7bd51c 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3922,7 +3922,7 @@ again:
 	 */
 
 	atomic_long_sub((size >> PAGE_SHIFT) + 1, &mmap_user->locked_vm);
-	vma->vm_mm->pinned_vm -= mmap_locked;
+	atomic_long_sub(&vma->vm_mm->pinned_vm, mmap_locked);
 	free_uid(mmap_user);
 
 	ring_buffer_put(rb); /* could be last */
@@ -3944,7 +3944,7 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
 	struct ring_buffer *rb;
 	unsigned long vma_size;
 	unsigned long nr_pages;
-	long user_extra, extra;
+	long user_extra, extra = 0;
 	int ret = 0, flags = 0;
 
 	/*
@@ -4006,16 +4006,14 @@ again:
 
 	user_locked = atomic_long_read(&user->locked_vm) + user_extra;
 
-	extra = 0;
 	if (user_locked > user_lock_limit)
 		extra = user_locked - user_lock_limit;
 
 	lock_limit = rlimit(RLIMIT_MEMLOCK);
 	lock_limit >>= PAGE_SHIFT;
-	locked = vma->vm_mm->pinned_vm + extra;
 
-	if ((locked > lock_limit) && perf_paranoid_tracepoint_raw() &&
-		!capable(CAP_IPC_LOCK)) {
+	if (atomic_long_add_return(&vma->vm_mm->pinned_vm, extra) > lock_limit &&
+	    perf_paranoid_tracepoint_raw() && !capable(CAP_IPC_LOCK)) {
 		ret = -EPERM;
 		goto unlock;
 	}
@@ -4039,7 +4037,6 @@ again:
 	rb->mmap_user = get_current_user();
 
 	atomic_long_add(user_extra, &user->locked_vm);
-	vma->vm_mm->pinned_vm += extra;
 
 	ring_buffer_attach(event, rb);
 	rcu_assign_pointer(event->rb, rb);
@@ -4049,6 +4046,8 @@ again:
 unlock:
 	if (!ret)
 		atomic_inc(&event->mmap_count);
+	else if (extra)
+		atomic_long_sub(&vma->vm_mm->pinned_vm, extra);
 	mutex_unlock(&event->mmap_mutex);
 
 	/*
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
       [not found]                 ` <20131008190604.GB14223-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
@ 2013-10-16 21:39                   ` Jan Kara
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2013-10-16 21:39 UTC (permalink / raw)
  To: Marciniszyn, Mike
  Cc: Jan Kara, LKML, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	infinipath, Roland Dreier,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Tue 08-10-13 21:06:04, Jan Kara wrote:
> On Mon 07-10-13 19:26:04, Jan Kara wrote:
> > On Mon 07-10-13 15:38:24, Marciniszyn, Mike wrote:
> > > > > This patch and the sibling ipath patch will nominally take the mmap_sem
> > > > > twice where the old routine only took it once.   This is a performance
> > > > > issue.
> > > >   It will take mmap_sem only once during normal operation. Only if
> > > > get_user_pages_unlocked() fail, we have to take mmap_sem again to undo
> > > > the change of mm->pinned_vm.
> > > > 
> > > > > Is the intent here to deprecate get_user_pages()?
> > > 
> > > The old code looked like:
> > > __qib_get_user_pages()
> > > 	(broken) ulimit test
> > >              for (...)
> > > 		get_user_pages()
> > > 
> > > qib_get_user_pages()
> > > 	mmap_sem lock
> > > 	__qib_get_user_pages()
> > >              mmap_sem() unlock
> > > 
> > > The new code is:
> > > 
> > > get_user_pages_unlocked()
> > > 	mmap_sem  lock
> > > 	get_user_pages()
> > > 	mmap_sem unlock
> > > 
> > > qib_get_user_pages()
> > > 	mmap_sem lock
> > >              ulimit test and locked pages maintenance
> > >              mmap_sem unlock
> > > 	for (...)
> > > 		get_user_pages_unlocked()
> > > 
> > > I count an additional pair of mmap_sem transactions in the normal case.
> >   Ah, sorry, you are right.
> > 
> > > > > Could the lock limit test be pushed into another version of the
> > > > > wrapper so that there is only one set of mmap_sem transactions?
> > > >   I'm sorry, I don't understand what you mean here...
> > > > 
> > > 
> > > This is what I had in mind:
> > > 
> > > get_user_pages_ulimit_unlocked()
> > > 	mmap_sem  lock
> > > 	ulimit test and locked pages maintenance (from qib/ipath)
> > >              for (...)
> > > 	       get_user_pages_unlocked()	
> > > 	mmap_sem unlock
> > > 	
> > > qib_get_user_pages()
> > > 	get_user_pages_ulimit_unlocked()
> > > 
> > > This really pushes the code into a new wrapper common to ipath/qib and
> > > any others that might want to combine locking with ulimit enforcement.
> >   We could do that but frankly, I'd rather change ulimit enforcement to not
> > require mmap_sem and use atomic counter instead. I'll see what I can do.
>   OK, so something like the attached patch (compile tested only). What do
> you think? I'm just not 100% sure removing mmap_sem surrounding stuff like
> __ipath_release_user_pages() is safe. I don't see a reason why it shouldn't
> be - we have references to the pages and we only mark them dirty and put the
> reference - but maybe I miss something subtle...
  Ping? Any opinion on this?

								Honza

> From 44fe90e8303b293370f077ae665d6e43846cf277 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
> Date: Tue, 8 Oct 2013 14:16:24 +0200
> Subject: [PATCH] mm: Switch mm->pinned_vm to atomic_long_t
> 
> Currently updates to mm->pinned_vm were protected by mmap_sem.
> kernel/events/core.c actually held it only for reading so that may have
> been racy. The only other user - Infiniband - used mmap_sem for writing
> but it caused quite some complications to it.
> 
> So switch mm->pinned_vm and convert all the places using it. This allows
> quite some simplifications to Infiniband code and it now doesn't have to
> care about mmap_sem at all.
> 
> Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
> ---
>  drivers/infiniband/core/umem.c                 | 56 +++------------------
>  drivers/infiniband/core/uverbs_cmd.c           |  1 -
>  drivers/infiniband/core/uverbs_main.c          |  2 -
>  drivers/infiniband/hw/ipath/ipath_file_ops.c   |  2 +-
>  drivers/infiniband/hw/ipath/ipath_kernel.h     |  1 -
>  drivers/infiniband/hw/ipath/ipath_user_pages.c | 70 +++-----------------------
>  drivers/infiniband/hw/qib/qib_user_pages.c     | 20 ++------
>  fs/proc/task_mmu.c                             |  2 +-
>  include/linux/mm_types.h                       |  2 +-
>  include/rdma/ib_umem.h                         |  3 --
>  include/rdma/ib_verbs.h                        |  1 -
>  kernel/events/core.c                           | 13 +++--
>  12 files changed, 29 insertions(+), 144 deletions(-)
> 
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index 0640a89021a9..294d2e468177 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -80,6 +80,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>  {
>  	struct ib_umem *umem;
>  	struct page **page_list;
> +	struct mm_struct *mm = current->mm;
>  	struct ib_umem_chunk *chunk;
>  	unsigned long locked;
>  	unsigned long lock_limit;
> @@ -126,20 +127,13 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>  
>  	npages = PAGE_ALIGN(size + umem->offset) >> PAGE_SHIFT;
>  
> -	down_write(&current->mm->mmap_sem);
> -
>  	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>  	locked = npages;
> -	if (npages + current->mm->pinned_vm > lock_limit &&
> +	if (atomic_long_add_return(&mm->pinned_vm, npages) > lock_limit &&
>  	    !capable(CAP_IPC_LOCK)) {
> -		up_write(&current->mm->mmap_sem);
> -		kfree(umem);
> -		free_page((unsigned long) page_list);
> -		return ERR_PTR(-ENOMEM);
> +		ret = -ENOMEM;
> +		goto out;
>  	}
> -	current->mm->pinned_vm += npages;
> -
> -	up_write(&current->mm->mmap_sem);
>  
>  	cur_base = addr & PAGE_MASK;
>  
> @@ -201,9 +195,7 @@ out:
>  	if (ret < 0) {
>  		__ib_umem_release(context->device, umem, 0);
>  		kfree(umem);
> -		down_write(&current->mm->mmap_sem);
> -		current->mm->pinned_vm -= locked;
> -		up_write(&current->mm->mmap_sem);
> +		atomic_long_sub(&mm->pinned_vm, locked);
>  	}
>  	free_page((unsigned long) page_list);
>  
> @@ -211,17 +203,6 @@ out:
>  }
>  EXPORT_SYMBOL(ib_umem_get);
>  
> -static void ib_umem_account(struct work_struct *work)
> -{
> -	struct ib_umem *umem = container_of(work, struct ib_umem, work);
> -
> -	down_write(&umem->mm->mmap_sem);
> -	umem->mm->pinned_vm -= umem->diff;
> -	up_write(&umem->mm->mmap_sem);
> -	mmput(umem->mm);
> -	kfree(umem);
> -}
> -
>  /**
>   * ib_umem_release - release memory pinned with ib_umem_get
>   * @umem: umem struct to release
> @@ -234,37 +215,14 @@ void ib_umem_release(struct ib_umem *umem)
>  
>  	__ib_umem_release(umem->context->device, umem, 1);
>  
> -	mm = get_task_mm(current);
> +	mm = current->mm;
>  	if (!mm) {
>  		kfree(umem);
>  		return;
>  	}
>  
>  	diff = PAGE_ALIGN(umem->length + umem->offset) >> PAGE_SHIFT;
> -
> -	/*
> -	 * We may be called with the mm's mmap_sem already held.  This
> -	 * can happen when a userspace munmap() is the call that drops
> -	 * the last reference to our file and calls our release
> -	 * method.  If there are memory regions to destroy, we'll end
> -	 * up here and not be able to take the mmap_sem.  In that case
> -	 * we defer the vm_locked accounting to the system workqueue.
> -	 */
> -	if (context->closing) {
> -		if (!down_write_trylock(&mm->mmap_sem)) {
> -			INIT_WORK(&umem->work, ib_umem_account);
> -			umem->mm   = mm;
> -			umem->diff = diff;
> -
> -			queue_work(ib_wq, &umem->work);
> -			return;
> -		}
> -	} else
> -		down_write(&mm->mmap_sem);
> -
> -	current->mm->pinned_vm -= diff;
> -	up_write(&mm->mmap_sem);
> -	mmput(mm);
> +	atomic_long_sub(&mm->pinned_vm, diff);
>  	kfree(umem);
>  }
>  EXPORT_SYMBOL(ib_umem_release);
> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
> index f2b81b9ee0d6..16381c6eae77 100644
> --- a/drivers/infiniband/core/uverbs_cmd.c
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -332,7 +332,6 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
>  	INIT_LIST_HEAD(&ucontext->ah_list);
>  	INIT_LIST_HEAD(&ucontext->xrcd_list);
>  	INIT_LIST_HEAD(&ucontext->rule_list);
> -	ucontext->closing = 0;
>  
>  	resp.num_comp_vectors = file->device->num_comp_vectors;
>  
> diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
> index 75ad86c4abf8..8fdc9ca62c27 100644
> --- a/drivers/infiniband/core/uverbs_main.c
> +++ b/drivers/infiniband/core/uverbs_main.c
> @@ -196,8 +196,6 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
>  	if (!context)
>  		return 0;
>  
> -	context->closing = 1;
> -
>  	list_for_each_entry_safe(uobj, tmp, &context->ah_list, list) {
>  		struct ib_ah *ah = uobj->object;
>  
> diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c
> index 6d7f453b4d05..f219f15ad1cf 100644
> --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c
> +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c
> @@ -2025,7 +2025,7 @@ static void unlock_expected_tids(struct ipath_portdata *pd)
>  		dd->ipath_pageshadow[i] = NULL;
>  		pci_unmap_page(dd->pcidev, dd->ipath_physshadow[i],
>  			PAGE_SIZE, PCI_DMA_FROMDEVICE);
> -		ipath_release_user_pages_on_close(&ps, 1);
> +		ipath_release_user_pages(&ps, 1);
>  		cnt++;
>  		ipath_stats.sps_pageunlocks++;
>  	}
> diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
> index 6559af60bffd..afc2f868541b 100644
> --- a/drivers/infiniband/hw/ipath/ipath_kernel.h
> +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
> @@ -1083,7 +1083,6 @@ static inline void ipath_sdma_desc_unreserve(struct ipath_devdata *dd, u16 cnt)
>  
>  int ipath_get_user_pages(unsigned long, size_t, struct page **);
>  void ipath_release_user_pages(struct page **, size_t);
> -void ipath_release_user_pages_on_close(struct page **, size_t);
>  int ipath_eeprom_read(struct ipath_devdata *, u8, void *, int);
>  int ipath_eeprom_write(struct ipath_devdata *, u8, const void *, int);
>  int ipath_tempsense_read(struct ipath_devdata *, u8 regnum);
> diff --git a/drivers/infiniband/hw/ipath/ipath_user_pages.c b/drivers/infiniband/hw/ipath/ipath_user_pages.c
> index a89af9654112..8081e76fa72c 100644
> --- a/drivers/infiniband/hw/ipath/ipath_user_pages.c
> +++ b/drivers/infiniband/hw/ipath/ipath_user_pages.c
> @@ -68,26 +68,22 @@ int ipath_get_user_pages(unsigned long start_page, size_t num_pages,
>  			 struct page **p)
>  {
>  	unsigned long lock_limit;
> +	struct mm_struct *mm = current->mm;
>  	size_t got;
>  	int ret;
>  
> -	down_write(&current->mm->mmap_sem);
>  	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> -
> -	if (current->mm->pinned_vm + num_pages > lock_limit && 
> +	if (atomic_long_add_return(&mm->pinned_vm, num_pages) > lock_limit &&
>  	    !capable(CAP_IPC_LOCK)) {
> -		up_write(&current->mm->mmap_sem);
>  		ret = -ENOMEM;
> -		goto bail;
> +		goto bail_sub;
>  	}
> -	current->mm->pinned_vm += num_pages;
> -	up_write(&current->mm->mmap_sem);
>  
>  	ipath_cdbg(VERBOSE, "pin %lx pages from vaddr %lx\n",
>  		   (unsigned long) num_pages, start_page);
>  
>  	for (got = 0; got < num_pages; got += ret) {
> -		ret = get_user_pages_unlocked(current, current->mm,
> +		ret = get_user_pages_unlocked(current, mm,
>  					      start_page + got * PAGE_SIZE,
>  					      num_pages - got, 1, 1,
>  					      p + got);
> @@ -101,9 +97,8 @@ int ipath_get_user_pages(unsigned long start_page, size_t num_pages,
>  
>  bail_release:
>  	__ipath_release_user_pages(p, got, 0);
> -	down_write(&current->mm->mmap_sem);
> -	current->mm->pinned_vm -= num_pages;
> -	up_write(&current->mm->mmap_sem);
> +bail_sub:
> +	atomic_long_sub(&mm->pinned_vm, num_pages);
>  bail:
>  	return ret;
>  }
> @@ -166,56 +161,7 @@ dma_addr_t ipath_map_single(struct pci_dev *hwdev, void *ptr, size_t size,
>  
>  void ipath_release_user_pages(struct page **p, size_t num_pages)
>  {
> -	down_write(&current->mm->mmap_sem);
> -
> -	__ipath_release_user_pages(p, num_pages, 1);
> -
> -	current->mm->pinned_vm -= num_pages;
> -
> -	up_write(&current->mm->mmap_sem);
> -}
> -
> -struct ipath_user_pages_work {
> -	struct work_struct work;
> -	struct mm_struct *mm;
> -	unsigned long num_pages;
> -};
> -
> -static void user_pages_account(struct work_struct *_work)
> -{
> -	struct ipath_user_pages_work *work =
> -		container_of(_work, struct ipath_user_pages_work, work);
> -
> -	down_write(&work->mm->mmap_sem);
> -	work->mm->pinned_vm -= work->num_pages;
> -	up_write(&work->mm->mmap_sem);
> -	mmput(work->mm);
> -	kfree(work);
> -}
> -
> -void ipath_release_user_pages_on_close(struct page **p, size_t num_pages)
> -{
> -	struct ipath_user_pages_work *work;
> -	struct mm_struct *mm;
> -
>  	__ipath_release_user_pages(p, num_pages, 1);
> -
> -	mm = get_task_mm(current);
> -	if (!mm)
> -		return;
> -
> -	work = kmalloc(sizeof(*work), GFP_KERNEL);
> -	if (!work)
> -		goto bail_mm;
> -
> -	INIT_WORK(&work->work, user_pages_account);
> -	work->mm = mm;
> -	work->num_pages = num_pages;
> -
> -	queue_work(ib_wq, &work->work);
> -	return;
> -
> -bail_mm:
> -	mmput(mm);
> -	return;
> +	if (current->mm)
> +		atomic_long_sub(&current->mm->pinned_vm, num_pages);
>  }
> diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
> index 57ce83c2d1d9..049ab8db5f32 100644
> --- a/drivers/infiniband/hw/qib/qib_user_pages.c
> +++ b/drivers/infiniband/hw/qib/qib_user_pages.c
> @@ -68,16 +68,13 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages,
>  	int ret;
>  	struct mm_struct *mm = current->mm;
>  
> -	down_write(&mm->mmap_sem);
>  	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>  
> -	if (mm->pinned_vm + num_pages > lock_limit && !capable(CAP_IPC_LOCK)) {
> -		up_write(&mm->mmap_sem);
> +	if (atomic_long_add_return(&mm->pinned_vm, num_pages) > lock_limit &&
> +	    !capable(CAP_IPC_LOCK)) {
>  		ret = -ENOMEM;
>  		goto bail;
>  	}
> -	mm->pinned_vm += num_pages;
> -	up_write(&mm->mmap_sem);
>  
>  	for (got = 0; got < num_pages; got += ret) {
>  		ret = get_user_pages_unlocked(current, mm,
> @@ -94,9 +91,7 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages,
>  
>  bail_release:
>  	__qib_release_user_pages(p, got, 0);
> -	down_write(&mm->mmap_sem);
> -	mm->pinned_vm -= num_pages;
> -	up_write(&mm->mmap_sem);
> +	atomic_long_sub(&mm->pinned_vm, num_pages);
>  bail:
>  	return ret;
>  }
> @@ -135,13 +130,8 @@ dma_addr_t qib_map_page(struct pci_dev *hwdev, struct page *page,
>  
>  void qib_release_user_pages(struct page **p, size_t num_pages)
>  {
> -	if (current->mm) /* during close after signal, mm can be NULL */
> -		down_write(&current->mm->mmap_sem);
> -
>  	__qib_release_user_pages(p, num_pages, 1);
>  
> -	if (current->mm) {
> -		current->mm->pinned_vm -= num_pages;
> -		up_write(&current->mm->mmap_sem);
> -	}
> +	if (current->mm)
> +		atomic_long_sub(&current->mm->pinned_vm, num_pages);
>  }
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 7366e9d63cee..9123bfef1dea 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -57,7 +57,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
>  		hiwater_vm << (PAGE_SHIFT-10),
>  		total_vm << (PAGE_SHIFT-10),
>  		mm->locked_vm << (PAGE_SHIFT-10),
> -		mm->pinned_vm << (PAGE_SHIFT-10),
> +		atomic_long_read(&mm->pinned_vm) << (PAGE_SHIFT-10),
>  		hiwater_rss << (PAGE_SHIFT-10),
>  		total_rss << (PAGE_SHIFT-10),
>  		data << (PAGE_SHIFT-10),
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index d9851eeb6e1d..f2bf72a86b70 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -355,7 +355,7 @@ struct mm_struct {
>  
>  	unsigned long total_vm;		/* Total pages mapped */
>  	unsigned long locked_vm;	/* Pages that have PG_mlocked set */
> -	unsigned long pinned_vm;	/* Refcount permanently increased */
> +	atomic_long_t pinned_vm;	/* Refcount permanently increased */
>  	unsigned long shared_vm;	/* Shared pages (files) */
>  	unsigned long exec_vm;		/* VM_EXEC & ~VM_WRITE */
>  	unsigned long stack_vm;		/* VM_GROWSUP/DOWN */
> diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
> index 9ee0d2e51b16..2dbbd2c56074 100644
> --- a/include/rdma/ib_umem.h
> +++ b/include/rdma/ib_umem.h
> @@ -47,9 +47,6 @@ struct ib_umem {
>  	int                     writable;
>  	int                     hugetlb;
>  	struct list_head	chunk_list;
> -	struct work_struct	work;
> -	struct mm_struct       *mm;
> -	unsigned long		diff;
>  };
>  
>  struct ib_umem_chunk {
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index e393171e2fac..bce6d2b91ec7 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -961,7 +961,6 @@ struct ib_ucontext {
>  	struct list_head	ah_list;
>  	struct list_head	xrcd_list;
>  	struct list_head	rule_list;
> -	int			closing;
>  };
>  
>  struct ib_uobject {
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index dd236b66ca3a..80d2ba7bd51c 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3922,7 +3922,7 @@ again:
>  	 */
>  
>  	atomic_long_sub((size >> PAGE_SHIFT) + 1, &mmap_user->locked_vm);
> -	vma->vm_mm->pinned_vm -= mmap_locked;
> +	atomic_long_sub(&vma->vm_mm->pinned_vm, mmap_locked);
>  	free_uid(mmap_user);
>  
>  	ring_buffer_put(rb); /* could be last */
> @@ -3944,7 +3944,7 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
>  	struct ring_buffer *rb;
>  	unsigned long vma_size;
>  	unsigned long nr_pages;
> -	long user_extra, extra;
> +	long user_extra, extra = 0;
>  	int ret = 0, flags = 0;
>  
>  	/*
> @@ -4006,16 +4006,14 @@ again:
>  
>  	user_locked = atomic_long_read(&user->locked_vm) + user_extra;
>  
> -	extra = 0;
>  	if (user_locked > user_lock_limit)
>  		extra = user_locked - user_lock_limit;
>  
>  	lock_limit = rlimit(RLIMIT_MEMLOCK);
>  	lock_limit >>= PAGE_SHIFT;
> -	locked = vma->vm_mm->pinned_vm + extra;
>  
> -	if ((locked > lock_limit) && perf_paranoid_tracepoint_raw() &&
> -		!capable(CAP_IPC_LOCK)) {
> +	if (atomic_long_add_return(&vma->vm_mm->pinned_vm, extra) > lock_limit &&
> +	    perf_paranoid_tracepoint_raw() && !capable(CAP_IPC_LOCK)) {
>  		ret = -EPERM;
>  		goto unlock;
>  	}
> @@ -4039,7 +4037,6 @@ again:
>  	rb->mmap_user = get_current_user();
>  
>  	atomic_long_add(user_extra, &user->locked_vm);
> -	vma->vm_mm->pinned_vm += extra;
>  
>  	ring_buffer_attach(event, rb);
>  	rcu_assign_pointer(event->rb, rb);
> @@ -4049,6 +4046,8 @@ again:
>  unlock:
>  	if (!ret)
>  		atomic_inc(&event->mmap_count);
> +	else if (extra)
> +		atomic_long_sub(&vma->vm_mm->pinned_vm, extra);
>  	mutex_unlock(&event->mmap_mutex);
>  
>  	/*
> -- 
> 1.8.1.4
> 

-- 
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2013-10-16 21:39 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-02 14:27 [PATCH 0/26] get_user_pages() cleanup Jan Kara
2013-10-02 14:28 ` [PATCH 20/26] ib: Convert ib_umem_get() to get_user_pages_unlocked() Jan Kara
2013-10-02 14:28 ` [PATCH 21/26] ib: Convert ipath_get_user_pages() " Jan Kara
2013-10-02 14:28 ` [PATCH 22/26] ib: Convert ipath_user_sdma_pin_pages() to use get_user_pages_unlocked() Jan Kara
2013-10-02 14:28 ` [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked() Jan Kara
     [not found]   ` <1380724087-13927-24-git-send-email-jack-AlSwsSmVLrQ@public.gmane.org>
2013-10-02 14:54     ` Marciniszyn, Mike
     [not found]       ` <32E1700B9017364D9B60AED9960492BC211AEF75-AtyAts71sc9zLByeVOV5+bfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-10-02 15:28         ` Jan Kara
     [not found]           ` <20131002152811.GC32181-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2013-10-02 15:32             ` Marciniszyn, Mike
     [not found]               ` <32E1700B9017364D9B60AED9960492BC211AF005-AtyAts71sc9zLByeVOV5+bfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-10-02 15:38                 ` Jan Kara
     [not found]                   ` <20131002153842.GD32181-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2013-10-04 13:39                     ` Marciniszyn, Mike
     [not found]                       ` <32E1700B9017364D9B60AED9960492BC211B0123-AtyAts71sc9zLByeVOV5+bfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-10-04 13:46                         ` Marciniszyn, Mike
2013-10-04 13:44               ` Marciniszyn, Mike
2013-10-04 13:52     ` Marciniszyn, Mike
     [not found]       ` <32E1700B9017364D9B60AED9960492BC211B0176-AtyAts71sc9zLByeVOV5+bfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-10-04 18:33         ` Jan Kara
     [not found]           ` <20131004183315.GA19557-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2013-10-07 15:20             ` Marciniszyn, Mike
2013-10-07 15:38           ` Marciniszyn, Mike
2013-10-07 17:26             ` Jan Kara
2013-10-08 19:06               ` Jan Kara
     [not found]                 ` <20131008190604.GB14223-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2013-10-16 21:39                   ` Jan Kara
2013-10-02 14:28 ` [PATCH 24/26] ib: Convert qib_user_sdma_pin_pages() to use get_user_pages_unlocked() Jan Kara
2013-10-02 14:28 ` [PATCH 25/26] ib: Convert mthca_map_user_db() to use get_user_pages_fast() Jan Kara
2013-10-02 16:20 ` [PATCH 0/26] get_user_pages() cleanup Christoph Hellwig
2013-10-02 20:29   ` Jan Kara
2013-10-04 20:31     ` KOSAKI Motohiro
2013-10-04 20:42       ` KOSAKI Motohiro
2013-10-07 21:18         ` Jan Kara
2013-10-08  0:27           ` KOSAKI Motohiro
2013-10-08  6:06             ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).