linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/26] get_user_pages() cleanup
@ 2013-10-02 14:27 Jan Kara
  2013-10-02 14:28 ` [PATCH 26/26] aio: Remove useless get_user_pages() call Jan Kara
  2013-10-02 16:20 ` [PATCH 0/26] get_user_pages() cleanup Christoph Hellwig
  0 siblings, 2 replies; 9+ messages in thread
From: Jan Kara @ 2013-10-02 14:27 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, Jan Kara, Alexander Viro, Andreas Dilger, Andy Walls,
	Arnd Bergmann, Benjamin LaHaise, ceph-devel, Dan Williams,
	David Airlie, dri-devel, Gleb Natapov, Greg Kroah-Hartman,
	hpdd-discuss, Jarod Wilson, Jayant Mangalampalli,
	Jean-Christophe Plagniol-Villard, Jesper Nilsson, Kai Makisara,
	kvm, Laurent Pinchart, linux-aio, linux-cris-kernel, linux-fbdev,
	linux-fsdevel

  Hello,

  In my quest for changing locking around page faults to make things easier for
filesystems I found out get_user_pages() users could use a cleanup.  The
knowledge about necessary locking for get_user_pages() is in tons of places in
drivers and quite a few of them actually get it wrong (don't have mmap_sem when
calling get_user_pages() or hold mmap_sem when calling copy_from_user() in the
surrounding code). Rather often this actually doesn't seem necessary. This
patch series converts lots of places to use either get_user_pages_fast()
or a new simple wrapper get_user_pages_unlocked() to remove the knowledge
of mmap_sem from the drivers. I'm still looking into converting a few remaining
drivers (most notably v4l2) which are more complex.

As I already wrote, in some cases I actually think drivers were buggy (and I
note that in corresponding changelogs). I would really like to ask respective
maintainers to have a look at the patches in their area. Also any other
comments are welcome. Thanks.

								Honza

PS: Sorry for the huge recipient list but I don't really know how to trim it
    down...

CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Andreas Dilger <andreas.dilger@intel.com>
CC: Andy Walls <awalls@md.metrocast.net>
CC: Arnd Bergmann <arnd@arndb.de>
CC: Benjamin LaHaise <bcrl@kvack.org>
CC: ceph-devel@vger.kernel.org
CC: Dan Williams <dan.j.williams@intel.com>
CC: David Airlie <airlied@linux.ie>
CC: dri-devel@lists.freedesktop.org
CC: Gleb Natapov <gleb@redhat.com>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: hpdd-discuss@lists.01.org
CC: Jarod Wilson <jarod@wilsonet.com>
CC: Jayant Mangalampalli <jayant.mangalampalli@intel.com>
CC: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
CC: Jesper Nilsson <jesper.nilsson@axis.com>
CC: Kai Makisara <Kai.Makisara@kolumbus.fi>
CC: kvm@vger.kernel.org
CC: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
CC: linux-aio@kvack.org
CC: linux-cris-kernel@axis.com
CC: linux-fbdev@vger.kernel.org
CC: linux-fsdevel@vger.kernel.org
CC: linux-ia64@vger.kernel.org
CC: linux-media@vger.kernel.org
CC: linux-nfs@vger.kernel.org
CC: linux-rdma@vger.kernel.org
CC: linux-scsi@vger.kernel.org
CC: Manu Abraham <abraham.manu@gmail.com>
CC: Mark Allyn <mark.a.allyn@intel.com>
CC: Mikael Starvik <starvik@axis.com>
CC: Mike Marciniszyn <infinipath@intel.com>
CC: Naren Sankar <nsankar@broadcom.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Peng Tao <tao.peng@emc.com>
CC: Roland Dreier <roland@kernel.org>
CC: Sage Weil <sage@inktank.com>
CC: Scott Davilla <davilla@4pi.com>
CC: Timur Tabi <timur@freescale.com>
CC: Tomi Valkeinen <tomi.valkeinen@ti.com>
CC: Tony Luck <tony.luck@intel.com>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 26/26] aio: Remove useless get_user_pages() call
  2013-10-02 14:27 [PATCH 0/26] get_user_pages() cleanup Jan Kara
@ 2013-10-02 14:28 ` Jan Kara
  2013-10-02 16:20 ` [PATCH 0/26] get_user_pages() cleanup Christoph Hellwig
  1 sibling, 0 replies; 9+ messages in thread
From: Jan Kara @ 2013-10-02 14:28 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, Jan Kara, linux-fsdevel, linux-aio, Alexander Viro,
	Benjamin LaHaise

get_user_pages() call in aio_setup_ring() is useless these days. We
create all ringbuffer the pages in page cache using
find_or_create_page() anyway so we can just use the pointers we get from
this function instead of having to look them up via user address space.

CC: linux-fsdevel@vger.kernel.org
CC: linux-aio@kvack.org
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/aio.c | 48 +++++++++++++++++-------------------------------
 1 file changed, 17 insertions(+), 31 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 6b868f0e0c4c..a14a33027990 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -269,9 +269,21 @@ static int aio_setup_ring(struct kioctx *ctx)
 	file->f_inode->i_mapping->a_ops = &aio_ctx_aops;
 	file->f_inode->i_mapping->private_data = ctx;
 	file->f_inode->i_size = PAGE_SIZE * (loff_t)nr_pages;
+	ctx->aio_ring_file = file;
+	nr_events = (PAGE_SIZE * nr_pages - sizeof(struct aio_ring))
+			/ sizeof(struct io_event);
+
+	ctx->ring_pages = ctx->internal_pages;
+	if (nr_pages > AIO_RING_PAGES) {
+		ctx->ring_pages = kcalloc(nr_pages, sizeof(struct page *),
+					  GFP_KERNEL);
+		if (!ctx->ring_pages)
+			return -ENOMEM;
+	}
 
 	for (i = 0; i < nr_pages; i++) {
 		struct page *page;
+
 		page = find_or_create_page(file->f_inode->i_mapping,
 					   i, GFP_HIGHUSER | __GFP_ZERO);
 		if (!page)
@@ -281,19 +293,13 @@ static int aio_setup_ring(struct kioctx *ctx)
 		SetPageUptodate(page);
 		SetPageDirty(page);
 		unlock_page(page);
+		ctx->ring_pages[i] = page;
 	}
-	ctx->aio_ring_file = file;
-	nr_events = (PAGE_SIZE * nr_pages - sizeof(struct aio_ring))
-			/ sizeof(struct io_event);
-
-	ctx->ring_pages = ctx->internal_pages;
-	if (nr_pages > AIO_RING_PAGES) {
-		ctx->ring_pages = kcalloc(nr_pages, sizeof(struct page *),
-					  GFP_KERNEL);
-		if (!ctx->ring_pages)
-			return -ENOMEM;
+	ctx->nr_pages = i;
+	if (unlikely(ctx->nr_pages != nr_pages)) {
+		aio_free_ring(ctx);
+		return -EAGAIN;
 	}
-
 	ctx->mmap_size = nr_pages * PAGE_SIZE;
 	pr_debug("attempting mmap of %lu bytes\n", ctx->mmap_size);
 
@@ -309,28 +315,8 @@ static int aio_setup_ring(struct kioctx *ctx)
 	}
 
 	pr_debug("mmap address: 0x%08lx\n", ctx->mmap_base);
-
-	/* We must do this while still holding mmap_sem for write, as we
-	 * need to be protected against userspace attempting to mremap()
-	 * or munmap() the ring buffer.
-	 */
-	ctx->nr_pages = get_user_pages(current, mm, ctx->mmap_base, nr_pages,
-				       1, 0, ctx->ring_pages, NULL);
-
-	/* Dropping the reference here is safe as the page cache will hold
-	 * onto the pages for us.  It is also required so that page migration
-	 * can unmap the pages and get the right reference count.
-	 */
-	for (i = 0; i < ctx->nr_pages; i++)
-		put_page(ctx->ring_pages[i]);
-
 	up_write(&mm->mmap_sem);
 
-	if (unlikely(ctx->nr_pages != nr_pages)) {
-		aio_free_ring(ctx);
-		return -EAGAIN;
-	}
-
 	ctx->user_id = ctx->mmap_base;
 	ctx->nr_events = nr_events; /* trusted copy */
 
-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/26] get_user_pages() cleanup
  2013-10-02 14:27 [PATCH 0/26] get_user_pages() cleanup Jan Kara
  2013-10-02 14:28 ` [PATCH 26/26] aio: Remove useless get_user_pages() call Jan Kara
@ 2013-10-02 16:20 ` Christoph Hellwig
  2013-10-02 20:29   ` Jan Kara
  1 sibling, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2013-10-02 16:20 UTC (permalink / raw)
  To: Jan Kara
  Cc: LKML, linux-mm, Alexander Viro, Andreas Dilger, Andy Walls,
	Arnd Bergmann, Benjamin LaHaise, ceph-devel, Dan Williams,
	David Airlie, dri-devel, Gleb Natapov, Greg Kroah-Hartman,
	hpdd-discuss, Jarod Wilson, Jayant Mangalampalli,
	Jean-Christophe Plagniol-Villard, Jesper Nilsson, Kai Makisara,
	kvm, Laurent Pinchart, linux-aio, linux-cris-kernel, linux-fbdev,
	linux-fs

On Wed, Oct 02, 2013 at 04:27:41PM +0200, Jan Kara wrote:
>   Hello,
> 
>   In my quest for changing locking around page faults to make things easier for
> filesystems I found out get_user_pages() users could use a cleanup.  The
> knowledge about necessary locking for get_user_pages() is in tons of places in
> drivers and quite a few of them actually get it wrong (don't have mmap_sem when
> calling get_user_pages() or hold mmap_sem when calling copy_from_user() in the
> surrounding code). Rather often this actually doesn't seem necessary. This
> patch series converts lots of places to use either get_user_pages_fast()
> or a new simple wrapper get_user_pages_unlocked() to remove the knowledge
> of mmap_sem from the drivers. I'm still looking into converting a few remaining
> drivers (most notably v4l2) which are more complex.

Even looking over the kerneldoc comment next to it I still fail to
understand when you'd want to use get_user_pages_fast and when not.

This isn't meant as an argument against your series, but maybe a hint
that we'd need further work in this direction.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/26] get_user_pages() cleanup
  2013-10-02 16:20 ` [PATCH 0/26] get_user_pages() cleanup Christoph Hellwig
@ 2013-10-02 20:29   ` Jan Kara
  2013-10-04 20:31     ` KOSAKI Motohiro
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2013-10-02 20:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, LKML, linux-mm, Alexander Viro, Andreas Dilger,
	Andy Walls, Arnd Bergmann, Benjamin LaHaise, ceph-devel,
	Dan Williams, David Airlie, dri-devel, Gleb Natapov,
	Greg Kroah-Hartman, hpdd-discuss, Jarod Wilson,
	Jayant Mangalampalli, Jean-Christophe Plagniol-Villard,
	Jesper Nilsson, Kai Makisara, kvm, Laurent Pinchart, linux-aio,
	linux-cris-kernel, linux-fbdev@

On Wed 02-10-13 09:20:09, Christoph Hellwig wrote:
> On Wed, Oct 02, 2013 at 04:27:41PM +0200, Jan Kara wrote:
> >   Hello,
> > 
> >   In my quest for changing locking around page faults to make things easier for
> > filesystems I found out get_user_pages() users could use a cleanup.  The
> > knowledge about necessary locking for get_user_pages() is in tons of places in
> > drivers and quite a few of them actually get it wrong (don't have mmap_sem when
> > calling get_user_pages() or hold mmap_sem when calling copy_from_user() in the
> > surrounding code). Rather often this actually doesn't seem necessary. This
> > patch series converts lots of places to use either get_user_pages_fast()
> > or a new simple wrapper get_user_pages_unlocked() to remove the knowledge
> > of mmap_sem from the drivers. I'm still looking into converting a few remaining
> > drivers (most notably v4l2) which are more complex.
> 
> Even looking over the kerneldoc comment next to it I still fail to
> understand when you'd want to use get_user_pages_fast and when not.
  AFAIU get_user_pages_fast() should be used
1) if you don't need any special get_user_pages() arguments (like calling
   it for mm of a different process, forcing COW, or similar).
2) you don't expect pages to be unmapped (then get_user_pages_fast() is
actually somewhat slower because it walks page tables twice).

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/26] get_user_pages() cleanup
  2013-10-02 20:29   ` Jan Kara
@ 2013-10-04 20:31     ` KOSAKI Motohiro
  2013-10-04 20:42       ` KOSAKI Motohiro
  0 siblings, 1 reply; 9+ messages in thread
From: KOSAKI Motohiro @ 2013-10-04 20:31 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, LKML, linux-mm, Alexander Viro, Andreas Dilger,
	Andy Walls, Arnd Bergmann, Benjamin LaHaise, ceph-devel,
	Dan Williams, David Airlie, dri-devel, Gleb Natapov,
	Greg Kroah-Hartman, hpdd-discuss, Jarod Wilson,
	Jayant Mangalampalli, Jean-Christophe Plagniol-Villard,
	Jesper Nilsson, Kai Makisara, kvm, Laurent Pinchart, linux-aio,
	linux-cris-kernel@

(10/2/13 4:29 PM), Jan Kara wrote:
> On Wed 02-10-13 09:20:09, Christoph Hellwig wrote:
>> On Wed, Oct 02, 2013 at 04:27:41PM +0200, Jan Kara wrote:
>>>    Hello,
>>>
>>>    In my quest for changing locking around page faults to make things easier for
>>> filesystems I found out get_user_pages() users could use a cleanup.  The
>>> knowledge about necessary locking for get_user_pages() is in tons of places in
>>> drivers and quite a few of them actually get it wrong (don't have mmap_sem when
>>> calling get_user_pages() or hold mmap_sem when calling copy_from_user() in the
>>> surrounding code). Rather often this actually doesn't seem necessary. This
>>> patch series converts lots of places to use either get_user_pages_fast()
>>> or a new simple wrapper get_user_pages_unlocked() to remove the knowledge
>>> of mmap_sem from the drivers. I'm still looking into converting a few remaining
>>> drivers (most notably v4l2) which are more complex.
>>
>> Even looking over the kerneldoc comment next to it I still fail to
>> understand when you'd want to use get_user_pages_fast and when not.
>    AFAIU get_user_pages_fast() should be used
> 1) if you don't need any special get_user_pages() arguments (like calling
>     it for mm of a different process, forcing COW, or similar).
> 2) you don't expect pages to be unmapped (then get_user_pages_fast() is
> actually somewhat slower because it walks page tables twice).

If target page point to anon or private mapping pages, get_user_pages_fast()
is fork unsafe. O_DIRECT man pages describe a bit about this.


see http://man7.org/linux/man-pages/man2/open.2.html

>       O_DIRECT I/Os should never be run concurrently with the fork(2)
>       system call, if the memory buffer is a private mapping (i.e., any
>       mapping created with the mmap(2) MAP_PRIVATE flag; this includes
>       memory allocated on the heap and statically allocated buffers).  Any
>       such I/Os, whether submitted via an asynchronous I/O interface or
>       from another thread in the process, should be completed before
>       fork(2) is called.  Failure to do so can result in data corruption
>       and undefined behavior in parent and child processes.  This
>       restriction does not apply when the memory buffer for the O_DIRECT
>       I/Os was created using shmat(2) or mmap(2) with the MAP_SHARED flag.
>       Nor does this restriction apply when the memory buffer has been
>       advised as MADV_DONTFORK with madvise(2), ensuring that it will not
>       be available to the child after fork(2).

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/26] get_user_pages() cleanup
  2013-10-04 20:31     ` KOSAKI Motohiro
@ 2013-10-04 20:42       ` KOSAKI Motohiro
  2013-10-07 21:18         ` Jan Kara
  0 siblings, 1 reply; 9+ messages in thread
From: KOSAKI Motohiro @ 2013-10-04 20:42 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, LKML, linux-mm, Alexander Viro, Andreas Dilger,
	Andy Walls, Arnd Bergmann, Benjamin LaHaise, ceph-devel,
	Dan Williams, David Airlie, dri-devel, Gleb Natapov,
	Greg Kroah-Hartman, hpdd-discuss, Jarod Wilson,
	Jayant Mangalampalli, Jean-Christophe Plagniol-Villard,
	Jesper Nilsson, Kai Makisara, kvm, Laurent Pinchart, linux-aio,
	linux-cris-kernel@

(10/4/13 4:31 PM), KOSAKI Motohiro wrote:
> (10/2/13 4:29 PM), Jan Kara wrote:
>> On Wed 02-10-13 09:20:09, Christoph Hellwig wrote:
>>> On Wed, Oct 02, 2013 at 04:27:41PM +0200, Jan Kara wrote:
>>>>     Hello,
>>>>
>>>>     In my quest for changing locking around page faults to make things easier for
>>>> filesystems I found out get_user_pages() users could use a cleanup.  The
>>>> knowledge about necessary locking for get_user_pages() is in tons of places in
>>>> drivers and quite a few of them actually get it wrong (don't have mmap_sem when
>>>> calling get_user_pages() or hold mmap_sem when calling copy_from_user() in the
>>>> surrounding code). Rather often this actually doesn't seem necessary. This
>>>> patch series converts lots of places to use either get_user_pages_fast()
>>>> or a new simple wrapper get_user_pages_unlocked() to remove the knowledge
>>>> of mmap_sem from the drivers. I'm still looking into converting a few remaining
>>>> drivers (most notably v4l2) which are more complex.
>>>
>>> Even looking over the kerneldoc comment next to it I still fail to
>>> understand when you'd want to use get_user_pages_fast and when not.
>>     AFAIU get_user_pages_fast() should be used
>> 1) if you don't need any special get_user_pages() arguments (like calling
>>      it for mm of a different process, forcing COW, or similar).
>> 2) you don't expect pages to be unmapped (then get_user_pages_fast() is
>> actually somewhat slower because it walks page tables twice).
>
> If target page point to anon or private mapping pages, get_user_pages_fast()
> is fork unsafe. O_DIRECT man pages describe a bit about this.
>
>
> see http://man7.org/linux/man-pages/man2/open.2.html
>
>>        O_DIRECT I/Os should never be run concurrently with the fork(2)
>>        system call, if the memory buffer is a private mapping (i.e., any
>>        mapping created with the mmap(2) MAP_PRIVATE flag; this includes
>>        memory allocated on the heap and statically allocated buffers).  Any
>>        such I/Os, whether submitted via an asynchronous I/O interface or
>>        from another thread in the process, should be completed before
>>        fork(2) is called.  Failure to do so can result in data corruption
>>        and undefined behavior in parent and child processes.  This
>>        restriction does not apply when the memory buffer for the O_DIRECT
>>        I/Os was created using shmat(2) or mmap(2) with the MAP_SHARED flag.
>>        Nor does this restriction apply when the memory buffer has been
>>        advised as MADV_DONTFORK with madvise(2), ensuring that it will not
>>        be available to the child after fork(2).

IMHO, get_user_pages_fast() should be renamed to get_user_pages_quirk(). Its
semantics is not equal to get_user_pages(). When someone simply substitute
get_user_pages() to get_user_pages_fast(), they might see huge trouble.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/26] get_user_pages() cleanup
  2013-10-04 20:42       ` KOSAKI Motohiro
@ 2013-10-07 21:18         ` Jan Kara
  2013-10-08  0:27           ` KOSAKI Motohiro
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2013-10-07 21:18 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Jan Kara, Christoph Hellwig, LKML, linux-mm, Alexander Viro,
	Andreas Dilger, Andy Walls, Arnd Bergmann, Benjamin LaHaise,
	ceph-devel, Dan Williams, David Airlie, dri-devel, Gleb Natapov,
	Greg Kroah-Hartman, hpdd-discuss, Jarod Wilson,
	Jayant Mangalampalli, Jean-Christophe Plagniol-Villard,
	Jesper Nilsson, Kai Makisara, kvm, Laurent Pinchart, linux-aio, l

On Fri 04-10-13 16:42:19, KOSAKI Motohiro wrote:
> (10/4/13 4:31 PM), KOSAKI Motohiro wrote:
> >(10/2/13 4:29 PM), Jan Kara wrote:
> >>On Wed 02-10-13 09:20:09, Christoph Hellwig wrote:
> >>>On Wed, Oct 02, 2013 at 04:27:41PM +0200, Jan Kara wrote:
> >>>>    Hello,
> >>>>
> >>>>    In my quest for changing locking around page faults to make things easier for
> >>>>filesystems I found out get_user_pages() users could use a cleanup.  The
> >>>>knowledge about necessary locking for get_user_pages() is in tons of places in
> >>>>drivers and quite a few of them actually get it wrong (don't have mmap_sem when
> >>>>calling get_user_pages() or hold mmap_sem when calling copy_from_user() in the
> >>>>surrounding code). Rather often this actually doesn't seem necessary. This
> >>>>patch series converts lots of places to use either get_user_pages_fast()
> >>>>or a new simple wrapper get_user_pages_unlocked() to remove the knowledge
> >>>>of mmap_sem from the drivers. I'm still looking into converting a few remaining
> >>>>drivers (most notably v4l2) which are more complex.
> >>>
> >>>Even looking over the kerneldoc comment next to it I still fail to
> >>>understand when you'd want to use get_user_pages_fast and when not.
> >>    AFAIU get_user_pages_fast() should be used
> >>1) if you don't need any special get_user_pages() arguments (like calling
> >>     it for mm of a different process, forcing COW, or similar).
> >>2) you don't expect pages to be unmapped (then get_user_pages_fast() is
> >>actually somewhat slower because it walks page tables twice).
> >
> >If target page point to anon or private mapping pages, get_user_pages_fast()
> >is fork unsafe. O_DIRECT man pages describe a bit about this.
> >
> >
> >see http://man7.org/linux/man-pages/man2/open.2.html
> >
> >>       O_DIRECT I/Os should never be run concurrently with the fork(2)
> >>       system call, if the memory buffer is a private mapping (i.e., any
> >>       mapping created with the mmap(2) MAP_PRIVATE flag; this includes
> >>       memory allocated on the heap and statically allocated buffers).  Any
> >>       such I/Os, whether submitted via an asynchronous I/O interface or
> >>       from another thread in the process, should be completed before
> >>       fork(2) is called.  Failure to do so can result in data corruption
> >>       and undefined behavior in parent and child processes.  This
> >>       restriction does not apply when the memory buffer for the O_DIRECT
> >>       I/Os was created using shmat(2) or mmap(2) with the MAP_SHARED flag.
> >>       Nor does this restriction apply when the memory buffer has been
> >>       advised as MADV_DONTFORK with madvise(2), ensuring that it will not
> >>       be available to the child after fork(2).
> 
> IMHO, get_user_pages_fast() should be renamed to get_user_pages_quirk(). Its
> semantics is not equal to get_user_pages(). When someone simply substitute
> get_user_pages() to get_user_pages_fast(), they might see huge trouble.
  I forgot about this speciality (and actually comments didn't remind me
:(). But thinking about this some more get_user_pages_fast() seems as save
as get_user_pages() in presence of threads sharing mm, doesn't it? Because
while get_user_pages() are working, other thread can happilly trigger COW
on some of the pages and thus get_user_pages() can return pages some of
which are invisible in our mm by the time get_user_pages() returns. So
although in practice I agree problems of get_user_pages_fast() with fork(2)
are more visible, in essence they are still present with clone(2) and
get_user_pages().

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/26] get_user_pages() cleanup
  2013-10-07 21:18         ` Jan Kara
@ 2013-10-08  0:27           ` KOSAKI Motohiro
  2013-10-08  6:06             ` Jan Kara
  0 siblings, 1 reply; 9+ messages in thread
From: KOSAKI Motohiro @ 2013-10-08  0:27 UTC (permalink / raw)
  To: Jan Kara
  Cc: KOSAKI Motohiro, Christoph Hellwig, LKML, linux-mm,
	Alexander Viro, Andreas Dilger, Andy Walls, Arnd Bergmann,
	Benjamin LaHaise, ceph-devel, Dan Williams, David Airlie,
	dri-devel, Gleb Natapov, Greg Kroah-Hartman, hpdd-discuss,
	Jarod Wilson, Jayant Mangalampalli,
	Jean-Christophe Plagniol-Villard, Jesper Nilsson, Kai Makisara,
	kvm, Laurent Pinchart

(10/7/13 5:18 PM), Jan Kara wrote:
> On Fri 04-10-13 16:42:19, KOSAKI Motohiro wrote:
>> (10/4/13 4:31 PM), KOSAKI Motohiro wrote:
>>> (10/2/13 4:29 PM), Jan Kara wrote:
>>>> On Wed 02-10-13 09:20:09, Christoph Hellwig wrote:
>>>>> On Wed, Oct 02, 2013 at 04:27:41PM +0200, Jan Kara wrote:
>>>>>>     Hello,
>>>>>>
>>>>>>     In my quest for changing locking around page faults to make things easier for
>>>>>> filesystems I found out get_user_pages() users could use a cleanup.  The
>>>>>> knowledge about necessary locking for get_user_pages() is in tons of places in
>>>>>> drivers and quite a few of them actually get it wrong (don't have mmap_sem when
>>>>>> calling get_user_pages() or hold mmap_sem when calling copy_from_user() in the
>>>>>> surrounding code). Rather often this actually doesn't seem necessary. This
>>>>>> patch series converts lots of places to use either get_user_pages_fast()
>>>>>> or a new simple wrapper get_user_pages_unlocked() to remove the knowledge
>>>>>> of mmap_sem from the drivers. I'm still looking into converting a few remaining
>>>>>> drivers (most notably v4l2) which are more complex.
>>>>>
>>>>> Even looking over the kerneldoc comment next to it I still fail to
>>>>> understand when you'd want to use get_user_pages_fast and when not.
>>>>     AFAIU get_user_pages_fast() should be used
>>>> 1) if you don't need any special get_user_pages() arguments (like calling
>>>>      it for mm of a different process, forcing COW, or similar).
>>>> 2) you don't expect pages to be unmapped (then get_user_pages_fast() is
>>>> actually somewhat slower because it walks page tables twice).
>>>
>>> If target page point to anon or private mapping pages, get_user_pages_fast()
>>> is fork unsafe. O_DIRECT man pages describe a bit about this.
>>>
>>>
>>> see http://man7.org/linux/man-pages/man2/open.2.html
>>>
>>>>        O_DIRECT I/Os should never be run concurrently with the fork(2)
>>>>        system call, if the memory buffer is a private mapping (i.e., any
>>>>        mapping created with the mmap(2) MAP_PRIVATE flag; this includes
>>>>        memory allocated on the heap and statically allocated buffers).  Any
>>>>        such I/Os, whether submitted via an asynchronous I/O interface or
>>>>        from another thread in the process, should be completed before
>>>>        fork(2) is called.  Failure to do so can result in data corruption
>>>>        and undefined behavior in parent and child processes.  This
>>>>        restriction does not apply when the memory buffer for the O_DIRECT
>>>>        I/Os was created using shmat(2) or mmap(2) with the MAP_SHARED flag.
>>>>        Nor does this restriction apply when the memory buffer has been
>>>>        advised as MADV_DONTFORK with madvise(2), ensuring that it will not
>>>>        be available to the child after fork(2).
>>
>> IMHO, get_user_pages_fast() should be renamed to get_user_pages_quirk(). Its
>> semantics is not equal to get_user_pages(). When someone simply substitute
>> get_user_pages() to get_user_pages_fast(), they might see huge trouble.
>    I forgot about this speciality (and actually comments didn't remind me
> :(). But thinking about this some more get_user_pages_fast() seems as save
> as get_user_pages() in presence of threads sharing mm, doesn't it?

It depends.

If there is any guarantee that other threads don't touch the same page which
retrieved get_user_pages(), get_user_pages_fast() give us brilliant fast way.
Example, as far as I heard form IB guys, the userland library of the infiniband
stack uses madvise(MADV_DONTFORK), and then they don't need to care COW issue
and can choose fastest way. An another example is a futex. futex doesn't use
the contents of the pages, it uses vaddr only for looking up key. Then, it
also doesn't have COW issue.

I don't know other cases. But as far as I know, everything is case-by-case.

> Because
> while get_user_pages() are working, other thread can happilly trigger COW
> on some of the pages and thus get_user_pages() can return pages some of
> which are invisible in our mm by the time get_user_pages() returns.

If you are talking about get_user_pages() instead of get_user_pages_fast(), this
can't be happen because page-fault takes mmap_sem too.

I would say, mmap_sem has too fat responsibility really.

> So
> although in practice I agree problems of get_user_pages_fast() with fork(2)
> are more visible, in essence they are still present with clone(2) and
> get_user_pages().
>
> 								Honza
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/26] get_user_pages() cleanup
  2013-10-08  0:27           ` KOSAKI Motohiro
@ 2013-10-08  6:06             ` Jan Kara
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Kara @ 2013-10-08  6:06 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Jan Kara, Christoph Hellwig, LKML, linux-mm, Alexander Viro,
	Andreas Dilger, Andy Walls, Arnd Bergmann, Benjamin LaHaise,
	ceph-devel, Dan Williams, David Airlie, dri-devel, Gleb Natapov,
	Greg Kroah-Hartman, hpdd-discuss, Jarod Wilson,
	Jayant Mangalampalli, Jean-Christophe Plagniol-Villard,
	Jesper Nilsson, Kai Makisara, kvm, Laurent Pinchart, linux-aio, l

On Mon 07-10-13 20:27:16, KOSAKI Motohiro wrote:
> (10/7/13 5:18 PM), Jan Kara wrote:
> >On Fri 04-10-13 16:42:19, KOSAKI Motohiro wrote:
> >>(10/4/13 4:31 PM), KOSAKI Motohiro wrote:
> >>>(10/2/13 4:29 PM), Jan Kara wrote:
> >>>>On Wed 02-10-13 09:20:09, Christoph Hellwig wrote:
> >>>>>On Wed, Oct 02, 2013 at 04:27:41PM +0200, Jan Kara wrote:
> >>>>>>    Hello,
> >>>>>>
> >>>>>>    In my quest for changing locking around page faults to make things easier for
> >>>>>>filesystems I found out get_user_pages() users could use a cleanup.  The
> >>>>>>knowledge about necessary locking for get_user_pages() is in tons of places in
> >>>>>>drivers and quite a few of them actually get it wrong (don't have mmap_sem when
> >>>>>>calling get_user_pages() or hold mmap_sem when calling copy_from_user() in the
> >>>>>>surrounding code). Rather often this actually doesn't seem necessary. This
> >>>>>>patch series converts lots of places to use either get_user_pages_fast()
> >>>>>>or a new simple wrapper get_user_pages_unlocked() to remove the knowledge
> >>>>>>of mmap_sem from the drivers. I'm still looking into converting a few remaining
> >>>>>>drivers (most notably v4l2) which are more complex.
> >>>>>
> >>>>>Even looking over the kerneldoc comment next to it I still fail to
> >>>>>understand when you'd want to use get_user_pages_fast and when not.
> >>>>    AFAIU get_user_pages_fast() should be used
> >>>>1) if you don't need any special get_user_pages() arguments (like calling
> >>>>     it for mm of a different process, forcing COW, or similar).
> >>>>2) you don't expect pages to be unmapped (then get_user_pages_fast() is
> >>>>actually somewhat slower because it walks page tables twice).
> >>>
> >>>If target page point to anon or private mapping pages, get_user_pages_fast()
> >>>is fork unsafe. O_DIRECT man pages describe a bit about this.
> >>>
> >>>
> >>>see http://man7.org/linux/man-pages/man2/open.2.html
> >>>
> >>>>       O_DIRECT I/Os should never be run concurrently with the fork(2)
> >>>>       system call, if the memory buffer is a private mapping (i.e., any
> >>>>       mapping created with the mmap(2) MAP_PRIVATE flag; this includes
> >>>>       memory allocated on the heap and statically allocated buffers).  Any
> >>>>       such I/Os, whether submitted via an asynchronous I/O interface or
> >>>>       from another thread in the process, should be completed before
> >>>>       fork(2) is called.  Failure to do so can result in data corruption
> >>>>       and undefined behavior in parent and child processes.  This
> >>>>       restriction does not apply when the memory buffer for the O_DIRECT
> >>>>       I/Os was created using shmat(2) or mmap(2) with the MAP_SHARED flag.
> >>>>       Nor does this restriction apply when the memory buffer has been
> >>>>       advised as MADV_DONTFORK with madvise(2), ensuring that it will not
> >>>>       be available to the child after fork(2).
> >>
> >>IMHO, get_user_pages_fast() should be renamed to get_user_pages_quirk(). Its
> >>semantics is not equal to get_user_pages(). When someone simply substitute
> >>get_user_pages() to get_user_pages_fast(), they might see huge trouble.
> >   I forgot about this speciality (and actually comments didn't remind me
> >:(). But thinking about this some more get_user_pages_fast() seems as save
> >as get_user_pages() in presence of threads sharing mm, doesn't it?
> 
> It depends.
> 
> If there is any guarantee that other threads don't touch the same page which
> retrieved get_user_pages(), get_user_pages_fast() give us brilliant fast way.
> Example, as far as I heard form IB guys, the userland library of the infiniband
> stack uses madvise(MADV_DONTFORK), and then they don't need to care COW issue
> and can choose fastest way. An another example is a futex. futex doesn't use
> the contents of the pages, it uses vaddr only for looking up key. Then, it
> also doesn't have COW issue.
> 
> I don't know other cases. But as far as I know, everything is case-by-case.
> 
> >Because
> >while get_user_pages() are working, other thread can happilly trigger COW
> >on some of the pages and thus get_user_pages() can return pages some of
> >which are invisible in our mm by the time get_user_pages() returns.
> 
> If you are talking about get_user_pages() instead of get_user_pages_fast(), this
> can't be happen because page-fault takes mmap_sem too.
  But both take mmap_sem for reading, so they won't exclude each other...

> I would say, mmap_sem has too fat responsibility really.
  I heartily agree with this. And that makes any changes to it really hard.
I've once went through all the places in kernel which acquired mmap_sem.
If I remember right, there were close to two hundreds of them! Basic use
is the protection of vma tree but then we have less obvious stuff like
protection of some other fields in mm_struct, fork exclusion, and god knows
what...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-10-08  6:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-02 14:27 [PATCH 0/26] get_user_pages() cleanup Jan Kara
2013-10-02 14:28 ` [PATCH 26/26] aio: Remove useless get_user_pages() call Jan Kara
2013-10-02 16:20 ` [PATCH 0/26] get_user_pages() cleanup Christoph Hellwig
2013-10-02 20:29   ` Jan Kara
2013-10-04 20:31     ` KOSAKI Motohiro
2013-10-04 20:42       ` KOSAKI Motohiro
2013-10-07 21:18         ` Jan Kara
2013-10-08  0:27           ` KOSAKI Motohiro
2013-10-08  6:06             ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).