public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET] FUSE: implement direct mmap
@ 2008-11-20 14:52 Tejun Heo
  2008-11-20 14:52 ` [PATCH 1/6] mmap: don't assume f_op->mmap() doesn't change vma->vm_file Tejun Heo
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Tejun Heo @ 2008-11-20 14:52 UTC (permalink / raw)
  To: linux-kernel, fuse-devel, miklos, akpm, npiggin


Hello,

This is the first take of fuse-implement-direct-mmap patchset.  This
patchset implements direct mmap support for FUSE (and CUSE).  Each
direct mmap area is backed by anonymous mapping (shmem_file) and the
FUSE server can decide how they are shared.

mmap request is handled in two steps.  MMAP first queries the server
whether it wants to share the mapping with an existing one or create a
new one, and if so, with which flags.  MMAP_COMMIT notifies the server
the result of mmap and if successful the fd the server can use to
access the mmap region.

This patchset contains the following six patches.

 0001-mmap-don-t-assume-f_op-mmap-doesn-t-change-vma.patch
 0002-fdtable-export-alloc_fd.patch
 0003-FUSE-don-t-let-fuse_req-end-put-the-base-referen.patch
 0004-FUSE-make-request_wait_answer-wait-for-end-co.patch
 0005-FUSE-implement-fuse_req-prep.patch
 0006-FUSE-implement-direct-mmap.patch

0001-0002 update mm and fdtable for following FUSE changes.  0003-0005
update fuse_req->end() handling and add ->prep().  0006 implements
direct mmap.

Direct mmap implementation jumps through a few hoops to override
vma->vm_file with shmem_file.  It would be great if there's a cleaner
way to achieve this.  For details, please take a look at 0006.

Nick, can you please verify that 0001 doesn't break anything and
replacing vma->vm_file in ->mmap() is okay?

Andrew, would 0002 be okay?

Thanks.

This patchset is on top of

  master (ee2f6cc7f9ea2542ad46070ed62ba7aa04d08871)
+ [1] poll-allow-f_op_poll-to-sleep-take-2
+ [2] add-cdev_release-and-convert-cdev_alloc-to-use-it
+ [3] extend-FUSE patchset, take #2
+ [4] implement-CUSE patchset, take #2

This patchset is also available in the following git tree.

 http://git.kernel.org/?p=linux/kernel/git/tj/misc.git;a=shortlog;h=fuse-mmap
 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git fuse-mmap

and contains the following changes.

 fs/file.c            |    1 
 fs/fuse/cuse.c       |    6 
 fs/fuse/dev.c        |   75 ++++++---
 fs/fuse/file.c       |  412 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/fuse/fuse_i.h     |   19 ++
 fs/fuse/inode.c      |    1 
 include/linux/fuse.h |   47 +++++
 mm/mmap.c            |    1 
 8 files changed, 533 insertions(+), 29 deletions(-)

--
tejun

[1] http://lkml.org/lkml/2008/11/20/161
[2] http://article.gmane.org/gmane.linux.kernel/727133
[3] http://lkml.org/lkml/2008/11/20/171
[4] http://lkml.org/lkml/2008/11/20/179

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/6] mmap: don't assume f_op->mmap() doesn't change vma->vm_file
  2008-11-20 14:52 [PATCHSET] FUSE: implement direct mmap Tejun Heo
@ 2008-11-20 14:52 ` Tejun Heo
  2008-11-20 14:52 ` [PATCH 2/6] fdtable: export alloc_fd() Tejun Heo
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2008-11-20 14:52 UTC (permalink / raw)
  To: linux-kernel, fuse-devel, miklos, akpm, npiggin; +Cc: Tejun Heo

mmap_region() assumes that vma->vm_file isn't changed by f_op->mmap()
and continues to use cache file after f_op->mmap() returns.  Don't
assume that.  This will be used by FUSE to redirect mmap to
shmem_file.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Nick Piggin <npiggin@suse.de>
---
 mm/mmap.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index d4855a6..2e4e0b5 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1177,6 +1177,7 @@ munmap_back:
 		vma->vm_file = file;
 		get_file(file);
 		error = file->f_op->mmap(file, vma);
+		file = vma->vm_file;
 		if (error)
 			goto unmap_and_free_vma;
 		if (vm_flags & VM_EXECUTABLE)
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/6] fdtable: export alloc_fd()
  2008-11-20 14:52 [PATCHSET] FUSE: implement direct mmap Tejun Heo
  2008-11-20 14:52 ` [PATCH 1/6] mmap: don't assume f_op->mmap() doesn't change vma->vm_file Tejun Heo
@ 2008-11-20 14:52 ` Tejun Heo
  2008-11-20 14:52 ` [PATCH 3/6] FUSE: don't let fuse_req->end() put the base reference Tejun Heo
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2008-11-20 14:52 UTC (permalink / raw)
  To: linux-kernel, fuse-devel, miklos, akpm, npiggin; +Cc: Tejun Heo

Export alloc_fd().  Will be used by FUSE.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/file.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index f313314..806b3ad 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -487,6 +487,7 @@ out:
 	spin_unlock(&files->file_lock);
 	return error;
 }
+EXPORT_SYMBOL_GPL(alloc_fd);
 
 int get_unused_fd(void)
 {
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/6] FUSE: don't let fuse_req->end() put the base reference
  2008-11-20 14:52 [PATCHSET] FUSE: implement direct mmap Tejun Heo
  2008-11-20 14:52 ` [PATCH 1/6] mmap: don't assume f_op->mmap() doesn't change vma->vm_file Tejun Heo
  2008-11-20 14:52 ` [PATCH 2/6] fdtable: export alloc_fd() Tejun Heo
@ 2008-11-20 14:52 ` Tejun Heo
  2008-11-20 14:52 ` [PATCH 4/6] FUSE: make request_wait_answer() wait for ->end() completion Tejun Heo
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2008-11-20 14:52 UTC (permalink / raw)
  To: linux-kernel, fuse-devel, miklos, akpm, npiggin; +Cc: Tejun Heo

fuse_req->end() was supposed to be put the base reference but there's
no reason why it should.  It only makes things more complex.  Move it
out of ->end() and make it the responsibility of request_end().

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/fuse/dev.c   |    5 +----
 fs/fuse/file.c  |    4 +---
 fs/fuse/inode.c |    1 -
 3 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index b8f70a0..25a134a 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -296,8 +296,7 @@ static void request_end(struct fuse_conn *fc, struct fuse_req *req)
 	wake_up(&req->waitq);
 	if (end)
 		end(fc, req);
-	else
-		fuse_put_request(fc, req);
+	fuse_put_request(fc, req);
 }
 
 static void wait_answer_interruptible(struct fuse_conn *fc,
@@ -1052,8 +1051,6 @@ static void end_io_requests(struct fuse_conn *fc)
 		wake_up(&req->waitq);
 		if (end) {
 			req->end = NULL;
-			/* The end function will consume this reference */
-			__fuse_get_request(req);
 			spin_unlock(&fc->lock);
 			wait_event(req->waitq, !req->locked);
 			end(fc, req);
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 4d535ae..128356b 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -85,7 +85,6 @@ static void fuse_release_end(struct fuse_conn *fc, struct fuse_req *req)
 {
 	dput(req->misc.release.dentry);
 	mntput(req->misc.release.vfsmount);
-	fuse_put_request(fc, req);
 }
 
 static void fuse_file_put(struct fuse_file *ff)
@@ -506,7 +505,6 @@ static void fuse_readpages_end(struct fuse_conn *fc, struct fuse_req *req)
 	}
 	if (req->ff)
 		fuse_file_put(req->ff);
-	fuse_put_request(fc, req);
 }
 
 static void fuse_send_readpages(struct fuse_req *req, struct file *file,
@@ -1056,7 +1054,6 @@ static void fuse_writepage_free(struct fuse_conn *fc, struct fuse_req *req)
 {
 	__free_page(req->pages[0]);
 	fuse_file_put(req->ff);
-	fuse_put_request(fc, req);
 }
 
 static void fuse_writepage_finish(struct fuse_conn *fc, struct fuse_req *req)
@@ -1100,6 +1097,7 @@ static void fuse_send_writepage(struct fuse_conn *fc, struct fuse_req *req)
 	fuse_writepage_finish(fc, req);
 	spin_unlock(&fc->lock);
 	fuse_writepage_free(fc, req);
+	fuse_put_request(fc, req);
 	spin_lock(&fc->lock);
 }
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index eae4ff9..75f0770 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -781,7 +781,6 @@ static void process_init_reply(struct fuse_conn *fc, struct fuse_req *req)
 		fc->max_write = max_t(unsigned, 4096, fc->max_write);
 		fc->conn_init = 1;
 	}
-	fuse_put_request(fc, req);
 	fc->blocked = 0;
 	wake_up_all(&fc->blocked_waitq);
 }
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/6] FUSE: make request_wait_answer() wait for ->end() completion
  2008-11-20 14:52 [PATCHSET] FUSE: implement direct mmap Tejun Heo
                   ` (2 preceding siblings ...)
  2008-11-20 14:52 ` [PATCH 3/6] FUSE: don't let fuse_req->end() put the base reference Tejun Heo
@ 2008-11-20 14:52 ` Tejun Heo
  2008-11-20 14:52 ` [PATCH 5/6] FUSE: implement fuse_req->prep() Tejun Heo
  2008-11-20 14:52 ` [PATCH 6/6] FUSE: implement direct mmap Tejun Heo
  5 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2008-11-20 14:52 UTC (permalink / raw)
  To: linux-kernel, fuse-devel, miklos, akpm, npiggin; +Cc: Tejun Heo

Previously, a request was marked FINISHED before ->end() is executed
and thus request_wait_answer() can return before it's done.  This
patch makes request_wait_answer() wait for ->end() to finish before
returning.

Note that no current ->end() user waits for request completion, so
this change doesn't cause any behavior difference.

While at it, beef up the comment above ->end() hook and clarify when
and where it's called.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/fuse/dev.c    |   41 +++++++++++++++++++++++++----------------
 fs/fuse/fuse_i.h |    5 ++++-
 2 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 25a134a..c83ff20 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -278,7 +278,6 @@ static void request_end(struct fuse_conn *fc, struct fuse_req *req)
 	req->end = NULL;
 	list_del(&req->list);
 	list_del(&req->intr_entry);
-	req->state = FUSE_REQ_FINISHED;
 	if (req->background) {
 		if (fc->num_background == FUSE_MAX_BACKGROUND) {
 			fc->blocked = 0;
@@ -292,10 +291,21 @@ static void request_end(struct fuse_conn *fc, struct fuse_req *req)
 		fc->active_background--;
 		flush_bg_queue(fc);
 	}
+
 	spin_unlock(&fc->lock);
-	wake_up(&req->waitq);
-	if (end)
+
+	if (end) {
 		end(fc, req);
+		smp_wmb();
+	}
+
+	/*
+	 * We own this request and wake_up() has enough memory
+	 * barrier, no need to grab spin lock to set state.
+	 */
+	req->state = FUSE_REQ_FINISHED;
+
+	wake_up(&req->waitq);
 	fuse_put_request(fc, req);
 }
 
@@ -369,17 +379,16 @@ static void request_wait_answer(struct fuse_conn *fc, struct fuse_req *req)
 		return;
 
  aborted:
-	BUG_ON(req->state != FUSE_REQ_FINISHED);
-	if (req->locked) {
-		/* This is uninterruptible sleep, because data is
-		   being copied to/from the buffers of req.  During
-		   locked state, there mustn't be any filesystem
-		   operation (e.g. page fault), since that could lead
-		   to deadlock */
-		spin_unlock(&fc->lock);
-		wait_event(req->waitq, !req->locked);
-		spin_lock(&fc->lock);
-	}
+	spin_unlock(&fc->lock);
+	wait_event(req->waitq, req->state == FUSE_REQ_FINISHED);
+	/*
+	 * This is uninterruptible sleep, because data is being copied
+	 * to/from the buffers of req.  During locked state, there
+	 * mustn't be any filesystem operation (e.g. page fault),
+	 * since that could lead to deadlock
+	 */
+	wait_event(req->waitq, !req->locked);
+	spin_lock(&fc->lock);
 }
 
 void fuse_request_send(struct fuse_conn *fc, struct fuse_req *req)
@@ -1046,9 +1055,7 @@ static void end_io_requests(struct fuse_conn *fc)
 
 		req->aborted = 1;
 		req->out.h.error = -ECONNABORTED;
-		req->state = FUSE_REQ_FINISHED;
 		list_del_init(&req->list);
-		wake_up(&req->waitq);
 		if (end) {
 			req->end = NULL;
 			spin_unlock(&fc->lock);
@@ -1056,6 +1063,8 @@ static void end_io_requests(struct fuse_conn *fc)
 			end(fc, req);
 			spin_lock(&fc->lock);
 		}
+		req->state = FUSE_REQ_FINISHED;
+		wake_up(&req->waitq);
 	}
 }
 
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 23df478..90eb42c 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -282,7 +282,10 @@ struct fuse_req {
 	/** Link on fi->writepages */
 	struct list_head writepages_entry;
 
-	/** Request completion callback */
+	/** Request completion callback.  This function is called from
+	    the kernel context of the FUSE server if the request isn't
+	    being aborted.  If the request is being aborted, it's
+	    called from the kernel context of the aborting process. */
 	void (*end)(struct fuse_conn *, struct fuse_req *);
 
 	/** Request is stolen from fuse_file->reserved_req */
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 5/6] FUSE: implement fuse_req->prep()
  2008-11-20 14:52 [PATCHSET] FUSE: implement direct mmap Tejun Heo
                   ` (3 preceding siblings ...)
  2008-11-20 14:52 ` [PATCH 4/6] FUSE: make request_wait_answer() wait for ->end() completion Tejun Heo
@ 2008-11-20 14:52 ` Tejun Heo
  2008-11-20 14:52 ` [PATCH 6/6] FUSE: implement direct mmap Tejun Heo
  5 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2008-11-20 14:52 UTC (permalink / raw)
  To: linux-kernel, fuse-devel, miklos, akpm, npiggin; +Cc: Tejun Heo

Implement ->prep() which is the opposite equivalent of ->end().  It's
called right before the request is passed to userland server in the
kernel context of the server.  ->prep() can fail the request without
disrupting the whole channel.

This will be used by direct mmap implementation.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/fuse/dev.c    |   29 ++++++++++++++++++++++++++---
 fs/fuse/fuse_i.h |    6 ++++++
 2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index c83ff20..05414b8 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -747,6 +747,7 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, const struct iovec *iov,
 			      unsigned long nr_segs, loff_t pos)
 {
 	int err;
+	bool restart;
 	struct fuse_req *req;
 	struct fuse_in *in;
 	struct fuse_copy_state cs;
@@ -793,12 +794,32 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, const struct iovec *iov,
 		goto restart;
 	}
 	spin_unlock(&fc->lock);
+
+	restart = false;
 	fuse_copy_init(&cs, fc, 1, req, iov, nr_segs);
+
+	/*
+	 * Execute prep if available.  Failure from prep doesn't
+	 * indicate faulty channel.  On failure, fail the current
+	 * request and proceed to the next one.
+	 */
+	if (req->prep) {
+		err = req->prep(fc, req);
+		if (err) {
+			restart = true;
+			goto finish;
+		}
+	}
+
 	err = fuse_copy_one(&cs, &in->h, sizeof(in->h));
-	if (!err)
-		err = fuse_copy_args(&cs, in->numargs, in->argpages,
-				     (struct fuse_arg *) in->args, 0);
+	if (err)
+		goto finish;
+
+	err = fuse_copy_args(&cs, in->numargs, in->argpages,
+			     (struct fuse_arg *) in->args, 0);
+ finish:
 	fuse_copy_finish(&cs);
+
 	spin_lock(&fc->lock);
 	req->locked = 0;
 	if (req->aborted) {
@@ -808,6 +829,8 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, const struct iovec *iov,
 	if (err) {
 		req->out.h.error = -EIO;
 		request_end(fc, req);
+		if (restart)
+			goto restart;
 		return err;
 	}
 	if (!req->isreply)
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 90eb42c..9d3becb 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -282,6 +282,12 @@ struct fuse_req {
 	/** Link on fi->writepages */
 	struct list_head writepages_entry;
 
+	/** Request preparation callback.  Called from the kernel
+	    context of the FUSE server before passing the request to
+	    the FUSE server.  Non-zero return from this function will
+	    fail the request. */
+	int (*prep)(struct fuse_conn *, struct fuse_req *);
+
 	/** Request completion callback.  This function is called from
 	    the kernel context of the FUSE server if the request isn't
 	    being aborted.  If the request is being aborted, it's
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 6/6] FUSE: implement direct mmap
  2008-11-20 14:52 [PATCHSET] FUSE: implement direct mmap Tejun Heo
                   ` (4 preceding siblings ...)
  2008-11-20 14:52 ` [PATCH 5/6] FUSE: implement fuse_req->prep() Tejun Heo
@ 2008-11-20 14:52 ` Tejun Heo
  5 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2008-11-20 14:52 UTC (permalink / raw)
  To: linux-kernel, fuse-devel, miklos, akpm, npiggin; +Cc: Tejun Heo

This patch implements direct mmap.  It allows FUSE server to honor
each mmap request with anonymous mapping.  FUSE server can make
multiple mmap requests share a single anonymous mapping or separate
mappings as it sees fit.

mmap request is handled in two steps.  MMAP first queries the server
whether it wants to share the mapping with an existing one or create a
new one, and if so, with which flags.  MMAP_COMMIT notifies the server
the result of mmap and if successful the fd the server can use to
access the mmap region.

Internally, shmem_file is used to back the mmap areas and vma->vm_file
is overridden from the FUSE file to the shmem_file.

For details, please read the comment on top of
fuse_file_direct_mmap().

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/fuse/cuse.c       |    6 +
 fs/fuse/file.c       |  408 +++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/fuse/fuse_i.h     |    8 +
 include/linux/fuse.h |   47 ++++++
 4 files changed, 468 insertions(+), 1 deletions(-)

diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c
index 048e67d..c4102df 100644
--- a/fs/fuse/cuse.c
+++ b/fs/fuse/cuse.c
@@ -183,6 +183,11 @@ static long cuse_file_compat_ioctl(struct file *file, unsigned int cmd,
 	return fuse_file_do_ioctl(file->private_data, cmd, arg, flags);
 }
 
+static int cuse_file_direct_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	return fuse_file_direct_mmap(file->private_data, vma);
+}
+
 static const struct file_operations cuse_frontend_fops = {
 	.read			= cuse_direct_read,
 	.write			= cuse_direct_write,
@@ -193,6 +198,7 @@ static const struct file_operations cuse_frontend_fops = {
 	.poll			= cuse_file_poll,
 	.unlocked_ioctl		= cuse_file_ioctl,
 	.compat_ioctl		= cuse_file_compat_ioctl,
+	.mmap			= cuse_file_direct_mmap,
 };
 
 
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 128356b..a594361 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -13,6 +13,9 @@
 #include <linux/kernel.h>
 #include <linux/sched.h>
 #include <linux/module.h>
+#include <linux/file.h>
+#include <linux/syscalls.h>
+#include <linux/mman.h>
 
 static const struct file_operations fuse_direct_io_file_operations;
 
@@ -1883,6 +1886,408 @@ int fuse_notify_poll_wakeup(struct fuse_conn *fc,
 	return 0;
 }
 
+struct fuse_mmap {
+	struct fuse_conn	*fc;	/* associated fuse_conn */
+	struct file		*file;	/* associated file */
+	struct kref		kref;	/* reference count */
+	u64			mmap_unique; /* mmap req which created this */
+	int			mmap_fd;     /* server side fd for shmem file */
+	struct file		*mmap_file;  /* shmem file backing this mmap */
+	unsigned long		start;
+	unsigned long		len;
+
+	/* our copy of vm_ops w/ open and close overridden */
+	struct vm_operations_struct vm_ops;
+};
+
+/*
+ * Create fuse_mmap structure which represents a single mmapped
+ * region.  If @mfile is specified the created fuse_mmap would be
+ * associated with it; otherwise, a new shmem_file is created.
+ */
+static struct fuse_mmap *create_fuse_mmap(struct fuse_conn *fc,
+					  struct file *file, struct file *mfile,
+					  u64 mmap_unique, int mmap_fd,
+					  struct vm_area_struct *vma)
+{
+	char dname[] = "dev/fuse";
+	loff_t off = (loff_t)vma->vm_pgoff << PAGE_SHIFT;
+	size_t len = vma->vm_end - vma->vm_start;
+	struct fuse_mmap *fmmap;
+	int err;
+
+	err = -ENOMEM;
+	fmmap = kzalloc(sizeof(*fmmap), GFP_KERNEL);
+	if (!fmmap)
+		goto fail;
+	kref_init(&fmmap->kref);
+
+	if (mfile) {
+		/*
+		 * dentry name with a slash in it can't be created
+		 * from userland, so testing dname ensures that the fd
+		 * is the one we've created.  Note that @mfile is
+		 * already grabbed by fuse_mmap_end().
+		 */
+		err = -EINVAL;
+		if (strcmp(mfile->f_dentry->d_name.name, dname))
+			goto fail;
+	} else {
+		/*
+		 * Create a new shmem_file.  As fuse direct mmaps can
+		 * be shared, offset can't be zapped to zero.  Use off
+		 * + len as the default size.  Server has a chance to
+		 * adjust this and other stuff while processing the
+		 * COMMIT request before the client sees this mmap
+		 * area.
+		 */
+		mfile = shmem_file_setup(dname, off + len, vma->vm_flags);
+		if (IS_ERR(mfile)) {
+			err = PTR_ERR(mfile);
+			goto fail;
+		}
+	}
+	fmmap->mmap_file = mfile;
+
+	fmmap->fc = fuse_conn_get(fc);
+	get_file(file);
+	fmmap->file = file;
+	fmmap->mmap_unique = mmap_unique;
+	fmmap->mmap_fd = mmap_fd;
+	fmmap->start = vma->vm_start;
+	fmmap->len = len;
+
+	return fmmap;
+
+ fail:
+	kfree(fmmap);
+	return ERR_PTR(err);
+}
+
+static void destroy_fuse_mmap(struct fuse_mmap *fmmap)
+{
+	/* mmap_file reference is managed by VM */
+	fuse_conn_put(fmmap->fc);
+	fput(fmmap->file);
+	kfree(fmmap);
+}
+
+static void fuse_vm_release(struct kref *kref)
+{
+	struct fuse_mmap *fmmap = container_of(kref, struct fuse_mmap, kref);
+	struct fuse_conn *fc = fmmap->fc;
+	struct fuse_file *ff = fmmap->file->private_data;
+	struct fuse_req *req;
+	struct fuse_munmap_in *inarg;
+
+	/* failing this might lead to resource leak in server, don't fail */
+	req = fuse_get_req_nofail(fc, fmmap->file);
+	inarg = &req->misc.munmap.in;
+
+	inarg->fh = ff->fh;
+	inarg->mmap_unique = fmmap->mmap_unique;
+	inarg->fd = fmmap->mmap_fd;
+	inarg->addr = fmmap->start;
+	inarg->len = fmmap->len;
+
+	req->in.h.opcode = FUSE_MUNMAP;
+	req->in.h.nodeid = get_node_id(fmmap->file->f_dentry->d_inode);
+	req->in.numargs = 1;
+	req->in.args[0].size = sizeof(*inarg);
+	req->in.args[0].value = inarg;
+
+	fuse_request_send_noreply(fc, req);
+
+	destroy_fuse_mmap(fmmap);
+}
+
+static void fuse_vm_open(struct vm_area_struct *vma)
+{
+	struct fuse_mmap *fmmap = vma->vm_private_data;
+
+	kref_get(&fmmap->kref);
+}
+
+static void fuse_vm_close(struct vm_area_struct *vma)
+{
+	struct fuse_mmap *fmmap = vma->vm_private_data;
+
+	kref_put(&fmmap->kref, fuse_vm_release);
+}
+
+static void fuse_mmap_end(struct fuse_conn *fc, struct fuse_req *req)
+{
+	struct fuse_mmap_out *mmap_out = req->out.args[0].value;
+	int fd = mmap_out->fd;
+	struct file *file;
+
+	/*
+	 * If aborted, we're in a different context and the server is
+	 * gonna die soon anyway.  Don't bother.
+	 */
+	if (unlikely(req->aborted))
+		return;
+
+	if (!req->out.h.error && fd >= 0) {
+		/*
+		 * fget() failure should be handled differently as the
+		 * userland is expecting MMAP_COMMIT.  Set ERR_PTR
+		 * value in misc.mmap.file instead of setting
+		 * out.h.error.
+		 */
+		file = fget(fd);
+		if (!file)
+			file = ERR_PTR(-EBADF);
+		req->misc.mmap.file = file;
+	}
+}
+
+static int fuse_mmap_commit_prep(struct fuse_conn *fc, struct fuse_req *req)
+{
+	struct fuse_mmap_commit_in *commit_in = (void *)req->in.args[0].value;
+	struct file *mfile = req->misc.mmap.file;
+	int fd;
+
+	if (!mfile)
+		return 0;
+
+	/* new mmap.file has been created, assign a fd to it */
+	fd = commit_in->fd = get_unused_fd_flags(O_CLOEXEC);
+	if (fd < 0)
+		return 0;
+
+	get_file(mfile);
+	fd_install(fd, mfile);
+	return 0;
+}
+
+static void fuse_mmap_commit_end(struct fuse_conn *fc, struct fuse_req *req)
+{
+	struct fuse_mmap_commit_in *commit_in = (void *)req->in.args[0].value;
+
+	/*
+	 * If aborted, we're in a different context and the server is
+	 * gonna die soon anyway.  Don't bother.
+	 */
+	if (unlikely(req->aborted))
+		return;
+
+	/*
+	 * If a new fd was assigned to mmap.file but the request
+	 * failed, close the fd.
+	 */
+	if (req->misc.mmap.file && commit_in->fd >= 0 && req->out.h.error)
+		sys_close(commit_in->fd);
+}
+
+/*
+ * Direct mmap is implemented using two requests - FUSE_MMAP and
+ * FUSE_MMAP_COMMIT.  This is to allow the userland server to choose
+ * whether to share an existing mmap or create a new one.
+ *
+ * Each separate mmap area is backed by a shmem_file (an anonymous
+ * mapping).  If the server specifies fd to an existing shmem_file
+ * created by previous FUSE_MMAP_COMMIT, the shmem_file for that
+ * mapping is reused.  If not, a new shmem_file is created and a new
+ * fd is opened and notified to the server via FUSE_MMAP_COMMIT.
+ *
+ * Because the server might allocate resources on FUSE_MMAP, FUSE
+ * guarantees that FUSE_MMAP_COMMIT will be sent whether the mmap
+ * attempt succeeds or not.  On failure, commit_in.fd will contain
+ * negative error code; otherwise, it will contain the fd for the
+ * shmem_file.  The server is then free to truncate the fd to desired
+ * size and fill in the content.  The client will only see the area
+ * only after COMMIT is successfully replied.  If the server fails the
+ * COMMIT request and new fd has been allocated for it, the fd will be
+ * automatically closed by the kernel.
+ *
+ * FUSE guarantees that MUNMAP request will be sent when the area gets
+ * unmapped.
+ *
+ * The server can associate the three related requests - MMAP,
+ * MMAP_COMMIT and MUNMAP using ->unique of the MMAP request.  The
+ * latter two requests carry ->mmap_unique field which contains
+ * ->unique of the MMAP request.
+ */
+int fuse_file_direct_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct inode *inode = file->f_dentry->d_inode;
+	struct fuse_file *ff = file->private_data;
+	struct fuse_conn *fc = get_fuse_conn(inode);
+	struct fuse_mmap *fmmap = NULL;
+	struct fuse_req *req;
+	struct fuse_mmap_in mmap_in;
+	struct fuse_mmap_out mmap_out;
+	struct fuse_mmap_commit_in commit_in;
+	struct file *mfile;
+	u64 mmap_unique;
+	int err;
+
+	/*
+	 * First, execute FUSE_MMAP which will query the server
+	 * whether this mmap request is valid and which fd it wants to
+	 * use to mmap this request.
+	 */
+	req = fuse_get_req(fc);
+	if (IS_ERR(req)) {
+		err = PTR_ERR(req);
+		goto err;
+	}
+
+	memset(&mmap_in, 0, sizeof(mmap_in));
+	mmap_in.fh = ff->fh;
+	mmap_in.addr = vma->vm_start;
+	mmap_in.len = vma->vm_end - vma->vm_start;
+	mmap_in.prot = ((vma->vm_flags & VM_READ) ? PROT_READ : 0) |
+		       ((vma->vm_flags & VM_WRITE) ? PROT_WRITE : 0) |
+		       ((vma->vm_flags & VM_EXEC) ? PROT_EXEC : 0);
+	mmap_in.flags = ((vma->vm_flags & VM_GROWSDOWN) ? MAP_GROWSDOWN : 0) |
+			((vma->vm_flags & VM_DENYWRITE) ? MAP_DENYWRITE : 0) |
+			((vma->vm_flags & VM_EXECUTABLE) ? MAP_EXECUTABLE : 0) |
+			((vma->vm_flags & VM_LOCKED) ? MAP_LOCKED : 0);
+	mmap_in.offset = (loff_t)vma->vm_pgoff << PAGE_SHIFT;
+
+	req->in.h.opcode = FUSE_MMAP;
+	req->in.h.nodeid = get_node_id(inode);
+	req->in.numargs = 1;
+	req->in.args[0].size = sizeof(mmap_in);
+	req->in.args[0].value = &mmap_in;
+	req->out.numargs = 1;
+	req->out.args[0].size = sizeof(mmap_out);
+	req->out.args[0].value = &mmap_out;
+
+	req->end = fuse_mmap_end;
+
+	fuse_request_send(fc, req);
+
+	/* mmap.file is set if server requested to reuse existing mapping */
+	mfile = req->misc.mmap.file;
+	mmap_unique = req->in.h.unique;
+	err = req->out.h.error;
+
+	fuse_put_request(fc, req);
+
+	/* ERR_PTR value in mfile means fget failure, send failure COMMIT */
+	if (IS_ERR(mfile)) {
+		err = PTR_ERR(mfile);
+		goto commit;
+	}
+	/* userland indicated failure, we can just fail */
+	if (err)
+		goto err;
+
+	/*
+	 * Second, create mmap as the server requested.
+	 */
+	fmmap = create_fuse_mmap(fc, file, mfile, mmap_unique, mmap_out.fd,
+				 vma);
+	if (IS_ERR(fmmap)) {
+		err = PTR_ERR(fmmap);
+		if (mfile)
+			fput(mfile);
+		fmmap = NULL;
+		goto commit;
+	}
+
+	/*
+	 * fmmap points to shm_file to mmap, give it to vma.  From
+	 * this point on, the mfile reference is managed by the vma.
+	 */
+	mfile = fmmap->mmap_file;
+	fput(vma->vm_file);
+	vma->vm_file = mfile;
+
+	/* add flags server requested and mmap the shm_file */
+	if (mmap_out.flags & FUSE_MMAP_DONT_COPY)
+		vma->vm_flags |= VM_DONTCOPY;
+	if (mmap_out.flags & FUSE_MMAP_DONT_EXPAND)
+		vma->vm_flags |= VM_DONTEXPAND;
+
+	err = mfile->f_op->mmap(mfile, vma);
+	if (err)
+		goto commit;
+
+	/*
+	 * Override vm_ops->open and ->close.  This is a bit hacky but
+	 * vma's can't easily be nested and FUSE needs to notify the
+	 * server when to release resources for mmaps.  Both shmem and
+	 * tiny_shmem implementations are okay with this trick but if
+	 * there's a cleaner way to do this, please update it.
+	 */
+	err = -EINVAL;
+	if (vma->vm_ops->open || vma->vm_ops->close || vma->vm_private_data) {
+		printk(KERN_ERR "FUSE: can't do direct mmap. shmem mmap has "
+		       "open, close or vm_private_data\n");
+		goto commit;
+	}
+
+	fmmap->vm_ops = *vma->vm_ops;
+	vma->vm_ops = &fmmap->vm_ops;
+	vma->vm_ops->open = fuse_vm_open;
+	vma->vm_ops->close = fuse_vm_close;
+	vma->vm_private_data = fmmap;
+	err = 0;
+
+ commit:
+	/*
+	 * Third, either mmap succeeded or failed after MMAP request
+	 * succeeded.  Notify userland what happened.
+	 */
+
+	/* missing commit can cause resource leak on server side, don't fail */
+	req = fuse_get_req_nofail(fc, file);
+
+	memset(&commit_in, 0, sizeof(commit_in));
+	commit_in.fh = ff->fh;
+	commit_in.mmap_unique = mmap_unique;
+	commit_in.addr = mmap_in.addr;
+	commit_in.len = mmap_in.len;
+	commit_in.prot = mmap_in.prot;
+	commit_in.flags = mmap_in.flags;
+	commit_in.offset = mmap_in.offset;
+
+	if (!err) {
+		commit_in.fd = fmmap->mmap_fd;
+		/*
+		 * If fmmap->mmap_fd < 0, new fd needs to be created
+		 * when the server reads MMAP_COMMIT.  Pass the file
+		 * pointer.  A fd will be assigned to it by the
+		 * fuse_mmap_commit_prep callback.
+		 */
+		if (fmmap->mmap_fd < 0)
+			req->misc.mmap.file = mfile;
+	} else
+		commit_in.fd = err;
+
+	req->in.h.opcode = FUSE_MMAP_COMMIT;
+	req->in.h.nodeid = get_node_id(inode);
+	req->in.numargs = 1;
+	req->in.args[0].size = sizeof(commit_in);
+	req->in.args[0].value = &commit_in;
+
+	req->prep = fuse_mmap_commit_prep;
+	req->end = fuse_mmap_commit_end;
+
+	fuse_request_send(fc, req);
+	if (!err)			/* notified failure to userland */
+		err = req->out.h.error;
+	if (!err && commit_in.fd < 0)	/* failed to allocate fd */
+		err = commit_in.fd;
+	fuse_put_request(fc, req);
+
+	if (!err) {
+		fmmap->mmap_fd = commit_in.fd;
+		return 0;
+	}
+
+	/* fall through */
+ err:
+	if (fmmap)
+		destroy_fuse_mmap(fmmap);
+	return err;
+}
+EXPORT_SYMBOL_GPL(fuse_file_direct_mmap);
+
 static const struct file_operations fuse_file_operations = {
 	.llseek		= fuse_file_llseek,
 	.read		= do_sync_read,
@@ -1915,7 +2320,8 @@ static const struct file_operations fuse_direct_io_file_operations = {
 	.unlocked_ioctl	= fuse_file_ioctl,
 	.compat_ioctl	= fuse_file_compat_ioctl,
 	.poll		= fuse_file_poll,
-	/* no mmap and splice_read */
+	.mmap		= fuse_file_direct_mmap,
+	/* no splice_read */
 };
 
 static const struct address_space_operations fuse_file_aops  = {
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 9d3becb..016ed54 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -262,6 +262,13 @@ struct fuse_req {
 			struct fuse_write_out out;
 		} write;
 		struct fuse_lk_in lk_in;
+		struct {
+			/** to move filp for mmap between client and server */
+			struct file *file;
+		} mmap;
+		struct {
+			struct fuse_munmap_in in;
+		} munmap;
 	} misc;
 
 	/** page vector */
@@ -572,6 +579,7 @@ int fuse_file_lock(struct file *file, int cmd, struct file_lock *fl);
 int fuse_file_flock(struct file *file, int cmd, struct file_lock *fl);
 long fuse_file_do_ioctl(struct file *file, unsigned int cmd, unsigned long arg,
 			unsigned int flags);
+int fuse_file_direct_mmap(struct file *file, struct vm_area_struct *vma);
 
 /**
  * Notify poll wakeup
diff --git a/include/linux/fuse.h b/include/linux/fuse.h
index 5842560..5d150b3 100644
--- a/include/linux/fuse.h
+++ b/include/linux/fuse.h
@@ -170,6 +170,15 @@ struct fuse_file_lock {
  */
 #define FUSE_POLL_SCHEDULE_NOTIFY (1 << 0)
 
+/**
+ * Mmap flags
+ *
+ * FUSE_MMAP_DONT_COPY: don't copy the region on fork
+ * FUSE_MMAP_DONT_EXPAND: can't be expanded with mremap()
+ */
+#define FUSE_MMAP_DONT_COPY	(1 << 0)
+#define FUSE_MMAP_DONT_EXPAND	(1 << 1)
+
 enum fuse_opcode {
 	FUSE_LOOKUP	   = 1,
 	FUSE_FORGET	   = 2,  /* no reply */
@@ -209,6 +218,9 @@ enum fuse_opcode {
 	FUSE_DESTROY       = 38,
 	FUSE_IOCTL         = 39,
 	FUSE_POLL          = 40,
+	FUSE_MMAP          = 41,
+	FUSE_MMAP_COMMIT   = 42,
+	FUSE_MUNMAP        = 43,
 
 	CUSE_BASE          = 4096,
 };
@@ -448,6 +460,41 @@ struct fuse_notify_poll_wakeup_out {
 	__u64	kh;
 };
 
+struct fuse_mmap_in {
+	__u64	fh;
+	__u64	addr;
+	__u64	len;
+	__s32	prot;
+	__s32	flags;
+	__u64	offset;
+};
+
+struct fuse_mmap_out {
+	__s32	fd;
+	__u32	flags;
+};
+
+struct fuse_mmap_commit_in {
+	__u64	fh;
+	__u64	mmap_unique;
+	__u64	addr;
+	__u64	len;
+	__s32	prot;
+	__s32	flags;
+	__s32	fd;
+	__u32	padding;
+	__u64	offset;
+};
+
+struct fuse_munmap_in {
+	__u64	fh;
+	__u64	mmap_unique;
+	__u64	addr;
+	__u64	len;
+	__s32	fd;
+	__u32	padding;
+};
+
 struct fuse_in_header {
 	__u32	len;
 	__u32	opcode;
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-11-20 15:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-20 14:52 [PATCHSET] FUSE: implement direct mmap Tejun Heo
2008-11-20 14:52 ` [PATCH 1/6] mmap: don't assume f_op->mmap() doesn't change vma->vm_file Tejun Heo
2008-11-20 14:52 ` [PATCH 2/6] fdtable: export alloc_fd() Tejun Heo
2008-11-20 14:52 ` [PATCH 3/6] FUSE: don't let fuse_req->end() put the base reference Tejun Heo
2008-11-20 14:52 ` [PATCH 4/6] FUSE: make request_wait_answer() wait for ->end() completion Tejun Heo
2008-11-20 14:52 ` [PATCH 5/6] FUSE: implement fuse_req->prep() Tejun Heo
2008-11-20 14:52 ` [PATCH 6/6] FUSE: implement direct mmap Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox