* [PATCH 0/7] fuse: file locking + misc
@ 2006-06-12 12:21 Miklos Szeredi
2006-06-12 12:25 ` [PATCH 1/7] fuse: use MISC_MAJOR Miklos Szeredi
` (6 more replies)
0 siblings, 7 replies; 16+ messages in thread
From: Miklos Szeredi @ 2006-06-12 12:21 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel
The following patches add POSIX file locking to the fuse interface.
Additional changes ralated to this are:
- asynchronous interrupt of requests by SIGKILL no longer supported
- separate control filesystem, instead of using sysfs objects
- add support for synchronously interrupting requests
Details are documented in Documentation/filesystems/fuse.txt
throughout the patches.
Note: this series depends on
vfs-add-lock-owner-argument-to-flush-operation.patch
to compile and to some extent on
remove-steal_locks.patch
to work correctly.
Miklos
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/7] fuse: use MISC_MAJOR
2006-06-12 12:21 [PATCH 0/7] fuse: file locking + misc Miklos Szeredi
@ 2006-06-12 12:25 ` Miklos Szeredi
2006-06-12 12:27 ` [PATCH 2/7] fuse: no backgrounding on interrupt Miklos Szeredi
` (5 subsequent siblings)
6 siblings, 0 replies; 16+ messages in thread
From: Miklos Szeredi @ 2006-06-12 12:25 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel
From: Jan Engelhardt <jengelh@linux01.gwdg.de>
Have fuse.h use MISC_MAJOR rather than a hardcoded '10'.
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
---
Index: linux/include/linux/fuse.h
===================================================================
--- linux.orig/include/linux/fuse.h 2006-06-12 14:09:22.000000000 +0200
+++ linux/include/linux/fuse.h 2006-06-12 14:09:54.000000000 +0200
@@ -9,6 +9,7 @@
/* This file defines the kernel interface of FUSE */
#include <asm/types.h>
+#include <linux/major.h>
/** Version number of this interface */
#define FUSE_KERNEL_VERSION 7
@@ -20,7 +21,7 @@
#define FUSE_ROOT_ID 1
/** The major number of the fuse character device */
-#define FUSE_MAJOR 10
+#define FUSE_MAJOR MISC_MAJOR
/** The minor number of the fuse character device */
#define FUSE_MINOR 229
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 2/7] fuse: no backgrounding on interrupt
2006-06-12 12:21 [PATCH 0/7] fuse: file locking + misc Miklos Szeredi
2006-06-12 12:25 ` [PATCH 1/7] fuse: use MISC_MAJOR Miklos Szeredi
@ 2006-06-12 12:27 ` Miklos Szeredi
2006-06-12 12:28 ` [PATCH 3/7] fuse: add control filesystem Miklos Szeredi
` (4 subsequent siblings)
6 siblings, 0 replies; 16+ messages in thread
From: Miklos Szeredi @ 2006-06-12 12:27 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel
Don't put requests into the background when a fatal interrupt occurs
while the request is in userspace. This removes a major wart from the
implementation.
Backgrounding of requests was introduced to allow breaking of
deadlocks. However now the same can be achieved by aborting the
filesystem through the 'abort' sysfs attribute.
This is a change in the interface, but should not cause problems,
since these kinds of deadlocks never happen during normal operation.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
---
Index: linux/fs/fuse/inode.c
===================================================================
--- linux.orig/fs/fuse/inode.c 2006-06-12 14:09:21.000000000 +0200
+++ linux/fs/fuse/inode.c 2006-06-12 14:09:56.000000000 +0200
@@ -11,7 +11,6 @@
#include <linux/pagemap.h>
#include <linux/slab.h>
#include <linux/file.h>
-#include <linux/mount.h>
#include <linux/seq_file.h>
#include <linux/init.h>
#include <linux/module.h>
@@ -205,20 +204,14 @@ static void fuse_put_super(struct super_
{
struct fuse_conn *fc = get_fuse_conn_super(sb);
- down_write(&fc->sbput_sem);
- while (!list_empty(&fc->background))
- fuse_release_background(fc,
- list_entry(fc->background.next,
- struct fuse_req, bg_entry));
-
spin_lock(&fc->lock);
- fc->mounted = 0;
fc->connected = 0;
+ fc->blocked = 0;
spin_unlock(&fc->lock);
- up_write(&fc->sbput_sem);
/* Flush all readers on this fs */
kill_fasync(&fc->fasync, SIGIO, POLL_IN);
wake_up_all(&fc->waitq);
+ wake_up_all(&fc->blocked_waitq);
kobject_del(&fc->kobj);
kobject_put(&fc->kobj);
}
@@ -386,8 +379,6 @@ static struct fuse_conn *new_conn(void)
INIT_LIST_HEAD(&fc->pending);
INIT_LIST_HEAD(&fc->processing);
INIT_LIST_HEAD(&fc->io);
- INIT_LIST_HEAD(&fc->background);
- init_rwsem(&fc->sbput_sem);
kobj_set_kset_s(fc, connections_subsys);
kobject_init(&fc->kobj);
atomic_set(&fc->num_waiting, 0);
@@ -543,7 +534,6 @@ static int fuse_fill_super(struct super_
goto err_kobject_del;
sb->s_root = root_dentry;
- fc->mounted = 1;
fc->connected = 1;
kobject_get(&fc->kobj);
file->private_data = fc;
Index: linux/fs/fuse/dev.c
===================================================================
--- linux.orig/fs/fuse/dev.c 2006-06-12 14:09:21.000000000 +0200
+++ linux/fs/fuse/dev.c 2006-06-12 14:09:56.000000000 +0200
@@ -64,18 +64,6 @@ static void restore_sigs(sigset_t *oldse
sigprocmask(SIG_SETMASK, oldset, NULL);
}
-/*
- * Reset request, so that it can be reused
- *
- * The caller must be _very_ careful to make sure, that it is holding
- * the only reference to req
- */
-void fuse_reset_request(struct fuse_req *req)
-{
- BUG_ON(atomic_read(&req->count) != 1);
- fuse_request_init(req);
-}
-
static void __fuse_get_request(struct fuse_req *req)
{
atomic_inc(&req->count);
@@ -103,6 +91,10 @@ struct fuse_req *fuse_get_req(struct fus
if (intr)
goto out;
+ err = -ENOTCONN;
+ if (!fc->connected)
+ goto out;
+
req = fuse_request_alloc();
err = -ENOMEM;
if (!req)
@@ -129,113 +121,38 @@ void fuse_put_request(struct fuse_conn *
}
/*
- * Called with sbput_sem held for read (request_end) or write
- * (fuse_put_super). By the time fuse_put_super() is finished, all
- * inodes belonging to background requests must be released, so the
- * iputs have to be done within the locked region.
- */
-void fuse_release_background(struct fuse_conn *fc, struct fuse_req *req)
-{
- iput(req->inode);
- iput(req->inode2);
- spin_lock(&fc->lock);
- list_del(&req->bg_entry);
- if (fc->num_background == FUSE_MAX_BACKGROUND) {
- fc->blocked = 0;
- wake_up_all(&fc->blocked_waitq);
- }
- fc->num_background--;
- spin_unlock(&fc->lock);
-}
-
-/*
* This function is called when a request is finished. Either a reply
* has arrived or it was interrupted (and not yet sent) or some error
* occurred during communication with userspace, or the device file
- * was closed. In case of a background request the reference to the
- * stored objects are released. The requester thread is woken up (if
- * still waiting), the 'end' callback is called if given, else the
- * reference to the request is released
- *
- * Releasing extra reference for foreground requests must be done
- * within the same locked region as setting state to finished. This
- * is because fuse_reset_request() may be called after request is
- * finished and it must be the sole possessor. If request is
- * interrupted and put in the background, it will return with an error
- * and hence never be reset and reused.
+ * was closed. The requester thread is woken up (if still waiting),
+ * the 'end' callback is called if given, else the reference to the
+ * request is released
*
* Called with fc->lock, unlocks it
*/
static void request_end(struct fuse_conn *fc, struct fuse_req *req)
{
+ void (*end) (struct fuse_conn *, struct fuse_req *) = req->end;
+ req->end = NULL;
list_del(&req->list);
req->state = FUSE_REQ_FINISHED;
- if (!req->background) {
- spin_unlock(&fc->lock);
- wake_up(&req->waitq);
- fuse_put_request(fc, req);
- } else {
- void (*end) (struct fuse_conn *, struct fuse_req *) = req->end;
- req->end = NULL;
- spin_unlock(&fc->lock);
- down_read(&fc->sbput_sem);
- if (fc->mounted)
- fuse_release_background(fc, req);
- up_read(&fc->sbput_sem);
-
- /* fput must go outside sbput_sem, otherwise it can deadlock */
- if (req->file)
- fput(req->file);
-
- if (end)
- end(fc, req);
- else
- fuse_put_request(fc, req);
+ if (req->background) {
+ if (fc->num_background == FUSE_MAX_BACKGROUND) {
+ fc->blocked = 0;
+ wake_up_all(&fc->blocked_waitq);
+ }
+ fc->num_background--;
}
-}
-
-/*
- * Unfortunately request interruption not just solves the deadlock
- * problem, it causes problems too. These stem from the fact, that an
- * interrupted request is continued to be processed in userspace,
- * while all the locks and object references (inode and file) held
- * during the operation are released.
- *
- * To release the locks is exactly why there's a need to interrupt the
- * request, so there's not a lot that can be done about this, except
- * introduce additional locking in userspace.
- *
- * More important is to keep inode and file references until userspace
- * has replied, otherwise FORGET and RELEASE could be sent while the
- * inode/file is still used by the filesystem.
- *
- * For this reason the concept of "background" request is introduced.
- * An interrupted request is backgrounded if it has been already sent
- * to userspace. Backgrounding involves getting an extra reference to
- * inode(s) or file used in the request, and adding the request to
- * fc->background list. When a reply is received for a background
- * request, the object references are released, and the request is
- * removed from the list. If the filesystem is unmounted while there
- * are still background requests, the list is walked and references
- * are released as if a reply was received.
- *
- * There's one more use for a background request. The RELEASE message is
- * always sent as background, since it doesn't return an error or
- * data.
- */
-static void background_request(struct fuse_conn *fc, struct fuse_req *req)
-{
- req->background = 1;
- list_add(&req->bg_entry, &fc->background);
- fc->num_background++;
- if (fc->num_background == FUSE_MAX_BACKGROUND)
- fc->blocked = 1;
- if (req->inode)
- req->inode = igrab(req->inode);
- if (req->inode2)
- req->inode2 = igrab(req->inode2);
+ spin_unlock(&fc->lock);
+ dput(req->dentry);
+ mntput(req->vfsmount);
if (req->file)
- get_file(req->file);
+ fput(req->file);
+ wake_up(&req->waitq);
+ if (end)
+ end(fc, req);
+ else
+ fuse_put_request(fc, req);
}
/* Called with fc->lock held. Releases, and then reacquires it. */
@@ -244,9 +161,14 @@ static void request_wait_answer(struct f
sigset_t oldset;
spin_unlock(&fc->lock);
- block_sigs(&oldset);
- wait_event_interruptible(req->waitq, req->state == FUSE_REQ_FINISHED);
- restore_sigs(&oldset);
+ if (req->force)
+ wait_event(req->waitq, req->state == FUSE_REQ_FINISHED);
+ else {
+ block_sigs(&oldset);
+ wait_event_interruptible(req->waitq,
+ req->state == FUSE_REQ_FINISHED);
+ restore_sigs(&oldset);
+ }
spin_lock(&fc->lock);
if (req->state == FUSE_REQ_FINISHED && !req->interrupted)
return;
@@ -268,8 +190,11 @@ static void request_wait_answer(struct f
if (req->state == FUSE_REQ_PENDING) {
list_del(&req->list);
__fuse_put_request(req);
- } else if (req->state == FUSE_REQ_SENT)
- background_request(fc, req);
+ } else if (req->state == FUSE_REQ_SENT) {
+ spin_unlock(&fc->lock);
+ wait_event(req->waitq, req->state == FUSE_REQ_FINISHED);
+ spin_lock(&fc->lock);
+ }
}
static unsigned len_args(unsigned numargs, struct fuse_arg *args)
@@ -327,8 +252,12 @@ void request_send(struct fuse_conn *fc,
static void request_send_nowait(struct fuse_conn *fc, struct fuse_req *req)
{
spin_lock(&fc->lock);
- background_request(fc, req);
if (fc->connected) {
+ req->background = 1;
+ fc->num_background++;
+ if (fc->num_background == FUSE_MAX_BACKGROUND)
+ fc->blocked = 1;
+
queue_request(fc, req);
spin_unlock(&fc->lock);
} else {
@@ -883,10 +812,12 @@ void fuse_abort_conn(struct fuse_conn *f
spin_lock(&fc->lock);
if (fc->connected) {
fc->connected = 0;
+ fc->blocked = 0;
end_io_requests(fc);
end_requests(fc, &fc->pending);
end_requests(fc, &fc->processing);
wake_up_all(&fc->waitq);
+ wake_up_all(&fc->blocked_waitq);
kill_fasync(&fc->fasync, SIGIO, POLL_IN);
}
spin_unlock(&fc->lock);
Index: linux/fs/fuse/fuse_i.h
===================================================================
--- linux.orig/fs/fuse/fuse_i.h 2006-06-12 14:09:21.000000000 +0200
+++ linux/fs/fuse/fuse_i.h 2006-06-12 14:09:56.000000000 +0200
@@ -8,12 +8,12 @@
#include <linux/fuse.h>
#include <linux/fs.h>
+#include <linux/mount.h>
#include <linux/wait.h>
#include <linux/list.h>
#include <linux/spinlock.h>
#include <linux/mm.h>
#include <linux/backing-dev.h>
-#include <asm/semaphore.h>
/** Max number of pages that can be used in a single read request */
#define FUSE_MAX_PAGES_PER_REQ 32
@@ -135,9 +135,6 @@ struct fuse_req {
fuse_conn */
struct list_head list;
- /** Entry on the background list */
- struct list_head bg_entry;
-
/** refcount */
atomic_t count;
@@ -150,6 +147,9 @@ struct fuse_req {
/** True if the request has reply */
unsigned isreply:1;
+ /** Force sending of the request even if interrupted */
+ unsigned force:1;
+
/** The request was interrupted */
unsigned interrupted:1;
@@ -192,15 +192,15 @@ struct fuse_req {
/** offset of data on first page */
unsigned page_offset;
- /** Inode used in the request */
- struct inode *inode;
-
- /** Second inode used in the request (or NULL) */
- struct inode *inode2;
-
/** File used in the request (or NULL) */
struct file *file;
+ /** vfsmount used in release */
+ struct vfsmount *vfsmount;
+
+ /** dentry used in release */
+ struct dentry *dentry;
+
/** Request completion callback */
void (*end)(struct fuse_conn *, struct fuse_req *);
};
@@ -243,10 +243,6 @@ struct fuse_conn {
/** The list of requests under I/O */
struct list_head io;
- /** Requests put in the background (RELEASE or any other
- interrupted request) */
- struct list_head background;
-
/** Number of requests currently in the background */
unsigned num_background;
@@ -258,15 +254,9 @@ struct fuse_conn {
/** waitq for blocked connection */
wait_queue_head_t blocked_waitq;
- /** RW semaphore for exclusion with fuse_put_super() */
- struct rw_semaphore sbput_sem;
-
/** The next unique request id */
u64 reqctr;
- /** Mount is active */
- unsigned mounted;
-
/** Connection established, cleared on umount, connection
abort and device release */
unsigned connected;
@@ -383,12 +373,9 @@ void fuse_file_free(struct fuse_file *ff
void fuse_finish_open(struct inode *inode, struct file *file,
struct fuse_file *ff, struct fuse_open_out *outarg);
-/**
- * Send a RELEASE request
- */
-void fuse_send_release(struct fuse_conn *fc, struct fuse_file *ff,
- u64 nodeid, struct inode *inode, int flags, int isdir);
-
+/** */
+struct fuse_req *fuse_release_fill(struct fuse_file *ff, u64 nodeid, int flags,
+ int opcode);
/**
* Send RELEASE or RELEASEDIR request
*/
@@ -446,11 +433,6 @@ struct fuse_req *fuse_request_alloc(void
void fuse_request_free(struct fuse_req *req);
/**
- * Reinitialize a request, the preallocated flag is left unmodified
- */
-void fuse_reset_request(struct fuse_req *req);
-
-/**
* Reserve a preallocated request
*/
struct fuse_req *fuse_get_req(struct fuse_conn *fc);
@@ -476,11 +458,6 @@ void request_send_noreply(struct fuse_co
*/
void request_send_background(struct fuse_conn *fc, struct fuse_req *req);
-/**
- * Release inodes and file associated with background request
- */
-void fuse_release_background(struct fuse_conn *fc, struct fuse_req *req);
-
/* Abort all requests */
void fuse_abort_conn(struct fuse_conn *fc);
Index: linux/fs/fuse/dir.c
===================================================================
--- linux.orig/fs/fuse/dir.c 2006-06-12 14:09:21.000000000 +0200
+++ linux/fs/fuse/dir.c 2006-06-12 14:09:56.000000000 +0200
@@ -1,6 +1,6 @@
/*
FUSE: Filesystem in Userspace
- Copyright (C) 2001-2005 Miklos Szeredi <miklos@szeredi.hu>
+ Copyright (C) 2001-2006 Miklos Szeredi <miklos@szeredi.hu>
This program can be distributed under the terms of the GNU GPL.
See the file COPYING.
@@ -79,7 +79,6 @@ static void fuse_lookup_init(struct fuse
{
req->in.h.opcode = FUSE_LOOKUP;
req->in.h.nodeid = get_node_id(dir);
- req->inode = dir;
req->in.numargs = 1;
req->in.args[0].size = entry->d_name.len + 1;
req->in.args[0].value = entry->d_name.name;
@@ -225,6 +224,20 @@ static struct dentry *fuse_lookup(struct
}
/*
+ * Synchronous release for the case when something goes wrong in CREATE_OPEN
+ */
+static void fuse_sync_release(struct fuse_conn *fc, struct fuse_file *ff,
+ u64 nodeid, int flags)
+{
+ struct fuse_req *req;
+
+ req = fuse_release_fill(ff, nodeid, flags, FUSE_RELEASE);
+ req->force = 1;
+ request_send(fc, req);
+ fuse_put_request(fc, req);
+}
+
+/*
* Atomic create+open operation
*
* If the filesystem doesn't support this, then fall back to separate
@@ -237,6 +250,7 @@ static int fuse_create_open(struct inode
struct inode *inode;
struct fuse_conn *fc = get_fuse_conn(dir);
struct fuse_req *req;
+ struct fuse_req *forget_req;
struct fuse_open_in inarg;
struct fuse_open_out outopen;
struct fuse_entry_out outentry;
@@ -247,9 +261,14 @@ static int fuse_create_open(struct inode
if (fc->no_create)
return -ENOSYS;
+ forget_req = fuse_get_req(fc);
+ if (IS_ERR(forget_req))
+ return PTR_ERR(forget_req);
+
req = fuse_get_req(fc);
+ err = PTR_ERR(req);
if (IS_ERR(req))
- return PTR_ERR(req);
+ goto out_put_forget_req;
err = -ENOMEM;
ff = fuse_file_alloc();
@@ -262,7 +281,6 @@ static int fuse_create_open(struct inode
inarg.mode = mode;
req->in.h.opcode = FUSE_CREATE;
req->in.h.nodeid = get_node_id(dir);
- req->inode = dir;
req->in.numargs = 2;
req->in.args[0].size = sizeof(inarg);
req->in.args[0].value = &inarg;
@@ -285,25 +303,23 @@ static int fuse_create_open(struct inode
if (!S_ISREG(outentry.attr.mode) || invalid_nodeid(outentry.nodeid))
goto out_free_ff;
+ fuse_put_request(fc, req);
inode = fuse_iget(dir->i_sb, outentry.nodeid, outentry.generation,
&outentry.attr);
- err = -ENOMEM;
if (!inode) {
flags &= ~(O_CREAT | O_EXCL | O_TRUNC);
ff->fh = outopen.fh;
- /* Special release, with inode = NULL, this will
- trigger a 'forget' request when the release is
- complete */
- fuse_send_release(fc, ff, outentry.nodeid, NULL, flags, 0);
- goto out_put_request;
+ fuse_sync_release(fc, ff, outentry.nodeid, flags);
+ fuse_send_forget(fc, forget_req, outentry.nodeid, 1);
+ return -ENOMEM;
}
- fuse_put_request(fc, req);
+ fuse_put_request(fc, forget_req);
d_instantiate(entry, inode);
fuse_change_timeout(entry, &outentry);
file = lookup_instantiate_filp(nd, entry, generic_file_open);
if (IS_ERR(file)) {
ff->fh = outopen.fh;
- fuse_send_release(fc, ff, outentry.nodeid, inode, flags, 0);
+ fuse_sync_release(fc, ff, outentry.nodeid, flags);
return PTR_ERR(file);
}
fuse_finish_open(inode, file, ff, &outopen);
@@ -313,6 +329,8 @@ static int fuse_create_open(struct inode
fuse_file_free(ff);
out_put_request:
fuse_put_request(fc, req);
+ out_put_forget_req:
+ fuse_put_request(fc, forget_req);
return err;
}
@@ -328,7 +346,6 @@ static int create_new_entry(struct fuse_
int err;
req->in.h.nodeid = get_node_id(dir);
- req->inode = dir;
req->out.numargs = 1;
req->out.args[0].size = sizeof(outarg);
req->out.args[0].value = &outarg;
@@ -448,7 +465,6 @@ static int fuse_unlink(struct inode *dir
req->in.h.opcode = FUSE_UNLINK;
req->in.h.nodeid = get_node_id(dir);
- req->inode = dir;
req->in.numargs = 1;
req->in.args[0].size = entry->d_name.len + 1;
req->in.args[0].value = entry->d_name.name;
@@ -480,7 +496,6 @@ static int fuse_rmdir(struct inode *dir,
req->in.h.opcode = FUSE_RMDIR;
req->in.h.nodeid = get_node_id(dir);
- req->inode = dir;
req->in.numargs = 1;
req->in.args[0].size = entry->d_name.len + 1;
req->in.args[0].value = entry->d_name.name;
@@ -510,8 +525,6 @@ static int fuse_rename(struct inode *old
inarg.newdir = get_node_id(newdir);
req->in.h.opcode = FUSE_RENAME;
req->in.h.nodeid = get_node_id(olddir);
- req->inode = olddir;
- req->inode2 = newdir;
req->in.numargs = 3;
req->in.args[0].size = sizeof(inarg);
req->in.args[0].value = &inarg;
@@ -558,7 +571,6 @@ static int fuse_link(struct dentry *entr
memset(&inarg, 0, sizeof(inarg));
inarg.oldnodeid = get_node_id(inode);
req->in.h.opcode = FUSE_LINK;
- req->inode2 = inode;
req->in.numargs = 2;
req->in.args[0].size = sizeof(inarg);
req->in.args[0].value = &inarg;
@@ -587,7 +599,6 @@ int fuse_do_getattr(struct inode *inode)
req->in.h.opcode = FUSE_GETATTR;
req->in.h.nodeid = get_node_id(inode);
- req->inode = inode;
req->out.numargs = 1;
req->out.args[0].size = sizeof(arg);
req->out.args[0].value = &arg;
@@ -679,7 +690,6 @@ static int fuse_access(struct inode *ino
inarg.mask = mask;
req->in.h.opcode = FUSE_ACCESS;
req->in.h.nodeid = get_node_id(inode);
- req->inode = inode;
req->in.numargs = 1;
req->in.args[0].size = sizeof(inarg);
req->in.args[0].value = &inarg;
@@ -820,7 +830,6 @@ static char *read_link(struct dentry *de
}
req->in.h.opcode = FUSE_READLINK;
req->in.h.nodeid = get_node_id(inode);
- req->inode = inode;
req->out.argvar = 1;
req->out.numargs = 1;
req->out.args[0].size = PAGE_SIZE - 1;
@@ -939,7 +948,6 @@ static int fuse_setattr(struct dentry *e
iattr_to_fattr(attr, &inarg);
req->in.h.opcode = FUSE_SETATTR;
req->in.h.nodeid = get_node_id(inode);
- req->inode = inode;
req->in.numargs = 1;
req->in.args[0].size = sizeof(inarg);
req->in.args[0].value = &inarg;
@@ -1002,7 +1010,6 @@ static int fuse_setxattr(struct dentry *
inarg.flags = flags;
req->in.h.opcode = FUSE_SETXATTR;
req->in.h.nodeid = get_node_id(inode);
- req->inode = inode;
req->in.numargs = 3;
req->in.args[0].size = sizeof(inarg);
req->in.args[0].value = &inarg;
@@ -1041,7 +1048,6 @@ static ssize_t fuse_getxattr(struct dent
inarg.size = size;
req->in.h.opcode = FUSE_GETXATTR;
req->in.h.nodeid = get_node_id(inode);
- req->inode = inode;
req->in.numargs = 2;
req->in.args[0].size = sizeof(inarg);
req->in.args[0].value = &inarg;
@@ -1091,7 +1097,6 @@ static ssize_t fuse_listxattr(struct den
inarg.size = size;
req->in.h.opcode = FUSE_LISTXATTR;
req->in.h.nodeid = get_node_id(inode);
- req->inode = inode;
req->in.numargs = 1;
req->in.args[0].size = sizeof(inarg);
req->in.args[0].value = &inarg;
@@ -1135,7 +1140,6 @@ static int fuse_removexattr(struct dentr
req->in.h.opcode = FUSE_REMOVEXATTR;
req->in.h.nodeid = get_node_id(inode);
- req->inode = inode;
req->in.numargs = 1;
req->in.args[0].size = strlen(name) + 1;
req->in.args[0].value = name;
Index: linux/fs/fuse/file.c
===================================================================
--- linux.orig/fs/fuse/file.c 2006-06-12 14:09:22.000000000 +0200
+++ linux/fs/fuse/file.c 2006-06-12 14:09:56.000000000 +0200
@@ -30,7 +30,6 @@ static int fuse_send_open(struct inode *
inarg.flags = file->f_flags & ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
req->in.h.opcode = isdir ? FUSE_OPENDIR : FUSE_OPEN;
req->in.h.nodeid = get_node_id(inode);
- req->inode = inode;
req->in.numargs = 1;
req->in.args[0].size = sizeof(inarg);
req->in.args[0].value = &inarg;
@@ -113,37 +112,22 @@ int fuse_open_common(struct inode *inode
return err;
}
-/* Special case for failed iget in CREATE */
-static void fuse_release_end(struct fuse_conn *fc, struct fuse_req *req)
+struct fuse_req *fuse_release_fill(struct fuse_file *ff, u64 nodeid, int flags,
+ int opcode)
{
- /* If called from end_io_requests(), req has more than one
- reference and fuse_reset_request() cannot work */
- if (fc->connected) {
- u64 nodeid = req->in.h.nodeid;
- fuse_reset_request(req);
- fuse_send_forget(fc, req, nodeid, 1);
- } else
- fuse_put_request(fc, req);
-}
-
-void fuse_send_release(struct fuse_conn *fc, struct fuse_file *ff,
- u64 nodeid, struct inode *inode, int flags, int isdir)
-{
- struct fuse_req * req = ff->release_req;
+ struct fuse_req *req = ff->release_req;
struct fuse_release_in *inarg = &req->misc.release_in;
inarg->fh = ff->fh;
inarg->flags = flags;
- req->in.h.opcode = isdir ? FUSE_RELEASEDIR : FUSE_RELEASE;
+ req->in.h.opcode = opcode;
req->in.h.nodeid = nodeid;
- req->inode = inode;
req->in.numargs = 1;
req->in.args[0].size = sizeof(struct fuse_release_in);
req->in.args[0].value = inarg;
- request_send_background(fc, req);
- if (!inode)
- req->end = fuse_release_end;
kfree(ff);
+
+ return req;
}
int fuse_release_common(struct inode *inode, struct file *file, int isdir)
@@ -151,8 +135,15 @@ int fuse_release_common(struct inode *in
struct fuse_file *ff = file->private_data;
if (ff) {
struct fuse_conn *fc = get_fuse_conn(inode);
- u64 nodeid = get_node_id(inode);
- fuse_send_release(fc, ff, nodeid, inode, file->f_flags, isdir);
+ struct fuse_req *req;
+
+ req = fuse_release_fill(ff, get_node_id(inode), file->f_flags,
+ isdir ? FUSE_RELEASEDIR : FUSE_RELEASE);
+
+ /* Hold vfsmount and dentry until release is finished */
+ req->vfsmount = mntget(file->f_vfsmnt);
+ req->dentry = dget(file->f_dentry);
+ request_send_background(fc, req);
}
/* Return value is ignored by VFS */
@@ -192,8 +183,6 @@ static int fuse_flush(struct file *file,
inarg.fh = ff->fh;
req->in.h.opcode = FUSE_FLUSH;
req->in.h.nodeid = get_node_id(inode);
- req->inode = inode;
- req->file = file;
req->in.numargs = 1;
req->in.args[0].size = sizeof(inarg);
req->in.args[0].value = &inarg;
@@ -232,8 +221,6 @@ int fuse_fsync_common(struct file *file,
inarg.fsync_flags = datasync ? 1 : 0;
req->in.h.opcode = isdir ? FUSE_FSYNCDIR : FUSE_FSYNC;
req->in.h.nodeid = get_node_id(inode);
- req->inode = inode;
- req->file = file;
req->in.numargs = 1;
req->in.args[0].size = sizeof(inarg);
req->in.args[0].value = &inarg;
@@ -266,8 +253,6 @@ void fuse_read_fill(struct fuse_req *req
inarg->size = count;
req->in.h.opcode = opcode;
req->in.h.nodeid = get_node_id(inode);
- req->inode = inode;
- req->file = file;
req->in.numargs = 1;
req->in.args[0].size = sizeof(struct fuse_read_in);
req->in.args[0].value = inarg;
@@ -342,6 +327,8 @@ static void fuse_send_readpages(struct f
req->out.page_zeroing = 1;
fuse_read_fill(req, file, inode, pos, count, FUSE_READ);
if (fc->async_read) {
+ get_file(file);
+ req->file = file;
req->end = fuse_readpages_end;
request_send_background(fc, req);
} else {
@@ -420,8 +407,6 @@ static size_t fuse_send_write(struct fus
inarg.size = count;
req->in.h.opcode = FUSE_WRITE;
req->in.h.nodeid = get_node_id(inode);
- req->inode = inode;
- req->file = file;
req->in.argpages = 1;
req->in.numargs = 2;
req->in.args[0].size = sizeof(struct fuse_write_in);
Index: linux/Documentation/filesystems/fuse.txt
===================================================================
--- linux.orig/Documentation/filesystems/fuse.txt 2006-06-12 14:09:22.000000000 +0200
+++ linux/Documentation/filesystems/fuse.txt 2006-06-12 14:09:56.000000000 +0200
@@ -304,25 +304,7 @@ Scenario 1 - Simple deadlock
| | for "file"]
| | *DEADLOCK*
-The solution for this is to allow requests to be interrupted while
-they are in userspace:
-
- | [interrupted by signal] |
- | <fuse_unlink() |
- | [release semaphore] | [semaphore acquired]
- | <sys_unlink() |
- | | >fuse_unlink()
- | | [queue req on fc->pending]
- | | [wake up fc->waitq]
- | | [sleep on req->waitq]
-
-If the filesystem daemon was single threaded, this will stop here,
-since there's no other thread to dequeue and execute the request.
-In this case the solution is to kill the FUSE daemon as well. If
-there are multiple serving threads, you just have to kill them as
-long as any remain.
-
-Moral: a filesystem which deadlocks, can soon find itself dead.
+The solution for this is to allow the filesystem to be aborted.
Scenario 2 - Tricky deadlock
----------------------------
@@ -355,24 +337,14 @@ but is caused by a pagefault.
| | [lock page]
| | * DEADLOCK *
-Solution is again to let the the request be interrupted (not
-elaborated further).
+Solution is basically the same as above.
An additional problem is that while the write buffer is being
copied to the request, the request must not be interrupted. This
is because the destination address of the copy may not be valid
after the request is interrupted.
-This is solved with doing the copy atomically, and allowing
-interruption while the page(s) belonging to the write buffer are
-faulted with get_user_pages(). The 'req->locked' flag indicates
-when the copy is taking place, and interruption is delayed until
-this flag is unset.
-
-Scenario 3 - Tricky deadlock with asynchronous read
----------------------------------------------------
-
-The same situation as above, except thread-1 will wait on page lock
-and hence it will be uninterruptible as well. The solution is to
-abort the connection with forced umount (if mount is attached) or
-through the abort attribute in sysfs.
+This is solved with doing the copy atomically, and allowing abort
+while the page(s) belonging to the write buffer are faulted with
+get_user_pages(). The 'req->locked' flag indicates when the copy is
+taking place, and abort is delayed until this flag is unset.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 3/7] fuse: add control filesystem
2006-06-12 12:21 [PATCH 0/7] fuse: file locking + misc Miklos Szeredi
2006-06-12 12:25 ` [PATCH 1/7] fuse: use MISC_MAJOR Miklos Szeredi
2006-06-12 12:27 ` [PATCH 2/7] fuse: no backgrounding on interrupt Miklos Szeredi
@ 2006-06-12 12:28 ` Miklos Szeredi
2006-06-19 6:55 ` Andrew Morton
2006-06-12 12:29 ` [PATCH 4/7] fuse: add POSIX file locking support Miklos Szeredi
` (3 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Miklos Szeredi @ 2006-06-12 12:28 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel
Add a control filesystem to fuse, replacing the attributes currently
exported through sysfs. An empty directory '/sys/fs/fuse/connections'
is still created in sysfs, and mounting the control filesystem here
provides backward compatibility.
Advantages of the control filesystem over the previous solution:
- allows the object directory and the attributes to be owned by the
filesystem owner, hence letting unpriviled users abort the
filesystem connection
- does not suffer from module unload race
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
---
Index: linux/fs/fuse/control.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux/fs/fuse/control.c 2006-06-12 14:09:57.000000000 +0200
@@ -0,0 +1,204 @@
+/*
+ FUSE: Filesystem in Userspace
+ Copyright (C) 2001-2006 Miklos Szeredi <miklos@szeredi.hu>
+
+ This program can be distributed under the terms of the GNU GPL.
+ See the file COPYING.
+*/
+
+#include "fuse_i.h"
+
+#include <linux/init.h>
+#include <linux/module.h>
+
+#define FUSE_CTL_SUPER_MAGIC 0x65735543
+
+static struct super_block *fuse_control_sb;
+
+static struct fuse_conn *fuse_ctl_file_conn_get(struct file *file)
+{
+ struct fuse_conn *fc;
+ mutex_lock(&fuse_mutex);
+ fc = file->f_dentry->d_inode->u.generic_ip;
+ if (fc)
+ fc = fuse_conn_get(fc);
+ mutex_unlock(&fuse_mutex);
+ return fc;
+}
+
+static ssize_t fuse_conn_abort_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct fuse_conn *fc = fuse_ctl_file_conn_get(file);
+ if (fc) {
+ fuse_abort_conn(fc);
+ fuse_conn_put(fc);
+ }
+ return count;
+}
+
+static ssize_t fuse_conn_waiting_read(struct file *file, char __user *buf,
+ size_t len, loff_t *ppos)
+{
+ char tmp[32];
+ size_t size;
+
+ if (!*ppos) {
+ struct fuse_conn *fc = fuse_ctl_file_conn_get(file);
+ if (!fc)
+ return 0;
+
+ file->private_data = (void *) atomic_read(&fc->num_waiting);
+ fuse_conn_put(fc);
+ }
+ size = sprintf(tmp, "%i\n", (int) file->private_data);
+ return simple_read_from_buffer(buf, len, ppos, tmp, size);
+}
+
+static const struct file_operations fuse_ctl_abort_ops = {
+ .open = nonseekable_open,
+ .write = fuse_conn_abort_write,
+};
+
+static const struct file_operations fuse_ctl_waiting_ops = {
+ .open = nonseekable_open,
+ .read = fuse_conn_waiting_read,
+};
+
+static struct dentry *fuse_ctl_add_dentry(struct dentry *parent,
+ struct fuse_conn *fc,
+ const char *name,
+ int mode, int nlink,
+ struct inode_operations *iop,
+ const struct file_operations *fop)
+{
+ struct dentry *dentry;
+ struct inode *inode;
+
+ BUG_ON(fc->ctl_ndents >= FUSE_CTL_NUM_DENTRIES);
+ dentry = d_alloc_name(parent, name);
+ if (!dentry)
+ return NULL;
+
+ fc->ctl_dentry[fc->ctl_ndents++] = dentry;
+ inode = new_inode(fuse_control_sb);
+ if (!inode)
+ return NULL;
+
+ inode->i_mode = mode;
+ inode->i_uid = fc->user_id;
+ inode->i_gid = fc->group_id;
+ inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+ if (iop)
+ inode->i_op = iop;
+ inode->i_fop = fop;
+ inode->i_nlink = nlink;
+ inode->u.generic_ip = fc;
+ d_add(dentry, inode);
+ return dentry;
+}
+
+int fuse_ctl_add_conn(struct fuse_conn *fc)
+{
+ struct dentry *parent;
+ char name[32];
+
+ if (!fuse_control_sb)
+ return 0;
+
+ parent = fuse_control_sb->s_root;
+ parent->d_inode->i_nlink++;
+ sprintf(name, "%llu", (unsigned long long) fc->id);
+ parent = fuse_ctl_add_dentry(parent, fc, name, S_IFDIR | 0500, 2,
+ &simple_dir_inode_operations,
+ &simple_dir_operations);
+ if (!parent)
+ goto err;
+
+ if (!fuse_ctl_add_dentry(parent, fc, "waiting", S_IFREG | 0400, 1,
+ NULL, &fuse_ctl_waiting_ops) ||
+ !fuse_ctl_add_dentry(parent, fc, "abort", S_IFREG | 0200, 1,
+ NULL, &fuse_ctl_abort_ops))
+ goto err;
+
+ return 0;
+
+ err:
+ fuse_ctl_remove_conn(fc);
+ return -ENOMEM;
+}
+
+void fuse_ctl_remove_conn(struct fuse_conn *fc)
+{
+ int i;
+
+ if (!fuse_control_sb)
+ return;
+
+ for (i = fc->ctl_ndents - 1; i >= 0; i--) {
+ struct dentry *dentry = fc->ctl_dentry[i];
+ dentry->d_inode->u.generic_ip = NULL;
+ d_drop(dentry);
+ dput(dentry);
+ }
+ fuse_control_sb->s_root->d_inode->i_nlink--;
+}
+
+static int fuse_ctl_fill_super(struct super_block *sb, void *data, int silent)
+{
+ struct tree_descr empty_descr = {""};
+ struct fuse_conn *fc;
+ int err;
+
+ err = simple_fill_super(sb, FUSE_CTL_SUPER_MAGIC, &empty_descr);
+ if (err)
+ return err;
+
+ mutex_lock(&fuse_mutex);
+ BUG_ON(fuse_control_sb);
+ fuse_control_sb = sb;
+ list_for_each_entry(fc, &fuse_conn_list, entry) {
+ err = fuse_ctl_add_conn(fc);
+ if (err) {
+ fuse_control_sb = NULL;
+ mutex_unlock(&fuse_mutex);
+ return err;
+ }
+ }
+ mutex_unlock(&fuse_mutex);
+
+ return 0;
+}
+
+static struct super_block *fuse_ctl_get_sb(struct file_system_type *fs_type,
+ int flags, const char *dev_name,
+ void *raw_data)
+{
+ return get_sb_single(fs_type, flags, raw_data, fuse_ctl_fill_super);
+}
+
+static void fuse_ctl_kill_sb(struct super_block *sb)
+{
+ mutex_lock(&fuse_mutex);
+ fuse_control_sb = NULL;
+ mutex_unlock(&fuse_mutex);
+
+ kill_litter_super(sb);
+}
+
+static struct file_system_type fuse_ctl_fs_type = {
+ .owner = THIS_MODULE,
+ .name = "fusectl",
+ .get_sb = fuse_ctl_get_sb,
+ .kill_sb = fuse_ctl_kill_sb,
+};
+
+int __init fuse_ctl_init(void)
+{
+ return register_filesystem(&fuse_ctl_fs_type);
+}
+
+void fuse_ctl_cleanup(void)
+{
+ unregister_filesystem(&fuse_ctl_fs_type);
+}
Index: linux/fs/fuse/inode.c
===================================================================
--- linux.orig/fs/fuse/inode.c 2006-06-12 14:09:56.000000000 +0200
+++ linux/fs/fuse/inode.c 2006-06-12 14:09:57.000000000 +0200
@@ -22,13 +22,8 @@ MODULE_DESCRIPTION("Filesystem in Usersp
MODULE_LICENSE("GPL");
static kmem_cache_t *fuse_inode_cachep;
-static struct subsystem connections_subsys;
-
-struct fuse_conn_attr {
- struct attribute attr;
- ssize_t (*show)(struct fuse_conn *, char *);
- ssize_t (*store)(struct fuse_conn *, const char *, size_t);
-};
+struct list_head fuse_conn_list;
+DEFINE_MUTEX(fuse_mutex);
#define FUSE_SUPER_MAGIC 0x65735546
@@ -212,8 +207,11 @@ static void fuse_put_super(struct super_
kill_fasync(&fc->fasync, SIGIO, POLL_IN);
wake_up_all(&fc->waitq);
wake_up_all(&fc->blocked_waitq);
- kobject_del(&fc->kobj);
- kobject_put(&fc->kobj);
+ mutex_lock(&fuse_mutex);
+ list_del(&fc->entry);
+ fuse_ctl_remove_conn(fc);
+ mutex_unlock(&fuse_mutex);
+ fuse_conn_put(fc);
}
static void convert_fuse_statfs(struct kstatfs *stbuf, struct fuse_kstatfs *attr)
@@ -362,11 +360,6 @@ static int fuse_show_options(struct seq_
return 0;
}
-static void fuse_conn_release(struct kobject *kobj)
-{
- kfree(get_fuse_conn_kobj(kobj));
-}
-
static struct fuse_conn *new_conn(void)
{
struct fuse_conn *fc;
@@ -374,13 +367,12 @@ static struct fuse_conn *new_conn(void)
fc = kzalloc(sizeof(*fc), GFP_KERNEL);
if (fc) {
spin_lock_init(&fc->lock);
+ atomic_set(&fc->count, 1);
init_waitqueue_head(&fc->waitq);
init_waitqueue_head(&fc->blocked_waitq);
INIT_LIST_HEAD(&fc->pending);
INIT_LIST_HEAD(&fc->processing);
INIT_LIST_HEAD(&fc->io);
- kobj_set_kset_s(fc, connections_subsys);
- kobject_init(&fc->kobj);
atomic_set(&fc->num_waiting, 0);
fc->bdi.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE;
fc->bdi.unplug_io_fn = default_unplug_io_fn;
@@ -390,6 +382,18 @@ static struct fuse_conn *new_conn(void)
return fc;
}
+void fuse_conn_put(struct fuse_conn *fc)
+{
+ if (atomic_dec_and_test(&fc->count))
+ kfree(fc);
+}
+
+struct fuse_conn *fuse_conn_get(struct fuse_conn *fc)
+{
+ atomic_inc(&fc->count);
+ return fc;
+}
+
static struct inode *get_root_inode(struct super_block *sb, unsigned mode)
{
struct fuse_attr attr;
@@ -459,10 +463,9 @@ static void fuse_send_init(struct fuse_c
request_send_background(fc, req);
}
-static unsigned long long conn_id(void)
+static u64 conn_id(void)
{
- /* BKL is held for ->get_sb() */
- static unsigned long long ctr = 1;
+ static u64 ctr = 1;
return ctr++;
}
@@ -519,24 +522,21 @@ static int fuse_fill_super(struct super_
if (!init_req)
goto err_put_root;
- err = kobject_set_name(&fc->kobj, "%llu", conn_id());
- if (err)
- goto err_free_req;
-
- err = kobject_add(&fc->kobj);
- if (err)
- goto err_free_req;
-
- /* Setting file->private_data can't race with other mount()
- instances, since BKL is held for ->get_sb() */
+ mutex_lock(&fuse_mutex);
err = -EINVAL;
if (file->private_data)
- goto err_kobject_del;
+ goto err_unlock;
+ fc->id = conn_id();
+ err = fuse_ctl_add_conn(fc);
+ if (err)
+ goto err_unlock;
+
+ list_add_tail(&fc->entry, &fuse_conn_list);
sb->s_root = root_dentry;
fc->connected = 1;
- kobject_get(&fc->kobj);
- file->private_data = fc;
+ file->private_data = fuse_conn_get(fc);
+ mutex_unlock(&fuse_mutex);
/*
* atomic_dec_and_test() in fput() provides the necessary
* memory barrier for file->private_data to be visible on all
@@ -548,15 +548,14 @@ static int fuse_fill_super(struct super_
return 0;
- err_kobject_del:
- kobject_del(&fc->kobj);
- err_free_req:
+ err_unlock:
+ mutex_unlock(&fuse_mutex);
fuse_request_free(init_req);
err_put_root:
dput(root_dentry);
err:
fput(file);
- kobject_put(&fc->kobj);
+ fuse_conn_put(fc);
return err;
}
@@ -574,68 +573,8 @@ static struct file_system_type fuse_fs_t
.kill_sb = kill_anon_super,
};
-static ssize_t fuse_conn_waiting_show(struct fuse_conn *fc, char *page)
-{
- return sprintf(page, "%i\n", atomic_read(&fc->num_waiting));
-}
-
-static ssize_t fuse_conn_abort_store(struct fuse_conn *fc, const char *page,
- size_t count)
-{
- fuse_abort_conn(fc);
- return count;
-}
-
-static struct fuse_conn_attr fuse_conn_waiting =
- __ATTR(waiting, 0400, fuse_conn_waiting_show, NULL);
-static struct fuse_conn_attr fuse_conn_abort =
- __ATTR(abort, 0600, NULL, fuse_conn_abort_store);
-
-static struct attribute *fuse_conn_attrs[] = {
- &fuse_conn_waiting.attr,
- &fuse_conn_abort.attr,
- NULL,
-};
-
-static ssize_t fuse_conn_attr_show(struct kobject *kobj,
- struct attribute *attr,
- char *page)
-{
- struct fuse_conn_attr *fca =
- container_of(attr, struct fuse_conn_attr, attr);
-
- if (fca->show)
- return fca->show(get_fuse_conn_kobj(kobj), page);
- else
- return -EACCES;
-}
-
-static ssize_t fuse_conn_attr_store(struct kobject *kobj,
- struct attribute *attr,
- const char *page, size_t count)
-{
- struct fuse_conn_attr *fca =
- container_of(attr, struct fuse_conn_attr, attr);
-
- if (fca->store)
- return fca->store(get_fuse_conn_kobj(kobj), page, count);
- else
- return -EACCES;
-}
-
-static struct sysfs_ops fuse_conn_sysfs_ops = {
- .show = &fuse_conn_attr_show,
- .store = &fuse_conn_attr_store,
-};
-
-static struct kobj_type ktype_fuse_conn = {
- .release = fuse_conn_release,
- .sysfs_ops = &fuse_conn_sysfs_ops,
- .default_attrs = fuse_conn_attrs,
-};
-
static decl_subsys(fuse, NULL, NULL);
-static decl_subsys(connections, &ktype_fuse_conn, NULL);
+static decl_subsys(connections, NULL, NULL);
static void fuse_inode_init_once(void *foo, kmem_cache_t *cachep,
unsigned long flags)
@@ -709,6 +648,7 @@ static int __init fuse_init(void)
printk("fuse init (API version %i.%i)\n",
FUSE_KERNEL_VERSION, FUSE_KERNEL_MINOR_VERSION);
+ INIT_LIST_HEAD(&fuse_conn_list);
res = fuse_fs_init();
if (res)
goto err;
@@ -721,8 +661,14 @@ static int __init fuse_init(void)
if (res)
goto err_dev_cleanup;
+ res = fuse_ctl_init();
+ if (res)
+ goto err_sysfs_cleanup;
+
return 0;
+ err_sysfs_cleanup:
+ fuse_sysfs_cleanup();
err_dev_cleanup:
fuse_dev_cleanup();
err_fs_cleanup:
@@ -735,6 +681,7 @@ static void __exit fuse_exit(void)
{
printk(KERN_DEBUG "fuse exit\n");
+ fuse_ctl_cleanup();
fuse_sysfs_cleanup();
fuse_fs_cleanup();
fuse_dev_cleanup();
Index: linux/fs/fuse/Makefile
===================================================================
--- linux.orig/fs/fuse/Makefile 2006-06-12 14:09:21.000000000 +0200
+++ linux/fs/fuse/Makefile 2006-06-12 14:09:57.000000000 +0200
@@ -4,4 +4,4 @@
obj-$(CONFIG_FUSE_FS) += fuse.o
-fuse-objs := dev.o dir.o file.o inode.o
+fuse-objs := dev.o dir.o file.o inode.o control.o
Index: linux/fs/fuse/dev.c
===================================================================
--- linux.orig/fs/fuse/dev.c 2006-06-12 14:09:56.000000000 +0200
+++ linux/fs/fuse/dev.c 2006-06-12 14:09:57.000000000 +0200
@@ -833,7 +833,7 @@ static int fuse_dev_release(struct inode
end_requests(fc, &fc->processing);
spin_unlock(&fc->lock);
fasync_helper(-1, file, 0, &fc->fasync);
- kobject_put(&fc->kobj);
+ fuse_conn_put(fc);
}
return 0;
Index: linux/fs/fuse/fuse_i.h
===================================================================
--- linux.orig/fs/fuse/fuse_i.h 2006-06-12 14:09:56.000000000 +0200
+++ linux/fs/fuse/fuse_i.h 2006-06-12 14:09:57.000000000 +0200
@@ -14,6 +14,7 @@
#include <linux/spinlock.h>
#include <linux/mm.h>
#include <linux/backing-dev.h>
+#include <linux/mutex.h>
/** Max number of pages that can be used in a single read request */
#define FUSE_MAX_PAGES_PER_REQ 32
@@ -24,6 +25,9 @@
/** It could be as large as PATH_MAX, but would that have any uses? */
#define FUSE_NAME_MAX 1024
+/** Number of dentries for each connection in the control filesystem */
+#define FUSE_CTL_NUM_DENTRIES 3
+
/** If the FUSE_DEFAULT_PERMISSIONS flag is given, the filesystem
module will check permissions based on the file mode. Otherwise no
permission checking is done in the kernel */
@@ -33,6 +37,11 @@
doing the mount will be allowed to access the filesystem */
#define FUSE_ALLOW_OTHER (1 << 1)
+/** List of active connections */
+extern struct list_head fuse_conn_list;
+
+/** Global mutex protecting fuse_conn_list and the control filesystem */
+extern struct mutex fuse_mutex;
/** FUSE inode */
struct fuse_inode {
@@ -216,6 +225,9 @@ struct fuse_conn {
/** Lock protecting accessess to members of this structure */
spinlock_t lock;
+ /** Refcount */
+ atomic_t count;
+
/** The user id for this mount */
uid_t user_id;
@@ -310,8 +322,17 @@ struct fuse_conn {
/** Backing dev info */
struct backing_dev_info bdi;
- /** kobject */
- struct kobject kobj;
+ /** Entry on the fuse_conn_list */
+ struct list_head entry;
+
+ /** Unique ID */
+ u64 id;
+
+ /** Dentries in the control filesystem */
+ struct dentry *ctl_dentry[FUSE_CTL_NUM_DENTRIES];
+
+ /** number of dentries used in the above array */
+ int ctl_ndents;
/** O_ASYNC requests */
struct fasync_struct *fasync;
@@ -327,11 +348,6 @@ static inline struct fuse_conn *get_fuse
return get_fuse_conn_super(inode->i_sb);
}
-static inline struct fuse_conn *get_fuse_conn_kobj(struct kobject *obj)
-{
- return container_of(obj, struct fuse_conn, kobj);
-}
-
static inline struct fuse_inode *get_fuse_inode(struct inode *inode)
{
return container_of(inode, struct fuse_inode, inode);
@@ -422,6 +438,9 @@ int fuse_dev_init(void);
*/
void fuse_dev_cleanup(void);
+int fuse_ctl_init(void);
+void fuse_ctl_cleanup(void);
+
/**
* Allocate a request
*/
@@ -470,3 +489,23 @@ int fuse_do_getattr(struct inode *inode)
* Invalidate inode attributes
*/
void fuse_invalidate_attr(struct inode *inode);
+
+/**
+ * Acquire reference to fuse_conn
+ */
+struct fuse_conn *fuse_conn_get(struct fuse_conn *fc);
+
+/**
+ * Release reference to fuse_conn
+ */
+void fuse_conn_put(struct fuse_conn *fc);
+
+/**
+ * Add connection to control filesystem
+ */
+int fuse_ctl_add_conn(struct fuse_conn *fc);
+
+/**
+ * Remove connection from control filesystem
+ */
+void fuse_ctl_remove_conn(struct fuse_conn *fc);
Index: linux/Documentation/filesystems/fuse.txt
===================================================================
--- linux.orig/Documentation/filesystems/fuse.txt 2006-06-12 14:09:56.000000000 +0200
+++ linux/Documentation/filesystems/fuse.txt 2006-06-12 14:09:57.000000000 +0200
@@ -18,6 +18,14 @@ Non-privileged mount (or user mount):
user. NOTE: this is not the same as mounts allowed with the "user"
option in /etc/fstab, which is not discussed here.
+Filesystem connection:
+
+ A connection between the filesystem daemon and the kernel. The
+ connection exists until either the daemon dies, or the filesystem is
+ umounted. Note that detaching (or lazy umounting) the filesystem
+ does _not_ break the connection, in this case it will exist until
+ the last reference to the filesystem is released.
+
Mount owner:
The user who does the mounting.
@@ -86,16 +94,20 @@ Mount options
The default is infinite. Note that the size of read requests is
limited anyway to 32 pages (which is 128kbyte on i386).
-Sysfs
-~~~~~
+Control filesystem
+~~~~~~~~~~~~~~~~~~
+
+There's a control filesystem for FUSE, which can be mounted by:
-FUSE sets up the following hierarchy in sysfs:
+ mount -t fusectl none /sys/fs/fuse/connections
- /sys/fs/fuse/connections/N/
+Mounting it under the '/sys/fs/fuse/connections' directory makes it
+backwards compatible with earlier versions.
-where N is an increasing number allocated to each new connection.
+Under the fuse control filesystem each connection has a directory
+named by a unique number.
-For each connection the following attributes are defined:
+For each connection the following files exist within this directory:
'waiting'
@@ -110,7 +122,7 @@ For each connection the following attrib
connection. This means that all waiting requests will be aborted an
error returned for all aborted and new requests.
-Only a privileged user may read or write these attributes.
+Only the owner of the mount may read or write these files.
Aborting a filesystem connection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -139,8 +151,8 @@ the filesystem. There are several ways
- Use forced umount (umount -f). Works in all cases but only if
filesystem is still attached (it hasn't been lazy unmounted)
- - Abort filesystem through the sysfs interface. Most powerful
- method, always works.
+ - Abort filesystem through the FUSE control filesystem. Most
+ powerful method, always works.
How do non-privileged mounts work?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 4/7] fuse: add POSIX file locking support
2006-06-12 12:21 [PATCH 0/7] fuse: file locking + misc Miklos Szeredi
` (2 preceding siblings ...)
2006-06-12 12:28 ` [PATCH 3/7] fuse: add control filesystem Miklos Szeredi
@ 2006-06-12 12:29 ` Miklos Szeredi
2006-06-19 6:58 ` Andrew Morton
2006-06-19 8:21 ` Jesper Juhl
2006-06-12 12:30 ` [PATCH 5/7] fuse: ensure FLUSH reaches userspace Miklos Szeredi
` (2 subsequent siblings)
6 siblings, 2 replies; 16+ messages in thread
From: Miklos Szeredi @ 2006-06-12 12:29 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel
This patch adds POSIX file locking support to the fuse interface.
This implementation doesn't keep any locking state in kernel.
Unlocking on close() is handled by the FLUSH message, which now
contains the lock owner id.
Mandatory locking is not supported. The filesystem may enfoce
mandatory locking in userspace if needed.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
---
Index: linux/fs/fuse/file.c
===================================================================
--- linux.orig/fs/fuse/file.c 2006-06-12 14:09:56.000000000 +0200
+++ linux/fs/fuse/file.c 2006-06-12 14:09:59.000000000 +0200
@@ -160,6 +160,18 @@ static int fuse_release(struct inode *in
return fuse_release_common(inode, file, 0);
}
+/*
+ * It would be nice to scramble the ID space, so that the value of the
+ * files_struct pointer is not exposed to userspace. Symmetric crypto
+ * functions are overkill, since the inverse function doesn't need to
+ * be implemented (though it does have to exist). Is there something
+ * simpler?
+ */
+static inline u64 fuse_lock_owner_id(fl_owner_t id)
+{
+ return (unsigned long) id;
+}
+
static int fuse_flush(struct file *file, fl_owner_t id)
{
struct inode *inode = file->f_dentry->d_inode;
@@ -181,11 +193,13 @@ static int fuse_flush(struct file *file,
memset(&inarg, 0, sizeof(inarg));
inarg.fh = ff->fh;
+ inarg.lock_owner = fuse_lock_owner_id(id);
req->in.h.opcode = FUSE_FLUSH;
req->in.h.nodeid = get_node_id(inode);
req->in.numargs = 1;
req->in.args[0].size = sizeof(inarg);
req->in.args[0].value = &inarg;
+ req->force = 1;
request_send(fc, req);
err = req->out.h.error;
fuse_put_request(fc, req);
@@ -604,6 +618,122 @@ static int fuse_set_page_dirty(struct pa
return 0;
}
+static int convert_fuse_file_lock(const struct fuse_file_lock *ffl,
+ struct file_lock *fl)
+{
+ switch (ffl->type) {
+ case F_UNLCK:
+ break;
+
+ case F_RDLCK:
+ case F_WRLCK:
+ if (ffl->start > OFFSET_MAX || ffl->end > OFFSET_MAX ||
+ ffl->end < ffl->start)
+ return -EIO;
+
+ fl->fl_start = ffl->start;
+ fl->fl_end = ffl->end;
+ fl->fl_pid = ffl->pid;
+ break;
+
+ default:
+ return -EIO;
+ }
+ fl->fl_type = ffl->type;
+ return 0;
+}
+
+static void fuse_lk_fill(struct fuse_req *req, struct file *file,
+ const struct file_lock *fl, int opcode, pid_t pid)
+{
+ struct inode *inode = file->f_dentry->d_inode;
+ struct fuse_file *ff = file->private_data;
+ struct fuse_lk_in *arg = &req->misc.lk_in;
+
+ arg->fh = ff->fh;
+ arg->owner = fuse_lock_owner_id(fl->fl_owner);
+ arg->lk.start = fl->fl_start;
+ arg->lk.end = fl->fl_end;
+ arg->lk.type = fl->fl_type;
+ arg->lk.pid = pid;
+ req->in.h.opcode = opcode;
+ req->in.h.nodeid = get_node_id(inode);
+ req->in.numargs = 1;
+ req->in.args[0].size = sizeof(*arg);
+ req->in.args[0].value = arg;
+}
+
+static int fuse_getlk(struct file *file, struct file_lock *fl)
+{
+ struct inode *inode = file->f_dentry->d_inode;
+ struct fuse_conn *fc = get_fuse_conn(inode);
+ struct fuse_req *req;
+ struct fuse_lk_out outarg;
+ int err;
+
+ req = fuse_get_req(fc);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ fuse_lk_fill(req, file, fl, FUSE_GETLK, 0);
+ req->out.numargs = 1;
+ req->out.args[0].size = sizeof(outarg);
+ req->out.args[0].value = &outarg;
+ request_send(fc, req);
+ err = req->out.h.error;
+ fuse_put_request(fc, req);
+ if (!err)
+ err = convert_fuse_file_lock(&outarg.lk, fl);
+
+ return err;
+}
+
+static int fuse_setlk(struct file *file, struct file_lock *fl)
+{
+ struct inode *inode = file->f_dentry->d_inode;
+ struct fuse_conn *fc = get_fuse_conn(inode);
+ struct fuse_req *req;
+ int opcode = (fl->fl_flags & FL_SLEEP) ? FUSE_SETLKW : FUSE_SETLK;
+ pid_t pid = fl->fl_type != F_UNLCK ? current->tgid : 0;
+ int err;
+
+ /* Unlock on close is handled by the flush method */
+ if (fl->fl_flags & FL_CLOSE)
+ return 0;
+
+ req = fuse_get_req(fc);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ fuse_lk_fill(req, file, fl, opcode, pid);
+ request_send(fc, req);
+ err = req->out.h.error;
+ fuse_put_request(fc, req);
+ return err;
+}
+
+static int fuse_file_lock(struct file *file, int cmd, struct file_lock *fl)
+{
+ struct inode *inode = file->f_dentry->d_inode;
+ struct fuse_conn *fc = get_fuse_conn(inode);
+ int err;
+
+ if (cmd == F_GETLK) {
+ if (fc->no_lock) {
+ if (!posix_test_lock(file, fl, fl))
+ fl->fl_type = F_UNLCK;
+ err = 0;
+ } else
+ err = fuse_getlk(file, fl);
+ } else {
+ if (fc->no_lock)
+ err = posix_lock_file_wait(file, fl);
+ else
+ err = fuse_setlk(file, fl);
+ }
+ return err;
+}
+
static const struct file_operations fuse_file_operations = {
.llseek = generic_file_llseek,
.read = generic_file_read,
@@ -613,6 +743,7 @@ static const struct file_operations fuse
.flush = fuse_flush,
.release = fuse_release,
.fsync = fuse_fsync,
+ .lock = fuse_file_lock,
.sendfile = generic_file_sendfile,
};
@@ -624,6 +755,7 @@ static const struct file_operations fuse
.flush = fuse_flush,
.release = fuse_release,
.fsync = fuse_fsync,
+ .lock = fuse_file_lock,
/* no mmap and sendfile */
};
Index: linux/fs/fuse/fuse_i.h
===================================================================
--- linux.orig/fs/fuse/fuse_i.h 2006-06-12 14:09:57.000000000 +0200
+++ linux/fs/fuse/fuse_i.h 2006-06-12 14:09:59.000000000 +0200
@@ -190,6 +190,7 @@ struct fuse_req {
struct fuse_init_in init_in;
struct fuse_init_out init_out;
struct fuse_read_in read_in;
+ struct fuse_lk_in lk_in;
} misc;
/** page vector */
@@ -307,6 +308,9 @@ struct fuse_conn {
/** Is removexattr not implemented by fs? */
unsigned no_removexattr : 1;
+ /** Are file locking primitives not implemented by fs? */
+ unsigned no_lock : 1;
+
/** Is access not implemented by fs? */
unsigned no_access : 1;
Index: linux/include/linux/fuse.h
===================================================================
--- linux.orig/include/linux/fuse.h 2006-06-12 14:09:54.000000000 +0200
+++ linux/include/linux/fuse.h 2006-06-12 14:09:59.000000000 +0200
@@ -1,6 +1,6 @@
/*
FUSE: Filesystem in Userspace
- Copyright (C) 2001-2005 Miklos Szeredi <miklos@szeredi.hu>
+ Copyright (C) 2001-2006 Miklos Szeredi <miklos@szeredi.hu>
This program can be distributed under the terms of the GNU GPL.
See the file COPYING.
@@ -15,7 +15,7 @@
#define FUSE_KERNEL_VERSION 7
/** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 6
+#define FUSE_KERNEL_MINOR_VERSION 7
/** The node ID of the root inode */
#define FUSE_ROOT_ID 1
@@ -59,6 +59,13 @@ struct fuse_kstatfs {
__u32 spare[6];
};
+struct fuse_file_lock {
+ __u64 start;
+ __u64 end;
+ __u32 type;
+ __u32 pid; /* tgid */
+};
+
/**
* Bitmasks for fuse_setattr_in.valid
*/
@@ -83,6 +90,7 @@ struct fuse_kstatfs {
* INIT request/reply flags
*/
#define FUSE_ASYNC_READ (1 << 0)
+#define FUSE_POSIX_LOCKS (1 << 1)
enum fuse_opcode {
FUSE_LOOKUP = 1,
@@ -113,6 +121,9 @@ enum fuse_opcode {
FUSE_READDIR = 28,
FUSE_RELEASEDIR = 29,
FUSE_FSYNCDIR = 30,
+ FUSE_GETLK = 31,
+ FUSE_SETLK = 32,
+ FUSE_SETLKW = 33,
FUSE_ACCESS = 34,
FUSE_CREATE = 35
};
@@ -200,6 +211,7 @@ struct fuse_flush_in {
__u64 fh;
__u32 flush_flags;
__u32 padding;
+ __u64 lock_owner;
};
struct fuse_read_in {
@@ -248,6 +260,16 @@ struct fuse_getxattr_out {
__u32 padding;
};
+struct fuse_lk_in {
+ __u64 fh;
+ __u64 owner;
+ struct fuse_file_lock lk;
+};
+
+struct fuse_lk_out {
+ struct fuse_file_lock lk;
+};
+
struct fuse_access_in {
__u32 mask;
__u32 padding;
Index: linux/fs/fuse/inode.c
===================================================================
--- linux.orig/fs/fuse/inode.c 2006-06-12 14:09:57.000000000 +0200
+++ linux/fs/fuse/inode.c 2006-06-12 14:09:59.000000000 +0200
@@ -98,6 +98,14 @@ static void fuse_clear_inode(struct inod
}
}
+static int fuse_remount_fs(struct super_block *sb, int *flags, char *data)
+{
+ if (*flags & MS_MANDLOCK)
+ return -EINVAL;
+
+ return 0;
+}
+
void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr)
{
if (S_ISREG(inode->i_mode) && i_size_read(inode) != attr->size)
@@ -409,6 +417,7 @@ static struct super_operations fuse_supe
.destroy_inode = fuse_destroy_inode,
.read_inode = fuse_read_inode,
.clear_inode = fuse_clear_inode,
+ .remount_fs = fuse_remount_fs,
.put_super = fuse_put_super,
.umount_begin = fuse_umount_begin,
.statfs = fuse_statfs,
@@ -428,8 +437,12 @@ static void process_init_reply(struct fu
ra_pages = arg->max_readahead / PAGE_CACHE_SIZE;
if (arg->flags & FUSE_ASYNC_READ)
fc->async_read = 1;
- } else
+ if (!(arg->flags & FUSE_POSIX_LOCKS))
+ fc->no_lock = 1;
+ } else {
ra_pages = fc->max_read / PAGE_CACHE_SIZE;
+ fc->no_lock = 1;
+ }
fc->bdi.ra_pages = min(fc->bdi.ra_pages, ra_pages);
fc->minor = arg->minor;
@@ -447,7 +460,7 @@ static void fuse_send_init(struct fuse_c
arg->major = FUSE_KERNEL_VERSION;
arg->minor = FUSE_KERNEL_MINOR_VERSION;
arg->max_readahead = fc->bdi.ra_pages * PAGE_CACHE_SIZE;
- arg->flags |= FUSE_ASYNC_READ;
+ arg->flags |= FUSE_ASYNC_READ | FUSE_POSIX_LOCKS;
req->in.h.opcode = FUSE_INIT;
req->in.numargs = 1;
req->in.args[0].size = sizeof(*arg);
@@ -479,6 +492,9 @@ static int fuse_fill_super(struct super_
struct fuse_req *init_req;
int err;
+ if (sb->s_flags & MS_MANDLOCK)
+ return -EINVAL;
+
if (!parse_fuse_opt((char *) data, &d))
return -EINVAL;
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 5/7] fuse: ensure FLUSH reaches userspace
2006-06-12 12:21 [PATCH 0/7] fuse: file locking + misc Miklos Szeredi
` (3 preceding siblings ...)
2006-06-12 12:29 ` [PATCH 4/7] fuse: add POSIX file locking support Miklos Szeredi
@ 2006-06-12 12:30 ` Miklos Szeredi
2006-06-12 12:31 ` [PATCH 6/7] fuse: rename the interrupted flag Miklos Szeredi
2006-06-12 12:33 ` [PATCH 7/7] fuse: add request interruption Miklos Szeredi
6 siblings, 0 replies; 16+ messages in thread
From: Miklos Szeredi @ 2006-06-12 12:30 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel
All POSIX locks owned by the current task are removed on close(). If
the FLUSH request resulting initiated by close() fails to reach
userspace, there might be locks remaining, which cannot be removed.
The only reason it could fail, is if allocating the request fails. In
this case use the request reserved for RELEASE, or if that is
currently used by another FLUSH, wait for it to become available.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
---
Index: linux/fs/fuse/dev.c
===================================================================
--- linux.orig/fs/fuse/dev.c 2006-06-12 14:09:57.000000000 +0200
+++ linux/fs/fuse/dev.c 2006-06-12 14:10:01.000000000 +0200
@@ -76,6 +76,13 @@ static void __fuse_put_request(struct fu
atomic_dec(&req->count);
}
+static void fuse_req_init_context(struct fuse_req *req)
+{
+ req->in.h.uid = current->fsuid;
+ req->in.h.gid = current->fsgid;
+ req->in.h.pid = current->pid;
+}
+
struct fuse_req *fuse_get_req(struct fuse_conn *fc)
{
struct fuse_req *req;
@@ -100,9 +107,7 @@ struct fuse_req *fuse_get_req(struct fus
if (!req)
goto out;
- req->in.h.uid = current->fsuid;
- req->in.h.gid = current->fsgid;
- req->in.h.pid = current->pid;
+ fuse_req_init_context(req);
req->waiting = 1;
return req;
@@ -111,12 +116,87 @@ struct fuse_req *fuse_get_req(struct fus
return ERR_PTR(err);
}
+/*
+ * Return request in fuse_file->reserved_req. However that may
+ * currently be in use. If that is the case, wait for it to become
+ * available.
+ */
+static struct fuse_req *get_reserved_req(struct fuse_conn *fc,
+ struct file *file)
+{
+ struct fuse_req *req = NULL;
+ struct fuse_file *ff = file->private_data;
+
+ do {
+ wait_event(fc->blocked_waitq, ff->reserved_req);
+ spin_lock(&fc->lock);
+ if (ff->reserved_req) {
+ req = ff->reserved_req;
+ ff->reserved_req = NULL;
+ get_file(file);
+ req->stolen_file = file;
+ }
+ spin_unlock(&fc->lock);
+ } while (!req);
+
+ return req;
+}
+
+/*
+ * Put stolen request back into fuse_file->reserved_req
+ */
+static void put_reserved_req(struct fuse_conn *fc, struct fuse_req *req)
+{
+ struct file *file = req->stolen_file;
+ struct fuse_file *ff = file->private_data;
+
+ spin_lock(&fc->lock);
+ fuse_request_init(req);
+ BUG_ON(ff->reserved_req);
+ ff->reserved_req = req;
+ wake_up(&fc->blocked_waitq);
+ spin_unlock(&fc->lock);
+ fput(file);
+}
+
+/*
+ * Gets a requests for a file operation, always succeeds
+ *
+ * This is used for sending the FLUSH request, which must get to
+ * userspace, due to POSIX locks which may need to be unlocked.
+ *
+ * If allocation fails due to OOM, use the reserved request in
+ * fuse_file.
+ *
+ * This is very unlikely to deadlock accidentally, since the
+ * filesystem should not have it's own file open. If deadlock is
+ * intentional, it can still be broken by "aborting" the filesystem.
+ */
+struct fuse_req *fuse_get_req_nofail(struct fuse_conn *fc, struct file *file)
+{
+ struct fuse_req *req;
+
+ atomic_inc(&fc->num_waiting);
+ wait_event(fc->blocked_waitq, !fc->blocked);
+ req = fuse_request_alloc();
+ if (!req)
+ req = get_reserved_req(fc, file);
+
+ fuse_req_init_context(req);
+ req->waiting = 1;
+ return req;
+}
+
void fuse_put_request(struct fuse_conn *fc, struct fuse_req *req)
{
if (atomic_dec_and_test(&req->count)) {
if (req->waiting)
atomic_dec(&fc->num_waiting);
- fuse_request_free(req);
+
+ if (req->stolen_file)
+ put_reserved_req(fc, req);
+ else
+ fuse_request_free(req);
}
}
Index: linux/fs/fuse/fuse_i.h
===================================================================
--- linux.orig/fs/fuse/fuse_i.h 2006-06-12 14:09:59.000000000 +0200
+++ linux/fs/fuse/fuse_i.h 2006-06-12 14:10:01.000000000 +0200
@@ -65,7 +65,7 @@ struct fuse_inode {
/** FUSE specific file data */
struct fuse_file {
/** Request reserved for flush and release */
- struct fuse_req *release_req;
+ struct fuse_req *reserved_req;
/** File handle used by userspace */
u64 fh;
@@ -213,6 +213,9 @@ struct fuse_req {
/** Request completion callback */
void (*end)(struct fuse_conn *, struct fuse_req *);
+
+ /** Request is stolen from fuse_file->reserved_req */
+ struct file *stolen_file;
};
/**
@@ -456,11 +459,16 @@ struct fuse_req *fuse_request_alloc(void
void fuse_request_free(struct fuse_req *req);
/**
- * Reserve a preallocated request
+ * Get a request, may fail with -ENOMEM
*/
struct fuse_req *fuse_get_req(struct fuse_conn *fc);
/**
+ * Gets a requests for a file operation, always succeeds
+ */
+struct fuse_req *fuse_get_req_nofail(struct fuse_conn *fc, struct file *file);
+
+/**
* Decrement reference count of a request. If count goes to zero free
* the request.
*/
Index: linux/fs/fuse/file.c
===================================================================
--- linux.orig/fs/fuse/file.c 2006-06-12 14:09:59.000000000 +0200
+++ linux/fs/fuse/file.c 2006-06-12 14:10:01.000000000 +0200
@@ -48,8 +48,8 @@ struct fuse_file *fuse_file_alloc(void)
struct fuse_file *ff;
ff = kmalloc(sizeof(struct fuse_file), GFP_KERNEL);
if (ff) {
- ff->release_req = fuse_request_alloc();
- if (!ff->release_req) {
+ ff->reserved_req = fuse_request_alloc();
+ if (!ff->reserved_req) {
kfree(ff);
ff = NULL;
}
@@ -59,7 +59,7 @@ struct fuse_file *fuse_file_alloc(void)
void fuse_file_free(struct fuse_file *ff)
{
- fuse_request_free(ff->release_req);
+ fuse_request_free(ff->reserved_req);
kfree(ff);
}
@@ -115,7 +115,7 @@ int fuse_open_common(struct inode *inode
struct fuse_req *fuse_release_fill(struct fuse_file *ff, u64 nodeid, int flags,
int opcode)
{
- struct fuse_req *req = ff->release_req;
+ struct fuse_req *req = ff->reserved_req;
struct fuse_release_in *inarg = &req->misc.release_in;
inarg->fh = ff->fh;
@@ -187,10 +187,7 @@ static int fuse_flush(struct file *file,
if (fc->no_flush)
return 0;
- req = fuse_get_req(fc);
- if (IS_ERR(req))
- return PTR_ERR(req);
-
+ req = fuse_get_req_nofail(fc, file);
memset(&inarg, 0, sizeof(inarg));
inarg.fh = ff->fh;
inarg.lock_owner = fuse_lock_owner_id(id);
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 6/7] fuse: rename the interrupted flag
2006-06-12 12:21 [PATCH 0/7] fuse: file locking + misc Miklos Szeredi
` (4 preceding siblings ...)
2006-06-12 12:30 ` [PATCH 5/7] fuse: ensure FLUSH reaches userspace Miklos Szeredi
@ 2006-06-12 12:31 ` Miklos Szeredi
2006-06-12 12:33 ` [PATCH 7/7] fuse: add request interruption Miklos Szeredi
6 siblings, 0 replies; 16+ messages in thread
From: Miklos Szeredi @ 2006-06-12 12:31 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel
Rename the 'interrupted' flag to 'aborted', since it indicates exactly
that, and next patch will introduce an 'interrupted' flag for a
different purpose.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
---
Index: linux/fs/fuse/dev.c
===================================================================
--- linux.orig/fs/fuse/dev.c 2006-06-12 14:10:01.000000000 +0200
+++ linux/fs/fuse/dev.c 2006-06-12 14:10:02.000000000 +0200
@@ -202,7 +202,7 @@ void fuse_put_request(struct fuse_conn *
/*
* This function is called when a request is finished. Either a reply
- * has arrived or it was interrupted (and not yet sent) or some error
+ * has arrived or it was aborted (and not yet sent) or some error
* occurred during communication with userspace, or the device file
* was closed. The requester thread is woken up (if still waiting),
* the 'end' callback is called if given, else the reference to the
@@ -250,12 +250,12 @@ static void request_wait_answer(struct f
restore_sigs(&oldset);
}
spin_lock(&fc->lock);
- if (req->state == FUSE_REQ_FINISHED && !req->interrupted)
+ if (req->state == FUSE_REQ_FINISHED && !req->aborted)
return;
- if (!req->interrupted) {
+ if (!req->aborted) {
req->out.h.error = -EINTR;
- req->interrupted = 1;
+ req->aborted = 1;
}
if (req->locked) {
/* This is uninterruptible sleep, because data is
@@ -361,14 +361,14 @@ void request_send_background(struct fuse
/*
* Lock the request. Up to the next unlock_request() there mustn't be
* anything that could cause a page-fault. If the request was already
- * interrupted bail out.
+ * aborted bail out.
*/
static int lock_request(struct fuse_conn *fc, struct fuse_req *req)
{
int err = 0;
if (req) {
spin_lock(&fc->lock);
- if (req->interrupted)
+ if (req->aborted)
err = -ENOENT;
else
req->locked = 1;
@@ -378,7 +378,7 @@ static int lock_request(struct fuse_conn
}
/*
- * Unlock request. If it was interrupted during being locked, the
+ * Unlock request. If it was aborted during being locked, the
* requester thread is currently waiting for it to be unlocked, so
* wake it up.
*/
@@ -387,7 +387,7 @@ static void unlock_request(struct fuse_c
if (req) {
spin_lock(&fc->lock);
req->locked = 0;
- if (req->interrupted)
+ if (req->aborted)
wake_up(&req->waitq);
spin_unlock(&fc->lock);
}
@@ -589,8 +589,8 @@ static void request_wait(struct fuse_con
* Read a single request into the userspace filesystem's buffer. This
* function waits until a request is available, then removes it from
* the pending list and copies request data to userspace buffer. If
- * no reply is needed (FORGET) or request has been interrupted or
- * there was an error during the copying then it's finished by calling
+ * no reply is needed (FORGET) or request has been aborted or there
+ * was an error during the copying then it's finished by calling
* request_end(). Otherwise add it to the processing list, and set
* the 'sent' flag.
*/
@@ -645,10 +645,10 @@ static ssize_t fuse_dev_readv(struct fil
fuse_copy_finish(&cs);
spin_lock(&fc->lock);
req->locked = 0;
- if (!err && req->interrupted)
+ if (!err && req->aborted)
err = -ENOENT;
if (err) {
- if (!req->interrupted)
+ if (!req->aborted)
req->out.h.error = -EIO;
request_end(fc, req);
return err;
@@ -754,7 +754,7 @@ static ssize_t fuse_dev_writev(struct fi
if (!req)
goto err_unlock;
- if (req->interrupted) {
+ if (req->aborted) {
spin_unlock(&fc->lock);
fuse_copy_finish(&cs);
spin_lock(&fc->lock);
@@ -773,9 +773,9 @@ static ssize_t fuse_dev_writev(struct fi
spin_lock(&fc->lock);
req->locked = 0;
if (!err) {
- if (req->interrupted)
+ if (req->aborted)
err = -ENOENT;
- } else if (!req->interrupted)
+ } else if (!req->aborted)
req->out.h.error = -EIO;
request_end(fc, req);
@@ -835,7 +835,7 @@ static void end_requests(struct fuse_con
/*
* Abort requests under I/O
*
- * The requests are set to interrupted and finished, and the request
+ * The requests are set to aborted and finished, and the request
* waiter is woken up. This will make request_wait_answer() wait
* until the request is unlocked and then return.
*
@@ -850,7 +850,7 @@ static void end_io_requests(struct fuse_
list_entry(fc->io.next, struct fuse_req, list);
void (*end) (struct fuse_conn *, struct fuse_req *) = req->end;
- req->interrupted = 1;
+ req->aborted = 1;
req->out.h.error = -ECONNABORTED;
req->state = FUSE_REQ_FINISHED;
list_del_init(&req->list);
@@ -883,9 +883,8 @@ static void end_io_requests(struct fuse_
* onto the pending list is prevented by req->connected being false.
*
* Progression of requests under I/O to the processing list is
- * prevented by the req->interrupted flag being true for these
- * requests. For this reason requests on the io list must be aborted
- * first.
+ * prevented by the req->aborted flag being true for these requests.
+ * For this reason requests on the io list must be aborted first.
*/
void fuse_abort_conn(struct fuse_conn *fc)
{
Index: linux/fs/fuse/fuse_i.h
===================================================================
--- linux.orig/fs/fuse/fuse_i.h 2006-06-12 14:10:01.000000000 +0200
+++ linux/fs/fuse/fuse_i.h 2006-06-12 14:10:02.000000000 +0200
@@ -159,8 +159,8 @@ struct fuse_req {
/** Force sending of the request even if interrupted */
unsigned force:1;
- /** The request was interrupted */
- unsigned interrupted:1;
+ /** The request was aborted */
+ unsigned aborted:1;
/** Request is sent in the background */
unsigned background:1;
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 7/7] fuse: add request interruption
2006-06-12 12:21 [PATCH 0/7] fuse: file locking + misc Miklos Szeredi
` (5 preceding siblings ...)
2006-06-12 12:31 ` [PATCH 6/7] fuse: rename the interrupted flag Miklos Szeredi
@ 2006-06-12 12:33 ` Miklos Szeredi
6 siblings, 0 replies; 16+ messages in thread
From: Miklos Szeredi @ 2006-06-12 12:33 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel
Add synchronous request interruption. This is needed for file locking
operations which have to be interruptible. However filesystem may
implement interruptibility of other operations (e.g. like NFS 'intr'
mount option).
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
---
Index: linux/fs/fuse/dev.c
===================================================================
--- linux.orig/fs/fuse/dev.c 2006-06-12 14:10:02.000000000 +0200
+++ linux/fs/fuse/dev.c 2006-06-12 14:10:03.000000000 +0200
@@ -34,6 +34,7 @@ static void fuse_request_init(struct fus
{
memset(req, 0, sizeof(*req));
INIT_LIST_HEAD(&req->list);
+ INIT_LIST_HEAD(&req->intr_entry);
init_waitqueue_head(&req->waitq);
atomic_set(&req->count, 1);
}
@@ -215,6 +216,7 @@ static void request_end(struct fuse_conn
void (*end) (struct fuse_conn *, struct fuse_req *) = req->end;
req->end = NULL;
list_del(&req->list);
+ list_del(&req->intr_entry);
req->state = FUSE_REQ_FINISHED;
if (req->background) {
if (fc->num_background == FUSE_MAX_BACKGROUND) {
@@ -235,28 +237,63 @@ static void request_end(struct fuse_conn
fuse_put_request(fc, req);
}
+static void wait_answer_interruptible(struct fuse_conn *fc,
+ struct fuse_req *req)
+{
+ if (signal_pending(current))
+ return;
+
+ spin_unlock(&fc->lock);
+ wait_event_interruptible(req->waitq, req->state == FUSE_REQ_FINISHED);
+ spin_lock(&fc->lock);
+}
+
+static void queue_interrupt(struct fuse_conn *fc, struct fuse_req *req)
+{
+ list_add_tail(&req->intr_entry, &fc->interrupts);
+ wake_up(&fc->waitq);
+ kill_fasync(&fc->fasync, SIGIO, POLL_IN);
+}
+
/* Called with fc->lock held. Releases, and then reacquires it. */
static void request_wait_answer(struct fuse_conn *fc, struct fuse_req *req)
{
- sigset_t oldset;
+ if (!fc->no_interrupt) {
+ /* Any signal may interrupt this */
+ wait_answer_interruptible(fc, req);
- spin_unlock(&fc->lock);
- if (req->force)
+ if (req->aborted)
+ goto aborted;
+ if (req->state == FUSE_REQ_FINISHED)
+ return;
+
+ req->interrupted = 1;
+ if (req->state == FUSE_REQ_SENT)
+ queue_interrupt(fc, req);
+ }
+
+ if (req->force) {
+ spin_unlock(&fc->lock);
wait_event(req->waitq, req->state == FUSE_REQ_FINISHED);
- else {
+ spin_lock(&fc->lock);
+ } else {
+ sigset_t oldset;
+
+ /* Only fatal signals may interrupt this */
block_sigs(&oldset);
- wait_event_interruptible(req->waitq,
- req->state == FUSE_REQ_FINISHED);
+ wait_answer_interruptible(fc, req);
restore_sigs(&oldset);
}
- spin_lock(&fc->lock);
- if (req->state == FUSE_REQ_FINISHED && !req->aborted)
- return;
- if (!req->aborted) {
- req->out.h.error = -EINTR;
- req->aborted = 1;
- }
+ if (req->aborted)
+ goto aborted;
+ if (req->state == FUSE_REQ_FINISHED)
+ return;
+
+ req->out.h.error = -EINTR;
+ req->aborted = 1;
+
+ aborted:
if (req->locked) {
/* This is uninterruptible sleep, because data is
being copied to/from the buffers of req. During
@@ -288,13 +325,19 @@ static unsigned len_args(unsigned numarg
return nbytes;
}
+static u64 fuse_get_unique(struct fuse_conn *fc)
+ {
+ fc->reqctr++;
+ /* zero is special */
+ if (fc->reqctr == 0)
+ fc->reqctr = 1;
+
+ return fc->reqctr;
+}
+
static void queue_request(struct fuse_conn *fc, struct fuse_req *req)
{
- fc->reqctr++;
- /* zero is special */
- if (fc->reqctr == 0)
- fc->reqctr = 1;
- req->in.h.unique = fc->reqctr;
+ req->in.h.unique = fuse_get_unique(fc);
req->in.h.len = sizeof(struct fuse_in_header) +
len_args(req->in.numargs, (struct fuse_arg *) req->in.args);
list_add_tail(&req->list, &fc->pending);
@@ -307,9 +350,6 @@ static void queue_request(struct fuse_co
kill_fasync(&fc->fasync, SIGIO, POLL_IN);
}
-/*
- * This can only be interrupted by a SIGKILL
- */
void request_send(struct fuse_conn *fc, struct fuse_req *req)
{
req->isreply = 1;
@@ -566,13 +606,18 @@ static int fuse_copy_args(struct fuse_co
return err;
}
+static int request_pending(struct fuse_conn *fc)
+{
+ return !list_empty(&fc->pending) || !list_empty(&fc->interrupts);
+}
+
/* Wait until a request is available on the pending list */
static void request_wait(struct fuse_conn *fc)
{
DECLARE_WAITQUEUE(wait, current);
add_wait_queue_exclusive(&fc->waitq, &wait);
- while (fc->connected && list_empty(&fc->pending)) {
+ while (fc->connected && !request_pending(fc)) {
set_current_state(TASK_INTERRUPTIBLE);
if (signal_pending(current))
break;
@@ -586,6 +631,45 @@ static void request_wait(struct fuse_con
}
/*
+ * Transfer an interrupt request to userspace
+ *
+ * Unlike other requests this is assembled on demand, without a need
+ * to allocate a separate fuse_req structure.
+ *
+ * Called with fc->lock held, releases it
+ */
+static int fuse_read_interrupt(struct fuse_conn *fc, struct fuse_req *req,
+ const struct iovec *iov, unsigned long nr_segs)
+{
+ struct fuse_copy_state cs;
+ struct fuse_in_header ih;
+ struct fuse_interrupt_in arg;
+ unsigned reqsize = sizeof(ih) + sizeof(arg);
+ int err;
+
+ list_del_init(&req->intr_entry);
+ req->intr_unique = fuse_get_unique(fc);
+ memset(&ih, 0, sizeof(ih));
+ memset(&arg, 0, sizeof(arg));
+ ih.len = reqsize;
+ ih.opcode = FUSE_INTERRUPT;
+ ih.unique = req->intr_unique;
+ arg.unique = req->in.h.unique;
+
+ spin_unlock(&fc->lock);
+ if (iov_length(iov, nr_segs) < reqsize)
+ return -EINVAL;
+
+ fuse_copy_init(&cs, fc, 1, NULL, iov, nr_segs);
+ err = fuse_copy_one(&cs, &ih, sizeof(ih));
+ if (!err)
+ err = fuse_copy_one(&cs, &arg, sizeof(arg));
+ fuse_copy_finish(&cs);
+
+ return err ? err : reqsize;
+}
+
+/*
* Read a single request into the userspace filesystem's buffer. This
* function waits until a request is available, then removes it from
* the pending list and copies request data to userspace buffer. If
@@ -610,7 +694,7 @@ static ssize_t fuse_dev_readv(struct fil
spin_lock(&fc->lock);
err = -EAGAIN;
if ((file->f_flags & O_NONBLOCK) && fc->connected &&
- list_empty(&fc->pending))
+ !request_pending(fc))
goto err_unlock;
request_wait(fc);
@@ -618,9 +702,15 @@ static ssize_t fuse_dev_readv(struct fil
if (!fc->connected)
goto err_unlock;
err = -ERESTARTSYS;
- if (list_empty(&fc->pending))
+ if (!request_pending(fc))
goto err_unlock;
+ if (!list_empty(&fc->interrupts)) {
+ req = list_entry(fc->interrupts.next, struct fuse_req,
+ intr_entry);
+ return fuse_read_interrupt(fc, req, iov, nr_segs);
+ }
+
req = list_entry(fc->pending.next, struct fuse_req, list);
req->state = FUSE_REQ_READING;
list_move(&req->list, &fc->io);
@@ -658,6 +748,8 @@ static ssize_t fuse_dev_readv(struct fil
else {
req->state = FUSE_REQ_SENT;
list_move_tail(&req->list, &fc->processing);
+ if (req->interrupted)
+ queue_interrupt(fc, req);
spin_unlock(&fc->lock);
}
return reqsize;
@@ -684,7 +776,7 @@ static struct fuse_req *request_find(str
list_for_each(entry, &fc->processing) {
struct fuse_req *req;
req = list_entry(entry, struct fuse_req, list);
- if (req->in.h.unique == unique)
+ if (req->in.h.unique == unique || req->intr_unique == unique)
return req;
}
return NULL;
@@ -750,7 +842,6 @@ static ssize_t fuse_dev_writev(struct fi
goto err_unlock;
req = request_find(fc, oh.unique);
- err = -EINVAL;
if (!req)
goto err_unlock;
@@ -761,6 +852,23 @@ static ssize_t fuse_dev_writev(struct fi
request_end(fc, req);
return -ENOENT;
}
+ /* Is it an interrupt reply? */
+ if (req->intr_unique == oh.unique) {
+ err = -EINVAL;
+ if (nbytes != sizeof(struct fuse_out_header))
+ goto err_unlock;
+
+ if (oh.error == -ENOSYS)
+ fc->no_interrupt = 1;
+ else if (oh.error == -EAGAIN)
+ queue_interrupt(fc, req);
+
+ spin_unlock(&fc->lock);
+ fuse_copy_finish(&cs);
+ return nbytes;
+ }
+
+ req->state = FUSE_REQ_WRITING;
list_move(&req->list, &fc->io);
req->out.h = oh;
req->locked = 1;
@@ -809,7 +917,7 @@ static unsigned fuse_dev_poll(struct fil
spin_lock(&fc->lock);
if (!fc->connected)
mask = POLLERR;
- else if (!list_empty(&fc->pending))
+ else if (request_pending(fc))
mask |= POLLIN | POLLRDNORM;
spin_unlock(&fc->lock);
Index: linux/fs/fuse/fuse_i.h
===================================================================
--- linux.orig/fs/fuse/fuse_i.h 2006-06-12 14:10:02.000000000 +0200
+++ linux/fs/fuse/fuse_i.h 2006-06-12 14:10:03.000000000 +0200
@@ -131,6 +131,7 @@ enum fuse_req_state {
FUSE_REQ_PENDING,
FUSE_REQ_READING,
FUSE_REQ_SENT,
+ FUSE_REQ_WRITING,
FUSE_REQ_FINISHED
};
@@ -144,9 +145,15 @@ struct fuse_req {
fuse_conn */
struct list_head list;
+ /** Entry on the interrupts list */
+ struct list_head intr_entry;
+
/** refcount */
atomic_t count;
+ /** Unique ID for the interrupt request */
+ u64 intr_unique;
+
/*
* The following bitfields are either set once before the
* request is queued or setting/clearing them is protected by
@@ -165,6 +172,9 @@ struct fuse_req {
/** Request is sent in the background */
unsigned background:1;
+ /** The request has been interrupted */
+ unsigned interrupted:1;
+
/** Data is being copied to/from the request */
unsigned locked:1;
@@ -262,6 +272,9 @@ struct fuse_conn {
/** Number of requests currently in the background */
unsigned num_background;
+ /** Pending interrupts */
+ struct list_head interrupts;
+
/** Flag indicating if connection is blocked. This will be
the case before the INIT reply is received, and if there
are too many outstading backgrounds requests */
@@ -320,6 +333,9 @@ struct fuse_conn {
/** Is create not implemented by fs? */
unsigned no_create : 1;
+ /** Is interrupt not implemented by fs? */
+ unsigned no_interrupt : 1;
+
/** The number of requests waiting for completion */
atomic_t num_waiting;
Index: linux/include/linux/fuse.h
===================================================================
--- linux.orig/include/linux/fuse.h 2006-06-12 14:09:59.000000000 +0200
+++ linux/include/linux/fuse.h 2006-06-12 14:10:03.000000000 +0200
@@ -125,7 +125,8 @@ enum fuse_opcode {
FUSE_SETLK = 32,
FUSE_SETLKW = 33,
FUSE_ACCESS = 34,
- FUSE_CREATE = 35
+ FUSE_CREATE = 35,
+ FUSE_INTERRUPT = 36,
};
/* The read buffer is required to be at least 8k, but may be much larger */
@@ -291,6 +292,10 @@ struct fuse_init_out {
__u32 max_write;
};
+struct fuse_interrupt_in {
+ __u64 unique;
+};
+
struct fuse_in_header {
__u32 len;
__u32 opcode;
Index: linux/fs/fuse/file.c
===================================================================
--- linux.orig/fs/fuse/file.c 2006-06-12 14:10:01.000000000 +0200
+++ linux/fs/fuse/file.c 2006-06-12 14:10:03.000000000 +0200
@@ -705,6 +705,9 @@ static int fuse_setlk(struct file *file,
fuse_lk_fill(req, file, fl, opcode, pid);
request_send(fc, req);
err = req->out.h.error;
+ /* locking is restartable */
+ if (err == -EINTR)
+ err = -ERESTARTSYS;
fuse_put_request(fc, req);
return err;
}
Index: linux/Documentation/filesystems/fuse.txt
===================================================================
--- linux.orig/Documentation/filesystems/fuse.txt 2006-06-12 14:09:57.000000000 +0200
+++ linux/Documentation/filesystems/fuse.txt 2006-06-12 14:10:03.000000000 +0200
@@ -124,6 +124,46 @@ For each connection the following files
Only the owner of the mount may read or write these files.
+Interrupting filesystem operations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If a process issuing a FUSE filesystem request is interrupted, the
+following will happen:
+
+ 1) If the request is not yet sent to userspace AND the signal is
+ fatal (SIGKILL or unhandled fatal signal), then the request is
+ dequeued and returns immediately.
+
+ 2) If the request is not yet sent to userspace AND the signal is not
+ fatal, then an 'interrupted' flag is set for the request. When
+ the request has been successfully transfered to userspace and
+ this flag is set, an INTERRUPT request is queued.
+
+ 3) If the request is already sent to userspace, then an INTERRUPT
+ request is queued.
+
+INTERRUPT requests take precedence over other requests, so the
+userspace filesystem will receive queued INTERRUPTs before any others.
+
+The userspace filesystem may ignore the INTERRUPT requests entirely,
+or may honor them by sending a reply to the _original_ request, with
+the error set to EINTR.
+
+It is also possible that there's a race between processing the
+original request and it's INTERRUPT request. There are two possibilities:
+
+ 1) The INTERRUPT request is processed before the original request is
+ processed
+
+ 2) The INTERRUPT request is processed after the original request has
+ been answered
+
+If the filesystem cannot find the original request, it should wait for
+some timeout and/or a number of new requests to arrive, after which it
+should reply to the INTERRUPT request with an EAGAIN error. In case
+1) the INTERRUPT request will be requeued. In case 2) the INTERRUPT
+reply will be ignored.
+
Aborting a filesystem connection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -351,10 +391,10 @@ but is caused by a pagefault.
Solution is basically the same as above.
-An additional problem is that while the write buffer is being
-copied to the request, the request must not be interrupted. This
-is because the destination address of the copy may not be valid
-after the request is interrupted.
+An additional problem is that while the write buffer is being copied
+to the request, the request must not be interrupted/aborted. This is
+because the destination address of the copy may not be valid after the
+request has returned.
This is solved with doing the copy atomically, and allowing abort
while the page(s) belonging to the write buffer are faulted with
Index: linux/fs/fuse/inode.c
===================================================================
--- linux.orig/fs/fuse/inode.c 2006-06-12 14:09:59.000000000 +0200
+++ linux/fs/fuse/inode.c 2006-06-12 14:10:03.000000000 +0200
@@ -381,6 +381,7 @@ static struct fuse_conn *new_conn(void)
INIT_LIST_HEAD(&fc->pending);
INIT_LIST_HEAD(&fc->processing);
INIT_LIST_HEAD(&fc->io);
+ INIT_LIST_HEAD(&fc->interrupts);
atomic_set(&fc->num_waiting, 0);
fc->bdi.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE;
fc->bdi.unplug_io_fn = default_unplug_io_fn;
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/7] fuse: add control filesystem
2006-06-12 12:28 ` [PATCH 3/7] fuse: add control filesystem Miklos Szeredi
@ 2006-06-19 6:55 ` Andrew Morton
2006-06-19 8:06 ` Miklos Szeredi
0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2006-06-19 6:55 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: linux-kernel, linux-fsdevel
On Mon, 12 Jun 2006 14:28:32 +0200
Miklos Szeredi <miklos@szeredi.hu> wrote:
> Add a control filesystem to fuse, replacing the attributes currently
> exported through sysfs. An empty directory '/sys/fs/fuse/connections'
> is still created in sysfs, and mounting the control filesystem here
> provides backward compatibility.
>
> Advantages of the control filesystem over the previous solution:
>
> - allows the object directory and the attributes to be owned by the
> filesystem owner, hence letting unpriviled users abort the
> filesystem connection
>
> - does not suffer from module unload race
>
Presumably people with currently-working setups will find that whatever
they used to have in /sys/fs/fuse/connections won't be there any more, so
this is a non-back-compatible change. How do we help them with that?
> +static ssize_t fuse_conn_waiting_read(struct file *file, char __user *buf,
> + size_t len, loff_t *ppos)
> +{
> + char tmp[32];
> + size_t size;
> +
> + if (!*ppos) {
> + struct fuse_conn *fc = fuse_ctl_file_conn_get(file);
> + if (!fc)
> + return 0;
> +
> + file->private_data = (void *) atomic_read(&fc->num_waiting);
> + fuse_conn_put(fc);
> + }
> + size = sprintf(tmp, "%i\n", (int) file->private_data);
> + return simple_read_from_buffer(buf, len, ppos, tmp, size);
> +}
What happens if the first read isn't at file offset 0?
> +
> +static struct dentry *fuse_ctl_add_dentry(struct dentry *parent,
> + struct fuse_conn *fc,
> + const char *name,
> + int mode, int nlink,
> + struct inode_operations *iop,
> + const struct file_operations *fop)
> +{
> + struct dentry *dentry;
> + struct inode *inode;
> +
> + BUG_ON(fc->ctl_ndents >= FUSE_CTL_NUM_DENTRIES);
> + dentry = d_alloc_name(parent, name);
> + if (!dentry)
> + return NULL;
> +
> + fc->ctl_dentry[fc->ctl_ndents++] = dentry;
What locking protects fc->ctl_ndents?
> + inode = new_inode(fuse_control_sb);
> + if (!inode)
> + return NULL;
> +
> + inode->i_mode = mode;
> + inode->i_uid = fc->user_id;
> + inode->i_gid = fc->group_id;
> + inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> + if (iop)
> + inode->i_op = iop;
Is iop ever null?
> + inode->i_fop = fop;
> + inode->i_nlink = nlink;
> + inode->u.generic_ip = fc;
> + d_add(dentry, inode);
> + return dentry;
> +}
> +
> +int fuse_ctl_add_conn(struct fuse_conn *fc)
> +{
> + struct dentry *parent;
> + char name[32];
> +
> + if (!fuse_control_sb)
> + return 0;
Can this happen?
> + parent = fuse_control_sb->s_root;
> + parent->d_inode->i_nlink++;
What locking protects i_nlink?
> + sprintf(name, "%llu", (unsigned long long) fc->id);
> + parent = fuse_ctl_add_dentry(parent, fc, name, S_IFDIR | 0500, 2,
> + &simple_dir_inode_operations,
> + &simple_dir_operations);
> + if (!parent)
> + goto err;
> +
> + if (!fuse_ctl_add_dentry(parent, fc, "waiting", S_IFREG | 0400, 1,
> + NULL, &fuse_ctl_waiting_ops) ||
> + !fuse_ctl_add_dentry(parent, fc, "abort", S_IFREG | 0200, 1,
> + NULL, &fuse_ctl_abort_ops))
> + goto err;
> +
> + return 0;
> +
> + err:
> + fuse_ctl_remove_conn(fc);
> + return -ENOMEM;
> +}
> +
> +void fuse_ctl_remove_conn(struct fuse_conn *fc)
> +{
> + int i;
> +
> + if (!fuse_control_sb)
> + return;
> +
> + for (i = fc->ctl_ndents - 1; i >= 0; i--) {
> + struct dentry *dentry = fc->ctl_dentry[i];
> + dentry->d_inode->u.generic_ip = NULL;
> + d_drop(dentry);
> + dput(dentry);
> + }
> + fuse_control_sb->s_root->d_inode->i_nlink--;
Ditto.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/7] fuse: add POSIX file locking support
2006-06-12 12:29 ` [PATCH 4/7] fuse: add POSIX file locking support Miklos Szeredi
@ 2006-06-19 6:58 ` Andrew Morton
2006-06-19 8:12 ` Miklos Szeredi
2006-06-19 8:21 ` Jesper Juhl
1 sibling, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2006-06-19 6:58 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: linux-kernel, linux-fsdevel
On Mon, 12 Jun 2006 14:29:20 +0200
Miklos Szeredi <miklos@szeredi.hu> wrote:
> +/*
> + * It would be nice to scramble the ID space, so that the value of the
> + * files_struct pointer is not exposed to userspace. Symmetric crypto
> + * functions are overkill, since the inverse function doesn't need to
> + * be implemented (though it does have to exist). Is there something
> + * simpler?
> + */
> +static inline u64 fuse_lock_owner_id(fl_owner_t id)
> +{
> + return (unsigned long) id;
> +}
Add a constant, not-known-to-userspace offset to all ids?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/7] fuse: add control filesystem
2006-06-19 6:55 ` Andrew Morton
@ 2006-06-19 8:06 ` Miklos Szeredi
0 siblings, 0 replies; 16+ messages in thread
From: Miklos Szeredi @ 2006-06-19 8:06 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel
> Presumably people with currently-working setups will find that whatever
> they used to have in /sys/fs/fuse/connections won't be there any more, so
> this is a non-back-compatible change. How do we help them with that?
I can't think of any good technical solutions. But the control
filesystem should only ever be used in case of a major cock-up needing
manual intervention anyway, and not during normal operation.
E.g. somebody is trying very hard to deadlock a fuse filesystem.
So documenting it and alerting package maintainers should be enough I
think.
> > +static ssize_t fuse_conn_waiting_read(struct file *file, char __user *buf,
> > + size_t len, loff_t *ppos)
> > +{
> > + char tmp[32];
> > + size_t size;
> > +
> > + if (!*ppos) {
> > + struct fuse_conn *fc = fuse_ctl_file_conn_get(file);
> > + if (!fc)
> > + return 0;
> > +
> > + file->private_data = (void *) atomic_read(&fc->num_waiting);
> > + fuse_conn_put(fc);
> > + }
> > + size = sprintf(tmp, "%i\n", (int) file->private_data);
> > + return simple_read_from_buffer(buf, len, ppos, tmp, size);
> > +}
>
> What happens if the first read isn't at file offset 0?
Can't happen, because it's been opened with nonseekable_open.
> > +
> > +static struct dentry *fuse_ctl_add_dentry(struct dentry *parent,
> > + struct fuse_conn *fc,
> > + const char *name,
> > + int mode, int nlink,
> > + struct inode_operations *iop,
> > + const struct file_operations *fop)
> > +{
> > + struct dentry *dentry;
> > + struct inode *inode;
> > +
> > + BUG_ON(fc->ctl_ndents >= FUSE_CTL_NUM_DENTRIES);
> > + dentry = d_alloc_name(parent, name);
> > + if (!dentry)
> > + return NULL;
> > +
> > + fc->ctl_dentry[fc->ctl_ndents++] = dentry;
>
> What locking protects fc->ctl_ndents?
fuse_mutex. It's sort of documented at the declaration of fuse_mutex
in <fuse_i.h>, but I'll also add a header comment to these functions.
> > + inode = new_inode(fuse_control_sb);
> > + if (!inode)
> > + return NULL;
> > +
> > + inode->i_mode = mode;
> > + inode->i_uid = fc->user_id;
> > + inode->i_gid = fc->group_id;
> > + inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> > + if (iop)
> > + inode->i_op = iop;
>
> Is iop ever null?
Yes, for the non-directory nodes.
> > + inode->i_fop = fop;
> > + inode->i_nlink = nlink;
> > + inode->u.generic_ip = fc;
> > + d_add(dentry, inode);
> > + return dentry;
> > +}
> > +
> > +int fuse_ctl_add_conn(struct fuse_conn *fc)
> > +{
> > + struct dentry *parent;
> > + char name[32];
> > +
> > + if (!fuse_control_sb)
> > + return 0;
>
> Can this happen?
Certainly. fuse_control_sb is non null only while the sb is active
(when it's mounted at least once). A kernel mount could be created at
init time, but then the module unload problem would still remain.
>
> > + parent = fuse_control_sb->s_root;
> > + parent->d_inode->i_nlink++;
>
> What locking protects i_nlink?
fuse_mutex.
>
> > + sprintf(name, "%llu", (unsigned long long) fc->id);
> > + parent = fuse_ctl_add_dentry(parent, fc, name, S_IFDIR | 0500, 2,
> > + &simple_dir_inode_operations,
> > + &simple_dir_operations);
> > + if (!parent)
> > + goto err;
> > +
> > + if (!fuse_ctl_add_dentry(parent, fc, "waiting", S_IFREG | 0400, 1,
> > + NULL, &fuse_ctl_waiting_ops) ||
> > + !fuse_ctl_add_dentry(parent, fc, "abort", S_IFREG | 0200, 1,
> > + NULL, &fuse_ctl_abort_ops))
> > + goto err;
> > +
> > + return 0;
> > +
> > + err:
> > + fuse_ctl_remove_conn(fc);
> > + return -ENOMEM;
> > +}
> > +
> > +void fuse_ctl_remove_conn(struct fuse_conn *fc)
> > +{
> > + int i;
> > +
> > + if (!fuse_control_sb)
> > + return;
> > +
> > + for (i = fc->ctl_ndents - 1; i >= 0; i--) {
> > + struct dentry *dentry = fc->ctl_dentry[i];
> > + dentry->d_inode->u.generic_ip = NULL;
> > + d_drop(dentry);
> > + dput(dentry);
> > + }
> > + fuse_control_sb->s_root->d_inode->i_nlink--;
>
> Ditto.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/7] fuse: add POSIX file locking support
2006-06-19 6:58 ` Andrew Morton
@ 2006-06-19 8:12 ` Miklos Szeredi
0 siblings, 0 replies; 16+ messages in thread
From: Miklos Szeredi @ 2006-06-19 8:12 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel
> > +/*
> > + * It would be nice to scramble the ID space, so that the value of the
> > + * files_struct pointer is not exposed to userspace. Symmetric crypto
> > + * functions are overkill, since the inverse function doesn't need to
> > + * be implemented (though it does have to exist). Is there something
> > + * simpler?
> > + */
> > +static inline u64 fuse_lock_owner_id(fl_owner_t id)
> > +{
> > + return (unsigned long) id;
> > +}
>
> Add a constant, not-known-to-userspace offset to all ids?
I thought of that, but it seemd cryptographically not quite strong
enough. But maybe it's better than nothing.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/7] fuse: add POSIX file locking support
2006-06-12 12:29 ` [PATCH 4/7] fuse: add POSIX file locking support Miklos Szeredi
2006-06-19 6:58 ` Andrew Morton
@ 2006-06-19 8:21 ` Jesper Juhl
2006-06-19 8:37 ` Miklos Szeredi
1 sibling, 1 reply; 16+ messages in thread
From: Jesper Juhl @ 2006-06-19 8:21 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: akpm, linux-kernel, linux-fsdevel
On 12/06/06, Miklos Szeredi <miklos@szeredi.hu> wrote:
> This patch adds POSIX file locking support to the fuse interface.
>
> +/*
> + * It would be nice to scramble the ID space, so that the value of the
> + * files_struct pointer is not exposed to userspace. Symmetric crypto
> + * functions are overkill, since the inverse function doesn't need to
> + * be implemented (though it does have to exist). Is there something
> + * simpler?
> + */
> +static inline u64 fuse_lock_owner_id(fl_owner_t id)
> +{
> + return (unsigned long) id;
> +}
> +
How about; on fuse startup, pick some semirandom number, store it
somewhere, then do an XOR of the pointer with the saved value to
scramble it, when you need to use it, simply XOR it again with the
stored value... Not especially strong, but better than nothing and
better than just adding a constant that people can find out from the
source (and the scramble value would be differene each time fuse
loads, so at a minimum a different scramble key every boot) - also,
XOR is a quite fast operation so overhead should be low.
--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/7] fuse: add POSIX file locking support
2006-06-19 8:21 ` Jesper Juhl
@ 2006-06-19 8:37 ` Miklos Szeredi
2006-06-19 9:04 ` Jesper Juhl
0 siblings, 1 reply; 16+ messages in thread
From: Miklos Szeredi @ 2006-06-19 8:37 UTC (permalink / raw)
To: jesper.juhl; +Cc: akpm, linux-kernel, linux-fsdevel
> How about; on fuse startup, pick some semirandom number, store it
> somewhere, then do an XOR of the pointer with the saved value to
> scramble it, when you need to use it, simply XOR it again with the
> stored value... Not especially strong, but better than nothing and
> better than just adding a constant that people can find out from the
> source
I think Andrew was suggesting a random key for the ADD function.
> (and the scramble value would be differene each time fuse loads, so
> at a minimum a different scramble key every boot) - also, XOR is a
> quite fast operation so overhead should be low.
I think XOR might be even weaker than ADD, because from gessing the
difference between two values (easy) you might be able to guess the
bits of the key.
I'm actually looking for something stronger than XOR or ADD, but it's
all a bit academical I think, because even if userspace knows these
kernel pointers it can't really use them for any malicious purpose.
Miklos
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/7] fuse: add POSIX file locking support
2006-06-19 8:37 ` Miklos Szeredi
@ 2006-06-19 9:04 ` Jesper Juhl
2006-06-19 9:10 ` Miklos Szeredi
0 siblings, 1 reply; 16+ messages in thread
From: Jesper Juhl @ 2006-06-19 9:04 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: akpm, linux-kernel, linux-fsdevel
On 19/06/06, Miklos Szeredi <miklos@szeredi.hu> wrote:
> > How about; on fuse startup, pick some semirandom number, store it
> > somewhere, then do an XOR of the pointer with the saved value to
> > scramble it, when you need to use it, simply XOR it again with the
> > stored value... Not especially strong, but better than nothing and
> > better than just adding a constant that people can find out from the
> > source
>
> I think Andrew was suggesting a random key for the ADD function.
>
> > (and the scramble value would be differene each time fuse loads, so
> > at a minimum a different scramble key every boot) - also, XOR is a
> > quite fast operation so overhead should be low.
>
> I think XOR might be even weaker than ADD, because from gessing the
> difference between two values (easy) you might be able to guess the
> bits of the key.
>
> I'm actually looking for something stronger than XOR or ADD, but it's
How about using TEA (Tiny Encryption Algorithm), XTEA or XXTEA then?
They are quite simple algorithms, easy to implement and resonably fast
(with TEA being the simplest, but also weakest).
A hell of a lot better than just a simple XOR or ADD and probably more
than sufficient for this purpose.
http://en.wikipedia.org/wiki/Tiny_Encryption_Algorithm
http://www.simonshepherd.supanet.com/tea.htm
http://www.ftp.cl.cam.ac.uk/ftp/papers/djw-rmn/djw-rmn-tea.html
--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/7] fuse: add POSIX file locking support
2006-06-19 9:04 ` Jesper Juhl
@ 2006-06-19 9:10 ` Miklos Szeredi
0 siblings, 0 replies; 16+ messages in thread
From: Miklos Szeredi @ 2006-06-19 9:10 UTC (permalink / raw)
To: jesper.juhl; +Cc: akpm, linux-kernel, linux-fsdevel
> How about using TEA (Tiny Encryption Algorithm), XTEA or XXTEA then?
> They are quite simple algorithms, easy to implement and resonably fast
> (with TEA being the simplest, but also weakest).
> A hell of a lot better than just a simple XOR or ADD and probably more
> than sufficient for this purpose.
>
> http://en.wikipedia.org/wiki/Tiny_Encryption_Algorithm
> http://www.simonshepherd.supanet.com/tea.htm
> http://www.ftp.cl.cam.ac.uk/ftp/papers/djw-rmn/djw-rmn-tea.html
Cool. I'll add this.
It's not even worth using the crypto framework, since setting it up
would be more code than including the algorithm inline.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2006-06-19 9:11 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-12 12:21 [PATCH 0/7] fuse: file locking + misc Miklos Szeredi
2006-06-12 12:25 ` [PATCH 1/7] fuse: use MISC_MAJOR Miklos Szeredi
2006-06-12 12:27 ` [PATCH 2/7] fuse: no backgrounding on interrupt Miklos Szeredi
2006-06-12 12:28 ` [PATCH 3/7] fuse: add control filesystem Miklos Szeredi
2006-06-19 6:55 ` Andrew Morton
2006-06-19 8:06 ` Miklos Szeredi
2006-06-12 12:29 ` [PATCH 4/7] fuse: add POSIX file locking support Miklos Szeredi
2006-06-19 6:58 ` Andrew Morton
2006-06-19 8:12 ` Miklos Szeredi
2006-06-19 8:21 ` Jesper Juhl
2006-06-19 8:37 ` Miklos Szeredi
2006-06-19 9:04 ` Jesper Juhl
2006-06-19 9:10 ` Miklos Szeredi
2006-06-12 12:30 ` [PATCH 5/7] fuse: ensure FLUSH reaches userspace Miklos Szeredi
2006-06-12 12:31 ` [PATCH 6/7] fuse: rename the interrupted flag Miklos Szeredi
2006-06-12 12:33 ` [PATCH 7/7] fuse: add request interruption Miklos Szeredi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).