linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -V5 0/8] Generic name to handle and open by handle syscalls
@ 2010-04-26 17:33 Aneesh Kumar K.V
  2010-04-26 17:33 ` [PATCH -V5 1/8] exportfs: Return the minimum required handle size Aneesh Kumar K.V
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:33 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb; +Cc: linux-fsdevel, sfrench

Hi,

The below set of patches implement open by handle support using exportfs
operations. This allows user space application to map a file name to file 
handle and later open the file using handle. This should be usable
for userspace NFS [1] and 9P server [2]. XFS already support this with the ioctls
XFS_IOC_PATH_TO_HANDLE and XFS_IOC_OPEN_BY_HANDLE.

[1] http://nfs-ganesha.sourceforge.net/
[2] http://lists.gnu.org/archive/html/qemu-devel/2010-03/msg01087.html

TODO:
I guess we would need to optimize how we get the vfsmount for the filesystem
uuid specified. Searching the file system list and task name space may be a big
overhead for each open by handle call.

Chages from V4:
a) Changed the syscal arguments so that we don't need compat syscalls
   as suggested by Christoph
c) Added two new syscall sys_lname_to_handle and sys_freadlink to work with
   symlinks
d) Changed open_by_handle to work with all file types
e) Add ext3 support

Changes from V3:
a) Code cleanup suggested by Andreas
b) x86_64 syscall support
c) add compat syscall

Chages from V2:
a) Support system wide unique handle.

Changes from v1:
a) handle size is now specified in bytes
b) returns -EOVERFLOW if the handle size is small
c) dropped open_handle syscall and added open_by_handle_at syscall
   open_by_handle_at takes mount_fd as the directory fd of the mount point
   containing the file
e) handle will only be unique in a given file system. So for an NFS server
   exporting multiple file system, NFS server will have to internally track the
   mount point to which a file handle belongs to. We should be able to do it much
   easily than expecting kernel to give a system wide unique file handle. System
   wide unique file handle would need much larger changes to the exportfs or VFS
   interface and I was not sure whether we really need to do that in the kernel or
   in the user space
f) open_handle_at now only check for DAC_OVERRIDE capability


Example program: (x86_32). (x86_64 would need a different syscall number)
----------------
#include <stdio.h>
#include <stdlib.h>

#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <string.h>

struct uuid {
	char uuid[16];
};
struct file_handle {
        int handle_size;
        int handle_type;
	struct uuid fsid;
        char handle[0];
};


static int name_to_handle(const char *name, struct file_handle  *fh)
{
	return syscall(338, name, fh);
}

static int lname_to_handle(const char *name, struct file_handle  *fh)
{
	return syscall(339, name, fh);
}

static int open_by_handle(struct file_handle *fh,  int flags)
{
	return syscall(340, fh, flags);
}

static int freadlink(int fd, char *buf, size_t bufsiz)
{
	return syscall(341, fd, buf, bufsiz);
}

#define BUFSZ 100
int main(int argc, char *argv[])
{
        int ret;
	int handle_sz;
	struct stat bufstat;
        int fd, dirfd;
        char buf[BUFSZ];
        struct file_handle *fh = NULL;;
again:
	if (fh && fh->handle_size) {
		handle_sz = fh->handle_size;
		free(fh);
        	fh = malloc(sizeof(struct file_handle) + handle_sz);
		fh->handle_size = handle_sz;
	} else {
		fh = malloc(sizeof(struct file_handle));
		fh->handle_size = 0;
	}
        errno  = 0;
        ret = lname_to_handle(argv[1], fh);
        if (ret) {
                perror("Error:");
		printf("Found the handle size needed to be %d\n", fh->handle_size);
		printf("Trying again..\n");
		goto again;
                exit(1);
        }
        fd = open_by_handle(fh, O_RDONLY);
        if (fd <= 0 ) {
                perror("Error:");
                exit(1);
        }
	fstat(fd, &bufstat);
	ret = S_ISLNK(bufstat.st_mode);
	if (ret) {
        	memset(buf, 0 , BUFSZ);
		freadlink(fd, buf, BUFSZ);
		printf("%s is a symlink pointing to %s\n", argv[1], buf);
	}
        memset(buf, 0 , BUFSZ);
	while (1) {
		ret = read(fd, buf, BUFSZ -1);
		if (ret <= 0)
			break;
		buf[ret] = '\0';
                printf("%s", buf);
                memset(buf, 0 , BUFSZ);
        }
        return 0;
}

-aneesh


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH -V5 1/8] exportfs: Return the minimum required handle size
  2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
@ 2010-04-26 17:33 ` Aneesh Kumar K.V
  2010-04-26 17:34 ` [PATCH -V5 2/8] vfs: Add name to file handle conversion support Aneesh Kumar K.V
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:33 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, Aneesh Kumar K.V

The exportfs encode handle function should return the minimum required
handle size. This helps user to find out the handle size by passing 0
handle size in the first step and then redoing to the call again with
the returned handle size value.

Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/btrfs/export.c             |    8 ++++++--
 fs/exportfs/expfs.c           |    9 +++++++--
 fs/fat/inode.c                |    4 +++-
 fs/fuse/inode.c               |    4 +++-
 fs/gfs2/export.c              |    8 ++++++--
 fs/isofs/export.c             |    8 ++++++--
 fs/ocfs2/export.c             |    8 +++++++-
 fs/reiserfs/inode.c           |    7 ++++++-
 fs/udf/namei.c                |    7 ++++++-
 fs/xfs/linux-2.6/xfs_export.c |    4 +++-
 mm/shmem.c                    |    4 +++-
 11 files changed, 56 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/export.c b/fs/btrfs/export.c
index 951ef09..5f8ee5a 100644
--- a/fs/btrfs/export.c
+++ b/fs/btrfs/export.c
@@ -21,9 +21,13 @@ static int btrfs_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
 	int len = *max_len;
 	int type;
 
-	if ((len < BTRFS_FID_SIZE_NON_CONNECTABLE) ||
-	    (connectable && len < BTRFS_FID_SIZE_CONNECTABLE))
+	if (connectable && (len < BTRFS_FID_SIZE_CONNECTABLE)) {
+		*max_len = BTRFS_FID_SIZE_CONNECTABLE;
 		return 255;
+	} else if (len < BTRFS_FID_SIZE_NON_CONNECTABLE) {
+		*max_len = BTRFS_FID_SIZE_NON_CONNECTABLE;
+		return 255;
+	}
 
 	len  = BTRFS_FID_SIZE_NON_CONNECTABLE;
 	type = FILEID_BTRFS_WITHOUT_PARENT;
diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index e9e1759..cfee0f0 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -319,9 +319,14 @@ static int export_encode_fh(struct dentry *dentry, struct fid *fid,
 	struct inode * inode = dentry->d_inode;
 	int len = *max_len;
 	int type = FILEID_INO32_GEN;
-	
-	if (len < 2 || (connectable && len < 4))
+
+	if (connectable && (len < 4)) {
+		*max_len = 4;
+		return 255;
+	} else if (len < 2) {
+		*max_len = 2;
 		return 255;
+	}
 
 	len = 2;
 	fid->i32.ino = inode->i_ino;
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 0ce143b..6f83bc7 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -738,8 +738,10 @@ fat_encode_fh(struct dentry *de, __u32 *fh, int *lenp, int connectable)
 	struct inode *inode =  de->d_inode;
 	u32 ipos_h, ipos_m, ipos_l;
 
-	if (len < 5)
+	if (len < 5) {
+		*lenp = 5;
 		return 255; /* no room */
+	}
 
 	ipos_h = MSDOS_I(inode)->i_pos >> 8;
 	ipos_m = (MSDOS_I(inode)->i_pos & 0xf0) << 24;
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index ec14d19..beaea69 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -638,8 +638,10 @@ static int fuse_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
 	u64 nodeid;
 	u32 generation;
 
-	if (*max_len < len)
+	if (*max_len < len) {
+		*max_len = len;
 		return  255;
+	}
 
 	nodeid = get_fuse_inode(inode)->nodeid;
 	generation = inode->i_generation;
diff --git a/fs/gfs2/export.c b/fs/gfs2/export.c
index c22c211..d022236 100644
--- a/fs/gfs2/export.c
+++ b/fs/gfs2/export.c
@@ -36,9 +36,13 @@ static int gfs2_encode_fh(struct dentry *dentry, __u32 *p, int *len,
 	struct super_block *sb = inode->i_sb;
 	struct gfs2_inode *ip = GFS2_I(inode);
 
-	if (*len < GFS2_SMALL_FH_SIZE ||
-	    (connectable && *len < GFS2_LARGE_FH_SIZE))
+	if (connectable && (*len < GFS2_LARGE_FH_SIZE)) {
+		*len = GFS2_LARGE_FH_SIZE;
 		return 255;
+	} else if (*len < GFS2_SMALL_FH_SIZE) {
+		*len = GFS2_SMALL_FH_SIZE;
+		return 255;
+	}
 
 	fh[0] = cpu_to_be32(ip->i_no_formal_ino >> 32);
 	fh[1] = cpu_to_be32(ip->i_no_formal_ino & 0xFFFFFFFF);
diff --git a/fs/isofs/export.c b/fs/isofs/export.c
index ed752cb..dd4687f 100644
--- a/fs/isofs/export.c
+++ b/fs/isofs/export.c
@@ -124,9 +124,13 @@ isofs_export_encode_fh(struct dentry *dentry,
 	 * offset of the inode and the upper 16 bits of fh32[1] to
 	 * hold the offset of the parent.
 	 */
-
-	if (len < 3 || (connectable && len < 5))
+	if (connectable && (len < 5)) {
+		*max_len = 5;
+		return 255;
+	} else if (len < 3) {
+		*max_len = 3;
 		return 255;
+	}
 
 	len = 3;
 	fh32[0] = ei->i_iget5_block;
diff --git a/fs/ocfs2/export.c b/fs/ocfs2/export.c
index 19ad145..250a347 100644
--- a/fs/ocfs2/export.c
+++ b/fs/ocfs2/export.c
@@ -201,8 +201,14 @@ static int ocfs2_encode_fh(struct dentry *dentry, u32 *fh_in, int *max_len,
 		   dentry->d_name.len, dentry->d_name.name,
 		   fh, len, connectable);
 
-	if (len < 3 || (connectable && len < 6)) {
+	if (connectable && (len < 6)) {
 		mlog(ML_ERROR, "fh buffer is too small for encoding\n");
+		*max_len = 6;
+		type = 255;
+		goto bail;
+	} else if (len < 3) {
+		mlog(ML_ERROR, "fh buffer is too small for encoding\n");
+		*max_len = 3;
 		type = 255;
 		goto bail;
 	}
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index dc2c65e..5fff1e2 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -1588,8 +1588,13 @@ int reiserfs_encode_fh(struct dentry *dentry, __u32 * data, int *lenp,
 	struct inode *inode = dentry->d_inode;
 	int maxlen = *lenp;
 
-	if (maxlen < 3)
+	if (need_parent && (maxlen < 5)) {
+		*lenp = 5;
 		return 255;
+	} else if (maxlen < 3) {
+		*lenp = 3;
+		return 255;
+	}
 
 	data[0] = inode->i_ino;
 	data[1] = le32_to_cpu(INODE_PKEY(inode)->k_dir_id);
diff --git a/fs/udf/namei.c b/fs/udf/namei.c
index 7581602..37ce713 100644
--- a/fs/udf/namei.c
+++ b/fs/udf/namei.c
@@ -1360,8 +1360,13 @@ static int udf_encode_fh(struct dentry *de, __u32 *fh, int *lenp,
 	struct fid *fid = (struct fid *)fh;
 	int type = FILEID_UDF_WITHOUT_PARENT;
 
-	if (len < 3 || (connectable && len < 5))
+	if (connectable && (len < 5)) {
+		*lenp = 5;
+		return 255;
+	} else if (len < 3) {
+		*lenp = 3;
 		return 255;
+	}
 
 	*lenp = 3;
 	fid->udf.block = location.logicalBlockNum;
diff --git a/fs/xfs/linux-2.6/xfs_export.c b/fs/xfs/linux-2.6/xfs_export.c
index 846b75a..82c0553 100644
--- a/fs/xfs/linux-2.6/xfs_export.c
+++ b/fs/xfs/linux-2.6/xfs_export.c
@@ -81,8 +81,10 @@ xfs_fs_encode_fh(
 	 * seven combinations work.  The real answer is "don't use v2".
 	 */
 	len = xfs_fileid_length(fileid_type);
-	if (*max_len < len)
+	if (*max_len < len) {
+		*max_len = len
 		return 255;
+	}
 	*max_len = len;
 
 	switch (fileid_type) {
diff --git a/mm/shmem.c b/mm/shmem.c
index eef4ebe..bbeda1c 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2125,8 +2125,10 @@ static int shmem_encode_fh(struct dentry *dentry, __u32 *fh, int *len,
 {
 	struct inode *inode = dentry->d_inode;
 
-	if (*len < 3)
+	if (*len < 3) {
+		*len = 3;
 		return 255;
+	}
 
 	if (hlist_unhashed(&inode->i_hash)) {
 		/* Unfortunately insert_inode_hash is not idempotent,
-- 
1.7.0.4.360.g11766c


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH -V5 2/8] vfs: Add name to file handle conversion support
  2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
  2010-04-26 17:33 ` [PATCH -V5 1/8] exportfs: Return the minimum required handle size Aneesh Kumar K.V
@ 2010-04-26 17:34 ` Aneesh Kumar K.V
  2010-04-27  1:19   ` Neil Brown
  2010-04-26 17:34 ` [PATCH -V5 3/8] vfs: Add open by file handle support Aneesh Kumar K.V
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:34 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, Aneesh Kumar K.V

Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/open.c          |  100 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h |   17 +++++++++
 2 files changed, 117 insertions(+), 0 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 74e5cd9..e9aae5c 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -30,6 +30,8 @@
 #include <linux/falloc.h>
 #include <linux/fs_struct.h>
 #include <linux/ima.h>
+#include <linux/exportfs.h>
+#include <linux/mnt_namespace.h>
 
 #include "internal.h"
 
@@ -1206,3 +1208,101 @@ int nonseekable_open(struct inode *inode, struct file *filp)
 }
 
 EXPORT_SYMBOL(nonseekable_open);
+
+/* limit the handle size to some value */
+#define MAX_HANDLE_SZ 4096
+static long do_sys_name_to_handle(struct path *path,
+			struct file_handle __user *ufh)
+{
+	int retval;
+	int handle_size;
+	struct super_block *sb;
+	struct uuid *this_fs_id;
+	struct file_handle f_handle;
+	struct file_handle *handle = NULL;
+
+	if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle)))
+		return -EFAULT;
+
+	if (f_handle.handle_size > MAX_HANDLE_SZ)
+		return -EINVAL;
+
+	handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_size,
+			GFP_KERNEL);
+	if (!handle) {
+		retval = -ENOMEM;
+		goto err_out;
+	}
+	handle_size = f_handle.handle_size;
+
+	/* we ask for a non connected handle */
+	retval = exportfs_encode_fh(path->dentry,
+				(struct fid *)handle->f_handle,
+				&handle_size,  0);
+	/* convert handle size to bytes */
+	handle_size *= sizeof(u32);
+	handle->handle_type = retval;
+	handle->handle_size = handle_size;
+	if (handle_size <= f_handle.handle_size) {
+		/* get the uuid */
+		sb = path->mnt->mnt_sb;
+		if (sb->s_op->get_fsid) {
+			this_fs_id = sb->s_op->get_fsid(sb);
+			memcpy(handle->fsid.uuid, this_fs_id->uuid,
+				sizeof(handle->fsid.uuid));
+		} else
+			memset(handle->fsid.uuid, 0, sizeof(handle->fsid.uuid));
+		retval = 0;
+	} else {
+		/*
+		 * set the handle_size to zero so we copy only
+		 * non variable part of the file_handle
+		 */
+		handle_size = 0;
+		retval = -EOVERFLOW;
+	}
+	if (copy_to_user(ufh, handle,
+				sizeof(struct file_handle) + handle_size))
+		retval = -EFAULT;
+
+	kfree(handle);
+err_out:
+	return retval;
+}
+
+SYSCALL_DEFINE2(name_to_handle, const char __user *, name,
+		struct file_handle __user *, handle)
+{
+	long ret;
+	struct path path;
+
+	/* Follow links */
+	ret = user_path(name, &path);
+	if (ret)
+		return ret;
+
+	ret = do_sys_name_to_handle(&path, handle);
+	path_put(&path);
+
+	/* avoid REGPARM breakage on x86: */
+	asmlinkage_protect(2, ret, name, handle);
+	return ret;
+}
+
+SYSCALL_DEFINE2(lname_to_handle, const char __user *, name,
+		struct file_handle __user *, handle)
+{
+	long ret;
+	struct path path;
+
+	ret = user_lpath(name, &path);
+	if (ret)
+		return ret;
+
+	ret = do_sys_name_to_handle(&path, handle);
+	path_put(&path);
+
+	/* avoid REGPARM breakage on x86: */
+	asmlinkage_protect(2, ret, name, handle);
+	return ret;
+}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 39d57bc..da28b80 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -948,6 +948,20 @@ struct file {
 	unsigned long f_mnt_write_state;
 #endif
 };
+
+struct uuid {
+	char uuid[16];
+};
+
+struct file_handle {
+	int handle_size;
+	int handle_type;
+	/* File system identifier */
+	struct uuid fsid;
+	/* file identifier */
+	char f_handle[0];
+};
+
 extern spinlock_t files_lock;
 #define file_list_lock() spin_lock(&files_lock);
 #define file_list_unlock() spin_unlock(&files_lock);
@@ -1580,6 +1594,7 @@ struct super_operations {
 	ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
 #endif
 	int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
+	struct uuid *(*get_fsid)(struct super_block *);
 };
 
 /*
@@ -2328,6 +2343,8 @@ extern struct super_block *get_super(struct block_device *);
 extern struct super_block *get_active_super(struct block_device *bdev);
 extern struct super_block *user_get_super(dev_t);
 extern void drop_super(struct super_block *sb);
+extern struct super_block *fs_get_sb(struct uuid *fsid);
+
 
 extern int dcache_dir_open(struct inode *, struct file *);
 extern int dcache_dir_close(struct inode *, struct file *);
-- 
1.7.0.4.360.g11766c


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH -V5 3/8] vfs: Add open by file handle support
  2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
  2010-04-26 17:33 ` [PATCH -V5 1/8] exportfs: Return the minimum required handle size Aneesh Kumar K.V
  2010-04-26 17:34 ` [PATCH -V5 2/8] vfs: Add name to file handle conversion support Aneesh Kumar K.V
@ 2010-04-26 17:34 ` Aneesh Kumar K.V
  2010-04-27  1:05   ` Neil Brown
  2010-04-26 17:34 ` [PATCH -V5 4/8] vfs: Add freadlink syscall Aneesh Kumar K.V
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:34 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, Aneesh Kumar K.V

Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/filesystems.c              |   32 +++++++++-
 fs/namei.c                    |   24 -------
 fs/namespace.c                |   38 +++++++++++
 fs/open.c                     |  148 +++++++++++++++++++++++++++++++++++++++++
 fs/pnode.c                    |    2 +-
 include/linux/mnt_namespace.h |    2 +
 include/linux/namei.h         |   24 +++++++
 7 files changed, 244 insertions(+), 26 deletions(-)

diff --git a/fs/filesystems.c b/fs/filesystems.c
index 68ba492..743d36e 100644
--- a/fs/filesystems.c
+++ b/fs/filesystems.c
@@ -281,5 +281,35 @@ struct file_system_type *get_fs_type(const char *name)
 	}
 	return fs;
 }
-
 EXPORT_SYMBOL(get_fs_type);
+
+struct super_block *fs_get_sb(struct uuid *fsid)
+{
+	struct uuid *this_fsid;
+	struct file_system_type *fs_type;
+	struct super_block *sb, *found_sb = NULL;
+
+	read_lock(&file_systems_lock);
+	fs_type = file_systems;
+	while (fs_type) {
+		spin_lock(&sb_lock);
+		list_for_each_entry(sb, &fs_type->fs_supers, s_instances) {
+			if (!sb->s_op->get_fsid)
+				continue;
+			this_fsid = sb->s_op->get_fsid(sb);
+			if (!memcmp(fsid->uuid, this_fsid->uuid,
+					sizeof(this_fsid->uuid))) {
+				/* found the matching super_block */
+				atomic_inc(&sb->s_active);
+				found_sb = sb;
+				spin_unlock(&sb_lock);
+				goto out;
+			}
+		}
+		spin_unlock(&sb_lock);
+		fs_type = fs_type->next;
+	}
+out:
+	read_unlock(&file_systems_lock);
+	return found_sb;
+}
diff --git a/fs/namei.c b/fs/namei.c
index a7dce91..a18711e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1521,30 +1521,6 @@ out_unlock:
 	return may_open(&nd->path, 0, open_flag & ~O_TRUNC);
 }
 
-/*
- * Note that while the flag value (low two bits) for sys_open means:
- *	00 - read-only
- *	01 - write-only
- *	10 - read-write
- *	11 - special
- * it is changed into
- *	00 - no permissions needed
- *	01 - read-permission
- *	10 - write-permission
- *	11 - read-write
- * for the internal routines (ie open_namei()/follow_link() etc)
- * This is more logical, and also allows the 00 "no perm needed"
- * to be used for symlinks (where the permissions are checked
- * later).
- *
-*/
-static inline int open_to_namei_flags(int flag)
-{
-	if ((flag+1) & O_ACCMODE)
-		flag++;
-	return flag;
-}
-
 static int open_will_truncate(int flag, struct inode *inode)
 {
 	/*
diff --git a/fs/namespace.c b/fs/namespace.c
index 8174c8a..6168526 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2364,3 +2364,41 @@ void put_mnt_ns(struct mnt_namespace *ns)
 	kfree(ns);
 }
 EXPORT_SYMBOL(put_mnt_ns);
+
+/*
+ * Get any vfsmount mapping the superblock in the
+ * task namespace
+ */
+struct vfsmount *fs_get_vfsmount(struct task_struct *task,
+				struct super_block *sb)
+{
+	struct nsproxy *nsp;
+	struct list_head *mount_list;
+	struct mnt_namespace *ns = NULL;
+	struct vfsmount *mnt, *sb_mnt = NULL;
+
+	rcu_read_lock();
+	nsp = task_nsproxy(task);
+	if (nsp) {
+		ns = nsp->mnt_ns;
+		if (ns)
+			get_mnt_ns(ns);
+	}
+	rcu_read_unlock();
+	if (!ns)
+		return NULL;
+	down_read(&namespace_sem);
+	list_for_each(mount_list, &ns->list) {
+		mnt = list_entry(mount_list, struct vfsmount, mnt_list);
+		if (mnt->mnt_sb == sb) {
+			/* found the matching super block */
+			sb_mnt = mnt;
+			mntget(sb_mnt);
+			break;
+		}
+	}
+	up_read(&namespace_sem);
+
+	put_mnt_ns(ns);
+	return sb_mnt;
+}
diff --git a/fs/open.c b/fs/open.c
index e9aae5c..b87fa32 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1306,3 +1306,151 @@ SYSCALL_DEFINE2(lname_to_handle, const char __user *, name,
 	asmlinkage_protect(2, ret, name, handle);
 	return ret;
 }
+
+static int vfs_dentry_acceptable(void *context, struct dentry *dentry)
+{
+	return 1;
+}
+
+static struct dentry *handle_to_dentry(struct vfsmount *mnt,
+				struct file_handle *handle)
+{
+	int handle_size;
+	struct dentry *dentry;
+
+	/* change the handle size to multiple of sizeof(u32) */
+	handle_size = handle->handle_size >> 2;
+	dentry = exportfs_decode_fh(mnt, (struct fid *)handle->f_handle,
+					handle_size, handle->handle_type,
+					vfs_dentry_acceptable, NULL);
+	return dentry;
+}
+
+long do_sys_open_by_handle(struct file_handle __user *ufh, int flags)
+{
+	int fd;
+	int retval = 0;
+	int d_flags  = flags;
+	struct file *filp;
+	struct vfsmount *mnt;
+	struct inode *inode;
+	struct dentry *dentry;
+	struct super_block *sb;
+	struct file_handle f_handle;
+	struct file_handle *handle = NULL;
+
+	if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle)))
+		return  -EFAULT;
+
+	if ((f_handle.handle_size > MAX_HANDLE_SZ) ||
+		(f_handle.handle_size <= 0))
+		return -EINVAL;
+
+	if (!capable(CAP_DAC_OVERRIDE))
+		return -EPERM;
+
+	sb = fs_get_sb(&f_handle.fsid);
+	if (!sb)
+		return -ESTALE;
+	/*
+	 * Find the vfsmount for this superblock in the
+	 * current namespace
+	 */
+	mnt = fs_get_vfsmount(current, sb);
+	if (!mnt) {
+		retval = -ESTALE;
+		goto out_sb;
+	}
+
+	handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_size,
+			GFP_KERNEL);
+	if (!handle) {
+		retval =  -ENOMEM;
+		goto out_mnt;
+	}
+	/* copy the full handle */
+	if (copy_from_user(handle, ufh,
+				sizeof(struct file_handle) +
+				f_handle.handle_size)) {
+		retval = -EFAULT;
+		goto out_mnt;
+	}
+
+	dentry = handle_to_dentry(mnt, handle);
+	if (IS_ERR(dentry)) {
+		retval = PTR_ERR(dentry);
+		goto out_mnt;
+	}
+
+	inode = dentry->d_inode;
+
+	flags  = open_to_namei_flags(flags);
+	/* O_TRUNC implies we need access checks for write permissions */
+	if (flags & O_TRUNC)
+		flags |= MAY_WRITE;
+
+	if ((!(flags & O_APPEND) || (flags & O_TRUNC)) &&
+		(flags & FMODE_WRITE) && IS_APPEND(inode)) {
+		retval = -EPERM;
+		goto out_err;
+	}
+
+	if ((flags & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
+		retval = -EACCES;
+		goto out_err;
+	}
+
+	/* Can't write directories. */
+	if (S_ISDIR(inode->i_mode) && (flags & FMODE_WRITE)) {
+		retval = -EISDIR;
+		goto out_err;
+	}
+
+	fd = get_unused_fd();
+	if (fd < 0) {
+		retval = fd;
+		goto out_err;
+	}
+
+	filp = dentry_open(dentry, mntget(mnt),
+			d_flags, current_cred());
+	if (IS_ERR(filp)) {
+		put_unused_fd(fd);
+		retval =  PTR_ERR(filp);
+		goto out_err;
+	}
+
+	if (inode->i_mode & S_IFREG) {
+		filp->f_flags |= O_NOATIME;
+		filp->f_mode |= FMODE_NOCMTIME;
+	}
+	fsnotify_open(filp->f_path.dentry);
+	fd_install(fd, filp);
+	retval = fd;
+	goto out_mnt;
+
+out_err:
+	dput(dentry);
+out_mnt:
+	kfree(handle);
+	mntput(mnt);
+out_sb:
+	deactivate_super(sb);
+
+	return retval;
+}
+
+SYSCALL_DEFINE2(open_by_handle, struct file_handle __user *, handle,
+		int, flags)
+{
+	long ret;
+
+	if (force_o_largefile())
+		flags |= O_LARGEFILE;
+
+	ret = do_sys_open_by_handle(handle, flags);
+
+	/* avoid REGPARM breakage on x86: */
+	asmlinkage_protect(2, ret, handle, flags);
+	return ret;
+}
diff --git a/fs/pnode.c b/fs/pnode.c
index 5cc564a..9f6d12d 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -6,9 +6,9 @@
  *	Author : Ram Pai (linuxram@us.ibm.com)
  *
  */
+#include <linux/fs.h>
 #include <linux/mnt_namespace.h>
 #include <linux/mount.h>
-#include <linux/fs.h>
 #include "internal.h"
 #include "pnode.h"
 
diff --git a/include/linux/mnt_namespace.h b/include/linux/mnt_namespace.h
index 0b89efc..d363ecc 100644
--- a/include/linux/mnt_namespace.h
+++ b/include/linux/mnt_namespace.h
@@ -36,6 +36,8 @@ extern const struct seq_operations mounts_op;
 extern const struct seq_operations mountinfo_op;
 extern const struct seq_operations mountstats_op;
 extern int mnt_had_events(struct proc_mounts *);
+extern struct vfsmount *fs_get_vfsmount(struct task_struct *task,
+					struct super_block *sb);
 
 #endif
 #endif
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 05b441d..a853aa0 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -4,6 +4,7 @@
 #include <linux/dcache.h>
 #include <linux/linkage.h>
 #include <linux/path.h>
+#include <asm-generic/fcntl.h>
 
 struct vfsmount;
 
@@ -96,4 +97,27 @@ static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
 	((char *) name)[min(len, maxlen)] = '\0';
 }
 
+/*
+ * Note that while the flag value (low two bits) for sys_open means:
+ *	00 - read-only
+ *	01 - write-only
+ *	10 - read-write
+ *	11 - special
+ * it is changed into
+ *	00 - no permissions needed
+ *	01 - read-permission
+ *	10 - write-permission
+ *	11 - read-write
+ * for the internal routines (ie open_namei()/follow_link() etc)
+ * This is more logical, and also allows the 00 "no perm needed"
+ * to be used for symlinks (where the permissions are checked
+ * later).
+ *
+*/
+static inline int open_to_namei_flags(int flag)
+{
+	if ((flag+1) & O_ACCMODE)
+		flag++;
+	return flag;
+}
 #endif /* _LINUX_NAMEI_H */
-- 
1.7.0.4.360.g11766c


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH -V5 4/8] vfs: Add freadlink syscall
  2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (2 preceding siblings ...)
  2010-04-26 17:34 ` [PATCH -V5 3/8] vfs: Add open by file handle support Aneesh Kumar K.V
@ 2010-04-26 17:34 ` Aneesh Kumar K.V
  2010-04-26 17:34 ` [PATCH -V5 5/8] ext4: Add get_fsid callback Aneesh Kumar K.V
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:34 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, Aneesh Kumar K.V

This enables to use open-by-handle and then get the link target
details of a symlink using the fd returned by handle

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/stat.c |   27 +++++++++++++++++++++++++++
 1 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/fs/stat.c b/fs/stat.c
index c4ecd52..0fbab00 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -314,6 +314,33 @@ SYSCALL_DEFINE3(readlink, const char __user *, path, char __user *, buf,
 	return sys_readlinkat(AT_FDCWD, path, buf, bufsiz);
 }
 
+SYSCALL_DEFINE3(freadlink, int, fd, char __user *, buf,
+		int, bufsiz)
+{
+	int fput_needed;
+	struct file *file;
+	int error = -EBADF;
+	struct inode *inode;
+
+	file = fget_light(fd, &fput_needed);
+	if (!file)
+		return error;
+
+	inode = file->f_path.dentry->d_inode;
+	error = -EINVAL;
+	if (inode->i_op->readlink) {
+		error = security_inode_readlink(file->f_path.dentry);
+		if (!error) {
+			touch_atime(file->f_path.mnt, file->f_path.dentry);
+			error = inode->i_op->readlink(file->f_path.dentry,
+						buf, bufsiz);
+		}
+	}
+	fput_light(file, fput_needed);
+	return error;
+}
+
+
 
 /* ---------- LFS-64 ----------- */
 #ifdef __ARCH_WANT_STAT64
-- 
1.7.0.4.360.g11766c


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH -V5 5/8] ext4: Add get_fsid callback
  2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (3 preceding siblings ...)
  2010-04-26 17:34 ` [PATCH -V5 4/8] vfs: Add freadlink syscall Aneesh Kumar K.V
@ 2010-04-26 17:34 ` Aneesh Kumar K.V
  2010-04-26 17:34 ` [PATCH -V5 6/8] x86: Add new syscalls for x86_32 Aneesh Kumar K.V
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:34 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, Aneesh Kumar K.V

Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/ext4/super.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index e14d22c..b485f0a 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1049,6 +1049,14 @@ static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
 	return try_to_free_buffers(page);
 }
 
+static struct uuid *ext4_get_fsid(struct super_block *sb)
+{
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+	struct ext4_super_block *es = sbi->s_es;
+
+	return (struct uuid *)es->s_uuid;
+}
+
 #ifdef CONFIG_QUOTA
 #define QTYPE2NAME(t) ((t) == USRQUOTA ? "user" : "group")
 #define QTYPE2MOPT(on, t) ((t) == USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA))
@@ -1109,6 +1117,7 @@ static const struct super_operations ext4_sops = {
 	.quota_write	= ext4_quota_write,
 #endif
 	.bdev_try_to_free_page = bdev_try_to_free_page,
+	.get_fsid	= ext4_get_fsid,
 };
 
 static const struct super_operations ext4_nojournal_sops = {
@@ -1128,6 +1137,7 @@ static const struct super_operations ext4_nojournal_sops = {
 	.quota_write	= ext4_quota_write,
 #endif
 	.bdev_try_to_free_page = bdev_try_to_free_page,
+	.get_fsid	= ext4_get_fsid,
 };
 
 static const struct export_operations ext4_export_ops = {
-- 
1.7.0.4.360.g11766c


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH -V5 6/8] x86: Add new syscalls for x86_32
  2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (4 preceding siblings ...)
  2010-04-26 17:34 ` [PATCH -V5 5/8] ext4: Add get_fsid callback Aneesh Kumar K.V
@ 2010-04-26 17:34 ` Aneesh Kumar K.V
  2010-04-26 17:34 ` [PATCH -V5 7/8] x86: Add new syscalls for x86_64 Aneesh Kumar K.V
  2010-04-26 17:34 ` [PATCH -V5 8/8] ext3: Add get_fsid callback Aneesh Kumar K.V
  7 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:34 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, Aneesh Kumar K.V

This patch adds sys_name_to_handle, sys_lname_to_handle, sys_open_by_handle
and sys_freadlink syscalls to x86_32

Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/x86/include/asm/unistd_32.h   |    6 +++++-
 arch/x86/kernel/syscall_table_32.S |    4 ++++
 2 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/unistd_32.h b/arch/x86/include/asm/unistd_32.h
index beb9b5f..7ea7101 100644
--- a/arch/x86/include/asm/unistd_32.h
+++ b/arch/x86/include/asm/unistd_32.h
@@ -343,10 +343,14 @@
 #define __NR_rt_tgsigqueueinfo	335
 #define __NR_perf_event_open	336
 #define __NR_recvmmsg		337
+#define __NR_name_to_handle	338
+#define __NR_lname_to_handle    339
+#define __NR_open_by_handle     340
+#define __NR_freadlink          341
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 338
+#define NR_syscalls 342
 
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR
diff --git a/arch/x86/kernel/syscall_table_32.S b/arch/x86/kernel/syscall_table_32.S
index 8b37293..caa3f0c 100644
--- a/arch/x86/kernel/syscall_table_32.S
+++ b/arch/x86/kernel/syscall_table_32.S
@@ -337,3 +337,7 @@ ENTRY(sys_call_table)
 	.long sys_rt_tgsigqueueinfo	/* 335 */
 	.long sys_perf_event_open
 	.long sys_recvmmsg
+	.long sys_name_to_handle
+	.long sys_lname_to_handle
+	.long sys_open_by_handle
+	.long sys_freadlink		/* 341 */
-- 
1.7.0.4.360.g11766c


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH -V5 7/8] x86: Add new syscalls for x86_64
  2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (5 preceding siblings ...)
  2010-04-26 17:34 ` [PATCH -V5 6/8] x86: Add new syscalls for x86_32 Aneesh Kumar K.V
@ 2010-04-26 17:34 ` Aneesh Kumar K.V
  2010-04-26 17:34 ` [PATCH -V5 8/8] ext3: Add get_fsid callback Aneesh Kumar K.V
  7 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:34 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, Aneesh Kumar K.V

Add sys_name_to_handle, sys_lname_to_handle, sys_open_by_handle syscall
and sys_freadlink for x86_64

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/x86/ia32/ia32entry.S        |    4 ++++
 arch/x86/include/asm/unistd_64.h |    8 ++++++++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 59b4556..927cc8b 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -842,4 +842,8 @@ ia32_sys_call_table:
 	.quad compat_sys_rt_tgsigqueueinfo	/* 335 */
 	.quad sys_perf_event_open
 	.quad compat_sys_recvmmsg
+	.quad sys_name_to_handle
+	.quad sys_lname_to_handle
+	.quad sys_open_by_handle
+	.quad sys_freadlink			/* 341 */
 ia32_syscall_end:
diff --git a/arch/x86/include/asm/unistd_64.h b/arch/x86/include/asm/unistd_64.h
index ff4307b..a495455 100644
--- a/arch/x86/include/asm/unistd_64.h
+++ b/arch/x86/include/asm/unistd_64.h
@@ -663,6 +663,14 @@ __SYSCALL(__NR_rt_tgsigqueueinfo, sys_rt_tgsigqueueinfo)
 __SYSCALL(__NR_perf_event_open, sys_perf_event_open)
 #define __NR_recvmmsg				299
 __SYSCALL(__NR_recvmmsg, sys_recvmmsg)
+#define __NR_name_to_handle			300
+__SYSCALL(__NR_name_to_handle, sys_name_to_handle)
+#define __NR_lname_to_handle			301
+__SYSCALL(__NR_lname_to_handle, sys_lname_to_handle)
+#define __NR_open_by_handle			302
+__SYSCALL(__NR_open_by_handle, sys_open_by_handle)
+#define __NR_freadlink				303
+__SYSCALL(__NR_freadlink, sys_freadlink)
 
 #ifndef __NO_STUBS
 #define __ARCH_WANT_OLD_READDIR
-- 
1.7.0.4.360.g11766c


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH -V5 8/8] ext3: Add get_fsid callback
  2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (6 preceding siblings ...)
  2010-04-26 17:34 ` [PATCH -V5 7/8] x86: Add new syscalls for x86_64 Aneesh Kumar K.V
@ 2010-04-26 17:34 ` Aneesh Kumar K.V
  7 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:34 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, Aneesh Kumar K.V

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/ext3/super.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index 1bee604..dfe50a4 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -734,6 +734,14 @@ static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
 	return try_to_free_buffers(page);
 }
 
+static struct uuid *ext3_get_fsid(struct super_block *sb)
+{
+	struct ext3_sb_info *sbi = EXT3_SB(sb);
+	struct ext3_super_block *es = sbi->s_es;
+
+	return (struct uuid *)es->s_uuid;
+}
+
 #ifdef CONFIG_QUOTA
 #define QTYPE2NAME(t) ((t)==USRQUOTA?"user":"group")
 #define QTYPE2MOPT(on, t) ((t)==USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA))
@@ -791,6 +799,7 @@ static const struct super_operations ext3_sops = {
 	.quota_write	= ext3_quota_write,
 #endif
 	.bdev_try_to_free_page = bdev_try_to_free_page,
+	.get_fsid	= ext3_get_fsid,
 };
 
 static const struct export_operations ext3_export_ops = {
-- 
1.7.0.4.360.g11766c


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH -V5 3/8] vfs: Add open by file handle support
  2010-04-26 17:34 ` [PATCH -V5 3/8] vfs: Add open by file handle support Aneesh Kumar K.V
@ 2010-04-27  1:05   ` Neil Brown
  2010-04-27  6:10     ` Aneesh Kumar K. V
  0 siblings, 1 reply; 13+ messages in thread
From: Neil Brown @ 2010-04-27  1:05 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: hch, viro, adilger, corbet, serue, linux-fsdevel, sfrench

On Mon, 26 Apr 2010 23:04:01 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> Acked-by: Serge Hallyn <serue@us.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  fs/filesystems.c              |   32 +++++++++-
>  fs/namei.c                    |   24 -------
>  fs/namespace.c                |   38 +++++++++++
>  fs/open.c                     |  148 +++++++++++++++++++++++++++++++++++++++++
>  fs/pnode.c                    |    2 +-
>  include/linux/mnt_namespace.h |    2 +
>  include/linux/namei.h         |   24 +++++++
>  7 files changed, 244 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/filesystems.c b/fs/filesystems.c
> index 68ba492..743d36e 100644
> --- a/fs/filesystems.c
> +++ b/fs/filesystems.c
> @@ -281,5 +281,35 @@ struct file_system_type *get_fs_type(const char *name)
>  	}
>  	return fs;
>  }
> -
>  EXPORT_SYMBOL(get_fs_type);
> +
> +struct super_block *fs_get_sb(struct uuid *fsid)
> +{
> +	struct uuid *this_fsid;
> +	struct file_system_type *fs_type;
> +	struct super_block *sb, *found_sb = NULL;
> +
> +	read_lock(&file_systems_lock);
> +	fs_type = file_systems;
> +	while (fs_type) {

You've open-coded a for-loop here...
Why not:
        for (fs_type = file_systems ; fs_type ; fs_type = fs_type->next) {
??


> +		spin_lock(&sb_lock);
> +		list_for_each_entry(sb, &fs_type->fs_supers, s_instances) {
> +			if (!sb->s_op->get_fsid)
> +				continue;
> +			this_fsid = sb->s_op->get_fsid(sb);
> +			if (!memcmp(fsid->uuid, this_fsid->uuid,
> +					sizeof(this_fsid->uuid))) {
> +				/* found the matching super_block */
> +				atomic_inc(&sb->s_active);
> +				found_sb = sb;
> +				spin_unlock(&sb_lock);
> +				goto out;
> +			}
> +		}
> +		spin_unlock(&sb_lock);
> +		fs_type = fs_type->next;
> +	}
> +out:
> +	read_unlock(&file_systems_lock);
> +	return found_sb;
> +}
> diff --git a/fs/namei.c b/fs/namei.c
> index a7dce91..a18711e 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1521,30 +1521,6 @@ out_unlock:
>  	return may_open(&nd->path, 0, open_flag & ~O_TRUNC);
>  }
>  
> -/*
> - * Note that while the flag value (low two bits) for sys_open means:
> - *	00 - read-only
> - *	01 - write-only
> - *	10 - read-write
> - *	11 - special
> - * it is changed into
> - *	00 - no permissions needed
> - *	01 - read-permission
> - *	10 - write-permission
> - *	11 - read-write
> - * for the internal routines (ie open_namei()/follow_link() etc)
> - * This is more logical, and also allows the 00 "no perm needed"
> - * to be used for symlinks (where the permissions are checked
> - * later).
> - *
> -*/
> -static inline int open_to_namei_flags(int flag)
> -{
> -	if ((flag+1) & O_ACCMODE)
> -		flag++;
> -	return flag;
> -}
> -
>  static int open_will_truncate(int flag, struct inode *inode)
>  {
>  	/*
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 8174c8a..6168526 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2364,3 +2364,41 @@ void put_mnt_ns(struct mnt_namespace *ns)
>  	kfree(ns);
>  }
>  EXPORT_SYMBOL(put_mnt_ns);
> +
> +/*
> + * Get any vfsmount mapping the superblock in the
> + * task namespace
> + */
> +struct vfsmount *fs_get_vfsmount(struct task_struct *task,
> +				struct super_block *sb)
> +{
> +	struct nsproxy *nsp;
> +	struct list_head *mount_list;
> +	struct mnt_namespace *ns = NULL;
> +	struct vfsmount *mnt, *sb_mnt = NULL;
> +
> +	rcu_read_lock();
> +	nsp = task_nsproxy(task);
> +	if (nsp) {
> +		ns = nsp->mnt_ns;
> +		if (ns)
> +			get_mnt_ns(ns);
> +	}
> +	rcu_read_unlock();
> +	if (!ns)
> +		return NULL;
> +	down_read(&namespace_sem);
> +	list_for_each(mount_list, &ns->list) {
> +		mnt = list_entry(mount_list, struct vfsmount, mnt_list);
> +		if (mnt->mnt_sb == sb) {
> +			/* found the matching super block */
> +			sb_mnt = mnt;
> +			mntget(sb_mnt);
> +			break;
> +		}
> +	}
> +	up_read(&namespace_sem);
> +
> +	put_mnt_ns(ns);
> +	return sb_mnt;
> +}

If a task has the same filesystem mounted several times in it's namespace
(via "mount --bind") you just return any arbitrary one (the first).

Suppose that the filesystem does appear twice, and that in one appearance a
particular directory is mounted on, while in another appearance that
directory is not mounted on.  Suppose that the mounted-on appearance is
listed first in ns->list

Now we use a series of "name_to_handle" / "open_by_handle" calls to descend
through the second appearance to something beneath that directory which (in
the first appearance) is mounted-on.
This will not work as you will actually be descending the first appearance of
the filesystem.
I really think you should leave the choice of filesystem/mountpoint to
user-space.


> diff --git a/fs/open.c b/fs/open.c
> index e9aae5c..b87fa32 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -1306,3 +1306,151 @@ SYSCALL_DEFINE2(lname_to_handle, const char __user *, name,
>  	asmlinkage_protect(2, ret, name, handle);
>  	return ret;
>  }
> +
> +static int vfs_dentry_acceptable(void *context, struct dentry *dentry)
> +{
> +	return 1;
> +}
> +
> +static struct dentry *handle_to_dentry(struct vfsmount *mnt,
> +				struct file_handle *handle)
> +{
> +	int handle_size;
> +	struct dentry *dentry;
> +
> +	/* change the handle size to multiple of sizeof(u32) */
> +	handle_size = handle->handle_size >> 2;
> +	dentry = exportfs_decode_fh(mnt, (struct fid *)handle->f_handle,
> +					handle_size, handle->handle_type,
> +					vfs_dentry_acceptable, NULL);
> +	return dentry;
> +}
> +
> +long do_sys_open_by_handle(struct file_handle __user *ufh, int flags)
> +{
> +	int fd;
> +	int retval = 0;
> +	int d_flags  = flags;
> +	struct file *filp;
> +	struct vfsmount *mnt;
> +	struct inode *inode;
> +	struct dentry *dentry;
> +	struct super_block *sb;
> +	struct file_handle f_handle;
> +	struct file_handle *handle = NULL;
> +
> +	if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle)))
> +		return  -EFAULT;
> +
> +	if ((f_handle.handle_size > MAX_HANDLE_SZ) ||
> +		(f_handle.handle_size <= 0))
> +		return -EINVAL;
> +
> +	if (!capable(CAP_DAC_OVERRIDE))
> +		return -EPERM;
> +
> +	sb = fs_get_sb(&f_handle.fsid);
> +	if (!sb)
> +		return -ESTALE;
> +	/*
> +	 * Find the vfsmount for this superblock in the
> +	 * current namespace
> +	 */
> +	mnt = fs_get_vfsmount(current, sb);
> +	if (!mnt) {
> +		retval = -ESTALE;
> +		goto out_sb;
> +	}
> +
> +	handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_size,
> +			GFP_KERNEL);
> +	if (!handle) {
> +		retval =  -ENOMEM;
> +		goto out_mnt;
> +	}
> +	/* copy the full handle */
> +	if (copy_from_user(handle, ufh,
> +				sizeof(struct file_handle) +
> +				f_handle.handle_size)) {
> +		retval = -EFAULT;
> +		goto out_mnt;
> +	}
> +
> +	dentry = handle_to_dentry(mnt, handle);
> +	if (IS_ERR(dentry)) {
> +		retval = PTR_ERR(dentry);
> +		goto out_mnt;
> +	}
> +
> +	inode = dentry->d_inode;
> +
> +	flags  = open_to_namei_flags(flags);
> +	/* O_TRUNC implies we need access checks for write permissions */
> +	if (flags & O_TRUNC)
> +		flags |= MAY_WRITE;
> +
> +	if ((!(flags & O_APPEND) || (flags & O_TRUNC)) &&
> +		(flags & FMODE_WRITE) && IS_APPEND(inode)) {
> +		retval = -EPERM;
> +		goto out_err;
> +	}
> +
> +	if ((flags & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
> +		retval = -EACCES;
> +		goto out_err;
> +	}
> +
> +	/* Can't write directories. */
> +	if (S_ISDIR(inode->i_mode) && (flags & FMODE_WRITE)) {
> +		retval = -EISDIR;
> +		goto out_err;
> +	}
> +
> +	fd = get_unused_fd();

You want to use alloc_fd() (or get_unused_fd_flags) rather than get_unused_fd
so that the flags can be passed though and, in particular, so O_CLOEXEC will
be honoured.


> +	if (fd < 0) {
> +		retval = fd;
> +		goto out_err;
> +	}
> +
> +	filp = dentry_open(dentry, mntget(mnt),
> +			d_flags, current_cred());

dentry_open consumed the reference to dentry, so ...

> +	if (IS_ERR(filp)) {
> +		put_unused_fd(fd);
> +		retval =  PTR_ERR(filp);
> +		goto out_err;

The dput at 'out_err' will put it one time too many.
I think the best fix would be to pass dget(dentry) to dentry_open, then ....

> +	}
> +
> +	if (inode->i_mode & S_IFREG) {
> +		filp->f_flags |= O_NOATIME;
> +		filp->f_mode |= FMODE_NOCMTIME;
> +	}
> +	fsnotify_open(filp->f_path.dentry);
> +	fd_install(fd, filp);
> +	retval = fd;
> +	goto out_mnt;

... this goto (Which is rather ugly to me) would not be needed.

NeilBrown


> +
> +out_err:
> +	dput(dentry);
> +out_mnt:
> +	kfree(handle);
> +	mntput(mnt);
> +out_sb:
> +	deactivate_super(sb);
> +
> +	return retval;
> +}
> +
> +SYSCALL_DEFINE2(open_by_handle, struct file_handle __user *, handle,
> +		int, flags)
> +{
> +	long ret;
> +
> +	if (force_o_largefile())
> +		flags |= O_LARGEFILE;
> +
> +	ret = do_sys_open_by_handle(handle, flags);
> +
> +	/* avoid REGPARM breakage on x86: */
> +	asmlinkage_protect(2, ret, handle, flags);
> +	return ret;
> +}
> diff --git a/fs/pnode.c b/fs/pnode.c
> index 5cc564a..9f6d12d 100644
> --- a/fs/pnode.c
> +++ b/fs/pnode.c
> @@ -6,9 +6,9 @@
>   *	Author : Ram Pai (linuxram@us.ibm.com)
>   *
>   */
> +#include <linux/fs.h>
>  #include <linux/mnt_namespace.h>
>  #include <linux/mount.h>
> -#include <linux/fs.h>
>  #include "internal.h"
>  #include "pnode.h"
>  
> diff --git a/include/linux/mnt_namespace.h b/include/linux/mnt_namespace.h
> index 0b89efc..d363ecc 100644
> --- a/include/linux/mnt_namespace.h
> +++ b/include/linux/mnt_namespace.h
> @@ -36,6 +36,8 @@ extern const struct seq_operations mounts_op;
>  extern const struct seq_operations mountinfo_op;
>  extern const struct seq_operations mountstats_op;
>  extern int mnt_had_events(struct proc_mounts *);
> +extern struct vfsmount *fs_get_vfsmount(struct task_struct *task,
> +					struct super_block *sb);
>  
>  #endif
>  #endif
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 05b441d..a853aa0 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -4,6 +4,7 @@
>  #include <linux/dcache.h>
>  #include <linux/linkage.h>
>  #include <linux/path.h>
> +#include <asm-generic/fcntl.h>
>  
>  struct vfsmount;
>  
> @@ -96,4 +97,27 @@ static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
>  	((char *) name)[min(len, maxlen)] = '\0';
>  }
>  
> +/*
> + * Note that while the flag value (low two bits) for sys_open means:
> + *	00 - read-only
> + *	01 - write-only
> + *	10 - read-write
> + *	11 - special
> + * it is changed into
> + *	00 - no permissions needed
> + *	01 - read-permission
> + *	10 - write-permission
> + *	11 - read-write
> + * for the internal routines (ie open_namei()/follow_link() etc)
> + * This is more logical, and also allows the 00 "no perm needed"
> + * to be used for symlinks (where the permissions are checked
> + * later).
> + *
> +*/
> +static inline int open_to_namei_flags(int flag)
> +{
> +	if ((flag+1) & O_ACCMODE)
> +		flag++;
> +	return flag;
> +}
>  #endif /* _LINUX_NAMEI_H */


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -V5 2/8] vfs: Add name to file handle conversion support
  2010-04-26 17:34 ` [PATCH -V5 2/8] vfs: Add name to file handle conversion support Aneesh Kumar K.V
@ 2010-04-27  1:19   ` Neil Brown
  2010-04-27  6:08     ` Aneesh Kumar K. V
  0 siblings, 1 reply; 13+ messages in thread
From: Neil Brown @ 2010-04-27  1:19 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: hch, viro, adilger, corbet, serue, linux-fsdevel, sfrench

On Mon, 26 Apr 2010 23:04:00 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> Acked-by: Serge Hallyn <serue@us.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  fs/open.c          |  100 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/fs.h |   17 +++++++++
>  2 files changed, 117 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/open.c b/fs/open.c
> index 74e5cd9..e9aae5c 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -30,6 +30,8 @@
>  #include <linux/falloc.h>
>  #include <linux/fs_struct.h>
>  #include <linux/ima.h>
> +#include <linux/exportfs.h>
> +#include <linux/mnt_namespace.h>
>  
>  #include "internal.h"
>  
> @@ -1206,3 +1208,101 @@ int nonseekable_open(struct inode *inode, struct file *filp)
>  }
>  
>  EXPORT_SYMBOL(nonseekable_open);
> +
> +/* limit the handle size to some value */
> +#define MAX_HANDLE_SZ 4096
> +static long do_sys_name_to_handle(struct path *path,
> +			struct file_handle __user *ufh)
> +{
> +	int retval;
> +	int handle_size;
> +	struct super_block *sb;
> +	struct uuid *this_fs_id;
> +	struct file_handle f_handle;
> +	struct file_handle *handle = NULL;
> +
> +	if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle)))
> +		return -EFAULT;
> +
> +	if (f_handle.handle_size > MAX_HANDLE_SZ)
> +		return -EINVAL;
> +
> +	handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_size,
> +			GFP_KERNEL);
> +	if (!handle) {
> +		retval = -ENOMEM;
> +		goto err_out;
> +	}

You are mixing your patterns here.
In some cases you do:
   if (something fails)
           return -ERROR
other times you do

   if (something fails) {
            retval = -ERROR;
            goto err_out;
   }

which has exactly the same fix.  Being consistent is best and I would
recommend using the goto option - it is more maintainable.
I would actually prefer:

    retval = -ERROR;
    if (something fails)
          goto err_out;

but leave the choice to you.  Consistency is the important thing.



> +	handle_size = f_handle.handle_size;
> +
> +	/* we ask for a non connected handle */
> +	retval = exportfs_encode_fh(path->dentry,
> +				(struct fid *)handle->f_handle,
> +				&handle_size,  0);
> +	/* convert handle size to bytes */
> +	handle_size *= sizeof(u32);
> +	handle->handle_type = retval;
> +	handle->handle_size = handle_size;
> +	if (handle_size <= f_handle.handle_size) {
> +		/* get the uuid */
> +		sb = path->mnt->mnt_sb;
> +		if (sb->s_op->get_fsid) {
> +			this_fs_id = sb->s_op->get_fsid(sb);
> +			memcpy(handle->fsid.uuid, this_fs_id->uuid,
> +				sizeof(handle->fsid.uuid));
> +		} else
> +			memset(handle->fsid.uuid, 0, sizeof(handle->fsid.uuid));
> +		retval = 0;
> +	} else {
> +		/*
> +		 * set the handle_size to zero so we copy only
> +		 * non variable part of the file_handle
> +		 */
> +		handle_size = 0;
> +		retval = -EOVERFLOW;
> +	}
> +	if (copy_to_user(ufh, handle,
> +				sizeof(struct file_handle) + handle_size))
> +		retval = -EFAULT;
> +
> +	kfree(handle);
> +err_out:
> +	return retval;
> +}
> +
> +SYSCALL_DEFINE2(name_to_handle, const char __user *, name,
> +		struct file_handle __user *, handle)
> +{
> +	long ret;
> +	struct path path;

*Surely* you want name_to_handle_at, or name_at_to_handle here.
i.e. include an int 'dfd' to specify the starting directory.

Then you can make use of AT_SYMLINK_NOFOLLOW and so avoid the need for
2 syscalls.


> +
> +	/* Follow links */
> +	ret = user_path(name, &path);
> +	if (ret)
> +		return ret;
> +
> +	ret = do_sys_name_to_handle(&path, handle);
> +	path_put(&path);
> +
> +	/* avoid REGPARM breakage on x86: */
> +	asmlinkage_protect(2, ret, name, handle);
> +	return ret;
> +}
> +
> +SYSCALL_DEFINE2(lname_to_handle, const char __user *, name,
> +		struct file_handle __user *, handle)
> +{
> +	long ret;
> +	struct path path;
> +
> +	ret = user_lpath(name, &path);
> +	if (ret)
> +		return ret;
> +
> +	ret = do_sys_name_to_handle(&path, handle);
> +	path_put(&path);
> +
> +	/* avoid REGPARM breakage on x86: */
> +	asmlinkage_protect(2, ret, name, handle);
> +	return ret;
> +}
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 39d57bc..da28b80 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -948,6 +948,20 @@ struct file {
>  	unsigned long f_mnt_write_state;
>  #endif
>  };
> +
> +struct uuid {
> +	char uuid[16];

unsigned char??? just a thought.


> +};
> +
> +struct file_handle {
> +	int handle_size;
> +	int handle_type;
> +	/* File system identifier */
> +	struct uuid fsid;
> +	/* file identifier */
> +	char f_handle[0];
> +};
> +
>  extern spinlock_t files_lock;
>  #define file_list_lock() spin_lock(&files_lock);
>  #define file_list_unlock() spin_unlock(&files_lock);
> @@ -1580,6 +1594,7 @@ struct super_operations {
>  	ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
>  #endif
>  	int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
> +	struct uuid *(*get_fsid)(struct super_block *);

Assuming this was actually a good idea (which I am not yet sold on),
I think you would do better to pass in the address of a uuid to be filled in:

       int (*get_fsid)(struct super_block *, struct uuid *)

just incase the filesystem stored something in a slightly different format.

And as XFS has a 'nouuid' mount option, it might be best to allow an error
return to say "there is no uuid" - maybe expect -ENOENT ??

NeilBrown

>  };
>  
>  /*
> @@ -2328,6 +2343,8 @@ extern struct super_block *get_super(struct block_device *);
>  extern struct super_block *get_active_super(struct block_device *bdev);
>  extern struct super_block *user_get_super(dev_t);
>  extern void drop_super(struct super_block *sb);
> +extern struct super_block *fs_get_sb(struct uuid *fsid);
> +
>  
>  extern int dcache_dir_open(struct inode *, struct file *);
>  extern int dcache_dir_close(struct inode *, struct file *);


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -V5 2/8] vfs: Add name to file handle conversion support
  2010-04-27  1:19   ` Neil Brown
@ 2010-04-27  6:08     ` Aneesh Kumar K. V
  0 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K. V @ 2010-04-27  6:08 UTC (permalink / raw)
  To: Neil Brown; +Cc: hch, viro, adilger, corbet, serue, linux-fsdevel, sfrench

On Tue, 27 Apr 2010 11:19:49 +1000, Neil Brown <neilb@suse.de> wrote:
> On Mon, 26 Apr 2010 23:04:00 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > Acked-by: Serge Hallyn <serue@us.ibm.com>
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > ---
> >  fs/open.c          |  100 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/fs.h |   17 +++++++++
> >  2 files changed, 117 insertions(+), 0 deletions(-)
> > 
> > diff --git a/fs/open.c b/fs/open.c
> > index 74e5cd9..e9aae5c 100644
> > --- a/fs/open.c
> > +++ b/fs/open.c
> > @@ -30,6 +30,8 @@
> >  #include <linux/falloc.h>
> >  #include <linux/fs_struct.h>
> >  #include <linux/ima.h>
> > +#include <linux/exportfs.h>
> > +#include <linux/mnt_namespace.h>
> >  
> >  #include "internal.h"
> >  
> > @@ -1206,3 +1208,101 @@ int nonseekable_open(struct inode *inode, struct file *filp)
> >  }
> >  
> >  EXPORT_SYMBOL(nonseekable_open);
> > +
> > +/* limit the handle size to some value */
> > +#define MAX_HANDLE_SZ 4096
> > +static long do_sys_name_to_handle(struct path *path,
> > +			struct file_handle __user *ufh)
> > +{
> > +	int retval;
> > +	int handle_size;
> > +	struct super_block *sb;
> > +	struct uuid *this_fs_id;
> > +	struct file_handle f_handle;
> > +	struct file_handle *handle = NULL;
> > +
> > +	if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle)))
> > +		return -EFAULT;
> > +
> > +	if (f_handle.handle_size > MAX_HANDLE_SZ)
> > +		return -EINVAL;
> > +
> > +	handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_size,
> > +			GFP_KERNEL);
> > +	if (!handle) {
> > +		retval = -ENOMEM;
> > +		goto err_out;
> > +	}
> 
> You are mixing your patterns here.
> In some cases you do:
>    if (something fails)
>            return -ERROR
> other times you do
> 
>    if (something fails) {
>             retval = -ERROR;
>             goto err_out;
>    }
> 
> which has exactly the same fix.  Being consistent is best and I would
> recommend using the goto option - it is more maintainable.
> I would actually prefer:
> 
>     retval = -ERROR;
>     if (something fails)
>           goto err_out;
> 
> but leave the choice to you.  Consistency is the important thing.


Fixed as per you suggestion.

> 
> 
> 
> > +	handle_size = f_handle.handle_size;
> > +
> > +	/* we ask for a non connected handle */
> > +	retval = exportfs_encode_fh(path->dentry,
> > +				(struct fid *)handle->f_handle,
> > +				&handle_size,  0);
> > +	/* convert handle size to bytes */
> > +	handle_size *= sizeof(u32);
> > +	handle->handle_type = retval;
> > +	handle->handle_size = handle_size;
> > +	if (handle_size <= f_handle.handle_size) {
> > +		/* get the uuid */
> > +		sb = path->mnt->mnt_sb;
> > +		if (sb->s_op->get_fsid) {
> > +			this_fs_id = sb->s_op->get_fsid(sb);
> > +			memcpy(handle->fsid.uuid, this_fs_id->uuid,
> > +				sizeof(handle->fsid.uuid));
> > +		} else
> > +			memset(handle->fsid.uuid, 0, sizeof(handle->fsid.uuid));
> > +		retval = 0;
> > +	} else {
> > +		/*
> > +		 * set the handle_size to zero so we copy only
> > +		 * non variable part of the file_handle
> > +		 */
> > +		handle_size = 0;
> > +		retval = -EOVERFLOW;
> > +	}
> > +	if (copy_to_user(ufh, handle,
> > +				sizeof(struct file_handle) + handle_size))
> > +		retval = -EFAULT;
> > +
> > +	kfree(handle);
> > +err_out:
> > +	return retval;
> > +}
> > +
> > +SYSCALL_DEFINE2(name_to_handle, const char __user *, name,
> > +		struct file_handle __user *, handle)
> > +{
> > +	long ret;
> > +	struct path path;
> 
> *Surely* you want name_to_handle_at, or name_at_to_handle here.
> i.e. include an int 'dfd' to specify the starting directory.
> 
> Then you can make use of AT_SYMLINK_NOFOLLOW and so avoid the need for
> 2 syscalls.
> 


Done. Droped the lname_to_handle syscall


> 
> > +
> > +	/* Follow links */
> > +	ret = user_path(name, &path);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = do_sys_name_to_handle(&path, handle);
> > +	path_put(&path);
> > +
> > +	/* avoid REGPARM breakage on x86: */
> > +	asmlinkage_protect(2, ret, name, handle);
> > +	return ret;
> > +}
> > +
> > +SYSCALL_DEFINE2(lname_to_handle, const char __user *, name,
> > +		struct file_handle __user *, handle)
> > +{
> > +	long ret;
> > +	struct path path;
> > +
> > +	ret = user_lpath(name, &path);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = do_sys_name_to_handle(&path, handle);
> > +	path_put(&path);
> > +
> > +	/* avoid REGPARM breakage on x86: */
> > +	asmlinkage_protect(2, ret, name, handle);
> > +	return ret;
> > +}
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 39d57bc..da28b80 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -948,6 +948,20 @@ struct file {
> >  	unsigned long f_mnt_write_state;
> >  #endif
> >  };
> > +
> > +struct uuid {
> > +	char uuid[16];
> 
> unsigned char??? just a thought.
> 

Fixed

> 
> > +};
> > +
> > +struct file_handle {
> > +	int handle_size;
> > +	int handle_type;
> > +	/* File system identifier */
> > +	struct uuid fsid;
> > +	/* file identifier */
> > +	char f_handle[0];


I also changed this to unsigned char f_handle[0];


> > +};
> > +
> >  extern spinlock_t files_lock;
> >  #define file_list_lock() spin_lock(&files_lock);
> >  #define file_list_unlock() spin_unlock(&files_lock);
> > @@ -1580,6 +1594,7 @@ struct super_operations {
> >  	ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
> >  #endif
> >  	int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
> > +	struct uuid *(*get_fsid)(struct super_block *);
> 
> Assuming this was actually a good idea (which I am not yet sold on),
> I think you would do better to pass in the address of a uuid to be filled in:
> 
>        int (*get_fsid)(struct super_block *, struct uuid *)
> 
> just incase the filesystem stored something in a slightly different format.
> 
> And as XFS has a 'nouuid' mount option, it might be best to allow an error
> return to say "there is no uuid" - maybe expect -ENOENT ??

Fixed

Thanks for the review
-aneesh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -V5 3/8] vfs: Add open by file handle support
  2010-04-27  1:05   ` Neil Brown
@ 2010-04-27  6:10     ` Aneesh Kumar K. V
  0 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K. V @ 2010-04-27  6:10 UTC (permalink / raw)
  To: Neil Brown; +Cc: hch, viro, adilger, corbet, serue, linux-fsdevel, sfrench

On Tue, 27 Apr 2010 11:05:45 +1000, Neil Brown <neilb@suse.de> wrote:
> On Mon, 26 Apr 2010 23:04:01 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > Acked-by: Serge Hallyn <serue@us.ibm.com>
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > ---
> >  fs/filesystems.c              |   32 +++++++++-
> >  fs/namei.c                    |   24 -------
> >  fs/namespace.c                |   38 +++++++++++
> >  fs/open.c                     |  148 +++++++++++++++++++++++++++++++++++++++++
> >  fs/pnode.c                    |    2 +-
> >  include/linux/mnt_namespace.h |    2 +
> >  include/linux/namei.h         |   24 +++++++
> >  7 files changed, 244 insertions(+), 26 deletions(-)
> > 
> > diff --git a/fs/filesystems.c b/fs/filesystems.c
> > index 68ba492..743d36e 100644
> > --- a/fs/filesystems.c
> > +++ b/fs/filesystems.c
> > @@ -281,5 +281,35 @@ struct file_system_type *get_fs_type(const char *name)
> >  	}
> >  	return fs;
> >  }
> > -
> >  EXPORT_SYMBOL(get_fs_type);
> > +
> > +struct super_block *fs_get_sb(struct uuid *fsid)
> > +{
> > +	struct uuid *this_fsid;
> > +	struct file_system_type *fs_type;
> > +	struct super_block *sb, *found_sb = NULL;
> > +
> > +	read_lock(&file_systems_lock);
> > +	fs_type = file_systems;
> > +	while (fs_type) {
> 
> You've open-coded a for-loop here...
> Why not:
>         for (fs_type = file_systems ; fs_type ; fs_type = fs_type->next) {
> ??

Fixed

> 
> 
> > +		spin_lock(&sb_lock);
> > +		list_for_each_entry(sb, &fs_type->fs_supers, s_instances) {
> > +			if (!sb->s_op->get_fsid)
> > +				continue;
> > +			this_fsid = sb->s_op->get_fsid(sb);
> > +			if (!memcmp(fsid->uuid, this_fsid->uuid,
> > +					sizeof(this_fsid->uuid))) {
> > +				/* found the matching super_block */
> > +				atomic_inc(&sb->s_active);
> > +				found_sb = sb;
> > +				spin_unlock(&sb_lock);
> > +				goto out;
> > +			}
> > +		}
> > +		spin_unlock(&sb_lock);
> > +		fs_type = fs_type->next;
> > +	}
> > +out:
> > +	read_unlock(&file_systems_lock);
> > +	return found_sb;
> > +}
> > diff --git a/fs/namei.c b/fs/namei.c
> > index a7dce91..a18711e 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -1521,30 +1521,6 @@ out_unlock:
> >  	return may_open(&nd->path, 0, open_flag & ~O_TRUNC);
> >  }
> >  
> > -/*
> > - * Note that while the flag value (low two bits) for sys_open means:
> > - *	00 - read-only
> > - *	01 - write-only
> > - *	10 - read-write
> > - *	11 - special
> > - * it is changed into
> > - *	00 - no permissions needed
> > - *	01 - read-permission
> > - *	10 - write-permission
> > - *	11 - read-write
> > - * for the internal routines (ie open_namei()/follow_link() etc)
> > - * This is more logical, and also allows the 00 "no perm needed"
> > - * to be used for symlinks (where the permissions are checked
> > - * later).
> > - *
> > -*/
> > -static inline int open_to_namei_flags(int flag)
> > -{
> > -	if ((flag+1) & O_ACCMODE)
> > -		flag++;
> > -	return flag;
> > -}
> > -
> >  static int open_will_truncate(int flag, struct inode *inode)
> >  {
> >  	/*
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index 8174c8a..6168526 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -2364,3 +2364,41 @@ void put_mnt_ns(struct mnt_namespace *ns)
> >  	kfree(ns);
> >  }
> >  EXPORT_SYMBOL(put_mnt_ns);
> > +
> > +/*
> > + * Get any vfsmount mapping the superblock in the
> > + * task namespace
> > + */
> > +struct vfsmount *fs_get_vfsmount(struct task_struct *task,
> > +				struct super_block *sb)
> > +{
> > +	struct nsproxy *nsp;
> > +	struct list_head *mount_list;
> > +	struct mnt_namespace *ns = NULL;
> > +	struct vfsmount *mnt, *sb_mnt = NULL;
> > +
> > +	rcu_read_lock();
> > +	nsp = task_nsproxy(task);
> > +	if (nsp) {
> > +		ns = nsp->mnt_ns;
> > +		if (ns)
> > +			get_mnt_ns(ns);
> > +	}
> > +	rcu_read_unlock();
> > +	if (!ns)
> > +		return NULL;
> > +	down_read(&namespace_sem);
> > +	list_for_each(mount_list, &ns->list) {
> > +		mnt = list_entry(mount_list, struct vfsmount, mnt_list);
> > +		if (mnt->mnt_sb == sb) {
> > +			/* found the matching super block */
> > +			sb_mnt = mnt;
> > +			mntget(sb_mnt);
> > +			break;
> > +		}
> > +	}
> > +	up_read(&namespace_sem);
> > +
> > +	put_mnt_ns(ns);
> > +	return sb_mnt;
> > +}
> 
> If a task has the same filesystem mounted several times in it's namespace
> (via "mount --bind") you just return any arbitrary one (the first).
> 
> Suppose that the filesystem does appear twice, and that in one appearance a
> particular directory is mounted on, while in another appearance that
> directory is not mounted on.  Suppose that the mounted-on appearance is
> listed first in ns->list
> 
> Now we use a series of "name_to_handle" / "open_by_handle" calls to descend
> through the second appearance to something beneath that directory which (in
> the first appearance) is mounted-on.
> This will not work as you will actually be descending the first appearance of
> the filesystem.
> I really think you should leave the choice of filesystem/mountpoint to
> user-space.
> 

Will comment on this on a separate thread.

> 
> > diff --git a/fs/open.c b/fs/open.c
> > index e9aae5c..b87fa32 100644
> > --- a/fs/open.c
> > +++ b/fs/open.c
> > @@ -1306,3 +1306,151 @@ SYSCALL_DEFINE2(lname_to_handle, const char __user *, name,
> >  	asmlinkage_protect(2, ret, name, handle);
> >  	return ret;
> >  }
> > +
> > +static int vfs_dentry_acceptable(void *context, struct dentry *dentry)
> > +{
> > +	return 1;
> > +}
> > +
> > +static struct dentry *handle_to_dentry(struct vfsmount *mnt,
> > +				struct file_handle *handle)
> > +{
> > +	int handle_size;
> > +	struct dentry *dentry;
> > +
> > +	/* change the handle size to multiple of sizeof(u32) */
> > +	handle_size = handle->handle_size >> 2;
> > +	dentry = exportfs_decode_fh(mnt, (struct fid *)handle->f_handle,
> > +					handle_size, handle->handle_type,
> > +					vfs_dentry_acceptable, NULL);
> > +	return dentry;
> > +}
> > +
> > +long do_sys_open_by_handle(struct file_handle __user *ufh, int flags)
> > +{
> > +	int fd;
> > +	int retval = 0;
> > +	int d_flags  = flags;
> > +	struct file *filp;
> > +	struct vfsmount *mnt;
> > +	struct inode *inode;
> > +	struct dentry *dentry;
> > +	struct super_block *sb;
> > +	struct file_handle f_handle;
> > +	struct file_handle *handle = NULL;
> > +
> > +	if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle)))
> > +		return  -EFAULT;
> > +
> > +	if ((f_handle.handle_size > MAX_HANDLE_SZ) ||
> > +		(f_handle.handle_size <= 0))
> > +		return -EINVAL;
> > +
> > +	if (!capable(CAP_DAC_OVERRIDE))
> > +		return -EPERM;
> > +
> > +	sb = fs_get_sb(&f_handle.fsid);
> > +	if (!sb)
> > +		return -ESTALE;
> > +	/*
> > +	 * Find the vfsmount for this superblock in the
> > +	 * current namespace
> > +	 */
> > +	mnt = fs_get_vfsmount(current, sb);
> > +	if (!mnt) {
> > +		retval = -ESTALE;
> > +		goto out_sb;
> > +	}
> > +
> > +	handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_size,
> > +			GFP_KERNEL);
> > +	if (!handle) {
> > +		retval =  -ENOMEM;
> > +		goto out_mnt;
> > +	}
> > +	/* copy the full handle */
> > +	if (copy_from_user(handle, ufh,
> > +				sizeof(struct file_handle) +
> > +				f_handle.handle_size)) {
> > +		retval = -EFAULT;
> > +		goto out_mnt;
> > +	}
> > +
> > +	dentry = handle_to_dentry(mnt, handle);
> > +	if (IS_ERR(dentry)) {
> > +		retval = PTR_ERR(dentry);
> > +		goto out_mnt;
> > +	}
> > +
> > +	inode = dentry->d_inode;
> > +
> > +	flags  = open_to_namei_flags(flags);
> > +	/* O_TRUNC implies we need access checks for write permissions */
> > +	if (flags & O_TRUNC)
> > +		flags |= MAY_WRITE;
> > +
> > +	if ((!(flags & O_APPEND) || (flags & O_TRUNC)) &&
> > +		(flags & FMODE_WRITE) && IS_APPEND(inode)) {
> > +		retval = -EPERM;
> > +		goto out_err;
> > +	}
> > +
> > +	if ((flags & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
> > +		retval = -EACCES;
> > +		goto out_err;
> > +	}
> > +
> > +	/* Can't write directories. */
> > +	if (S_ISDIR(inode->i_mode) && (flags & FMODE_WRITE)) {
> > +		retval = -EISDIR;
> > +		goto out_err;
> > +	}
> > +
> > +	fd = get_unused_fd();
> 
> You want to use alloc_fd() (or get_unused_fd_flags) rather than get_unused_fd
> so that the flags can be passed though and, in particular, so O_CLOEXEC will
> be honoured.


Fixed by saying fd = get_unused_fd_flags(d_flags);

> 
> 
> > +	if (fd < 0) {
> > +		retval = fd;
> > +		goto out_err;
> > +	}
> > +
> > +	filp = dentry_open(dentry, mntget(mnt),
> > +			d_flags, current_cred());
> 
> dentry_open consumed the reference to dentry, so ...
> 
> > +	if (IS_ERR(filp)) {
> > +		put_unused_fd(fd);
> > +		retval =  PTR_ERR(filp);
> > +		goto out_err;
> 
> The dput at 'out_err' will put it one time too many.
> I think the best fix would be to pass dget(dentry) to dentry_open, then ....
> 
> > +	}
> > +
> > +	if (inode->i_mode & S_IFREG) {
> > +		filp->f_flags |= O_NOATIME;
> > +		filp->f_mode |= FMODE_NOCMTIME;
> > +	}
> > +	fsnotify_open(filp->f_path.dentry);
> > +	fd_install(fd, filp);
> > +	retval = fd;
> > +	goto out_mnt;
> 
> ... this goto (Which is rather ugly to me) would not be needed.
> 


Fixed

Thanks for the review
-aneesh

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-04-27  6:10 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
2010-04-26 17:33 ` [PATCH -V5 1/8] exportfs: Return the minimum required handle size Aneesh Kumar K.V
2010-04-26 17:34 ` [PATCH -V5 2/8] vfs: Add name to file handle conversion support Aneesh Kumar K.V
2010-04-27  1:19   ` Neil Brown
2010-04-27  6:08     ` Aneesh Kumar K. V
2010-04-26 17:34 ` [PATCH -V5 3/8] vfs: Add open by file handle support Aneesh Kumar K.V
2010-04-27  1:05   ` Neil Brown
2010-04-27  6:10     ` Aneesh Kumar K. V
2010-04-26 17:34 ` [PATCH -V5 4/8] vfs: Add freadlink syscall Aneesh Kumar K.V
2010-04-26 17:34 ` [PATCH -V5 5/8] ext4: Add get_fsid callback Aneesh Kumar K.V
2010-04-26 17:34 ` [PATCH -V5 6/8] x86: Add new syscalls for x86_32 Aneesh Kumar K.V
2010-04-26 17:34 ` [PATCH -V5 7/8] x86: Add new syscalls for x86_64 Aneesh Kumar K.V
2010-04-26 17:34 ` [PATCH -V5 8/8] ext3: Add get_fsid callback Aneesh Kumar K.V

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).