[PATCH -V7 0/8] Generic name to handle and open by handle syscalls

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH -V7 0/8] Generic name to handle and open by handle syscalls
@ 2010-05-12 15:50 Aneesh Kumar K.V
  2010-05-12 15:50 ` [PATCH -V7 1/9] exportfs: Return the minimum required handle size Aneesh Kumar K.V
                   ` (8 more replies)
  0 siblings, 9 replies; 49+ messages in thread
From: Aneesh Kumar K.V @ 2010-05-12 15:50 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel

Hi,

The below set of patches implement open by handle support using exportfs
operations. This allows user space application to map a file name to file 
handle and later open the file using handle. This should be usable
for userspace NFS [1] and 9P server [2]. XFS already support this with the ioctls
XFS_IOC_PATH_TO_HANDLE and XFS_IOC_OPEN_BY_HANDLE.

[1] http://nfs-ganesha.sourceforge.net/
[2] http://thread.gmane.org/gmane.comp.emulators.qemu/68992

I guess v7 address all the review comments. If I have missing some please let me
know. I am also looking at getting xfsprogs libhandle.so on top of these syscalls.
Let me know if you have any objection towards merging this patchset upstream.

Changes from V6:
a) Add uuid to vfsmount lookup and drop uuid to superblock lookup
b) Return -EOPNOTSUPP in sys_name_to_handle if the file system returned uuid
   doesn't give the same vfsmount on lookup. This ensure that we fail
   sys_name_to_handle when we have multiple file system returning same UUID.

Changes from V5:
a) added sys_name_to_handle_at syscall which takes AT_SYMLINK_NOFOLLOW flag 
   instead of two syscalls sys_name_to_handle and sys_lname_to_handle.
b) addressed review comments from Niel Brown
c) rebased to b91ce4d14a21fc04d165be30319541e0f9204f15
d) Add compat_sys_open_by_handle

Chages from V4:
a) Changed the syscal arguments so that we don't need compat syscalls
   as suggested by Christoph
c) Added two new syscall sys_lname_to_handle and sys_freadlink to work with
   symlinks
d) Changed open_by_handle to work with all file types
e) Add ext3 support

Changes from V3:
a) Code cleanup suggested by Andreas
b) x86_64 syscall support
c) add compat syscall

Chages from V2:
a) Support system wide unique handle.

Changes from v1:
a) handle size is now specified in bytes
b) returns -EOVERFLOW if the handle size is small
c) dropped open_handle syscall and added open_by_handle_at syscall
   open_by_handle_at takes mount_fd as the directory fd of the mount point
   containing the file
e) handle will only be unique in a given file system. So for an NFS server
   exporting multiple file system, NFS server will have to internally track the
   mount point to which a file handle belongs to. We should be able to do it much
   easily than expecting kernel to give a system wide unique file handle. System
   wide unique file handle would need much larger changes to the exportfs or VFS
   interface and I was not sure whether we really need to do that in the kernel or
   in the user space
f) open_handle_at now only check for DAC_OVERRIDE capability

Example program: (x86_32). (x86_64 would need a different syscall number)
----------------
#include <stdio.h>
#include <stdlib.h>

#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <string.h>

struct uuid {
	unsigned char uuid[16];
};
struct file_handle {
        int handle_size;
        int handle_type;
	struct uuid fsid;
        unsigned char handle[0];
};

#define AT_FDCWD		-100
#define AT_SYMLINK_NOFOLLOW	0x100

static int name_to_handle(const char *name, struct file_handle  *fh)
{
	return syscall(338, AT_FDCWD, name, fh, 0);
}

static int lname_to_handle(const char *name, struct file_handle  *fh)
{
	return syscall(338, AT_FDCWD, name, fh, AT_SYMLINK_NOFOLLOW);
}

static int open_by_handle(struct file_handle *fh,  int flags)
{
	return syscall(339, fh, flags);
}

static int freadlink(int fd, char *buf, size_t bufsiz)
{
	return syscall(340, fd, buf, bufsiz);
}

#define BUFSZ 100
int main(int argc, char *argv[])
{
        int ret;
	int handle_sz;
	struct stat bufstat;
        int fd, dirfd;
        char buf[BUFSZ];
        struct file_handle *fh = NULL;;
again:
	if (fh && fh->handle_size) {
		handle_sz = fh->handle_size;
		free(fh);
		fh = malloc(sizeof(struct file_handle) + handle_sz);
		fh->handle_size = handle_sz;
	} else {
		fh = malloc(sizeof(struct file_handle));
		fh->handle_size = 0;
	}
        errno  = 0;
        ret = lname_to_handle(argv[1], fh);
        if (ret && errno == EOVERFLOW) {
                perror("Error:");
		printf("Found the handle size needed to be %d\n", fh->handle_size);
		printf("Trying again..\n");
		goto again;
        } else if (ret) {
                perror("Error:");
		exit(1);
	}
        fd = open_by_handle(fh, O_RDONLY);
        if (fd <= 0 ) {
                perror("Error:");
                exit(1);
        }
	fstat(fd, &bufstat);
	ret = S_ISLNK(bufstat.st_mode);
	if (ret) {
	        memset(buf, 0 , BUFSZ);
		freadlink(fd, buf, BUFSZ);
		printf("%s is a symlink pointing to %s\n", argv[1], buf);
	}
        memset(buf, 0 , BUFSZ);
	while (1) {
		ret = read(fd, buf, BUFSZ -1);
		if (ret <= 0)
			break;
		buf[ret] = '\0';
                printf("%s", buf);
                memset(buf, 0 , BUFSZ);
        }
        return 0;
}

-aneesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH -V7 1/9] exportfs: Return the minimum required handle size
  2010-05-12 15:50 [PATCH -V7 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
@ 2010-05-12 15:50 ` Aneesh Kumar K.V
  2010-05-12 15:50 ` [PATCH -V7 2/9] vfs: Add uuid based vfsmount lookup Aneesh Kumar K.V
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 49+ messages in thread
From: Aneesh Kumar K.V @ 2010-05-12 15:50 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel,
	Aneesh Kumar K.V

The exportfs encode handle function should return the minimum required
handle size. This helps user to find out the handle size by passing 0
handle size in the first step and then redoing to the call again with
the returned handle size value.

Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/btrfs/export.c             |    8 ++++++--
 fs/exportfs/expfs.c           |    9 +++++++--
 fs/fat/inode.c                |    4 +++-
 fs/fuse/inode.c               |    4 +++-
 fs/gfs2/export.c              |    8 ++++++--
 fs/isofs/export.c             |    8 ++++++--
 fs/ocfs2/export.c             |    8 +++++++-
 fs/reiserfs/inode.c           |    7 ++++++-
 fs/udf/namei.c                |    7 ++++++-
 fs/xfs/linux-2.6/xfs_export.c |    4 +++-
 mm/shmem.c                    |    4 +++-
 11 files changed, 56 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/export.c b/fs/btrfs/export.c
index 951ef09..5f8ee5a 100644
--- a/fs/btrfs/export.c
+++ b/fs/btrfs/export.c
@@ -21,9 +21,13 @@ static int btrfs_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
 	int len = *max_len;
 	int type;
 
-	if ((len < BTRFS_FID_SIZE_NON_CONNECTABLE) ||
-	    (connectable && len < BTRFS_FID_SIZE_CONNECTABLE))
+	if (connectable && (len < BTRFS_FID_SIZE_CONNECTABLE)) {
+		*max_len = BTRFS_FID_SIZE_CONNECTABLE;
 		return 255;
+	} else if (len < BTRFS_FID_SIZE_NON_CONNECTABLE) {
+		*max_len = BTRFS_FID_SIZE_NON_CONNECTABLE;
+		return 255;
+	}
 
 	len  = BTRFS_FID_SIZE_NON_CONNECTABLE;
 	type = FILEID_BTRFS_WITHOUT_PARENT;
diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index e9e1759..cfee0f0 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -319,9 +319,14 @@ static int export_encode_fh(struct dentry *dentry, struct fid *fid,
 	struct inode * inode = dentry->d_inode;
 	int len = *max_len;
 	int type = FILEID_INO32_GEN;
-	
-	if (len < 2 || (connectable && len < 4))
+
+	if (connectable && (len < 4)) {
+		*max_len = 4;
+		return 255;
+	} else if (len < 2) {
+		*max_len = 2;
 		return 255;
+	}
 
 	len = 2;
 	fid->i32.ino = inode->i_ino;
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 0ce143b..6f83bc7 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -738,8 +738,10 @@ fat_encode_fh(struct dentry *de, __u32 *fh, int *lenp, int connectable)
 	struct inode *inode =  de->d_inode;
 	u32 ipos_h, ipos_m, ipos_l;
 
-	if (len < 5)
+	if (len < 5) {
+		*lenp = 5;
 		return 255; /* no room */
+	}
 
 	ipos_h = MSDOS_I(inode)->i_pos >> 8;
 	ipos_m = (MSDOS_I(inode)->i_pos & 0xf0) << 24;
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index ec14d19..beaea69 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -638,8 +638,10 @@ static int fuse_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
 	u64 nodeid;
 	u32 generation;
 
-	if (*max_len < len)
+	if (*max_len < len) {
+		*max_len = len;
 		return  255;
+	}
 
 	nodeid = get_fuse_inode(inode)->nodeid;
 	generation = inode->i_generation;
diff --git a/fs/gfs2/export.c b/fs/gfs2/export.c
index c22c211..d022236 100644
--- a/fs/gfs2/export.c
+++ b/fs/gfs2/export.c
@@ -36,9 +36,13 @@ static int gfs2_encode_fh(struct dentry *dentry, __u32 *p, int *len,
 	struct super_block *sb = inode->i_sb;
 	struct gfs2_inode *ip = GFS2_I(inode);
 
-	if (*len < GFS2_SMALL_FH_SIZE ||
-	    (connectable && *len < GFS2_LARGE_FH_SIZE))
+	if (connectable && (*len < GFS2_LARGE_FH_SIZE)) {
+		*len = GFS2_LARGE_FH_SIZE;
 		return 255;
+	} else if (*len < GFS2_SMALL_FH_SIZE) {
+		*len = GFS2_SMALL_FH_SIZE;
+		return 255;
+	}
 
 	fh[0] = cpu_to_be32(ip->i_no_formal_ino >> 32);
 	fh[1] = cpu_to_be32(ip->i_no_formal_ino & 0xFFFFFFFF);
diff --git a/fs/isofs/export.c b/fs/isofs/export.c
index ed752cb..dd4687f 100644
--- a/fs/isofs/export.c
+++ b/fs/isofs/export.c
@@ -124,9 +124,13 @@ isofs_export_encode_fh(struct dentry *dentry,
 	 * offset of the inode and the upper 16 bits of fh32[1] to
 	 * hold the offset of the parent.
 	 */
-
-	if (len < 3 || (connectable && len < 5))
+	if (connectable && (len < 5)) {
+		*max_len = 5;
+		return 255;
+	} else if (len < 3) {
+		*max_len = 3;
 		return 255;
+	}
 
 	len = 3;
 	fh32[0] = ei->i_iget5_block;
diff --git a/fs/ocfs2/export.c b/fs/ocfs2/export.c
index 19ad145..250a347 100644
--- a/fs/ocfs2/export.c
+++ b/fs/ocfs2/export.c
@@ -201,8 +201,14 @@ static int ocfs2_encode_fh(struct dentry *dentry, u32 *fh_in, int *max_len,
 		   dentry->d_name.len, dentry->d_name.name,
 		   fh, len, connectable);
 
-	if (len < 3 || (connectable && len < 6)) {
+	if (connectable && (len < 6)) {
 		mlog(ML_ERROR, "fh buffer is too small for encoding\n");
+		*max_len = 6;
+		type = 255;
+		goto bail;
+	} else if (len < 3) {
+		mlog(ML_ERROR, "fh buffer is too small for encoding\n");
+		*max_len = 3;
 		type = 255;
 		goto bail;
 	}
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index dc2c65e..5fff1e2 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -1588,8 +1588,13 @@ int reiserfs_encode_fh(struct dentry *dentry, __u32 * data, int *lenp,
 	struct inode *inode = dentry->d_inode;
 	int maxlen = *lenp;
 
-	if (maxlen < 3)
+	if (need_parent && (maxlen < 5)) {
+		*lenp = 5;
 		return 255;
+	} else if (maxlen < 3) {
+		*lenp = 3;
+		return 255;
+	}
 
 	data[0] = inode->i_ino;
 	data[1] = le32_to_cpu(INODE_PKEY(inode)->k_dir_id);
diff --git a/fs/udf/namei.c b/fs/udf/namei.c
index 7581602..37ce713 100644
--- a/fs/udf/namei.c
+++ b/fs/udf/namei.c
@@ -1360,8 +1360,13 @@ static int udf_encode_fh(struct dentry *de, __u32 *fh, int *lenp,
 	struct fid *fid = (struct fid *)fh;
 	int type = FILEID_UDF_WITHOUT_PARENT;
 
-	if (len < 3 || (connectable && len < 5))
+	if (connectable && (len < 5)) {
+		*lenp = 5;
+		return 255;
+	} else if (len < 3) {
+		*lenp = 3;
 		return 255;
+	}
 
 	*lenp = 3;
 	fid->udf.block = location.logicalBlockNum;
diff --git a/fs/xfs/linux-2.6/xfs_export.c b/fs/xfs/linux-2.6/xfs_export.c
index 846b75a..82c0553 100644
--- a/fs/xfs/linux-2.6/xfs_export.c
+++ b/fs/xfs/linux-2.6/xfs_export.c
@@ -81,8 +81,10 @@ xfs_fs_encode_fh(
 	 * seven combinations work.  The real answer is "don't use v2".
 	 */
 	len = xfs_fileid_length(fileid_type);
-	if (*max_len < len)
+	if (*max_len < len) {
+		*max_len = len
 		return 255;
+	}
 	*max_len = len;
 
 	switch (fileid_type) {
diff --git a/mm/shmem.c b/mm/shmem.c
index eef4ebe..bbeda1c 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2125,8 +2125,10 @@ static int shmem_encode_fh(struct dentry *dentry, __u32 *fh, int *len,
 {
 	struct inode *inode = dentry->d_inode;
 
-	if (*len < 3)
+	if (*len < 3) {
+		*len = 3;
 		return 255;
+	}
 
 	if (hlist_unhashed(&inode->i_hash)) {
 		/* Unfortunately insert_inode_hash is not idempotent,
-- 
1.7.1.78.g212f0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH -V7 2/9] vfs: Add uuid based vfsmount lookup
  2010-05-12 15:50 [PATCH -V7 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
  2010-05-12 15:50 ` [PATCH -V7 1/9] exportfs: Return the minimum required handle size Aneesh Kumar K.V
@ 2010-05-12 15:50 ` Aneesh Kumar K.V
  2010-05-12 15:50 ` [PATCH -V7 3/9] vfs: Add name to file handle conversion support Aneesh Kumar K.V
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 49+ messages in thread
From: Aneesh Kumar K.V @ 2010-05-12 15:50 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel,
	Aneesh Kumar K.V

This patch add a new superblock operations get_fsid that returns the
UUID mapping for the file system. The UUID returned is used to
identify the file system apart of file_handle

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/namespace.c                |   47 +++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h            |   15 +++++++++++++
 include/linux/mnt_namespace.h |    3 ++
 3 files changed, 65 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 8174c8a..c19e4fa 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2364,3 +2364,50 @@ void put_mnt_ns(struct mnt_namespace *ns)
 	kfree(ns);
 }
 EXPORT_SYMBOL(put_mnt_ns);
+
+/*
+ * Get any vfsmount mapping the uuid in the
+ * task namespace
+ */
+struct vfsmount *fs_get_vfsmount(struct task_struct *task,
+				struct uuid *fsid)
+{
+	int error;
+	struct nsproxy *nsp;
+	struct uuid this_fsid;
+	struct super_block *sb;
+	struct list_head *mount_list;
+	struct mnt_namespace *ns = NULL;
+	struct vfsmount *mnt, *sb_mnt = NULL;
+
+	rcu_read_lock();
+	nsp = task_nsproxy(task);
+	if (nsp) {
+		ns = nsp->mnt_ns;
+		if (ns)
+			get_mnt_ns(ns);
+	}
+	rcu_read_unlock();
+	if (!ns)
+		return NULL;
+	down_read(&namespace_sem);
+	list_for_each(mount_list, &ns->list) {
+		mnt = list_entry(mount_list, struct vfsmount, mnt_list);
+		sb  = mnt->mnt_sb;
+		if (!sb->s_op->get_fsid)
+			continue;
+		error = sb->s_op->get_fsid(sb, &this_fsid);
+		if (error)
+			continue;
+		if (!memcmp(fsid->uuid, this_fsid.uuid,
+				sizeof(this_fsid.uuid))) {
+			sb_mnt = mnt;
+			mntget(sb_mnt);
+			break;
+		}
+	}
+	up_read(&namespace_sem);
+
+	put_mnt_ns(ns);
+	return sb_mnt;
+}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 44f35ae..055734c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -948,6 +948,20 @@ struct file {
 	unsigned long f_mnt_write_state;
 #endif
 };
+
+struct uuid {
+	unsigned char uuid[16];
+};
+
+struct file_handle {
+	int handle_size;
+	int handle_type;
+	/* File system identifier */
+	struct uuid fsid;
+	/* file identifier */
+	unsigned char f_handle[0];
+};
+
 extern spinlock_t files_lock;
 #define file_list_lock() spin_lock(&files_lock);
 #define file_list_unlock() spin_unlock(&files_lock);
@@ -1580,6 +1594,7 @@ struct super_operations {
 	ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
 #endif
 	int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
+	int (*get_fsid)(struct super_block *, struct uuid *);
 };
 
 /*
diff --git a/include/linux/mnt_namespace.h b/include/linux/mnt_namespace.h
index 0b89efc..76c62b1 100644
--- a/include/linux/mnt_namespace.h
+++ b/include/linux/mnt_namespace.h
@@ -22,6 +22,7 @@ struct proc_mounts {
 };
 
 struct fs_struct;
+struct uuid;
 
 extern struct mnt_namespace *create_mnt_ns(struct vfsmount *mnt);
 extern struct mnt_namespace *copy_mnt_ns(unsigned long, struct mnt_namespace *,
@@ -36,6 +37,8 @@ extern const struct seq_operations mounts_op;
 extern const struct seq_operations mountinfo_op;
 extern const struct seq_operations mountstats_op;
 extern int mnt_had_events(struct proc_mounts *);
+extern struct vfsmount *fs_get_vfsmount(struct task_struct *task,
+					struct uuid *fsid);
 
 #endif
 #endif
-- 
1.7.1.78.g212f0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH -V7 3/9] vfs: Add name to file handle conversion support
  2010-05-12 15:50 [PATCH -V7 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
  2010-05-12 15:50 ` [PATCH -V7 1/9] exportfs: Return the minimum required handle size Aneesh Kumar K.V
  2010-05-12 15:50 ` [PATCH -V7 2/9] vfs: Add uuid based vfsmount lookup Aneesh Kumar K.V
@ 2010-05-12 15:50 ` Aneesh Kumar K.V
  2010-05-12 21:49   ` Andreas Dilger
  2010-05-12 15:50 ` [PATCH -V7 4/9] vfs: Add open by file handle support Aneesh Kumar K.V
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 49+ messages in thread
From: Aneesh Kumar K.V @ 2010-05-12 15:50 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel,
	Aneesh Kumar K.V

Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/open.c |  109 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 109 insertions(+), 0 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 74e5cd9..9a34b81 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -30,6 +30,8 @@
 #include <linux/falloc.h>
 #include <linux/fs_struct.h>
 #include <linux/ima.h>
+#include <linux/exportfs.h>
+#include <linux/mnt_namespace.h>
 
 #include "internal.h"
 
@@ -1206,3 +1208,110 @@ int nonseekable_open(struct inode *inode, struct file *filp)
 }
 
 EXPORT_SYMBOL(nonseekable_open);
+
+/* limit the handle size to some value */
+#define MAX_HANDLE_SZ 4096
+static long do_sys_name_to_handle(struct path *path,
+			struct file_handle __user *ufh)
+{
+	int retval;
+	int handle_size;
+	struct vfsmount *mnt;
+	struct super_block *sb;
+	struct uuid this_fs_id;
+	struct file_handle f_handle;
+	struct file_handle *handle = NULL;
+
+	sb = path->mnt->mnt_sb;
+	if (!sb->s_op->get_fsid) {
+		retval = -EOPNOTSUPP;
+		goto err_out;
+	}
+	if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle))) {
+		retval = -EFAULT;
+		goto err_out;
+	}
+	if (f_handle.handle_size > MAX_HANDLE_SZ) {
+		retval = -EINVAL;
+		goto err_out;
+	}
+	handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_size,
+			GFP_KERNEL);
+	if (!handle) {
+		retval = -ENOMEM;
+		goto err_out;
+	}
+	handle_size = f_handle.handle_size;
+
+	/* we ask for a non connected handle */
+	retval = exportfs_encode_fh(path->dentry,
+				(struct fid *)handle->f_handle,
+				&handle_size,  0);
+	/* convert handle size to bytes */
+	handle_size *= sizeof(u32);
+	handle->handle_type = retval;
+	handle->handle_size = handle_size;
+	if (handle_size <= f_handle.handle_size) {
+		/* get the uuid */
+		retval = sb->s_op->get_fsid(sb, &this_fs_id);
+		if (!retval) {
+			/*
+			 * Now verify whether we get the same vfsmount
+			 * if we lookup with uuid. In case we end up having
+			 * same uuid for the multiple file systems. When doing
+			 * uuid based lookup we would return the first one.So
+			 * with name_to_handle if we don't find the same
+			 * vfsmount with lookup return EOPNOTSUPP
+			 */
+			mnt = fs_get_vfsmount(current, &this_fs_id);
+			if (mnt != path->mnt) {
+				retval = -EOPNOTSUPP;
+				mntput(mnt);
+				goto err_free_out;
+			}
+			mntput(mnt);
+			memcpy(handle->fsid.uuid,
+				this_fs_id.uuid,
+				sizeof(handle->fsid.uuid));
+		} else {
+			retval = -EOPNOTSUPP;
+			goto err_free_out;
+		}
+	} else {
+		/*
+		 * set the handle_size to zero so we copy only
+		 * non variable part of the file_handle
+		 */
+		handle_size = 0;
+		retval = -EOVERFLOW;
+	}
+	if (copy_to_user(ufh, handle,
+				sizeof(struct file_handle) + handle_size))
+		retval = -EFAULT;
+err_free_out:
+	kfree(handle);
+err_out:
+	return retval;
+}
+
+SYSCALL_DEFINE4(name_to_handle_at, int, dfd, const char __user *, name,
+		struct file_handle __user *, handle, int, flag)
+{
+	int follow;
+	long ret = -EINVAL;
+	struct path path;
+
+	if ((flag & ~AT_SYMLINK_NOFOLLOW) != 0)
+		goto err_out;
+
+	follow = (flag & AT_SYMLINK_NOFOLLOW) ? 0 : LOOKUP_FOLLOW;
+	ret = user_path_at(dfd, name, follow, &path);
+	if (ret)
+		goto err_out;
+	ret = do_sys_name_to_handle(&path, handle);
+	path_put(&path);
+err_out:
+	/* avoid REGPARM breakage on x86: */
+	asmlinkage_protect(4, ret, dfd, name, handle, flag);
+	return ret;
+}
-- 
1.7.1.78.g212f0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH -V7 4/9] vfs: Add open by file handle support
  2010-05-12 15:50 [PATCH -V7 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (2 preceding siblings ...)
  2010-05-12 15:50 ` [PATCH -V7 3/9] vfs: Add name to file handle conversion support Aneesh Kumar K.V
@ 2010-05-12 15:50 ` Aneesh Kumar K.V
  2010-05-12 23:44   ` Neil Brown
  2010-05-12 15:50 ` [PATCH -V7 5/9] vfs: Add freadlink syscall Aneesh Kumar K.V
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 49+ messages in thread
From: Aneesh Kumar K.V @ 2010-05-12 15:50 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel,
	Aneesh Kumar K.V

Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/namei.c            |   24 ---------
 fs/open.c             |  136 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/namei.h |   24 +++++++++
 3 files changed, 160 insertions(+), 24 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index a7dce91..a18711e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1521,30 +1521,6 @@ out_unlock:
 	return may_open(&nd->path, 0, open_flag & ~O_TRUNC);
 }
 
-/*
- * Note that while the flag value (low two bits) for sys_open means:
- *	00 - read-only
- *	01 - write-only
- *	10 - read-write
- *	11 - special
- * it is changed into
- *	00 - no permissions needed
- *	01 - read-permission
- *	10 - write-permission
- *	11 - read-write
- * for the internal routines (ie open_namei()/follow_link() etc)
- * This is more logical, and also allows the 00 "no perm needed"
- * to be used for symlinks (where the permissions are checked
- * later).
- *
-*/
-static inline int open_to_namei_flags(int flag)
-{
-	if ((flag+1) & O_ACCMODE)
-		flag++;
-	return flag;
-}
-
 static int open_will_truncate(int flag, struct inode *inode)
 {
 	/*
diff --git a/fs/open.c b/fs/open.c
index 9a34b81..348a1b9 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1315,3 +1315,139 @@ err_out:
 	asmlinkage_protect(4, ret, dfd, name, handle, flag);
 	return ret;
 }
+
+static int vfs_dentry_acceptable(void *context, struct dentry *dentry)
+{
+	return 1;
+}
+
+static struct dentry *handle_to_dentry(struct vfsmount *mnt,
+				struct file_handle *handle)
+{
+	int handle_size;
+	struct dentry *dentry;
+
+	/* change the handle size to multiple of sizeof(u32) */
+	handle_size = handle->handle_size >> 2;
+	dentry = exportfs_decode_fh(mnt, (struct fid *)handle->f_handle,
+					handle_size, handle->handle_type,
+					vfs_dentry_acceptable, NULL);
+	return dentry;
+}
+
+static long do_sys_open_by_handle(struct file_handle __user *ufh, int flags)
+{
+	int fd;
+	int retval = 0;
+	int d_flags  = flags;
+	struct file *filp;
+	struct vfsmount *mnt;
+	struct inode *inode;
+	struct dentry *dentry;
+	struct file_handle f_handle;
+	struct file_handle *handle = NULL;
+
+	if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle))) {
+		retval = -EFAULT;
+		goto out_err;
+	}
+	if ((f_handle.handle_size > MAX_HANDLE_SZ) ||
+		(f_handle.handle_size <= 0)) {
+		retval =  -EINVAL;
+		goto out_err;
+	}
+	if (!capable(CAP_DAC_OVERRIDE)) {
+		retval = -EPERM;
+		goto out_err;
+	}
+	/*
+	 * Find the vfsmount for this uuid in the
+	 * current namespace
+	 */
+	mnt = fs_get_vfsmount(current, &f_handle.fsid);
+	if (!mnt) {
+		retval = -ESTALE;
+		goto out_err;
+	}
+
+	handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_size,
+			GFP_KERNEL);
+	if (!handle) {
+		retval =  -ENOMEM;
+		goto out_mnt;
+	}
+	/* copy the full handle */
+	if (copy_from_user(handle, ufh,
+				sizeof(struct file_handle) +
+				f_handle.handle_size)) {
+		retval = -EFAULT;
+		goto out_mnt;
+	}
+	dentry = handle_to_dentry(mnt, handle);
+	if (IS_ERR(dentry)) {
+		retval = PTR_ERR(dentry);
+		goto out_mnt;
+	}
+	inode = dentry->d_inode;
+	flags  = open_to_namei_flags(flags);
+	/* O_TRUNC implies we need access checks for write permissions */
+	if (flags & O_TRUNC)
+		flags |= MAY_WRITE;
+
+	if ((!(flags & O_APPEND) || (flags & O_TRUNC)) &&
+		(flags & FMODE_WRITE) && IS_APPEND(inode)) {
+		retval = -EPERM;
+		goto out_dentry;
+	}
+	if ((flags & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
+		retval = -EACCES;
+		goto out_dentry;
+	}
+	/* Can't write directories. */
+	if (S_ISDIR(inode->i_mode) && (flags & FMODE_WRITE)) {
+		retval = -EISDIR;
+		goto out_dentry;
+	}
+	fd = get_unused_fd_flags(d_flags);
+	if (fd < 0) {
+		retval = fd;
+		goto out_dentry;
+	}
+	filp = dentry_open(dget(dentry), mntget(mnt),
+			d_flags, current_cred());
+	if (IS_ERR(filp)) {
+		put_unused_fd(fd);
+		retval =  PTR_ERR(filp);
+		goto out_dentry;
+	}
+	if (inode->i_mode & S_IFREG) {
+		filp->f_flags |= O_NOATIME;
+		filp->f_mode |= FMODE_NOCMTIME;
+	}
+	fsnotify_open(filp->f_path.dentry);
+	fd_install(fd, filp);
+	retval = fd;
+
+out_dentry:
+	dput(dentry);
+out_mnt:
+	kfree(handle);
+	mntput(mnt);
+out_err:
+	return retval;
+}
+
+SYSCALL_DEFINE2(open_by_handle, struct file_handle __user *, handle,
+		int, flags)
+{
+	long ret;
+
+	if (force_o_largefile())
+		flags |= O_LARGEFILE;
+
+	ret = do_sys_open_by_handle(handle, flags);
+
+	/* avoid REGPARM breakage on x86: */
+	asmlinkage_protect(2, ret, handle, flags);
+	return ret;
+}
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 05b441d..a853aa0 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -4,6 +4,7 @@
 #include <linux/dcache.h>
 #include <linux/linkage.h>
 #include <linux/path.h>
+#include <asm-generic/fcntl.h>
 
 struct vfsmount;
 
@@ -96,4 +97,27 @@ static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
 	((char *) name)[min(len, maxlen)] = '\0';
 }
 
+/*
+ * Note that while the flag value (low two bits) for sys_open means:
+ *	00 - read-only
+ *	01 - write-only
+ *	10 - read-write
+ *	11 - special
+ * it is changed into
+ *	00 - no permissions needed
+ *	01 - read-permission
+ *	10 - write-permission
+ *	11 - read-write
+ * for the internal routines (ie open_namei()/follow_link() etc)
+ * This is more logical, and also allows the 00 "no perm needed"
+ * to be used for symlinks (where the permissions are checked
+ * later).
+ *
+*/
+static inline int open_to_namei_flags(int flag)
+{
+	if ((flag+1) & O_ACCMODE)
+		flag++;
+	return flag;
+}
 #endif /* _LINUX_NAMEI_H */
-- 
1.7.1.78.g212f0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH -V7 5/9] vfs: Add freadlink syscall
  2010-05-12 15:50 [PATCH -V7 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (3 preceding siblings ...)
  2010-05-12 15:50 ` [PATCH -V7 4/9] vfs: Add open by file handle support Aneesh Kumar K.V
@ 2010-05-12 15:50 ` Aneesh Kumar K.V
  2010-05-13  1:43   ` Neil Brown
  2010-05-12 15:50 ` [PATCH -V7 6/9] ext4: Add get_fsid callback Aneesh Kumar K.V
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 49+ messages in thread
From: Aneesh Kumar K.V @ 2010-05-12 15:50 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel,
	Aneesh Kumar K.V

This enables to use open-by-handle and then get the link target
details of a symlink using the fd returned by handle

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/stat.c |   27 +++++++++++++++++++++++++++
 1 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/fs/stat.c b/fs/stat.c
index c4ecd52..0fbab00 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -314,6 +314,33 @@ SYSCALL_DEFINE3(readlink, const char __user *, path, char __user *, buf,
 	return sys_readlinkat(AT_FDCWD, path, buf, bufsiz);
 }
 
+SYSCALL_DEFINE3(freadlink, int, fd, char __user *, buf,
+		int, bufsiz)
+{
+	int fput_needed;
+	struct file *file;
+	int error = -EBADF;
+	struct inode *inode;
+
+	file = fget_light(fd, &fput_needed);
+	if (!file)
+		return error;
+
+	inode = file->f_path.dentry->d_inode;
+	error = -EINVAL;
+	if (inode->i_op->readlink) {
+		error = security_inode_readlink(file->f_path.dentry);
+		if (!error) {
+			touch_atime(file->f_path.mnt, file->f_path.dentry);
+			error = inode->i_op->readlink(file->f_path.dentry,
+						buf, bufsiz);
+		}
+	}
+	fput_light(file, fput_needed);
+	return error;
+}
+
+
 
 /* ---------- LFS-64 ----------- */
 #ifdef __ARCH_WANT_STAT64
-- 
1.7.1.78.g212f0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH -V7 6/9] ext4: Add get_fsid callback
  2010-05-12 15:50 [PATCH -V7 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (4 preceding siblings ...)
  2010-05-12 15:50 ` [PATCH -V7 5/9] vfs: Add freadlink syscall Aneesh Kumar K.V
@ 2010-05-12 15:50 ` Aneesh Kumar K.V
  2010-05-13  3:11   ` Dave Chinner
  2010-05-14 17:32   ` Coly Li
  2010-05-12 15:50 ` [PATCH -V7 7/9] x86: Add new syscalls for x86_32 Aneesh Kumar K.V
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 49+ messages in thread
From: Aneesh Kumar K.V @ 2010-05-12 15:50 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel,
	Aneesh Kumar K.V

Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/ext4/super.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index e14d22c..fc7d464 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1049,6 +1049,19 @@ static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
 	return try_to_free_buffers(page);
 }
 
+static int ext4_get_fsid(struct super_block *sb, struct uuid *fsid)
+{
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+	struct ext4_super_block *es = sbi->s_es;
+
+	memcpy(fsid->uuid, es->s_uuid, sizeof(fsid->uuid));
+	/*
+	 * We may want to make sure we return error if the s_uuid is not
+	 * exactly unique
+	 */
+	return 0;
+}
+
 #ifdef CONFIG_QUOTA
 #define QTYPE2NAME(t) ((t) == USRQUOTA ? "user" : "group")
 #define QTYPE2MOPT(on, t) ((t) == USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA))
@@ -1109,6 +1122,7 @@ static const struct super_operations ext4_sops = {
 	.quota_write	= ext4_quota_write,
 #endif
 	.bdev_try_to_free_page = bdev_try_to_free_page,
+	.get_fsid	= ext4_get_fsid,
 };
 
 static const struct super_operations ext4_nojournal_sops = {
@@ -1128,6 +1142,7 @@ static const struct super_operations ext4_nojournal_sops = {
 	.quota_write	= ext4_quota_write,
 #endif
 	.bdev_try_to_free_page = bdev_try_to_free_page,
+	.get_fsid	= ext4_get_fsid,
 };
 
 static const struct export_operations ext4_export_ops = {
-- 
1.7.1.78.g212f0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH -V7 7/9] x86: Add new syscalls for x86_32
  2010-05-12 15:50 [PATCH -V7 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (5 preceding siblings ...)
  2010-05-12 15:50 ` [PATCH -V7 6/9] ext4: Add get_fsid callback Aneesh Kumar K.V
@ 2010-05-12 15:50 ` Aneesh Kumar K.V
  2010-05-12 15:50 ` [PATCH -V7 8/9] x86: Add new syscalls for x86_64 Aneesh Kumar K.V
  2010-05-12 15:50 ` [PATCH -V7 9/9] ext3: Add get_fsid callback Aneesh Kumar K.V
  8 siblings, 0 replies; 49+ messages in thread
From: Aneesh Kumar K.V @ 2010-05-12 15:50 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel,
	Aneesh Kumar K.V

This patch adds sys_name_to_handle_at, sys_open_by_handle
and sys_freadlink syscalls to x86_32

Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/x86/include/asm/unistd_32.h   |    5 ++++-
 arch/x86/kernel/syscall_table_32.S |    3 +++
 2 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/unistd_32.h b/arch/x86/include/asm/unistd_32.h
index beb9b5f..a78505c 100644
--- a/arch/x86/include/asm/unistd_32.h
+++ b/arch/x86/include/asm/unistd_32.h
@@ -343,10 +343,13 @@
 #define __NR_rt_tgsigqueueinfo	335
 #define __NR_perf_event_open	336
 #define __NR_recvmmsg		337
+#define __NR_name_to_handle_at	338
+#define __NR_open_by_handle     339
+#define __NR_freadlink          340
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 338
+#define NR_syscalls 341
 
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR
diff --git a/arch/x86/kernel/syscall_table_32.S b/arch/x86/kernel/syscall_table_32.S
index 8b37293..64b8d5d 100644
--- a/arch/x86/kernel/syscall_table_32.S
+++ b/arch/x86/kernel/syscall_table_32.S
@@ -337,3 +337,6 @@ ENTRY(sys_call_table)
 	.long sys_rt_tgsigqueueinfo	/* 335 */
 	.long sys_perf_event_open
 	.long sys_recvmmsg
+	.long sys_name_to_handle_at
+	.long sys_open_by_handle
+	.long sys_freadlink		/* 340 */
-- 
1.7.1.78.g212f0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH -V7 8/9] x86: Add new syscalls for x86_64
  2010-05-12 15:50 [PATCH -V7 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (6 preceding siblings ...)
  2010-05-12 15:50 ` [PATCH -V7 7/9] x86: Add new syscalls for x86_32 Aneesh Kumar K.V
@ 2010-05-12 15:50 ` Aneesh Kumar K.V
  2010-05-12 15:50 ` [PATCH -V7 9/9] ext3: Add get_fsid callback Aneesh Kumar K.V
  8 siblings, 0 replies; 49+ messages in thread
From: Aneesh Kumar K.V @ 2010-05-12 15:50 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel,
	Aneesh Kumar K.V

Add sys_name_to_handle_at, sys_open_by_handle syscall
and sys_freadlink for x86_64

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/x86/ia32/ia32entry.S        |    3 +++
 arch/x86/include/asm/unistd_64.h |    6 ++++++
 fs/compat.c                      |   10 ++++++++++
 fs/open.c                        |    2 +-
 include/linux/fs.h               |    1 +
 5 files changed, 21 insertions(+), 1 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index e790bc1..2f4c979 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -842,4 +842,7 @@ ia32_sys_call_table:
 	.quad compat_sys_rt_tgsigqueueinfo	/* 335 */
 	.quad sys_perf_event_open
 	.quad compat_sys_recvmmsg
+	.quad sys_name_to_handle_at
+	.quad compat_sys_open_by_handle
+	.quad sys_freadlink			/* 340 */
 ia32_syscall_end:
diff --git a/arch/x86/include/asm/unistd_64.h b/arch/x86/include/asm/unistd_64.h
index ff4307b..c32428d 100644
--- a/arch/x86/include/asm/unistd_64.h
+++ b/arch/x86/include/asm/unistd_64.h
@@ -663,6 +663,12 @@ __SYSCALL(__NR_rt_tgsigqueueinfo, sys_rt_tgsigqueueinfo)
 __SYSCALL(__NR_perf_event_open, sys_perf_event_open)
 #define __NR_recvmmsg				299
 __SYSCALL(__NR_recvmmsg, sys_recvmmsg)
+#define __NR_name_to_handle_at			300
+__SYSCALL(__NR_name_to_handle, sys_name_to_handle)
+#define __NR_open_by_handle			301
+__SYSCALL(__NR_open_by_handle, sys_open_by_handle)
+#define __NR_freadlink				302
+__SYSCALL(__NR_freadlink, sys_freadlink)
 
 #ifndef __NO_STUBS
 #define __ARCH_WANT_OLD_READDIR
diff --git a/fs/compat.c b/fs/compat.c
index 4b6ed03..5c71c98 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -2310,3 +2310,13 @@ asmlinkage long compat_sys_timerfd_gettime(int ufd,
 }
 
 #endif /* CONFIG_TIMERFD */
+
+/*
+ * Exactly like fs/open.c:sys_open_by_handle(), except that it doesn't set the
+ * O_LARGEFILE flag.
+ */
+asmlinkage long
+compat_sys_open_by_handle(struct file_handle __user *handle, int flags)
+{
+	return do_sys_open_by_handle(handle, flags);
+}
diff --git a/fs/open.c b/fs/open.c
index 348a1b9..b8924e1 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1335,7 +1335,7 @@ static struct dentry *handle_to_dentry(struct vfsmount *mnt,
 	return dentry;
 }
 
-static long do_sys_open_by_handle(struct file_handle __user *ufh, int flags)
+long do_sys_open_by_handle(struct file_handle __user *ufh, int flags)
 {
 	int fd;
 	int retval = 0;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 055734c..6dcf696 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1933,6 +1933,7 @@ extern struct file * dentry_open(struct dentry *, struct vfsmount *, int,
 				 const struct cred *);
 extern int filp_close(struct file *, fl_owner_t id);
 extern char * getname(const char __user *);
+extern long do_sys_open_by_handle(struct file_handle __user *, int);
 
 /* fs/ioctl.c */
 
-- 
1.7.1.78.g212f0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH -V7 9/9] ext3: Add get_fsid callback
  2010-05-12 15:50 [PATCH -V7 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (7 preceding siblings ...)
  2010-05-12 15:50 ` [PATCH -V7 8/9] x86: Add new syscalls for x86_64 Aneesh Kumar K.V
@ 2010-05-12 15:50 ` Aneesh Kumar K.V
  8 siblings, 0 replies; 49+ messages in thread
From: Aneesh Kumar K.V @ 2010-05-12 15:50 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel,
	Aneesh Kumar K.V

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/ext3/super.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index 1bee604..63c322e 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -734,6 +734,15 @@ static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
 	return try_to_free_buffers(page);
 }
 
+static int ext3_get_fsid(struct super_block *sb, struct uuid *fsid)
+{
+	struct ext3_sb_info *sbi = EXT3_SB(sb);
+	struct ext3_super_block *es = sbi->s_es;
+
+	memcpy(fsid->uuid, es->s_uuid, sizeof(fsid->uuid));
+	return 0;
+}
+
 #ifdef CONFIG_QUOTA
 #define QTYPE2NAME(t) ((t)==USRQUOTA?"user":"group")
 #define QTYPE2MOPT(on, t) ((t)==USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA))
@@ -791,6 +800,7 @@ static const struct super_operations ext3_sops = {
 	.quota_write	= ext3_quota_write,
 #endif
 	.bdev_try_to_free_page = bdev_try_to_free_page,
+	.get_fsid	= ext3_get_fsid,
 };
 
 static const struct export_operations ext3_export_ops = {
-- 
1.7.1.78.g212f0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 3/9] vfs: Add name to file handle conversion support
  2010-05-12 15:50 ` [PATCH -V7 3/9] vfs: Add name to file handle conversion support Aneesh Kumar K.V
@ 2010-05-12 21:49   ` Andreas Dilger
  2010-05-12 22:43     ` Neil Brown
                       ` (3 more replies)
  0 siblings, 4 replies; 49+ messages in thread
From: Andreas Dilger @ 2010-05-12 21:49 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: hch, viro, adilger, corbet, serue, neilb, linux-fsdevel, sfrench,
	philippe.deniel, linux-kernel

On 2010-05-12, at 09:50, Aneesh Kumar K.V wrote:
> +static long do_sys_name_to_handle(struct path *path,
> +			struct file_handle __user *ufh)
> +{
> +	if (handle_size <= f_handle.handle_size) {
> +		/* get the uuid */
> +		retval = sb->s_op->get_fsid(sb, &this_fs_id);
> +		if (!retval) {
> +			/*
> +			 * Now verify whether we get the same vfsmount
> +			 * if we lookup with uuid. In case we end up having
> +			 * same uuid for the multiple file systems. When doing
> +			 * uuid based lookup we would return the first one.So
> +			 * with name_to_handle if we don't find the same
> +			 * vfsmount with lookup return EOPNOTSUPP
> +			 */
> +			mnt = fs_get_vfsmount(current, &this_fs_id);
> +			if (mnt != path->mnt) {
> +				retval = -EOPNOTSUPP;
> +				mntput(mnt);
> +				goto err_free_out;
> +			}

I don't see that this does anything for us except add overhead.  This is no protection against mounting a second filesystem with the same UUID after the handle is returned, since there is no expiration for file handles.

At best I think we could start by changing the list-based UUID lookup with a hash-based one, and when adding a duplicate UUID at mount time start by printing out an error message to the console in case of duplicated UUIDs, and maybe at some point in the future this might cause the mount to fail (though I don't think we can make that decision lightly or quickly).

That moves the overhead to mount time instead of for each name_to_handle() call (which would be brutal for a system with many filesystems mounted).

> +SYSCALL_DEFINE4(name_to_handle_at, int, dfd, const char __user *, name,
> +		struct file_handle __user *, handle, int, flag)
> +{
> +	int follow;
> +	long ret = -EINVAL;
> +	struct path path;
> +
> +	if ((flag & ~AT_SYMLINK_NOFOLLOW) != 0)
> +		goto err_out;

Won't this break the use of AT_FDCWD to get a relative file handle?

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 3/9] vfs: Add name to file handle conversion support
  2010-05-12 21:49   ` Andreas Dilger
@ 2010-05-12 22:43     ` Neil Brown
  2010-05-13  6:17       ` Aneesh Kumar K. V
  2010-05-13  0:20     ` Dave Chinner
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 49+ messages in thread
From: Neil Brown @ 2010-05-12 22:43 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Aneesh Kumar K.V, hch, viro, adilger, corbet, serue,
	linux-fsdevel, sfrench, philippe.deniel, linux-kernel

On Wed, 12 May 2010 15:49:49 -0600
Andreas Dilger <andreas.dilger@oracle.com> wrote:

> On 2010-05-12, at 09:50, Aneesh Kumar K.V wrote:
> > +static long do_sys_name_to_handle(struct path *path,
> > +			struct file_handle __user *ufh)
> > +{
> > +	if (handle_size <= f_handle.handle_size) {
> > +		/* get the uuid */
> > +		retval = sb->s_op->get_fsid(sb, &this_fs_id);
> > +		if (!retval) {
> > +			/*
> > +			 * Now verify whether we get the same vfsmount
> > +			 * if we lookup with uuid. In case we end up having
> > +			 * same uuid for the multiple file systems. When doing
> > +			 * uuid based lookup we would return the first one.So
> > +			 * with name_to_handle if we don't find the same
> > +			 * vfsmount with lookup return EOPNOTSUPP
> > +			 */
> > +			mnt = fs_get_vfsmount(current, &this_fs_id);
> > +			if (mnt != path->mnt) {
> > +				retval = -EOPNOTSUPP;
> > +				mntput(mnt);
> > +				goto err_free_out;
> > +			}
> 
> I don't see that this does anything for us except add overhead.  This is no protection against mounting a second filesystem with the same UUID after the handle is returned, since there is no expiration for file handles.

I very much agree.
Further I have not yet seen a convincing argument that the UUID should be
part of the kernel interface for filehandle lookup.  That fact that including
the UUID suggests to you (Aneesh) that this sort of thing might be a good
idea seems to me to be a strong argument against including the UUID in the
lookup.

> 
> At best I think we could start by changing the list-based UUID lookup with a hash-based one, and when adding a duplicate UUID at mount time start by printing out an error message to the console in case of duplicated UUIDs, and maybe at some point in the future this might cause the mount to fail (though I don't think we can make that decision lightly or quickly).
> 
> That moves the overhead to mount time instead of for each name_to_handle() call (which would be brutal for a system with many filesystems mounted).
> 
> > +SYSCALL_DEFINE4(name_to_handle_at, int, dfd, const char __user *, name,
> > +		struct file_handle __user *, handle, int, flag)
> > +{
> > +	int follow;
> > +	long ret = -EINVAL;
> > +	struct path path;
> > +
> > +	if ((flag & ~AT_SYMLINK_NOFOLLOW) != 0)
> > +		goto err_out;
> 
> Won't this break the use of AT_FDCWD to get a relative file handle?

No, AT_FDCWD is a value passed in 'dfd', not a flag.

I wonder though if the default should be to not follow symlinks, and that
AT_SYMLINK_FOLLOW should be required if you really want to follow symlinks.
I suspect that in most cases you wouldn't want to follow symlinks.
It isn't a really important point though.

NeilBrown



> 
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Technical Lead
> Oracle Corporation Canada Inc.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 4/9] vfs: Add open by file handle support
  2010-05-12 15:50 ` [PATCH -V7 4/9] vfs: Add open by file handle support Aneesh Kumar K.V
@ 2010-05-12 23:44   ` Neil Brown
  2010-05-13  6:09     ` Dave Chinner
  0 siblings, 1 reply; 49+ messages in thread
From: Neil Brown @ 2010-05-12 23:44 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: hch, viro, adilger, corbet, serue, linux-fsdevel, sfrench,
	philippe.deniel, linux-kernel

On Wed, 12 May 2010 21:20:39 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> Acked-by: Serge Hallyn <serue@us.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  fs/namei.c            |   24 ---------
>  fs/open.c             |  136 +++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/namei.h |   24 +++++++++
>  3 files changed, 160 insertions(+), 24 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index a7dce91..a18711e 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1521,30 +1521,6 @@ out_unlock:
>  	return may_open(&nd->path, 0, open_flag & ~O_TRUNC);
>  }
>  
> -/*
> - * Note that while the flag value (low two bits) for sys_open means:
> - *	00 - read-only
> - *	01 - write-only
> - *	10 - read-write
> - *	11 - special
> - * it is changed into
> - *	00 - no permissions needed
> - *	01 - read-permission
> - *	10 - write-permission
> - *	11 - read-write
> - * for the internal routines (ie open_namei()/follow_link() etc)
> - * This is more logical, and also allows the 00 "no perm needed"
> - * to be used for symlinks (where the permissions are checked
> - * later).
> - *
> -*/
> -static inline int open_to_namei_flags(int flag)
> -{
> -	if ((flag+1) & O_ACCMODE)
> -		flag++;
> -	return flag;
> -}
> -
>  static int open_will_truncate(int flag, struct inode *inode)
>  {
>  	/*
> diff --git a/fs/open.c b/fs/open.c
> index 9a34b81..348a1b9 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -1315,3 +1315,139 @@ err_out:
>  	asmlinkage_protect(4, ret, dfd, name, handle, flag);
>  	return ret;
>  }
> +
> +static int vfs_dentry_acceptable(void *context, struct dentry *dentry)
> +{
> +	return 1;
> +}
> +
> +static struct dentry *handle_to_dentry(struct vfsmount *mnt,
> +				struct file_handle *handle)
> +{
> +	int handle_size;
> +	struct dentry *dentry;
> +
> +	/* change the handle size to multiple of sizeof(u32) */
> +	handle_size = handle->handle_size >> 2;
> +	dentry = exportfs_decode_fh(mnt, (struct fid *)handle->f_handle,
> +					handle_size, handle->handle_type,
> +					vfs_dentry_acceptable, NULL);
> +	return dentry;
> +}
> +
> +static long do_sys_open_by_handle(struct file_handle __user *ufh, int flags)
> +{
> +	int fd;
> +	int retval = 0;
> +	int d_flags  = flags;
> +	struct file *filp;
> +	struct vfsmount *mnt;
> +	struct inode *inode;
> +	struct dentry *dentry;
> +	struct file_handle f_handle;
> +	struct file_handle *handle = NULL;
> +
> +	if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle))) {
> +		retval = -EFAULT;
> +		goto out_err;
> +	}
> +	if ((f_handle.handle_size > MAX_HANDLE_SZ) ||
> +		(f_handle.handle_size <= 0)) {
> +		retval =  -EINVAL;
> +		goto out_err;
> +	}
> +	if (!capable(CAP_DAC_OVERRIDE)) {
> +		retval = -EPERM;
> +		goto out_err;
> +	}
> +	/*
> +	 * Find the vfsmount for this uuid in the
> +	 * current namespace
> +	 */
> +	mnt = fs_get_vfsmount(current, &f_handle.fsid);
> +	if (!mnt) {
> +		retval = -ESTALE;
> +		goto out_err;
> +	}
> +
> +	handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_size,
> +			GFP_KERNEL);
> +	if (!handle) {
> +		retval =  -ENOMEM;
> +		goto out_mnt;
> +	}
> +	/* copy the full handle */
> +	if (copy_from_user(handle, ufh,
> +				sizeof(struct file_handle) +
> +				f_handle.handle_size)) {
> +		retval = -EFAULT;
> +		goto out_mnt;
> +	}
> +	dentry = handle_to_dentry(mnt, handle);
> +	if (IS_ERR(dentry)) {
> +		retval = PTR_ERR(dentry);
> +		goto out_mnt;
> +	}
> +	inode = dentry->d_inode;
> +	flags  = open_to_namei_flags(flags);
> +	/* O_TRUNC implies we need access checks for write permissions */
> +	if (flags & O_TRUNC)
> +		flags |= MAY_WRITE;
> +
> +	if ((!(flags & O_APPEND) || (flags & O_TRUNC)) &&
> +		(flags & FMODE_WRITE) && IS_APPEND(inode)) {
> +		retval = -EPERM;
> +		goto out_dentry;
> +	}
> +	if ((flags & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
> +		retval = -EACCES;
> +		goto out_dentry;
> +	}
> +	/* Can't write directories. */
> +	if (S_ISDIR(inode->i_mode) && (flags & FMODE_WRITE)) {
> +		retval = -EISDIR;
> +		goto out_dentry;
> +	}

Including all these checks inline here seems error prone.  Can you not just
use finish_open ??  It might do more than you need, but it would be more
obvious that you didn't forget anything..


> +	fd = get_unused_fd_flags(d_flags);
> +	if (fd < 0) {
> +		retval = fd;
> +		goto out_dentry;
> +	}
> +	filp = dentry_open(dget(dentry), mntget(mnt),
> +			d_flags, current_cred());
> +	if (IS_ERR(filp)) {
> +		put_unused_fd(fd);
> +		retval =  PTR_ERR(filp);
> +		goto out_dentry;
> +	}
> +	if (inode->i_mode & S_IFREG) {

I suspect this is not the test you want.  It tests for IFREG or IFLNK or
IFSOCK.

> +		filp->f_flags |= O_NOATIME;
> +		filp->f_mode |= FMODE_NOCMTIME;
> +	}

I think you need a comment here explaining the rational for these setting.
Why is O_NOATIME important IFREG but not for IFDIR?
Why is it not sufficient to honour O_NOATIME that is passed in.
How can you ever justify setting FMODE_NOCMTIME ?
I guess you are just copying from xfs code, but it still needs justification.


NeilBrown



> +	fsnotify_open(filp->f_path.dentry);
> +	fd_install(fd, filp);
> +	retval = fd;
> +
> +out_dentry:
> +	dput(dentry);
> +out_mnt:
> +	kfree(handle);
> +	mntput(mnt);
> +out_err:
> +	return retval;
> +}
> +
> +SYSCALL_DEFINE2(open_by_handle, struct file_handle __user *, handle,
> +		int, flags)
> +{
> +	long ret;
> +
> +	if (force_o_largefile())
> +		flags |= O_LARGEFILE;
> +
> +	ret = do_sys_open_by_handle(handle, flags);
> +
> +	/* avoid REGPARM breakage on x86: */
> +	asmlinkage_protect(2, ret, handle, flags);
> +	return ret;
> +}
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 05b441d..a853aa0 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -4,6 +4,7 @@
>  #include <linux/dcache.h>
>  #include <linux/linkage.h>
>  #include <linux/path.h>
> +#include <asm-generic/fcntl.h>
>  
>  struct vfsmount;
>  
> @@ -96,4 +97,27 @@ static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
>  	((char *) name)[min(len, maxlen)] = '\0';
>  }
>  
> +/*
> + * Note that while the flag value (low two bits) for sys_open means:
> + *	00 - read-only
> + *	01 - write-only
> + *	10 - read-write
> + *	11 - special
> + * it is changed into
> + *	00 - no permissions needed
> + *	01 - read-permission
> + *	10 - write-permission
> + *	11 - read-write
> + * for the internal routines (ie open_namei()/follow_link() etc)
> + * This is more logical, and also allows the 00 "no perm needed"
> + * to be used for symlinks (where the permissions are checked
> + * later).
> + *
> +*/
> +static inline int open_to_namei_flags(int flag)
> +{
> +	if ((flag+1) & O_ACCMODE)
> +		flag++;
> +	return flag;
> +}
>  #endif /* _LINUX_NAMEI_H */


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 3/9] vfs: Add name to file handle conversion support
  2010-05-12 21:49   ` Andreas Dilger
  2010-05-12 22:43     ` Neil Brown
@ 2010-05-13  0:20     ` Dave Chinner
  2010-05-13  6:23       ` Aneesh Kumar K. V
  2010-05-13  5:56     ` Aneesh Kumar K. V
  2010-05-13 14:24     ` Aneesh Kumar K. V
  3 siblings, 1 reply; 49+ messages in thread
From: Dave Chinner @ 2010-05-13  0:20 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Aneesh Kumar K.V, hch, viro, adilger, corbet, serue, neilb,
	linux-fsdevel, sfrench, philippe.deniel, linux-kernel

On Wed, May 12, 2010 at 03:49:49PM -0600, Andreas Dilger wrote:
> On 2010-05-12, at 09:50, Aneesh Kumar K.V wrote:
> > +static long do_sys_name_to_handle(struct path *path,
> > +			struct file_handle __user *ufh)
> > +{
> > +	if (handle_size <= f_handle.handle_size) {
> > +		/* get the uuid */
> > +		retval = sb->s_op->get_fsid(sb, &this_fs_id);
> > +		if (!retval) {
> > +			/*
> > +			 * Now verify whether we get the same vfsmount
> > +			 * if we lookup with uuid. In case we end up having
> > +			 * same uuid for the multiple file systems. When doing
> > +			 * uuid based lookup we would return the first one.So
> > +			 * with name_to_handle if we don't find the same
> > +			 * vfsmount with lookup return EOPNOTSUPP
> > +			 */
> > +			mnt = fs_get_vfsmount(current, &this_fs_id);
> > +			if (mnt != path->mnt) {
> > +				retval = -EOPNOTSUPP;
> > +				mntput(mnt);
> > +				goto err_free_out;
> > +			}
> 
> I don't see that this does anything for us except add overhead.
> This is no protection against mounting a second filesystem with
> the same UUID after the handle is returned, since there is no
> expiration for file handles.
> 
> At best I think we could start by changing the list-based UUID
> lookup with a hash-based one, and when adding a duplicate UUID at
> mount time start by printing out an error message to the console
> in case of duplicated UUIDs, and maybe at some point in the future
> this might cause the mount to fail (though I don't think we can
> make that decision lightly or quickly).
>
> That moves the overhead to mount time instead of for each
> name_to_handle() call (which would be brutal for a system with
> many filesystems mounted).

That will pretty much match exactly what XFS already does.  Can we
start by moving the XFS functionality (xfs_uuid_mount(), "nouuid"
mount option, etc) to the VFS level and then optimise from there?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 5/9] vfs: Add freadlink syscall
  2010-05-12 15:50 ` [PATCH -V7 5/9] vfs: Add freadlink syscall Aneesh Kumar K.V
@ 2010-05-13  1:43   ` Neil Brown
  2010-05-13  6:25     ` Aneesh Kumar K. V
  0 siblings, 1 reply; 49+ messages in thread
From: Neil Brown @ 2010-05-13  1:43 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: hch, viro, adilger, corbet, serue, linux-fsdevel, sfrench,
	philippe.deniel, linux-kernel

On Wed, 12 May 2010 21:20:40 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> This enables to use open-by-handle and then get the link target
> details of a symlink using the fd returned by handle
> 

I find it very frustrating that a new syscall seems to be needed here.
We have 'readlinkat', and it should be enough.
How:  the 'dfd' has to be a 'directory', and the path name as to be non-empty.

The following patch allows 'path' to be NULL and in that case 'dfd' to be a
non-directory.  This allows readlinkat and faccessat (and probably others)
to be used on an fd with not following path name.

What do people think of this alternative?

NeilBrown

From: NeilBrown <neilb@suse.de>

Allow '*at' syscalls to have a NULL path name, thus using the 'fd' directly.

Move the 'is a directory' test from path_init to the start of link_path_walk
(which is always called after a successful path_init).
Remove the file_permission(MAY_EXEC) test from path_init as such a test
already exists in link_path_walk as exec_permission()
Allow user_path_at to use a NULL resulting in the dfd being the target object.

This effectively provides an 'f*' version for all syscalls with a '*at'
version. In particular: link, access, chmod, chown, stat, readlink, utimes

Some of those already have f* versions, but link, access, readlink all
benefit directly, and futimes code can be tidied up.

NULL is only allowed if 'dfd' is not ATFD_CWD to reduce possible
backward-compatibility issues.

openat is not effected by this change, even in non-O_CREAT instances.  Maybe
it should.

Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/fs/namei.c b/fs/namei.c
index 48e60a1..b9b091e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -824,6 +824,11 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 		goto return_reval;

 	inode = nd->path.dentry->d_inode;
+
+	err = -ENOTDIR;
+	if (!S_ISDIR(inode->i_mode))
+		goto return_err;
+
 	if (nd->depth)
 		lookup_flags = LOOKUP_FOLLOW | (nd->flags & LOOKUP_CONTINUE);

@@ -1030,14 +1035,6 @@ static int path_init(int dfd, const char *name, unsigned int flags, struct namei

 		dentry = file->f_path.dentry;

-		retval = -ENOTDIR;
-		if (!S_ISDIR(dentry->d_inode->i_mode))
-			goto fput_fail;
-
-		retval = file_permission(file, MAY_EXEC);
-		if (retval)
-			goto fput_fail;
-
 		nd->path = file->f_path;
 		path_get(&file->f_path);

@@ -1241,17 +1238,22 @@ int user_path_at(int dfd, const char __user *name, unsigned flags,
 		 struct path *path)
 {
 	struct nameidata nd;
-	char *tmp = getname(name);
-	int err = PTR_ERR(tmp);
-	if (!IS_ERR(tmp)) {
-
-		BUG_ON(flags & LOOKUP_PARENT);
+	char *tmp;
+	int err;

-		err = do_path_lookup(dfd, tmp, flags, &nd);
-		putname(tmp);
-		if (!err)
-			*path = nd.path;
+	BUG_ON(flags & LOOKUP_PARENT);
+	if (name == NULL && dfd >= 0)
+		err = do_path_lookup(dfd, "", flags, &nd);
+	else {
+		tmp = getname(name);
+		err = PTR_ERR(tmp);
+		if (!IS_ERR(tmp)) {
+			err = do_path_lookup(dfd, tmp, flags, &nd);
+			putname(tmp);
+		}
 	}
+	if (!err)
+		*path = nd.path;
 	return err;
 }

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 6/9] ext4: Add get_fsid callback
  2010-05-12 15:50 ` [PATCH -V7 6/9] ext4: Add get_fsid callback Aneesh Kumar K.V
@ 2010-05-13  3:11   ` Dave Chinner
  2010-05-13  6:32     ` Aneesh Kumar K. V
  2010-05-14 17:32   ` Coly Li
  1 sibling, 1 reply; 49+ messages in thread
From: Dave Chinner @ 2010-05-13  3:11 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: hch, viro, adilger, corbet, serue, neilb, linux-fsdevel, sfrench,
	philippe.deniel, linux-kernel

On Wed, May 12, 2010 at 09:20:41PM +0530, Aneesh Kumar K.V wrote:
> Acked-by: Serge Hallyn <serue@us.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  fs/ext4/super.c |   15 +++++++++++++++
>  1 files changed, 15 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index e14d22c..fc7d464 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1049,6 +1049,19 @@ static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
>  	return try_to_free_buffers(page);
>  }
>  
> +static int ext4_get_fsid(struct super_block *sb, struct uuid *fsid)
> +{
> +	struct ext4_sb_info *sbi = EXT4_SB(sb);
> +	struct ext4_super_block *es = sbi->s_es;
> +
> +	memcpy(fsid->uuid, es->s_uuid, sizeof(fsid->uuid));
> +	/*
> +	 * We may want to make sure we return error if the s_uuid is not
> +	 * exactly unique
> +	 */
> +	return 0;
> +}
> +
>  #ifdef CONFIG_QUOTA
>  #define QTYPE2NAME(t) ((t) == USRQUOTA ? "user" : "group")
>  #define QTYPE2MOPT(on, t) ((t) == USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA))
> @@ -1109,6 +1122,7 @@ static const struct super_operations ext4_sops = {
>  	.quota_write	= ext4_quota_write,
>  #endif
>  	.bdev_try_to_free_page = bdev_try_to_free_page,
> +	.get_fsid	= ext4_get_fsid,
>  };
>  
>  static const struct super_operations ext4_nojournal_sops = {
> @@ -1128,6 +1142,7 @@ static const struct super_operations ext4_nojournal_sops = {
>  	.quota_write	= ext4_quota_write,
>  #endif
>  	.bdev_try_to_free_page = bdev_try_to_free_page,
> +	.get_fsid	= ext4_get_fsid,
>  };

This all looks pretty simple - can you add XFS support to this
interface (uuid is in XFS_M(sb)->m_sb.sb_uuid) so that it can be
tested to work on multiple filesystems.

FWIW, I didn't get patch 0 of this series, so I'll comment on
one line of it right here because it is definitely relevant:

> I am also looking at getting xfsprogs libhandle.so on top of these
> syscalls.

If you plan to modify libhandle to use these syscalls, then you need
to guarantee:

	1. XFS support for the syscalls
	2. the handle format, lifecycle and protections for XFS
	   handles are *exactly* the same as the current XFS
	   handles.  i.e. there's a fixed userspace API that
	   cannot be broken.
	3. you don't break any of the other XFS specific handle
	   interfaces that aren't implemented by the new syscalls
	3. You don't break and existing XFS utilites - dump/restore,
	   and fsr come to mind immediately.
	4. that you'll fix the xfstests that may break because of the
	   change
	5. that you'll write new tests for xfstests that validates
	   that the libhandle API works correctly and that handle
	   formats and lifecycles do not get accidentally changed in
	   future.

That's a lot of work and, IMO, is completely pointless. All we'd get
out of it is more complexity, bloat, scope for regressions and a
biger test matrix, and we wouldn't have any new functionality to
speak of.

However, this leads to the bigger question: what's the point of a
new interface if all it ends up getting used for is to re-implement
part of an existing library?

I know this goes against the severe ext4 NIH syndrome that seems to
pervade anything that XFS has already implemented, but lets be
realistic here. If you want applications to use libhandle then there
is no need for a new kernel API - it already has a perfectly
funtional one that has stood the test of time, and all it requires
is moving the  XFS ioctl handler up into the VFS and modifying the
implementation to use existing filesystem callouts.

FWIW, there really isn't anything XFS specific to these handle
functions, so moving it to the VFS should be pretty easy, and that
will result in a full libhandle support for all filesystems that
provide NFS support. That, IMO, is a far superior result than having
two different handle interfaces that have different functionality and
semantics, neither of which have wide fs support...

So please make up your mind - either the handle interface is a
completely new interface with new userspace and kernel APIs,
or it uses the existing userspace and kernel APIs. Anything else
does not make sense.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 3/9] vfs: Add name to file handle conversion support
  2010-05-12 21:49   ` Andreas Dilger
  2010-05-12 22:43     ` Neil Brown
  2010-05-13  0:20     ` Dave Chinner
@ 2010-05-13  5:56     ` Aneesh Kumar K. V
  2010-05-13 14:24     ` Aneesh Kumar K. V
  3 siblings, 0 replies; 49+ messages in thread
From: Aneesh Kumar K. V @ 2010-05-13  5:56 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: hch, viro, adilger, corbet, serue, neilb, linux-fsdevel, sfrench,
	philippe.deniel, linux-kernel

On Wed, 12 May 2010 15:49:49 -0600, Andreas Dilger <andreas.dilger@oracle.com> wrote:
> On 2010-05-12, at 09:50, Aneesh Kumar K.V wrote:
> > +static long do_sys_name_to_handle(struct path *path,
> > +			struct file_handle __user *ufh)
> > +{
> > +	if (handle_size <= f_handle.handle_size) {
> > +		/* get the uuid */
> > +		retval = sb->s_op->get_fsid(sb, &this_fs_id);
> > +		if (!retval) {
> > +			/*
> > +			 * Now verify whether we get the same vfsmount
> > +			 * if we lookup with uuid. In case we end up having
> > +			 * same uuid for the multiple file systems. When doing
> > +			 * uuid based lookup we would return the first one.So
> > +			 * with name_to_handle if we don't find the same
> > +			 * vfsmount with lookup return EOPNOTSUPP
> > +			 */
> > +			mnt = fs_get_vfsmount(current, &this_fs_id);
> > +			if (mnt != path->mnt) {
> > +				retval = -EOPNOTSUPP;
> > +				mntput(mnt);
> > +				goto err_free_out;
> > +			}
> 
> I don't see that this does anything for us except add overhead.  This is no protection against mounting a second filesystem with the same UUID after the handle is returned, since there is no expiration for file handles.
> 
> At best I think we could start by changing the list-based UUID lookup
> with a hash-based one, and when adding a duplicate UUID at mount time
> start by printing out an error message to the console in case of
> duplicated UUIDs, and maybe at some point in the future this might
> cause the mount to fail (though I don't think we can make that
> decision lightly or quickly).

I actually was looking at doing this. That is something in line of

register_for_handle_lookup(struct vfsmount *mnt);

Each file system will call this in their gets_sb callback after calling
get_sb_bdev. ie 

static int ext4_get_sb(struct file_system_type *fs_type, int flags,
		       const char *dev_name, void *data, struct vfsmount *mnt)
{
        int retval;
	retval = get_sb_bdev(fs_type, flags, dev_name, data,
		       ext4_fill_super,mnt);

         if (register_for_handle_lookup(mnt)) {
            printk(KERN_INFO, "File system won't be available or handle lookup");
         }
         return retval;              
}


-aneesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 4/9] vfs: Add open by file handle support
  2010-05-12 23:44   ` Neil Brown
@ 2010-05-13  6:09     ` Dave Chinner
  2010-05-13  6:37       ` Aneesh Kumar K. V
  0 siblings, 1 reply; 49+ messages in thread
From: Dave Chinner @ 2010-05-13  6:09 UTC (permalink / raw)
  To: Neil Brown
  Cc: Aneesh Kumar K.V, hch, viro, adilger, corbet, serue,
	linux-fsdevel, sfrench, philippe.deniel, linux-kernel

On Thu, May 13, 2010 at 09:44:22AM +1000, Neil Brown wrote:
> On Wed, 12 May 2010 21:20:39 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > +		filp->f_flags |= O_NOATIME;
> > +		filp->f_mode |= FMODE_NOCMTIME;
> > +	}
> 
> I think you need a comment here explaining the rational for these setting.

If you've never seen how applications use the XFS handle interface
in conjunction with other XFS functionality, then I guess if would
seem like bad voodoo.

> Why is O_NOATIME important IFREG but not for IFDIR?

No application has ever required directory access or modification
via the handle interface to be invisible to the rest of the system.

> Why is it not sufficient to honour O_NOATIME that is passed in.

Because the XFS handle library is cross platform and predates
O_NOATIME on linux. Hence the library it has never set that flag and
always relied on the kernel implementation of the API to ensure
atime was never updated on fds derived from handles..

> How can you ever justify setting FMODE_NOCMTIME ?

Quite easily. ;)

The XFS handle interface was designed specifically to allow
applications to execute silent/invisible movement of data in, out
and around the filesystem without leaving user visible traces in
file metadata. This enables backup or filesysetm utilities that
operate on active filesystems need to be able to access or modify
inodes and data without affecting running applications.  It's a
feature of the handle interface, and used by xfs_dump, xfs_fsr,
SGI's HSM, etc to do stuff that isn't otherwise possible.

FWIW, if you are curious, here's the initial commit of the XFS
handle code into Irix tree from 3 Sep 1994, showing that the initial
XFS open_by_handle() implementation sets the FINVIS flag to trigger
invisible IO semantics:

http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=commitdiff;h=575b66fae833429a51fcadb204d45521c2dfc26f

> I guess you are just copying from xfs code, but it still needs justification.

	"They are intended for use by a limited set of system
	utilities such as backup programs."

		- open_by_handle(3) man page

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 3/9] vfs: Add name to file handle conversion support
  2010-05-12 22:43     ` Neil Brown
@ 2010-05-13  6:17       ` Aneesh Kumar K. V
  2010-05-13  7:11         ` Neil Brown
  0 siblings, 1 reply; 49+ messages in thread
From: Aneesh Kumar K. V @ 2010-05-13  6:17 UTC (permalink / raw)
  To: Neil Brown, Andreas Dilger
  Cc: hch, viro, adilger, corbet, serue, linux-fsdevel, sfrench,
	philippe.deniel, linux-kernel

On Thu, 13 May 2010 08:43:10 +1000, Neil Brown <neilb@suse.de> wrote:
> On Wed, 12 May 2010 15:49:49 -0600
> Andreas Dilger <andreas.dilger@oracle.com> wrote:
> 
> > On 2010-05-12, at 09:50, Aneesh Kumar K.V wrote:
> > > +static long do_sys_name_to_handle(struct path *path,
> > > +			struct file_handle __user *ufh)
> > > +{
> > > +	if (handle_size <= f_handle.handle_size) {
> > > +		/* get the uuid */
> > > +		retval = sb->s_op->get_fsid(sb, &this_fs_id);
> > > +		if (!retval) {
> > > +			/*
> > > +			 * Now verify whether we get the same vfsmount
> > > +			 * if we lookup with uuid. In case we end up having
> > > +			 * same uuid for the multiple file systems. When doing
> > > +			 * uuid based lookup we would return the first one.So
> > > +			 * with name_to_handle if we don't find the same
> > > +			 * vfsmount with lookup return EOPNOTSUPP
> > > +			 */
> > > +			mnt = fs_get_vfsmount(current, &this_fs_id);
> > > +			if (mnt != path->mnt) {
> > > +				retval = -EOPNOTSUPP;
> > > +				mntput(mnt);
> > > +				goto err_free_out;
> > > +			}
> > 
> > I don't see that this does anything for us except add overhead.  This is no protection against mounting a second filesystem with the same UUID after the handle is returned, since there is no expiration for file handles.
> 
> I very much agree.
> Further I have not yet seen a convincing argument that the UUID should be
> part of the kernel interface for filehandle lookup.  That fact that including
> the UUID suggests to you (Aneesh) that this sort of thing might be a good
> idea seems to me to be a strong argument against including the UUID in the
> lookup.


The other option I tried was to use mountdir fd to point to the
vfsmount. The problem with such an approach is that NFS server will now
have to maintain a cache for file system handle to mountdir mapping.
(xfsprogs libhandle/handle.c shows what such a cache would look like)

The server would anyhow require to get a system wide unique handle to represent the
file. ie server will have to generate unique id to represent the
filesystem and then maintain the unique id to mount dir mapping in the
user space. Also we have the problem of rebuiling the cache after a
server restart (due to crash). With the above observation it guess it
make it easier to have kernel provide file system id which can be
directly used by the userspace.  Hence the descission to use
UUID. Whether UUID is the right value to represent file system is
something we should decide.

> 
> > 
> > At best I think we could start by changing the list-based UUID lookup with a hash-based one, and when adding a duplicate UUID at mount time start by printing out an error message to the console in case of duplicated UUIDs, and maybe at some point in the future this might cause the mount to fail (though I don't think we can make that decision lightly or quickly).
> > 
> > That moves the overhead to mount time instead of for each name_to_handle() call (which would be brutal for a system with many filesystems mounted).
> > 

If we implement the above, can we say UUID can be used to identify the
file system ?

 
> > > +SYSCALL_DEFINE4(name_to_handle_at, int, dfd, const char __user *, name,
> > > +		struct file_handle __user *, handle, int, flag)
> > > +{
> > > +	int follow;
> > > +	long ret = -EINVAL;
> > > +	struct path path;
> > > +
> > > +	if ((flag & ~AT_SYMLINK_NOFOLLOW) != 0)
> > > +		goto err_out;
> > 
> > Won't this break the use of AT_FDCWD to get a relative file handle?
> 
> No, AT_FDCWD is a value passed in 'dfd', not a flag.
> 
> I wonder though if the default should be to not follow symlinks, and that
> AT_SYMLINK_FOLLOW should be required if you really want to follow symlinks.
> I suspect that in most cases you wouldn't want to follow symlinks.
> It isn't a really important point though.
>

Will fix that in next iteration.

-aneesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 3/9] vfs: Add name to file handle conversion support
  2010-05-13  0:20     ` Dave Chinner
@ 2010-05-13  6:23       ` Aneesh Kumar K. V
  2010-05-13  7:31         ` Dave Chinner
  0 siblings, 1 reply; 49+ messages in thread
From: Aneesh Kumar K. V @ 2010-05-13  6:23 UTC (permalink / raw)
  To: Dave Chinner, Andreas Dilger
  Cc: hch, viro, adilger, corbet, serue, neilb, linux-fsdevel, sfrench,
	philippe.deniel, linux-kernel

On Thu, 13 May 2010 10:20:38 +1000, Dave Chinner <david@fromorbit.com> wrote:
> On Wed, May 12, 2010 at 03:49:49PM -0600, Andreas Dilger wrote:
> > On 2010-05-12, at 09:50, Aneesh Kumar K.V wrote:
> > > +static long do_sys_name_to_handle(struct path *path,
> > > +			struct file_handle __user *ufh)
> > > +{
> > > +	if (handle_size <= f_handle.handle_size) {
> > > +		/* get the uuid */
> > > +		retval = sb->s_op->get_fsid(sb, &this_fs_id);
> > > +		if (!retval) {
> > > +			/*
> > > +			 * Now verify whether we get the same vfsmount
> > > +			 * if we lookup with uuid. In case we end up having
> > > +			 * same uuid for the multiple file systems. When doing
> > > +			 * uuid based lookup we would return the first one.So
> > > +			 * with name_to_handle if we don't find the same
> > > +			 * vfsmount with lookup return EOPNOTSUPP
> > > +			 */
> > > +			mnt = fs_get_vfsmount(current, &this_fs_id);
> > > +			if (mnt != path->mnt) {
> > > +				retval = -EOPNOTSUPP;
> > > +				mntput(mnt);
> > > +				goto err_free_out;
> > > +			}
> > 
> > I don't see that this does anything for us except add overhead.
> > This is no protection against mounting a second filesystem with
> > the same UUID after the handle is returned, since there is no
> > expiration for file handles.
> > 
> > At best I think we could start by changing the list-based UUID
> > lookup with a hash-based one, and when adding a duplicate UUID at
> > mount time start by printing out an error message to the console
> > in case of duplicated UUIDs, and maybe at some point in the future
> > this might cause the mount to fail (though I don't think we can
> > make that decision lightly or quickly).
> >
> > That moves the overhead to mount time instead of for each
> > name_to_handle() call (which would be brutal for a system with
> > many filesystems mounted).
> 
> That will pretty much match exactly what XFS already does.  Can we
> start by moving the XFS functionality (xfs_uuid_mount(), "nouuid"
> mount option, etc) to the VFS level and then optimise from there?
> 

I will do this. But should the uuid be unique in a system wide manner or
should it be unique for a mount namespace ? With containers isn't it
valid for the second container to mount a file system with same uuid of
a file system in the first container, but uuid itself is unique in the
second container ?

-aneesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 5/9] vfs: Add freadlink syscall
  2010-05-13  1:43   ` Neil Brown
@ 2010-05-13  6:25     ` Aneesh Kumar K. V
  2010-05-13  6:56       ` Neil Brown
  0 siblings, 1 reply; 49+ messages in thread
From: Aneesh Kumar K. V @ 2010-05-13  6:25 UTC (permalink / raw)
  To: Neil Brown
  Cc: hch, viro, adilger, corbet, serue, linux-fsdevel, sfrench,
	philippe.deniel, linux-kernel

On Thu, 13 May 2010 11:43:51 +1000, Neil Brown <neilb@suse.de> wrote:
> On Wed, 12 May 2010 21:20:40 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > This enables to use open-by-handle and then get the link target
> > details of a symlink using the fd returned by handle
> > 
> 
> 
> I find it very frustrating that a new syscall seems to be needed here.
> We have 'readlinkat', and it should be enough.
> How:  the 'dfd' has to be a 'directory', and the path name as to be non-empty.
> 
> The following patch allows 'path' to be NULL and in that case 'dfd' to be a
> non-directory.  This allows readlinkat and faccessat (and probably others)
> to be used on an fd with not following path name.
> 
> What do people think of this alternative?
> 

I will add this in the next iteration and drop the freadlink syscall.


> NeilBrown
> 
> 
> From: NeilBrown <neilb@suse.de>
> 
> Allow '*at' syscalls to have a NULL path name, thus using the 'fd' directly.
> 
> Move the 'is a directory' test from path_init to the start of link_path_walk
> (which is always called after a successful path_init).
> Remove the file_permission(MAY_EXEC) test from path_init as such a test
> already exists in link_path_walk as exec_permission()
> Allow user_path_at to use a NULL resulting in the dfd being the target object.
> 
> This effectively provides an 'f*' version for all syscalls with a '*at'
> version. In particular: link, access, chmod, chown, stat, readlink, utimes
> 
> Some of those already have f* versions, but link, access, readlink all
> benefit directly, and futimes code can be tidied up.
> 
> NULL is only allowed if 'dfd' is not ATFD_CWD to reduce possible
> backward-compatibility issues.
> 
> openat is not effected by this change, even in non-O_CREAT instances.  Maybe
> it should.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> 
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 48e60a1..b9b091e 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -824,6 +824,11 @@ static int link_path_walk(const char *name, struct nameidata *nd)
>  		goto return_reval;
> 
>  	inode = nd->path.dentry->d_inode;
> +
> +	err = -ENOTDIR;
> +	if (!S_ISDIR(inode->i_mode))
> +		goto return_err;
> +
>  	if (nd->depth)
>  		lookup_flags = LOOKUP_FOLLOW | (nd->flags & LOOKUP_CONTINUE);
> 
> @@ -1030,14 +1035,6 @@ static int path_init(int dfd, const char *name, unsigned int flags, struct namei
> 
>  		dentry = file->f_path.dentry;
> 
> -		retval = -ENOTDIR;
> -		if (!S_ISDIR(dentry->d_inode->i_mode))
> -			goto fput_fail;
> -
> -		retval = file_permission(file, MAY_EXEC);
> -		if (retval)
> -			goto fput_fail;
> -
>  		nd->path = file->f_path;
>  		path_get(&file->f_path);
> 
> @@ -1241,17 +1238,22 @@ int user_path_at(int dfd, const char __user *name, unsigned flags,
>  		 struct path *path)
>  {
>  	struct nameidata nd;
> -	char *tmp = getname(name);
> -	int err = PTR_ERR(tmp);
> -	if (!IS_ERR(tmp)) {
> -
> -		BUG_ON(flags & LOOKUP_PARENT);
> +	char *tmp;
> +	int err;
> 
> -		err = do_path_lookup(dfd, tmp, flags, &nd);
> -		putname(tmp);
> -		if (!err)
> -			*path = nd.path;
> +	BUG_ON(flags & LOOKUP_PARENT);
> +	if (name == NULL && dfd >= 0)
> +		err = do_path_lookup(dfd, "", flags, &nd);
> +	else {
> +		tmp = getname(name);
> +		err = PTR_ERR(tmp);
> +		if (!IS_ERR(tmp)) {
> +			err = do_path_lookup(dfd, tmp, flags, &nd);
> +			putname(tmp);
> +		}
>  	}
> +	if (!err)
> +		*path = nd.path;
>  	return err;
>  }
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 6/9] ext4: Add get_fsid callback
  2010-05-13  3:11   ` Dave Chinner
@ 2010-05-13  6:32     ` Aneesh Kumar K. V
  2010-05-14  1:44       ` Dave Chinner
  0 siblings, 1 reply; 49+ messages in thread
From: Aneesh Kumar K. V @ 2010-05-13  6:32 UTC (permalink / raw)
  To: Dave Chinner
  Cc: hch, viro, adilger, corbet, serue, neilb, linux-fsdevel, sfrench,
	philippe.deniel, linux-kernel

On Thu, 13 May 2010 13:11:33 +1000, Dave Chinner <david@fromorbit.com> wrote:
> On Wed, May 12, 2010 at 09:20:41PM +0530, Aneesh Kumar K.V wrote:
> > Acked-by: Serge Hallyn <serue@us.ibm.com>
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > ---
> >  fs/ext4/super.c |   15 +++++++++++++++
> >  1 files changed, 15 insertions(+), 0 deletions(-)
> > 
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index e14d22c..fc7d464 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -1049,6 +1049,19 @@ static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
> >  	return try_to_free_buffers(page);
> >  }
> >  
> > +static int ext4_get_fsid(struct super_block *sb, struct uuid *fsid)
> > +{
> > +	struct ext4_sb_info *sbi = EXT4_SB(sb);
> > +	struct ext4_super_block *es = sbi->s_es;
> > +
> > +	memcpy(fsid->uuid, es->s_uuid, sizeof(fsid->uuid));
> > +	/*
> > +	 * We may want to make sure we return error if the s_uuid is not
> > +	 * exactly unique
> > +	 */
> > +	return 0;
> > +}
> > +
> >  #ifdef CONFIG_QUOTA
> >  #define QTYPE2NAME(t) ((t) == USRQUOTA ? "user" : "group")
> >  #define QTYPE2MOPT(on, t) ((t) == USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA))
> > @@ -1109,6 +1122,7 @@ static const struct super_operations ext4_sops = {
> >  	.quota_write	= ext4_quota_write,
> >  #endif
> >  	.bdev_try_to_free_page = bdev_try_to_free_page,
> > +	.get_fsid	= ext4_get_fsid,
> >  };
> >  
> >  static const struct super_operations ext4_nojournal_sops = {
> > @@ -1128,6 +1142,7 @@ static const struct super_operations ext4_nojournal_sops = {
> >  	.quota_write	= ext4_quota_write,
> >  #endif
> >  	.bdev_try_to_free_page = bdev_try_to_free_page,
> > +	.get_fsid	= ext4_get_fsid,
> >  };
> 
> This all looks pretty simple - can you add XFS support to this
> interface (uuid is in XFS_M(sb)->m_sb.sb_uuid) so that it can be
> tested to work on multiple filesystems.
> 
> FWIW, I didn't get patch 0 of this series, so I'll comment on
> one line of it right here because it is definitely relevant:
> 
> > I am also looking at getting xfsprogs libhandle.so on top of these
> > syscalls.
> 
> If you plan to modify libhandle to use these syscalls, then you need
> to guarantee:
> 
> 	1. XFS support for the syscalls
> 	2. the handle format, lifecycle and protections for XFS
> 	   handles are *exactly* the same as the current XFS
> 	   handles.  i.e. there's a fixed userspace API that
> 	   cannot be broken.
> 	3. you don't break any of the other XFS specific handle
> 	   interfaces that aren't implemented by the new syscalls
> 	3. You don't break and existing XFS utilites - dump/restore,
> 	   and fsr come to mind immediately.
> 	4. that you'll fix the xfstests that may break because of the
> 	   change
> 	5. that you'll write new tests for xfstests that validates
> 	   that the libhandle API works correctly and that handle
> 	   formats and lifecycles do not get accidentally changed in
> 	   future.
> 
> That's a lot of work and, IMO, is completely pointless. All we'd get
> out of it is more complexity, bloat, scope for regressions and a
> biger test matrix, and we wouldn't have any new functionality to
> speak of.

getting libhandle.so to work with the syscall is something that is
suggested on the list. The goal is to see if syscall achieve everything
that XFS ioctl does


> 
> However, this leads to the bigger question: what's the point of a
> new interface if all it ends up getting used for is to re-implement
> part of an existing library?
> 
> I know this goes against the severe ext4 NIH syndrome that seems to
> pervade anything that XFS has already implemented, but lets be
> realistic here. If you want applications to use libhandle then there
> is no need for a new kernel API - it already has a perfectly
> funtional one that has stood the test of time, and all it requires
> is moving the  XFS ioctl handler up into the VFS and modifying the
> implementation to use existing filesystem callouts.
> 
> FWIW, there really isn't anything XFS specific to these handle
> functions, so moving it to the VFS should be pretty easy, and that
> will result in a full libhandle support for all filesystems that
> provide NFS support. That, IMO, is a far superior result than having
> two different handle interfaces that have different functionality and
> semantics, neither of which have wide fs support...

That is more or less what i am doing. I started with xfs ioctls, But
instead of moving them as ioctls to the VFS layer what I did was to do a
syscall around the same functionality.

> 
> So please make up your mind - either the handle interface is a
> completely new interface with new userspace and kernel APIs,
> or it uses the existing userspace and kernel APIs. Anything else
> does not make sense.
> 

The goal was to get functionality similar to XFS ioctl in a file system
independent manner. That is why my first patch (v1) used a mountdir fd
similar to ioctl. But review feedback on the list suggested changes to
the interface. 


-aneesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 4/9] vfs: Add open by file handle support
  2010-05-13  6:09     ` Dave Chinner
@ 2010-05-13  6:37       ` Aneesh Kumar K. V
  2010-05-14 10:41         ` Dave Chinner
  0 siblings, 1 reply; 49+ messages in thread
From: Aneesh Kumar K. V @ 2010-05-13  6:37 UTC (permalink / raw)
  To: Dave Chinner, Neil Brown
  Cc: hch, viro, adilger, corbet, serue, linux-fsdevel, sfrench,
	philippe.deniel, linux-kernel

On Thu, 13 May 2010 16:09:55 +1000, Dave Chinner <david@fromorbit.com> wrote:
> On Thu, May 13, 2010 at 09:44:22AM +1000, Neil Brown wrote:
> > On Wed, 12 May 2010 21:20:39 +0530
> > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> > 
> > > +		filp->f_flags |= O_NOATIME;
> > > +		filp->f_mode |= FMODE_NOCMTIME;
> > > +	}
> > 
> > I think you need a comment here explaining the rational for these setting.
> 
> If you've never seen how applications use the XFS handle interface
> in conjunction with other XFS functionality, then I guess if would
> seem like bad voodoo.
> 
> > Why is O_NOATIME important IFREG but not for IFDIR?
> 
> No application has ever required directory access or modification
> via the handle interface to be invisible to the rest of the system.
> 
> > Why is it not sufficient to honour O_NOATIME that is passed in.
> 
> Because the XFS handle library is cross platform and predates
> O_NOATIME on linux. Hence the library it has never set that flag and
> always relied on the kernel implementation of the API to ensure
> atime was never updated on fds derived from handles..
> 
> > How can you ever justify setting FMODE_NOCMTIME ?
> 
> Quite easily. ;)
> 
> The XFS handle interface was designed specifically to allow
> applications to execute silent/invisible movement of data in, out
> and around the filesystem without leaving user visible traces in
> file metadata. This enables backup or filesysetm utilities that
> operate on active filesystems need to be able to access or modify
> inodes and data without affecting running applications.  It's a
> feature of the handle interface, and used by xfs_dump, xfs_fsr,
> SGI's HSM, etc to do stuff that isn't otherwise possible.
> 
> FWIW, if you are curious, here's the initial commit of the XFS
> handle code into Irix tree from 3 Sep 1994, showing that the initial
> XFS open_by_handle() implementation sets the FINVIS flag to trigger
> invisible IO semantics:
> 
> http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=commitdiff;h=575b66fae833429a51fcadb204d45521c2dfc26f

Thanks for sharing this. I haven't looked at the details you mentioned here.

> 
> > I guess you are just copying from xfs code, but it still needs justification.
> 
> 	"They are intended for use by a limited set of system
> 	utilities such as backup programs."
> 
> 		- open_by_handle(3) man page
> 

Should we retain all the above behaviour in the new syscall ?. Or just
do what a normal open(2) call does ?

-aneesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 5/9] vfs: Add freadlink syscall
  2010-05-13  6:25     ` Aneesh Kumar K. V
@ 2010-05-13  6:56       ` Neil Brown
  2010-05-13  7:34         ` Aneesh Kumar K. V
  2010-05-14 11:18         ` Aneesh Kumar K. V
  0 siblings, 2 replies; 49+ messages in thread
From: Neil Brown @ 2010-05-13  6:56 UTC (permalink / raw)
  To: Aneesh Kumar K. V
  Cc: hch, viro, adilger, corbet, serue, linux-fsdevel, sfrench,
	philippe.deniel, linux-kernel

On Thu, 13 May 2010 11:55:56 +0530
"Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> On Thu, 13 May 2010 11:43:51 +1000, Neil Brown <neilb@suse.de> wrote:
> > On Wed, 12 May 2010 21:20:40 +0530
> > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> > 
> > > This enables to use open-by-handle and then get the link target
> > > details of a symlink using the fd returned by handle
> > > 
> > 
> > 
> > I find it very frustrating that a new syscall seems to be needed here.
> > We have 'readlinkat', and it should be enough.
> > How:  the 'dfd' has to be a 'directory', and the path name as to be non-empty.
> > 
> > The following patch allows 'path' to be NULL and in that case 'dfd' to be a
> > non-directory.  This allows readlinkat and faccessat (and probably others)
> > to be used on an fd with not following path name.
> > 
> > What do people think of this alternative?
> > 
> 
> I will add this in the next iteration and drop the freadlink syscall.
> 

I wouldn't be quite that hasty.  It was only a proposal.  It might not be
appropriate to change all those syscalls..

An alternative that I don't think is a nice, but is a lot 'safer' is
to just change readlinkat to accept a NULL path:


Signed-off-by: NeilBrown <neilb@suse.de>
diff --git a/fs/stat.c b/fs/stat.c
index c4ecd52..85d0856 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -284,26 +284,40 @@ SYSCALL_DEFINE2(newfstat, unsigned int, fd, struct stat __user *, statbuf)
 SYSCALL_DEFINE4(readlinkat, int, dfd, const char __user *, pathname,
 		char __user *, buf, int, bufsiz)
 {
-	struct path path;
+	struct path path, *pp;
+	struct file *file = NULL;
 	int error;
 
 	if (bufsiz <= 0)
 		return -EINVAL;
 
-	error = user_path_at(dfd, pathname, 0, &path);
+	if (filename == NULL && dfd != AT_FDCWD) {
+		struct file *file = fget(dfd);
+
+		if (file)
+			pp = &file->f_path;
+		else
+			error = -EBADF;
+	} else {
+		error = user_path_at(dfd, pathname, 0, &path);
+		pp = &path;
+	}
 	if (!error) {
-		struct inode *inode = path.dentry->d_inode;
+		struct inode *inode = pp->dentry->d_inode;
 
 		error = -EINVAL;
 		if (inode->i_op->readlink) {
-			error = security_inode_readlink(path.dentry);
+			error = security_inode_readlink(pp->dentry);
 			if (!error) {
-				touch_atime(path.mnt, path.dentry);
-				error = inode->i_op->readlink(path.dentry,
+				touch_atime(pp->mnt, pp->dentry);
+				error = inode->i_op->readlink(pp->dentry,
 							      buf, bufsiz);
 			}
 		}
-		path_put(&path);
+		if (file)
+			fput(file);
+		else
+			path_put(&path);
 	}
 	return error;
 }

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 3/9] vfs: Add name to file handle conversion support
  2010-05-13  6:17       ` Aneesh Kumar K. V
@ 2010-05-13  7:11         ` Neil Brown
  2010-05-13  8:30           ` Andreas Dilger
  0 siblings, 1 reply; 49+ messages in thread
From: Neil Brown @ 2010-05-13  7:11 UTC (permalink / raw)
  To: Aneesh Kumar K. V
  Cc: Andreas Dilger, hch, viro, adilger, corbet, serue, linux-fsdevel,
	sfrench, philippe.deniel, linux-kernel

On Thu, 13 May 2010 11:47:46 +0530
"Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> On Thu, 13 May 2010 08:43:10 +1000, Neil Brown <neilb@suse.de> wrote:
> > On Wed, 12 May 2010 15:49:49 -0600
> > Andreas Dilger <andreas.dilger@oracle.com> wrote:
> > 
> > > On 2010-05-12, at 09:50, Aneesh Kumar K.V wrote:
> > > > +static long do_sys_name_to_handle(struct path *path,
> > > > +			struct file_handle __user *ufh)
> > > > +{
> > > > +	if (handle_size <= f_handle.handle_size) {
> > > > +		/* get the uuid */
> > > > +		retval = sb->s_op->get_fsid(sb, &this_fs_id);
> > > > +		if (!retval) {
> > > > +			/*
> > > > +			 * Now verify whether we get the same vfsmount
> > > > +			 * if we lookup with uuid. In case we end up having
> > > > +			 * same uuid for the multiple file systems. When doing
> > > > +			 * uuid based lookup we would return the first one.So
> > > > +			 * with name_to_handle if we don't find the same
> > > > +			 * vfsmount with lookup return EOPNOTSUPP
> > > > +			 */
> > > > +			mnt = fs_get_vfsmount(current, &this_fs_id);
> > > > +			if (mnt != path->mnt) {
> > > > +				retval = -EOPNOTSUPP;
> > > > +				mntput(mnt);
> > > > +				goto err_free_out;
> > > > +			}
> > > 
> > > I don't see that this does anything for us except add overhead.  This is no protection against mounting a second filesystem with the same UUID after the handle is returned, since there is no expiration for file handles.
> > 
> > I very much agree.
> > Further I have not yet seen a convincing argument that the UUID should be
> > part of the kernel interface for filehandle lookup.  That fact that including
> > the UUID suggests to you (Aneesh) that this sort of thing might be a good
> > idea seems to me to be a strong argument against including the UUID in the
> > lookup.
> 
> 
> The other option I tried was to use mountdir fd to point to the
> vfsmount. The problem with such an approach is that NFS server will now
> have to maintain a cache for file system handle to mountdir mapping.
> (xfsprogs libhandle/handle.c shows what such a cache would look like)
> 
> The server would anyhow require to get a system wide unique handle to represent the
> file. ie server will have to generate unique id to represent the
> filesystem and then maintain the unique id to mount dir mapping in the
> user space. Also we have the problem of rebuiling the cache after a
> server restart (due to crash). With the above observation it guess it
> make it easier to have kernel provide file system id which can be
> directly used by the userspace.  Hence the descission to use
> UUID. Whether UUID is the right value to represent file system is
> something we should decide.

Here is my reason (well, one of my reasons) why UUID is clearly not an
acceptable handle to use - you *must* use an fd or are worst a path name.

 A filesystem can be mounted at multiple places in the namespace and each
 instance can have other filesystems mounted on different directories.
 Each of these is a 'vfsmnt'.
 You need a vfsmnt for most (all?) filesystem operations.
 A UUID does not uniquely identify a vfsmnt.  It might uniquely identify a
 filesystem, though that is debatable.  It definitely does not uniquely
 identify a vfsmnt.
 Therefore, given only a UUID the kernel would have to arbitrarily choose a
 vfsmnt that references to the right filesystem.  If a particular directory in
 the filesystem that you want to access was mounted-on in that vfsmnt, that
 directory would be completely inaccessible to you, even though it might be
 completely accessible in some other vfsmnt.
 So it is quite possible for a scheme based on kernel interpretation of uuids
 to fail.

 This may be a corner case, but I think people are slowly getting more
 adventurous in terms of using the mount table to do interesting things.  It
 may be less of a corner case in 5 years.

 So to tell the kernel which filesystem is of interest for filehandle
 lookup, you *must* give it a name, whether a textual path, or a filehandle
 obtained by opening a textual path.

 knfsd creates a cache in the kernel by telling it that a particular
 handle-fragment maps to a particular path.  The handle-fragment is normally
 a uuid by default, but it is trivial to choose something else via
 configuration in userspace if a corner-case needs to be handled.

 Creating such a cache and rebuilding after server-restart is not really very
 difficult.

NeilBrown


> 
> > 
> > > 
> > > At best I think we could start by changing the list-based UUID lookup with a hash-based one, and when adding a duplicate UUID at mount time start by printing out an error message to the console in case of duplicated UUIDs, and maybe at some point in the future this might cause the mount to fail (though I don't think we can make that decision lightly or quickly).
> > > 
> > > That moves the overhead to mount time instead of for each name_to_handle() call (which would be brutal for a system with many filesystems mounted).
> > > 
> 
> If we implement the above, can we say UUID can be used to identify the
> file system ?
> 
>  
> > > > +SYSCALL_DEFINE4(name_to_handle_at, int, dfd, const char __user *, name,
> > > > +		struct file_handle __user *, handle, int, flag)
> > > > +{
> > > > +	int follow;
> > > > +	long ret = -EINVAL;
> > > > +	struct path path;
> > > > +
> > > > +	if ((flag & ~AT_SYMLINK_NOFOLLOW) != 0)
> > > > +		goto err_out;
> > > 
> > > Won't this break the use of AT_FDCWD to get a relative file handle?
> > 
> > No, AT_FDCWD is a value passed in 'dfd', not a flag.
> > 
> > I wonder though if the default should be to not follow symlinks, and that
> > AT_SYMLINK_FOLLOW should be required if you really want to follow symlinks.
> > I suspect that in most cases you wouldn't want to follow symlinks.
> > It isn't a really important point though.
> >
> 
> Will fix that in next iteration.
> 
> -aneesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 3/9] vfs: Add name to file handle conversion support
  2010-05-13  6:23       ` Aneesh Kumar K. V
@ 2010-05-13  7:31         ` Dave Chinner
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Chinner @ 2010-05-13  7:31 UTC (permalink / raw)
  To: Aneesh Kumar K. V
  Cc: Andreas Dilger, hch, viro, adilger, corbet, serue, neilb,
	linux-fsdevel, sfrench, philippe.deniel, linux-kernel

On Thu, May 13, 2010 at 11:53:33AM +0530, Aneesh Kumar K. V wrote:
> On Thu, 13 May 2010 10:20:38 +1000, Dave Chinner <david@fromorbit.com> wrote:
> > On Wed, May 12, 2010 at 03:49:49PM -0600, Andreas Dilger wrote:
> > > On 2010-05-12, at 09:50, Aneesh Kumar K.V wrote:
> > > > +static long do_sys_name_to_handle(struct path *path,
> > > > +			struct file_handle __user *ufh)
> > > > +{
> > > > +	if (handle_size <= f_handle.handle_size) {
> > > > +		/* get the uuid */
> > > > +		retval = sb->s_op->get_fsid(sb, &this_fs_id);
> > > > +		if (!retval) {
> > > > +			/*
> > > > +			 * Now verify whether we get the same vfsmount
> > > > +			 * if we lookup with uuid. In case we end up having
> > > > +			 * same uuid for the multiple file systems. When doing
> > > > +			 * uuid based lookup we would return the first one.So
> > > > +			 * with name_to_handle if we don't find the same
> > > > +			 * vfsmount with lookup return EOPNOTSUPP
> > > > +			 */
> > > > +			mnt = fs_get_vfsmount(current, &this_fs_id);
> > > > +			if (mnt != path->mnt) {
> > > > +				retval = -EOPNOTSUPP;
> > > > +				mntput(mnt);
> > > > +				goto err_free_out;
> > > > +			}
> > > 
> > > I don't see that this does anything for us except add overhead.
> > > This is no protection against mounting a second filesystem with
> > > the same UUID after the handle is returned, since there is no
> > > expiration for file handles.
> > > 
> > > At best I think we could start by changing the list-based UUID
> > > lookup with a hash-based one, and when adding a duplicate UUID at
> > > mount time start by printing out an error message to the console
> > > in case of duplicated UUIDs, and maybe at some point in the future
> > > this might cause the mount to fail (though I don't think we can
> > > make that decision lightly or quickly).
> > >
> > > That moves the overhead to mount time instead of for each
> > > name_to_handle() call (which would be brutal for a system with
> > > many filesystems mounted).
> > 
> > That will pretty much match exactly what XFS already does.  Can we
> > start by moving the XFS functionality (xfs_uuid_mount(), "nouuid"
> > mount option, etc) to the VFS level and then optimise from there?
> > 
> 
> I will do this. But should the uuid be unique in a system wide manner or
> should it be unique for a mount namespace ? With containers isn't it
> valid for the second container to mount a file system with same uuid of
> a file system in the first container, but uuid itself is unique in the
> second container ?

I don't know how containers and mount namespaces interact, so I
can't really comment with any authority. However, two different
filesystems with the same UUID means that someone or something
doesn't understand what unique means and that, I think, makes the
container issue moot.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 5/9] vfs: Add freadlink syscall
  2010-05-13  6:56       ` Neil Brown
@ 2010-05-13  7:34         ` Aneesh Kumar K. V
  2010-05-13  8:09           ` Neil Brown
  2010-05-14 11:18         ` Aneesh Kumar K. V
  1 sibling, 1 reply; 49+ messages in thread
From: Aneesh Kumar K. V @ 2010-05-13  7:34 UTC (permalink / raw)
  To: Neil Brown
  Cc: hch, viro, adilger, corbet, serue, linux-fsdevel, sfrench,
	philippe.deniel, linux-kernel

On Thu, 13 May 2010 16:56:06 +1000, Neil Brown <neilb@suse.de> wrote:
> On Thu, 13 May 2010 11:55:56 +0530
> "Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > On Thu, 13 May 2010 11:43:51 +1000, Neil Brown <neilb@suse.de> wrote:
> > > On Wed, 12 May 2010 21:20:40 +0530
> > > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> > > 
> > > > This enables to use open-by-handle and then get the link target
> > > > details of a symlink using the fd returned by handle
> > > > 
> > > 
> > > 
> > > I find it very frustrating that a new syscall seems to be needed here.
> > > We have 'readlinkat', and it should be enough.
> > > How:  the 'dfd' has to be a 'directory', and the path name as to be non-empty.
> > > 
> > > The following patch allows 'path' to be NULL and in that case 'dfd' to be a
> > > non-directory.  This allows readlinkat and faccessat (and probably others)
> > > to be used on an fd with not following path name.
> > > 
> > > What do people think of this alternative?
> > > 
> > 
> > I will add this in the next iteration and drop the freadlink syscall.
> > 
> 
> I wouldn't be quite that hasty.  It was only a proposal.  It might not be
> appropriate to change all those syscalls..
> 
> An alternative that I don't think is a nice, but is a lot 'safer' is
> to just change readlinkat to accept a NULL path:

That would mean only readlinkat now have a special case of accepting NULL path
and  dirfd can point to objects other than directory. So from the
interface point of having the special case is confusing. But if you feel
strong enough to drop freadlink syscall i can do that. Just not sure
whether special case for only readlinkat is the right way considering we
have  fstat/fchwon type calls for other file system operations.


-aneesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 5/9] vfs: Add freadlink syscall
  2010-05-13  7:34         ` Aneesh Kumar K. V
@ 2010-05-13  8:09           ` Neil Brown
  0 siblings, 0 replies; 49+ messages in thread
From: Neil Brown @ 2010-05-13  8:09 UTC (permalink / raw)
  To: Aneesh Kumar K. V
  Cc: hch, viro, adilger, corbet, serue, linux-fsdevel, sfrench,
	philippe.deniel, linux-kernel

On Thu, 13 May 2010 13:04:18 +0530
"Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> On Thu, 13 May 2010 16:56:06 +1000, Neil Brown <neilb@suse.de> wrote:
> > On Thu, 13 May 2010 11:55:56 +0530
> > "Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> > 
> > > On Thu, 13 May 2010 11:43:51 +1000, Neil Brown <neilb@suse.de> wrote:
> > > > On Wed, 12 May 2010 21:20:40 +0530
> > > > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> > > > 
> > > > > This enables to use open-by-handle and then get the link target
> > > > > details of a symlink using the fd returned by handle
> > > > > 
> > > > 
> > > > 
> > > > I find it very frustrating that a new syscall seems to be needed here.
> > > > We have 'readlinkat', and it should be enough.
> > > > How:  the 'dfd' has to be a 'directory', and the path name as to be non-empty.
> > > > 
> > > > The following patch allows 'path' to be NULL and in that case 'dfd' to be a
> > > > non-directory.  This allows readlinkat and faccessat (and probably others)
> > > > to be used on an fd with not following path name.
> > > > 
> > > > What do people think of this alternative?
> > > > 
> > > 
> > > I will add this in the next iteration and drop the freadlink syscall.
> > > 
> > 
> > I wouldn't be quite that hasty.  It was only a proposal.  It might not be
> > appropriate to change all those syscalls..
> > 
> > An alternative that I don't think is a nice, but is a lot 'safer' is
> > to just change readlinkat to accept a NULL path:
> 
> That would mean only readlinkat now have a special case of accepting NULL path
> and  dirfd can point to objects other than directory. So from the
> interface point of having the special case is confusing. But if you feel
> strong enough to drop freadlink syscall i can do that. Just not sure
> whether special case for only readlinkat is the right way considering we
> have  fstat/fchwon type calls for other file system operations.
> 
>

Not completely a special case.  I modelled this code on utimesat.

NeilBrown

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 3/9] vfs: Add name to file handle conversion support
  2010-05-13  7:11         ` Neil Brown
@ 2010-05-13  8:30           ` Andreas Dilger
  2010-05-13  8:47             ` Neil Brown
  2010-05-13 14:21             ` Aneesh Kumar K. V
  0 siblings, 2 replies; 49+ messages in thread
From: Andreas Dilger @ 2010-05-13  8:30 UTC (permalink / raw)
  To: Neil Brown
  Cc: Aneesh Kumar K. V, Christoph Hellwig, Alexander Viro,
	Jonathan Corbet, Serge Hallyn, linux-fsdevel, Steven French,
	Philippe Deniel, linux-kernel@vger.kernel.org Mailinglist

On 2010-05-13, at 01:11, Neil Brown wrote:
> Here is my reason (well, one of my reasons) why UUID is clearly not an
> acceptable handle to use - you *must* use an fd or are worst a path name.
> 
> A filesystem can be mounted at multiple places in the namespace and each
> instance can have other filesystems mounted on different directories.
> Each of these is a 'vfsmnt'.
> You need a vfsmnt for most (all?) filesystem operations.
> A UUID does not uniquely identify a vfsmnt.  It might uniquely identify a
> filesystem, though that is debatable.  It definitely does not uniquely
> identify a vfsmnt.
> Therefore, given only a UUID the kernel would have to arbitrarily choose a
> vfsmnt that references to the right filesystem.  If a particular directory in
> the filesystem that you want to access was mounted-on in that vfsmnt, that
> directory would be completely inaccessible to you, even though it might be
> completely accessible in some other vfsmnt.
> So it is quite possible for a scheme based on kernel interpretation of uuids
> to fail.

I think this is exactly the wrong solution for this case.  The lookup from pathname to handle happened ALREADY (in the presence of the right namespace and vfsmnt information) and all that should be required for the handle lookup is to get the exact same inode back from disk.  It is irrelevant what vfsmnt is used to do this lookup, so long as it is in the same filesystem as before.

> This may be a corner case, but I think people are slowly getting more
> adventurous in terms of using the mount table to do interesting things.  It
> may be less of a corner case in 5 years.

I think the important feature for the handle is that _regardless_ of what shenanigans are done with the path, over-mounting of directories, etc, that the same inode is returned.  One would expect that if a real fd was opened by a process, that any later changes to the namespace would not suddenly result in another file with the same pathname to be accessible by that fd.
 
> So to tell the kernel which filesystem is of interest for filehandle
> lookup, you *must* give it a name, whether a textual path, or a filehandle
> obtained by opening a textual path.

No, that is done at the time of open() (or in this case name_to_handle()) and afterward the name/path/vfsmnt is completely irrelevant to the fd/handle.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 3/9] vfs: Add name to file handle conversion support
  2010-05-13  8:30           ` Andreas Dilger
@ 2010-05-13  8:47             ` Neil Brown
  2010-05-13 14:21             ` Aneesh Kumar K. V
  1 sibling, 0 replies; 49+ messages in thread
From: Neil Brown @ 2010-05-13  8:47 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Aneesh Kumar K. V, Christoph Hellwig, Alexander Viro,
	Jonathan Corbet, Serge Hallyn, linux-fsdevel, Steven French,
	Philippe Deniel, linux-kernel@vger.kernel.org Mailinglist

On Thu, 13 May 2010 02:30:46 -0600
Andreas Dilger <andreas.dilger@oracle.com> wrote:

> On 2010-05-13, at 01:11, Neil Brown wrote:
> > Here is my reason (well, one of my reasons) why UUID is clearly not an
> > acceptable handle to use - you *must* use an fd or are worst a path name.
> > 
> > A filesystem can be mounted at multiple places in the namespace and each
> > instance can have other filesystems mounted on different directories.
> > Each of these is a 'vfsmnt'.
> > You need a vfsmnt for most (all?) filesystem operations.
> > A UUID does not uniquely identify a vfsmnt.  It might uniquely identify a
> > filesystem, though that is debatable.  It definitely does not uniquely
> > identify a vfsmnt.
> > Therefore, given only a UUID the kernel would have to arbitrarily choose a
> > vfsmnt that references to the right filesystem.  If a particular directory in
> > the filesystem that you want to access was mounted-on in that vfsmnt, that
> > directory would be completely inaccessible to you, even though it might be
> > completely accessible in some other vfsmnt.
> > So it is quite possible for a scheme based on kernel interpretation of uuids
> > to fail.
> 
> I think this is exactly the wrong solution for this case.  The lookup from pathname to handle happened ALREADY (in the presence of the right namespace and vfsmnt information) and all that should be required for the handle lookup is to get the exact same inode back from disk.  It is irrelevant what vfsmnt is used to do this lookup, so long as it is in the same filesystem as before.
> 
> > This may be a corner case, but I think people are slowly getting more
> > adventurous in terms of using the mount table to do interesting things.  It
> > may be less of a corner case in 5 years.
> 
> I think the important feature for the handle is that _regardless_ of what shenanigans are done with the path, over-mounting of directories, etc, that the same inode is returned.  One would expect that if a real fd was opened by a process, that any later changes to the namespace would not suddenly result in another file with the same pathname to be accessible by that fd.

Very true.
And if you only ever read from or write to the fd then there are no other
issues.

However if you use e.g. openat(the_fd, ....), then the vfsmnt attached to
the_fd is very important.
And if you were to use these filehandles to implement a user-space NFS server
(as has been suggested) and you wanted to implement the LOOKUP request, you
would need to use openat, and the vfsmnt would be significant.

NeilBrown


>  
> > So to tell the kernel which filesystem is of interest for filehandle
> > lookup, you *must* give it a name, whether a textual path, or a filehandle
> > obtained by opening a textual path.
> 
> No, that is done at the time of open() (or in this case name_to_handle()) and afterward the name/path/vfsmnt is completely irrelevant to the fd/handle.
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Technical Lead
> Oracle Corporation Canada Inc.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 3/9] vfs: Add name to file handle conversion support
  2010-05-13  8:30           ` Andreas Dilger
  2010-05-13  8:47             ` Neil Brown
@ 2010-05-13 14:21             ` Aneesh Kumar K. V
  2010-05-13 18:17               ` Aneesh Kumar K. V
  1 sibling, 1 reply; 49+ messages in thread
From: Aneesh Kumar K. V @ 2010-05-13 14:21 UTC (permalink / raw)
  To: Andreas Dilger, Neil Brown
  Cc: Christoph Hellwig, Alexander Viro, Jonathan Corbet, Serge Hallyn,
	linux-fsdevel, Steven French, Philippe Deniel,
	linux-kernel@vger.kernel.org Mailinglist

On Thu, 13 May 2010 02:30:46 -0600, Andreas Dilger <andreas.dilger@oracle.com> wrote:
> On 2010-05-13, at 01:11, Neil Brown wrote:
> > Here is my reason (well, one of my reasons) why UUID is clearly not an
> > acceptable handle to use - you *must* use an fd or are worst a path name.
> > 
> > A filesystem can be mounted at multiple places in the namespace and each
> > instance can have other filesystems mounted on different directories.
> > Each of these is a 'vfsmnt'.
> > You need a vfsmnt for most (all?) filesystem operations.
> > A UUID does not uniquely identify a vfsmnt.  It might uniquely identify a
> > filesystem, though that is debatable.  It definitely does not uniquely
> > identify a vfsmnt.
> > Therefore, given only a UUID the kernel would have to arbitrarily choose a
> > vfsmnt that references to the right filesystem.  If a particular directory in
> > the filesystem that you want to access was mounted-on in that vfsmnt, that
> > directory would be completely inaccessible to you, even though it might be
> > completely accessible in some other vfsmnt.
> > So it is quite possible for a scheme based on kernel interpretation of uuids
> > to fail.
> 
> I think this is exactly the wrong solution for this case.  The lookup from pathname to handle happened ALREADY (in the presence of the right namespace and vfsmnt information) and all that should be required for the handle lookup is to get the exact same inode back from disk.  It is irrelevant what vfsmnt is used to do this lookup, so long as it is in the same filesystem as before.
> 
> > This may be a corner case, but I think people are slowly getting more
> > adventurous in terms of using the mount table to do interesting things.  It
> > may be less of a corner case in 5 years.
> 
> I think the important feature for the handle is that _regardless_ of what shenanigans are done with the path, over-mounting of directories, etc, that the same inode is returned.  One would expect that if a real fd was opened by a process, that any later changes to the namespace would not suddenly result in another file with the same pathname to be accessible by that fd.
> 
> > So to tell the kernel which filesystem is of interest for filehandle
> > lookup, you *must* give it a name, whether a textual path, or a filehandle
> > obtained by opening a textual path.
> 
> No, that is done at the time of open() (or in this case name_to_handle()) and afterward the name/path/vfsmnt is completely irrelevant to the fd/handle.


I guess we also have the problem with readonly bind mounts.  May be we
should use the mountdir fd and leave the fsid to mountdir fd mapping to
userspace. 

NOTE: With the current patchset a name_to_handle on the readonly mount
point will fail because of the new checks i added in the syscall. But in
general i guess we need to make sure that an open_by_handle should
follow these vfsmount flags also. 

Christoph, Andreas,

Can we agree on the use of mountdir fd ? If so i can redo the patch
using mountdirfd.

-aneesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 3/9] vfs: Add name to file handle conversion support
  2010-05-12 21:49   ` Andreas Dilger
                       ` (2 preceding siblings ...)
  2010-05-13  5:56     ` Aneesh Kumar K. V
@ 2010-05-13 14:24     ` Aneesh Kumar K. V
  3 siblings, 0 replies; 49+ messages in thread
From: Aneesh Kumar K. V @ 2010-05-13 14:24 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: hch, viro, adilger, corbet, serue, neilb, linux-fsdevel, sfrench,
	philippe.deniel, linux-kernel

On Wed, 12 May 2010 15:49:49 -0600, Andreas Dilger <andreas.dilger@oracle.com> wrote:
> On 2010-05-12, at 09:50, Aneesh Kumar K.V wrote:
> > +static long do_sys_name_to_handle(struct path *path,
> > +			struct file_handle __user *ufh)
> > +{
> > +	if (handle_size <= f_handle.handle_size) {
> > +		/* get the uuid */
> > +		retval = sb->s_op->get_fsid(sb, &this_fs_id);
> > +		if (!retval) {
> > +			/*
> > +			 * Now verify whether we get the same vfsmount
> > +			 * if we lookup with uuid. In case we end up having
> > +			 * same uuid for the multiple file systems. When doing
> > +			 * uuid based lookup we would return the first one.So
> > +			 * with name_to_handle if we don't find the same
> > +			 * vfsmount with lookup return EOPNOTSUPP
> > +			 */
> > +			mnt = fs_get_vfsmount(current, &this_fs_id);
> > +			if (mnt != path->mnt) {
> > +				retval = -EOPNOTSUPP;
> > +				mntput(mnt);
> > +				goto err_free_out;
> > +			}
> 
> I don't see that this does anything for us except add overhead.  This
> is no protection against mounting a second filesystem with the same
> UUID after the handle is returned, since there is no expiration for
> file handles.

The new vfsmount gets added to the tail of the mount list. So if we
already have the handle then we can be sure that on open_by_handle we
get the same vfsmount. Of course if we umount the original file system
and mount it back we could get it wrong. That is also true with the
below scheme, ie if we umount both the file system mount the second one
which causes it to be added to the uuid hash table followed by the first
one which will not get added to uuid hash table because of same uuid.
In that case an open_by_handle will map the fsid to the wrong file system.

> 
> At best I think we could start by changing the list-based UUID lookup with a hash-based one, and when adding a duplicate UUID at mount time start by printing out an error message to the console in case of duplicated UUIDs, and maybe at some point in the future this might cause the mount to fail (though I don't think we can make that decision lightly or quickly).
> 
> That moves the overhead to mount time instead of for each name_to_handle() call (which would be brutal for a system with many filesystems mounted).
> 

-aneesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 3/9] vfs: Add name to file handle conversion support
  2010-05-13 14:21             ` Aneesh Kumar K. V
@ 2010-05-13 18:17               ` Aneesh Kumar K. V
  2010-05-13 22:54                 ` Andreas Dilger
  0 siblings, 1 reply; 49+ messages in thread
From: Aneesh Kumar K. V @ 2010-05-13 18:17 UTC (permalink / raw)
  To: Andreas Dilger, Neil Brown
  Cc: Christoph Hellwig, Alexander Viro, Jonathan Corbet, Serge Hallyn,
	linux-fsdevel, Steven French, Philippe Deniel,
	linux-kernel@vger.kernel.org Mailinglist

On Thu, 13 May 2010 19:51:23 +0530, "Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> On Thu, 13 May 2010 02:30:46 -0600, Andreas Dilger <andreas.dilger@oracle.com> wrote:
> > On 2010-05-13, at 01:11, Neil Brown wrote:
> > > Here is my reason (well, one of my reasons) why UUID is clearly not an
> > > acceptable handle to use - you *must* use an fd or are worst a path name.
> > > 
> > > A filesystem can be mounted at multiple places in the namespace and each
> > > instance can have other filesystems mounted on different directories.
> > > Each of these is a 'vfsmnt'.
> > > You need a vfsmnt for most (all?) filesystem operations.
> > > A UUID does not uniquely identify a vfsmnt.  It might uniquely identify a
> > > filesystem, though that is debatable.  It definitely does not uniquely
> > > identify a vfsmnt.
> > > Therefore, given only a UUID the kernel would have to arbitrarily choose a
> > > vfsmnt that references to the right filesystem.  If a particular directory in
> > > the filesystem that you want to access was mounted-on in that vfsmnt, that
> > > directory would be completely inaccessible to you, even though it might be
> > > completely accessible in some other vfsmnt.
> > > So it is quite possible for a scheme based on kernel interpretation of uuids
> > > to fail.
> > 
> > I think this is exactly the wrong solution for this case.  The lookup from pathname to handle happened ALREADY (in the presence of the right namespace and vfsmnt information) and all that should be required for the handle lookup is to get the exact same inode back from disk.  It is irrelevant what vfsmnt is used to do this lookup, so long as it is in the same filesystem as before.
> > 
> > > This may be a corner case, but I think people are slowly getting more
> > > adventurous in terms of using the mount table to do interesting things.  It
> > > may be less of a corner case in 5 years.
> > 
> > I think the important feature for the handle is that _regardless_ of what shenanigans are done with the path, over-mounting of directories, etc, that the same inode is returned.  One would expect that if a real fd was opened by a process, that any later changes to the namespace would not suddenly result in another file with the same pathname to be accessible by that fd.
> > 
> > > So to tell the kernel which filesystem is of interest for filehandle
> > > lookup, you *must* give it a name, whether a textual path, or a filehandle
> > > obtained by opening a textual path.
> > 
> > No, that is done at the time of open() (or in this case name_to_handle()) and afterward the name/path/vfsmnt is completely irrelevant to the fd/handle.
> 
> 
> I guess we also have the problem with readonly bind mounts.  May be we
> should use the mountdir fd and leave the fsid to mountdir fd mapping to
> userspace. 
> 
> NOTE: With the current patchset a name_to_handle on the readonly mount
> point will fail because of the new checks i added in the syscall. But in
> general i guess we need to make sure that an open_by_handle should
> follow these vfsmount flags also. 
> 
> Christoph, Andreas,
> 
> Can we agree on the use of mountdir fd ? If so i can redo the patch
> using mountdirfd.
> 

Or

How about below interface

sys_open_by_handle_at(int mountdirfd, struct file_handle *handle, int
flags)

if mountdirfd == 0
   use UUID specified in the file handle to lookup vfsmount
else
   else use mountdirfd to get the vfsmount


-aneesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 3/9] vfs: Add name to file handle conversion support
  2010-05-13 18:17               ` Aneesh Kumar K. V
@ 2010-05-13 22:54                 ` Andreas Dilger
  2010-05-14 17:25                   ` Al Viro
  0 siblings, 1 reply; 49+ messages in thread
From: Andreas Dilger @ 2010-05-13 22:54 UTC (permalink / raw)
  To: Aneesh Kumar K. V
  Cc: Neil Brown, Christoph Hellwig, Alexander Viro, Jonathan Corbet,
	Serge Hallyn, linux-fsdevel, Steven French, Philippe Deniel,
	linux-kernel@vger.kernel.org Mailinglist

On 2010-05-13, at 12:17, Aneesh Kumar K. V wrote:
> How about below interface
> 
> sys_open_by_handle_at(int mountdirfd, struct file_handle *handle, int
> flags)
> 
> if mountdirfd == 0
>   use UUID specified in the file handle to lookup vfsmount
> else
>   else use mountdirfd to get the vfsmount

fd = 0 is a valid descriptor, though -1 is not.  I'd be OK with this, because it still means we can use the UUID to positively identify the filesystem (it would also avoid the need to do a by-UUID lookup in most cases, since the dirfd would already point to a filesystem and we can just compare the handle UUID with the UUID for that superblock.

The question is what to do if the UUID in the file handle does not match the filesystem that is passed via vfsmount?  It would seem that vfsmount is needed for determining the namespace and possibly disambiguating multiple mounts of snapshots with the same UUID.  Independent of that if the UUID is not matching the underlying filesystem that the vfsmount is pointing to the open should fail with -ESTALE.

If the UUID is not the same, then the chance that the inode/generation stored in the file handle are still useful is slim.  If someone is cloning the filesystem down to the inode number, then they can clone the UUID as well.

In summary, I'm OK with this approach.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 6/9] ext4: Add get_fsid callback
  2010-05-13  6:32     ` Aneesh Kumar K. V
@ 2010-05-14  1:44       ` Dave Chinner
  2010-05-15  6:09         ` Aneesh Kumar K. V
  0 siblings, 1 reply; 49+ messages in thread
From: Dave Chinner @ 2010-05-14  1:44 UTC (permalink / raw)
  To: Aneesh Kumar K. V
  Cc: hch, viro, adilger, corbet, serue, neilb, linux-fsdevel, sfrench,
	philippe.deniel, linux-kernel

On Thu, May 13, 2010 at 12:02:52PM +0530, Aneesh Kumar K. V wrote:
> On Thu, 13 May 2010 13:11:33 +1000, Dave Chinner <david@fromorbit.com> wrote:
> > On Wed, May 12, 2010 at 09:20:41PM +0530, Aneesh Kumar K.V wrote:
> > > Acked-by: Serge Hallyn <serue@us.ibm.com>
> > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > > ---
> > >  fs/ext4/super.c |   15 +++++++++++++++
> > >  1 files changed, 15 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > > index e14d22c..fc7d464 100644
> > > --- a/fs/ext4/super.c
> > > +++ b/fs/ext4/super.c
> > > @@ -1049,6 +1049,19 @@ static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
> > >  	return try_to_free_buffers(page);
> > >  }
> > >  
> > > +static int ext4_get_fsid(struct super_block *sb, struct uuid *fsid)
> > > +{
> > > +	struct ext4_sb_info *sbi = EXT4_SB(sb);
> > > +	struct ext4_super_block *es = sbi->s_es;
> > > +
> > > +	memcpy(fsid->uuid, es->s_uuid, sizeof(fsid->uuid));
> > > +	/*
> > > +	 * We may want to make sure we return error if the s_uuid is not
> > > +	 * exactly unique
> > > +	 */
> > > +	return 0;
> > > +}
> > > +
> > >  #ifdef CONFIG_QUOTA
> > >  #define QTYPE2NAME(t) ((t) == USRQUOTA ? "user" : "group")
> > >  #define QTYPE2MOPT(on, t) ((t) == USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA))
> > > @@ -1109,6 +1122,7 @@ static const struct super_operations ext4_sops = {
> > >  	.quota_write	= ext4_quota_write,
> > >  #endif
> > >  	.bdev_try_to_free_page = bdev_try_to_free_page,
> > > +	.get_fsid	= ext4_get_fsid,
> > >  };
> > >  
> > >  static const struct super_operations ext4_nojournal_sops = {
> > > @@ -1128,6 +1142,7 @@ static const struct super_operations ext4_nojournal_sops = {
> > >  	.quota_write	= ext4_quota_write,
> > >  #endif
> > >  	.bdev_try_to_free_page = bdev_try_to_free_page,
> > > +	.get_fsid	= ext4_get_fsid,
> > >  };
> > 
> > This all looks pretty simple - can you add XFS support to this
> > interface (uuid is in XFS_M(sb)->m_sb.sb_uuid) so that it can be
> > tested to work on multiple filesystems.
> > 
> > FWIW, I didn't get patch 0 of this series, so I'll comment on
> > one line of it right here because it is definitely relevant:
> > 
> > > I am also looking at getting xfsprogs libhandle.so on top of these
> > > syscalls.
> > 
> > If you plan to modify libhandle to use these syscalls, then you need
> > to guarantee:
> > 
> > 	1. XFS support for the syscalls
> > 	2. the handle format, lifecycle and protections for XFS
> > 	   handles are *exactly* the same as the current XFS
> > 	   handles.  i.e. there's a fixed userspace API that
> > 	   cannot be broken.
> > 	3. you don't break any of the other XFS specific handle
> > 	   interfaces that aren't implemented by the new syscalls
> > 	3. You don't break and existing XFS utilites - dump/restore,
> > 	   and fsr come to mind immediately.
> > 	4. that you'll fix the xfstests that may break because of the
> > 	   change
> > 	5. that you'll write new tests for xfstests that validates
> > 	   that the libhandle API works correctly and that handle
> > 	   formats and lifecycles do not get accidentally changed in
> > 	   future.
> > 
> > That's a lot of work and, IMO, is completely pointless. All we'd get
> > out of it is more complexity, bloat, scope for regressions and a
> > biger test matrix, and we wouldn't have any new functionality to
> > speak of.
> 
> getting libhandle.so to work with the syscall is something that is
> suggested on the list. The goal is to see if syscall achieve everything
> that XFS ioctl does

Ok, I didn't know that, but the question still stands. The XFS ioctl
cannot go away any time soon (we basically have to support it
forever), so why should we be writing a new, redundant
kernel API for this functionality that is going not generally going
to be directly accessed by userspace developers?

APIs are hard to get right - moving and modifying kernel code to be
generic is easy in comparison, and also somethign we can easily fix
if we get it wrong the first time. Make a mistake with a syscall
API, and we are stuck with it forever.

Might I suggest a slightly different approach, then? That is,
separate the two parts of making the XFS handle code generic and
providing a new API?  We don't lose anything by separating them - we
don't introduce any new APIs that have to be supported in the first
step, nor does the functionality get delayed by API discussions.
However, we still gain immediate widespread support for handles through
the libraries *already shipping* in every major distro, and that
doesn't get held up by discussions around what the API should look
like.

Then we can work on getting a new API right - we're going to be
stuck with it forever, so it's probably better to work out how the
interface will be used outside libhandle. A new application using it
would be a great example - it's rare that an API created with only
one user is going to be a good API when more developers try to make
use of it for new applications.

There is precedence here - the FIFREEZE ioctl for freezing/thawing
filesystems from userspace іs the API that XFS has been using for
years (XFS_IOC_FREEZE) to provide the functionality. It got promoted
to the VFS when other filesystems needed userspace freezing
capabilities, but only after new syscalls were proposed first.  The
result of using the existing interface was that freeze/thaw for any
capable filesystem was immediately availble using xfs_freeze or
xfs_io - there was no lag to userspace support in distro's, no
problems having to detect and support multiple interfaces depending
on what the kernel supported, etc. Overall it made things much
simpler and easier to manage and test....

Your thoughts?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 4/9] vfs: Add open by file handle support
  2010-05-13  6:37       ` Aneesh Kumar K. V
@ 2010-05-14 10:41         ` Dave Chinner
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Chinner @ 2010-05-14 10:41 UTC (permalink / raw)
  To: Aneesh Kumar K. V
  Cc: Neil Brown, hch, viro, adilger, corbet, serue, linux-fsdevel,
	sfrench, philippe.deniel, linux-kernel

On Thu, May 13, 2010 at 12:07:02PM +0530, Aneesh Kumar K. V wrote:
> On Thu, 13 May 2010 16:09:55 +1000, Dave Chinner <david@fromorbit.com> wrote:
> > On Thu, May 13, 2010 at 09:44:22AM +1000, Neil Brown wrote:
> > > On Wed, 12 May 2010 21:20:39 +0530
> > > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> > > 
> > > > +		filp->f_flags |= O_NOATIME;
> > > > +		filp->f_mode |= FMODE_NOCMTIME;
> > > > +	}
> > > 
> > > I think you need a comment here explaining the rational for these setting.
> > 
> > If you've never seen how applications use the XFS handle interface
> > in conjunction with other XFS functionality, then I guess if would
> > seem like bad voodoo.
> > 
> > > Why is O_NOATIME important IFREG but not for IFDIR?
> > 
> > No application has ever required directory access or modification
> > via the handle interface to be invisible to the rest of the system.
> > 
> > > Why is it not sufficient to honour O_NOATIME that is passed in.
> > 
> > Because the XFS handle library is cross platform and predates
> > O_NOATIME on linux. Hence the library it has never set that flag and
> > always relied on the kernel implementation of the API to ensure
> > atime was never updated on fds derived from handles..
> > 
> > > How can you ever justify setting FMODE_NOCMTIME ?
> > 
> > Quite easily. ;)
> > 
> > The XFS handle interface was designed specifically to allow
> > applications to execute silent/invisible movement of data in, out
> > and around the filesystem without leaving user visible traces in
> > file metadata. This enables backup or filesysetm utilities that
> > operate on active filesystems need to be able to access or modify
> > inodes and data without affecting running applications.  It's a
> > feature of the handle interface, and used by xfs_dump, xfs_fsr,
> > SGI's HSM, etc to do stuff that isn't otherwise possible.
> > 
> > FWIW, if you are curious, here's the initial commit of the XFS
> > handle code into Irix tree from 3 Sep 1994, showing that the initial
> > XFS open_by_handle() implementation sets the FINVIS flag to trigger
> > invisible IO semantics:
> > 
> > http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=commitdiff;h=575b66fae833429a51fcadb204d45521c2dfc26f
> 
> Thanks for sharing this. I haven't looked at the details you mentioned here.
> 
> > 
> > > I guess you are just copying from xfs code, but it still needs justification.
> > 
> > 	"They are intended for use by a limited set of system
> > 	utilities such as backup programs."
> > 
> > 		- open_by_handle(3) man page
> > 
> 
> Should we retain all the above behaviour in the new syscall ?. Or just
> do what a normal open(2) call does ?

I'm not sure that FMODE_NOCMTIME can be set from userspace at the
moment. In fs.h:

  82 /*
  83  * Don't update ctime and mtime.
  84  *
  85  * Currently a special hack for the XFS open_by_handle ioctl, but we'll
  86  * hopefully graduate it to a proper O_CMTIME flag supported by open(2) soon.
  87  */
  88 #define FMODE_NOCMTIME          ((__force fmode_t)0x800)

Perhaps we need to introduce O_NOCMTIME as the comment suggests, and
then the new handle code doesn't need to automatically set it. If
libhandle is converted, then it could set the open flags as
necessary...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 5/9] vfs: Add freadlink syscall
  2010-05-13  6:56       ` Neil Brown
  2010-05-13  7:34         ` Aneesh Kumar K. V
@ 2010-05-14 11:18         ` Aneesh Kumar K. V
  1 sibling, 0 replies; 49+ messages in thread
From: Aneesh Kumar K. V @ 2010-05-14 11:18 UTC (permalink / raw)
  To: Neil Brown
  Cc: hch, viro, adilger, corbet, serue, linux-fsdevel, sfrench,
	philippe.deniel, linux-kernel

On Thu, 13 May 2010 16:56:06 +1000, Neil Brown <neilb@suse.de> wrote:
> On Thu, 13 May 2010 11:55:56 +0530
> "Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > On Thu, 13 May 2010 11:43:51 +1000, Neil Brown <neilb@suse.de> wrote:
> > > On Wed, 12 May 2010 21:20:40 +0530
> > > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> > > 
> > > > This enables to use open-by-handle and then get the link target
> > > > details of a symlink using the fd returned by handle
> > > > 
> > > 
> > > 
> > > I find it very frustrating that a new syscall seems to be needed here.
> > > We have 'readlinkat', and it should be enough.
> > > How:  the 'dfd' has to be a 'directory', and the path name as to be non-empty.
> > > 
> > > The following patch allows 'path' to be NULL and in that case 'dfd' to be a
> > > non-directory.  This allows readlinkat and faccessat (and probably others)
> > > to be used on an fd with not following path name.
> > > 
> > > What do people think of this alternative?
> > > 
> > 
> > I will add this in the next iteration and drop the freadlink syscall.
> > 
> 
> I wouldn't be quite that hasty.  It was only a proposal.  It might not be
> appropriate to change all those syscalls..
> 
> An alternative that I don't think is a nice, but is a lot 'safer' is
> to just change readlinkat to accept a NULL path:
> 
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> diff --git a/fs/stat.c b/fs/stat.c
> index c4ecd52..85d0856 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -284,26 +284,40 @@ SYSCALL_DEFINE2(newfstat, unsigned int, fd, struct stat __user *, statbuf)
>  SYSCALL_DEFINE4(readlinkat, int, dfd, const char __user *, pathname,
>  		char __user *, buf, int, bufsiz)
>  {
> -	struct path path;
> +	struct path path, *pp;
> +	struct file *file = NULL;
>  	int error;
> 
>  	if (bufsiz <= 0)
>  		return -EINVAL;
> 
> -	error = user_path_at(dfd, pathname, 0, &path);
> +	if (filename == NULL && dfd != AT_FDCWD) {
> +		struct file *file = fget(dfd);
> +
> +		if (file)
> +			pp = &file->f_path;
> +		else
> +			error = -EBADF;
> +	} else {
> +		error = user_path_at(dfd, pathname, 0, &path);
> +		pp = &path;
> +	}
>  	if (!error) {
> -		struct inode *inode = path.dentry->d_inode;
> +		struct inode *inode = pp->dentry->d_inode;
> 
>  		error = -EINVAL;
>  		if (inode->i_op->readlink) {
> -			error = security_inode_readlink(path.dentry);
> +			error = security_inode_readlink(pp->dentry);
>  			if (!error) {
> -				touch_atime(path.mnt, path.dentry);
> -				error = inode->i_op->readlink(path.dentry,
> +				touch_atime(pp->mnt, pp->dentry);
> +				error = inode->i_op->readlink(pp->dentry,
>  							      buf, bufsiz);
>  			}
>  		}
> -		path_put(&path);
> +		if (file)
> +			fput(file);
> +		else
> +			path_put(&path);
>  	}
>  	return error;
>  }

I updated the patch to fix some compile errors. I need below patch to get
it work with rest of the changes. Either we do the below or we can add a
new syscall to get link target directly from handle. Any preference ?

commit 7ec1f4cc592383a8f13fbd9b0790c5c0988a89ee
Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date:   Fri May 14 16:35:20 2010 +0530

    vfs: Allow handle based open on symlinks
    
    The patch update may_open to allow handle based open on symlinks.
    The file handle based API use file descritor returned from open_by_handle_at
    to do different file system operations. To find the link target name we
    need to get a file descriptor on symlinks.
    
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

diff --git a/fs/namei.c b/fs/namei.c
index 7efcfb5..56c1d99 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1421,7 +1421,7 @@ int vfs_create(struct inode *dir, struct dentry *dentry, int mode,
 	return error;
 }
 
-int may_open(struct path *path, int acc_mode, int flag)
+static int __may_open(struct path *path, int acc_mode, int flag, int handle)
 {
 	struct dentry *dentry = path->dentry;
 	struct inode *inode = dentry->d_inode;
@@ -1432,7 +1432,13 @@ int may_open(struct path *path, int acc_mode, int flag)
 
 	switch (inode->i_mode & S_IFMT) {
 	case S_IFLNK:
-		return -ELOOP;
+		/*
+		 * For file handle based open we should allow
+		 * open of symlink.
+		 */
+		if (!handle)
+			return -ELOOP;
+		break;
 	case S_IFDIR:
 		if (acc_mode & MAY_WRITE)
 			return -EISDIR;
@@ -1472,6 +1478,11 @@ int may_open(struct path *path, int acc_mode, int flag)
 	return break_lease(inode, flag);
 }
 
+int may_open(struct path *path, int acc_mode, int flag)
+{
+	return __may_open(path, acc_mode, flag, 0);
+}
+
 static int handle_truncate(struct path *path)
 {
 	struct inode *inode = path->dentry->d_inode;
@@ -1569,7 +1580,7 @@ struct file *finish_open_path(struct path *path,
 		if (error)
 			goto exit;
 	}
-	error = may_open(path, acc_mode, open_flag);
+	error = __may_open(path, acc_mode, open_flag, 1);
 	if (error) {
 		if (will_truncate)
 			mnt_drop_write(path->mnt);

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 3/9] vfs: Add name to file handle conversion support
  2010-05-13 22:54                 ` Andreas Dilger
@ 2010-05-14 17:25                   ` Al Viro
  2010-05-14 18:18                     ` Aneesh Kumar K. V
  0 siblings, 1 reply; 49+ messages in thread
From: Al Viro @ 2010-05-14 17:25 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Aneesh Kumar K. V, Neil Brown, Christoph Hellwig, Jonathan Corbet,
	Serge Hallyn, linux-fsdevel, Steven French, Philippe Deniel,
	linux-kernel@vger.kernel.org Mailinglist

On Thu, May 13, 2010 at 04:54:11PM -0600, Andreas Dilger wrote:

> fd = 0 is a valid descriptor, though -1 is not.  I'd be OK with this, because it still means we can use the UUID to positively identify the filesystem (it would also avoid the need to do a by-UUID lookup in most cases, since the dirfd would already point to a filesystem and we can just compare the handle UUID with the UUID for that superblock.
> 
> The question is what to do if the UUID in the file handle does not match the filesystem that is passed via vfsmount?  It would seem that vfsmount is needed for determining the namespace and possibly disambiguating multiple mounts of snapshots with the same UUID.  Independent of that if the UUID is not matching the underlying filesystem that the vfsmount is pointing to the open should fail with -ESTALE.
> 
> If the UUID is not the same, then the chance that the inode/generation stored in the file handle are still useful is slim.  If someone is cloning the filesystem down to the inode number, then they can clone the UUID as well.
> 
> In summary, I'm OK with this approach.

I am not.  First of all, it relies on order of vfsmounts in the list to
decide which one gets picked; as far as I'm concerned, that alone is enough
to declare it bogus.  "Let's pick the random answer" is wrong outside of
/dev/random and it's got way too little entropy for that use ;-)

What's more, there's absolutely nothing to stop you from having fsid -> fd
cache in userland and populating it on demand from whatever means you are
using to determine the mapping from fsid to fs.

IOW, NAK.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 6/9] ext4: Add get_fsid callback
  2010-05-12 15:50 ` [PATCH -V7 6/9] ext4: Add get_fsid callback Aneesh Kumar K.V
  2010-05-13  3:11   ` Dave Chinner
@ 2010-05-14 17:32   ` Coly Li
  2010-05-14 18:21     ` Aneesh Kumar K. V
  1 sibling, 1 reply; 49+ messages in thread
From: Coly Li @ 2010-05-14 17:32 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: hch, viro, adilger, corbet, serue, neilb, linux-fsdevel, sfrench,
	philippe.deniel, linux-kernel



On 05/12/2010 11:50 PM, Aneesh Kumar K.V Wrote:
> Acked-by: Serge Hallyn <serue@us.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  fs/ext4/super.c |   15 +++++++++++++++
>  1 files changed, 15 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index e14d22c..fc7d464 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1049,6 +1049,19 @@ static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
>  	return try_to_free_buffers(page);
>  }
>  
> +static int ext4_get_fsid(struct super_block *sb, struct uuid *fsid)
> +{
> +	struct ext4_sb_info *sbi = EXT4_SB(sb);
> +	struct ext4_super_block *es = sbi->s_es;
> +
> +	memcpy(fsid->uuid, es->s_uuid, sizeof(fsid->uuid));
> +	/*
> +	 * We may want to make sure we return error if the s_uuid is not
> +	 * exactly unique
> +	 */
> +	return 0;
> +}
> +
>  #ifdef CONFIG_QUOTA
>  #define QTYPE2NAME(t) ((t) == USRQUOTA ? "user" : "group")
>  #define QTYPE2MOPT(on, t) ((t) == USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA))
> @@ -1109,6 +1122,7 @@ static const struct super_operations ext4_sops = {
>  	.quota_write	= ext4_quota_write,
>  #endif
>  	.bdev_try_to_free_page = bdev_try_to_free_page,
> +	.get_fsid	= ext4_get_fsid,
>  };
>  
>  static const struct super_operations ext4_nojournal_sops = {
> @@ -1128,6 +1142,7 @@ static const struct super_operations ext4_nojournal_sops = {
>  	.quota_write	= ext4_quota_write,
>  #endif
>  	.bdev_try_to_free_page = bdev_try_to_free_page,
> +	.get_fsid	= ext4_get_fsid,
>  };
>  

Hi Aneesh,

Just wondering why not call sb->s_op->statfs() to get the fsid in do_sys_name_to_handle() ?
The overhead is a little bit more code to get fsid from struct kstatfs, but we can gain,
1) avoid to have one more callback in struct super_operations.
2) more file system can be supported (since most of file systems have statfs() call back)

Thanks.
-- 
Coly Li
SuSE Labs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 3/9] vfs: Add name to file handle conversion support
  2010-05-14 17:25                   ` Al Viro
@ 2010-05-14 18:18                     ` Aneesh Kumar K. V
  2010-05-14 18:40                       ` Al Viro
  0 siblings, 1 reply; 49+ messages in thread
From: Aneesh Kumar K. V @ 2010-05-14 18:18 UTC (permalink / raw)
  To: Al Viro, Andreas Dilger
  Cc: Neil Brown, Christoph Hellwig, Jonathan Corbet, Serge Hallyn,
	linux-fsdevel, Steven French, Philippe Deniel,
	linux-kernel@vger.kernel.org Mailinglist

On Fri, 14 May 2010 18:25:10 +0100, Al Viro <viro@ZenIV.linux.org.uk> wrote:
> On Thu, May 13, 2010 at 04:54:11PM -0600, Andreas Dilger wrote:
> 
> > fd = 0 is a valid descriptor, though -1 is not.  I'd be OK with this, because it still means we can use the UUID to positively identify the filesystem (it would also avoid the need to do a by-UUID lookup in most cases, since the dirfd would already point to a filesystem and we can just compare the handle UUID with the UUID for that superblock.
> > 
> > The question is what to do if the UUID in the file handle does not match the filesystem that is passed via vfsmount?  It would seem that vfsmount is needed for determining the namespace and possibly disambiguating multiple mounts of snapshots with the same UUID.  Independent of that if the UUID is not matching the underlying filesystem that the vfsmount is pointing to the open should fail with -ESTALE.
> > 
> > If the UUID is not the same, then the chance that the inode/generation stored in the file handle are still useful is slim.  If someone is cloning the filesystem down to the inode number, then they can clone the UUID as well.
> > 
> > In summary, I'm OK with this approach.
> 
> I am not.  First of all, it relies on order of vfsmounts in the list to
> decide which one gets picked; as far as I'm concerned, that alone is enough
> to declare it bogus.  "Let's pick the random answer" is wrong outside of
> /dev/random and it's got way too little entropy for that use ;-)
> 
> What's more, there's absolutely nothing to stop you from having fsid -> fd
> cache in userland and populating it on demand from whatever means you are
> using to determine the mapping from fsid to fs.
> 
> IOW, NAK.

How about
sys_open_by_handle(int mountdirfd,
                  struct file_handle __user *ufh, int open_flag)

and we validate the UUID present in the file_handle by using mountdirfd
vfsmount. In otherwords we drop the UUID based vfsmount lookup, But
still add UUID as a part of file handle ?

-aneesh


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 6/9] ext4: Add get_fsid callback
  2010-05-14 17:32   ` Coly Li
@ 2010-05-14 18:21     ` Aneesh Kumar K. V
  2010-05-14 19:08       ` Coly Li
  0 siblings, 1 reply; 49+ messages in thread
From: Aneesh Kumar K. V @ 2010-05-14 18:21 UTC (permalink / raw)
  To: coly.li
  Cc: hch, viro, adilger, corbet, serue, neilb, linux-fsdevel, sfrench,
	philippe.deniel, linux-kernel

On Sat, 15 May 2010 01:32:35 +0800, Coly Li <coly.li@suse.de> wrote:
> 
> 
> On 05/12/2010 11:50 PM, Aneesh Kumar K.V Wrote:
> > Acked-by: Serge Hallyn <serue@us.ibm.com>
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > ---
> >  fs/ext4/super.c |   15 +++++++++++++++
> >  1 files changed, 15 insertions(+), 0 deletions(-)
> > 
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index e14d22c..fc7d464 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -1049,6 +1049,19 @@ static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
> >  	return try_to_free_buffers(page);
> >  }
> >  
> > +static int ext4_get_fsid(struct super_block *sb, struct uuid *fsid)
> > +{
> > +	struct ext4_sb_info *sbi = EXT4_SB(sb);
> > +	struct ext4_super_block *es = sbi->s_es;
> > +
> > +	memcpy(fsid->uuid, es->s_uuid, sizeof(fsid->uuid));
> > +	/*
> > +	 * We may want to make sure we return error if the s_uuid is not
> > +	 * exactly unique
> > +	 */
> > +	return 0;
> > +}
> > +
> >  #ifdef CONFIG_QUOTA
> >  #define QTYPE2NAME(t) ((t) == USRQUOTA ? "user" : "group")
> >  #define QTYPE2MOPT(on, t) ((t) == USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA))
> > @@ -1109,6 +1122,7 @@ static const struct super_operations ext4_sops = {
> >  	.quota_write	= ext4_quota_write,
> >  #endif
> >  	.bdev_try_to_free_page = bdev_try_to_free_page,
> > +	.get_fsid	= ext4_get_fsid,
> >  };
> >  
> >  static const struct super_operations ext4_nojournal_sops = {
> > @@ -1128,6 +1142,7 @@ static const struct super_operations ext4_nojournal_sops = {
> >  	.quota_write	= ext4_quota_write,
> >  #endif
> >  	.bdev_try_to_free_page = bdev_try_to_free_page,
> > +	.get_fsid	= ext4_get_fsid,
> >  };
> >  
> 
> Hi Aneesh,
> 
> Just wondering why not call sb->s_op->statfs() to get the fsid in do_sys_name_to_handle() ?
> The overhead is a little bit more code to get fsid from struct kstatfs, but we can gain,
> 1) avoid to have one more callback in struct super_operations.
> 2) more file system can be supported (since most of file systems have statfs() call back)
> 

sb->s_op->statfs returns fsid of type __kernel_fsid_t which is just
64bits.

-aneesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 3/9] vfs: Add name to file handle conversion support
  2010-05-14 18:18                     ` Aneesh Kumar K. V
@ 2010-05-14 18:40                       ` Al Viro
  2010-05-15  5:31                         ` Aneesh Kumar K. V
  0 siblings, 1 reply; 49+ messages in thread
From: Al Viro @ 2010-05-14 18:40 UTC (permalink / raw)
  To: Aneesh Kumar K. V
  Cc: Andreas Dilger, Neil Brown, Christoph Hellwig, Jonathan Corbet,
	Serge Hallyn, linux-fsdevel, Steven French, Philippe Deniel,
	linux-kernel@vger.kernel.org Mailinglist

On Fri, May 14, 2010 at 11:48:24PM +0530, Aneesh Kumar K. V wrote:

> How about
> sys_open_by_handle(int mountdirfd,
>                   struct file_handle __user *ufh, int open_flag)
> 
> and we validate the UUID present in the file_handle by using mountdirfd
> vfsmount. In otherwords we drop the UUID based vfsmount lookup, But
> still add UUID as a part of file handle ?

IDGI.  If you have decided upon the mountdirfd, why bother with rechecking?
You still need to handle the case when fs tells you to take a hike because
fileid was invalid.  What does duplicate uuid check in the kernel buy you?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 6/9] ext4: Add get_fsid callback
  2010-05-14 18:21     ` Aneesh Kumar K. V
@ 2010-05-14 19:08       ` Coly Li
  0 siblings, 0 replies; 49+ messages in thread
From: Coly Li @ 2010-05-14 19:08 UTC (permalink / raw)
  To: Aneesh Kumar K. V
  Cc: hch, viro, adilger, corbet, serue, neilb, linux-fsdevel, sfrench,
	philippe.deniel, linux-kernel



On 05/15/2010 02:21 AM, Aneesh Kumar K. V Wrote:
> On Sat, 15 May 2010 01:32:35 +0800, Coly Li <coly.li@suse.de> wrote:
>>
>>
>> On 05/12/2010 11:50 PM, Aneesh Kumar K.V Wrote:
>>> Acked-by: Serge Hallyn <serue@us.ibm.com>
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>>> ---
>>>  fs/ext4/super.c |   15 +++++++++++++++
>>>  1 files changed, 15 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>>> index e14d22c..fc7d464 100644
>>> --- a/fs/ext4/super.c
>>> +++ b/fs/ext4/super.c
>>> @@ -1049,6 +1049,19 @@ static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
>>>  	return try_to_free_buffers(page);
>>>  }
>>>  
>>> +static int ext4_get_fsid(struct super_block *sb, struct uuid *fsid)
>>> +{
>>> +	struct ext4_sb_info *sbi = EXT4_SB(sb);
>>> +	struct ext4_super_block *es = sbi->s_es;
>>> +
>>> +	memcpy(fsid->uuid, es->s_uuid, sizeof(fsid->uuid));
>>> +	/*
>>> +	 * We may want to make sure we return error if the s_uuid is not
>>> +	 * exactly unique
>>> +	 */
>>> +	return 0;
>>> +}
>>> +
>>>  #ifdef CONFIG_QUOTA
>>>  #define QTYPE2NAME(t) ((t) == USRQUOTA ? "user" : "group")
>>>  #define QTYPE2MOPT(on, t) ((t) == USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA))
>>> @@ -1109,6 +1122,7 @@ static const struct super_operations ext4_sops = {
>>>  	.quota_write	= ext4_quota_write,
>>>  #endif
>>>  	.bdev_try_to_free_page = bdev_try_to_free_page,
>>> +	.get_fsid	= ext4_get_fsid,
>>>  };
>>>  
>>>  static const struct super_operations ext4_nojournal_sops = {
>>> @@ -1128,6 +1142,7 @@ static const struct super_operations ext4_nojournal_sops = {
>>>  	.quota_write	= ext4_quota_write,
>>>  #endif
>>>  	.bdev_try_to_free_page = bdev_try_to_free_page,
>>> +	.get_fsid	= ext4_get_fsid,
>>>  };
>>>  
>>
>> Hi Aneesh,
>>
>> Just wondering why not call sb->s_op->statfs() to get the fsid in do_sys_name_to_handle() ?
>> The overhead is a little bit more code to get fsid from struct kstatfs, but we can gain,
>> 1) avoid to have one more callback in struct super_operations.
>> 2) more file system can be supported (since most of file systems have statfs() call back)
>>
> 
> sb->s_op->statfs returns fsid of type __kernel_fsid_t which is just
> 64bits.
> 

I don't check the user space code of the patch series, anyway f_fsid from statfs(2) is designed to identify a file
system volume within a single system. Correct me if I am wrong, I guess handle->fsid is invisible from other systems,
maybe the f_fsid is enough here ?

Thanks.

-- 
Coly Li
SuSE Labs

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 4/9] vfs: Add open by file handle support
@ 2010-05-14 19:56 Steve French
  2010-05-16  7:24 ` Aneesh Kumar K. V
  0 siblings, 1 reply; 49+ messages in thread
From: Steve French @ 2010-05-14 19:56 UTC (permalink / raw)
  To: linux-fsdevel, LKML

I think open by handle will turn out to be useful, but in discussing
various "duplicate inode number" checks that we are having to add to
cifs, it reminded me that we probably need a "generation number" or
some equivalent (birth time is probably good enough as well) to be
able to tell the case where a file is deleted and new file is created
reusing the same inode number (eventually Samba needs to return this
to posix clients if inode numbers are to be useful - and I don't know
how to tell Samba how to get birth time or generation numbers out of
stat in userspace)

-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 3/9] vfs: Add name to file handle conversion support
  2010-05-14 18:40                       ` Al Viro
@ 2010-05-15  5:31                         ` Aneesh Kumar K. V
  2010-05-15  6:00                           ` Al Viro
  0 siblings, 1 reply; 49+ messages in thread
From: Aneesh Kumar K. V @ 2010-05-15  5:31 UTC (permalink / raw)
  To: Al Viro
  Cc: Andreas Dilger, Neil Brown, Christoph Hellwig, Jonathan Corbet,
	Serge Hallyn, linux-fsdevel, Steven French, Philippe Deniel,
	linux-kernel@vger.kernel.org Mailinglist

On Fri, 14 May 2010 19:40:44 +0100, Al Viro <viro@ZenIV.linux.org.uk> wrote:
> On Fri, May 14, 2010 at 11:48:24PM +0530, Aneesh Kumar K. V wrote:
> 
> > How about
> > sys_open_by_handle(int mountdirfd,
> >                   struct file_handle __user *ufh, int open_flag)
> > 
> > and we validate the UUID present in the file_handle by using mountdirfd
> > vfsmount. In otherwords we drop the UUID based vfsmount lookup, But
> > still add UUID as a part of file handle ?
> 
> IDGI.  If you have decided upon the mountdirfd, why bother with rechecking?
> You still need to handle the case when fs tells you to take a hike because
> fileid was invalid.  What does duplicate uuid check in the kernel buy you?

We need to have a file system identifier to identify a file system with
respect to which file_id need to be decoded. We can either do it in
userspace via statfs or any other encoding or let the kernel fill the
field for userspace. Now having kernel providing both file_id and file
system id implies we can get an 'unique' handle with one syscall. (Not
unique if we have multiple file system with same UUID. Userspace
file server can find out whether all the mount point it is exporting
have a unique UUID during startup.) 

Now if we have UUID as a part of file handle, adding the check make
sure that we are passing the right file handle with the correct mountdirfd.
If they don't match return -ESTALE. 

Adding UUID as a part of file handle also help us to get a 16 byte UUID
of the file system in userspace in a simple syscall interface, rather
than using file system specific libraries like libext2fs.

-aneesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 3/9] vfs: Add name to file handle conversion support
  2010-05-15  5:31                         ` Aneesh Kumar K. V
@ 2010-05-15  6:00                           ` Al Viro
  2010-05-15 15:28                             ` Aneesh Kumar K. V
  0 siblings, 1 reply; 49+ messages in thread
From: Al Viro @ 2010-05-15  6:00 UTC (permalink / raw)
  To: Aneesh Kumar K. V
  Cc: Andreas Dilger, Neil Brown, Christoph Hellwig, Jonathan Corbet,
	Serge Hallyn, linux-fsdevel, Steven French, Philippe Deniel,
	linux-kernel@vger.kernel.org Mailinglist

On Sat, May 15, 2010 at 11:01:13AM +0530, Aneesh Kumar K. V wrote:

> We need to have a file system identifier to identify a file system with
> respect to which file_id need to be decoded.

Obviously.

> We can either do it in
> userspace via statfs or any other encoding or let the kernel fill the
> field for userspace. Now having kernel providing both file_id and file
> system id implies we can get an 'unique' handle with one syscall. (Not
> unique if we have multiple file system with same UUID. Userspace
> file server can find out whether all the mount point it is exporting
> have a unique UUID during startup.) 

Or it can decide which UUID to use by parsing its analog of /etc/exports, or
any number of other schemes.  Sure.

> Now if we have UUID as a part of file handle, adding the check make
> sure that we are passing the right file handle with the correct mountdirfd.
> If they don't match return -ESTALE. 

Pardon?  Your variant needs to choose mountdirfd somehow and it has to be
done by UUID found in fhandle it has received.  Whatever userland code will
be doing that, you
	a) need to open() these suckers at some point
	b) do _NOT_ want to reopen them on each incoming fhandle, for obvious
efficiency reasons
	c) will have very few opens compared to the amount of fhandles you
are going to handle (number of filesystems vs. number of requested accesses
to them)
	d) can be sure that identity of fs hosting an opened directory will
_not_ change while it's open, so the checks concerning the validity of
UUID -> mountdirfd mapping you need to do should be done only once per
opening mountdirfd

So why the devil would you repeat them on each open-by-fhandle in a variant
that gets mountdirfd from caller?
 
> Adding UUID as a part of file handle also help us to get a 16 byte UUID
> of the file system in userspace in a simple syscall interface, rather
> than using file system specific libraries like libext2fs.

Your choice of fsid encoding is orthogonal to my question; again, the question
is "why would you want to have open-by-fhandle do anything with fsid, if it
gets filesystem identified by opened file descriptor passed by userland anyway?"

I don't think that UUID is always a good choice, BTW, but that's a separate
issue.  For one thing, not every fs _has_ UUID, so while that might be a
reasonable default, it doesn't make sense to hardwire it into the design.
For another, you might want a different policy (e.g. you might want to
allow explicitly configured "this fsid value corresponds to whatever's mounted
on that path" for some userland server), so again it doesn't make any sense
to hardwire that one.

But that's a completely independent issue; whether you use hardwire UUID into
the thing or not, rechecking that yes, mountdirfd is on the same fs as it
used to back when we'd opened it, is pointless.  It *will* be on the same fs,
so checking that we'd got the right filesystem should be done once per
opening mountdirfd, not once per opening file by fhandle.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 6/9] ext4: Add get_fsid callback
  2010-05-14  1:44       ` Dave Chinner
@ 2010-05-15  6:09         ` Aneesh Kumar K. V
  0 siblings, 0 replies; 49+ messages in thread
From: Aneesh Kumar K. V @ 2010-05-15  6:09 UTC (permalink / raw)
  To: Dave Chinner
  Cc: hch, viro, adilger, corbet, serue, neilb, linux-fsdevel, sfrench,
	philippe.deniel, linux-kernel

On Fri, 14 May 2010 11:44:04 +1000, Dave Chinner <david@fromorbit.com> wrote:
> On Thu, May 13, 2010 at 12:02:52PM +0530, Aneesh Kumar K. V wrote:
> > On Thu, 13 May 2010 13:11:33 +1000, Dave Chinner <david@fromorbit.com> wrote:
> > > On Wed, May 12, 2010 at 09:20:41PM +0530, Aneesh Kumar K.V wrote:
> > > > Acked-by: Serge Hallyn <serue@us.ibm.com>
> > > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > > > ---
> > > >  fs/ext4/super.c |   15 +++++++++++++++
> > > >  1 files changed, 15 insertions(+), 0 deletions(-)
> > > > 
> > > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > > > index e14d22c..fc7d464 100644
> > > > --- a/fs/ext4/super.c
> > > > +++ b/fs/ext4/super.c
> > > > @@ -1049,6 +1049,19 @@ static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
> > > >  	return try_to_free_buffers(page);
> > > >  }
> > > >  
> > > > +static int ext4_get_fsid(struct super_block *sb, struct uuid *fsid)
> > > > +{
> > > > +	struct ext4_sb_info *sbi = EXT4_SB(sb);
> > > > +	struct ext4_super_block *es = sbi->s_es;
> > > > +
> > > > +	memcpy(fsid->uuid, es->s_uuid, sizeof(fsid->uuid));
> > > > +	/*
> > > > +	 * We may want to make sure we return error if the s_uuid is not
> > > > +	 * exactly unique
> > > > +	 */
> > > > +	return 0;
> > > > +}
> > > > +
> > > >  #ifdef CONFIG_QUOTA
> > > >  #define QTYPE2NAME(t) ((t) == USRQUOTA ? "user" : "group")
> > > >  #define QTYPE2MOPT(on, t) ((t) == USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA))
> > > > @@ -1109,6 +1122,7 @@ static const struct super_operations ext4_sops = {
> > > >  	.quota_write	= ext4_quota_write,
> > > >  #endif
> > > >  	.bdev_try_to_free_page = bdev_try_to_free_page,
> > > > +	.get_fsid	= ext4_get_fsid,
> > > >  };
> > > >  
> > > >  static const struct super_operations ext4_nojournal_sops = {
> > > > @@ -1128,6 +1142,7 @@ static const struct super_operations ext4_nojournal_sops = {
> > > >  	.quota_write	= ext4_quota_write,
> > > >  #endif
> > > >  	.bdev_try_to_free_page = bdev_try_to_free_page,
> > > > +	.get_fsid	= ext4_get_fsid,
> > > >  };
> > > 
> > > This all looks pretty simple - can you add XFS support to this
> > > interface (uuid is in XFS_M(sb)->m_sb.sb_uuid) so that it can be
> > > tested to work on multiple filesystems.
> > > 
> > > FWIW, I didn't get patch 0 of this series, so I'll comment on
> > > one line of it right here because it is definitely relevant:
> > > 
> > > > I am also looking at getting xfsprogs libhandle.so on top of these
> > > > syscalls.
> > > 
> > > If you plan to modify libhandle to use these syscalls, then you need
> > > to guarantee:
> > > 
> > > 	1. XFS support for the syscalls
> > > 	2. the handle format, lifecycle and protections for XFS
> > > 	   handles are *exactly* the same as the current XFS
> > > 	   handles.  i.e. there's a fixed userspace API that
> > > 	   cannot be broken.
> > > 	3. you don't break any of the other XFS specific handle
> > > 	   interfaces that aren't implemented by the new syscalls
> > > 	3. You don't break and existing XFS utilites - dump/restore,
> > > 	   and fsr come to mind immediately.
> > > 	4. that you'll fix the xfstests that may break because of the
> > > 	   change
> > > 	5. that you'll write new tests for xfstests that validates
> > > 	   that the libhandle API works correctly and that handle
> > > 	   formats and lifecycles do not get accidentally changed in
> > > 	   future.
> > > 
> > > That's a lot of work and, IMO, is completely pointless. All we'd get
> > > out of it is more complexity, bloat, scope for regressions and a
> > > biger test matrix, and we wouldn't have any new functionality to
> > > speak of.
> > 
> > getting libhandle.so to work with the syscall is something that is
> > suggested on the list. The goal is to see if syscall achieve everything
> > that XFS ioctl does
> 
> Ok, I didn't know that, but the question still stands. The XFS ioctl
> cannot go away any time soon (we basically have to support it
> forever), so why should we be writing a new, redundant
> kernel API for this functionality that is going not generally going
> to be directly accessed by userspace developers?
> 
> APIs are hard to get right - moving and modifying kernel code to be
> generic is easy in comparison, and also somethign we can easily fix
> if we get it wrong the first time. Make a mistake with a syscall
> API, and we are stuck with it forever.
> 
> Might I suggest a slightly different approach, then? That is,
> separate the two parts of making the XFS handle code generic and
> providing a new API?  We don't lose anything by separating them - we
> don't introduce any new APIs that have to be supported in the first
> step, nor does the functionality get delayed by API discussions.
> However, we still gain immediate widespread support for handles through
> the libraries *already shipping* in every major distro, and that
> doesn't get held up by discussions around what the API should look
> like.
> 
> Then we can work on getting a new API right - we're going to be
> stuck with it forever, so it's probably better to work out how the
> interface will be used outside libhandle. A new application using it
> would be a great example - it's rare that an API created with only
> one user is going to be a good API when more developers try to make
> use of it for new applications.
> 
> There is precedence here - the FIFREEZE ioctl for freezing/thawing
> filesystems from userspace іs the API that XFS has been using for
> years (XFS_IOC_FREEZE) to provide the functionality. It got promoted
> to the VFS when other filesystems needed userspace freezing
> capabilities, but only after new syscalls were proposed first.  The
> result of using the existing interface was that freeze/thaw for any
> capable filesystem was immediately availble using xfs_freeze or
> xfs_io - there was no lag to userspace support in distro's, no
> problems having to detect and support multiple interfaces depending
> on what the kernel supported, etc. Overall it made things much
> simpler and easier to manage and test....
> 
> Your thoughts?
> 

Howabout continuing with syscall patchset trying to see if we can get it
merged in the next merge window. If it appears that a merge in the next
merge window is difficult, I can definitely try the ioctl approach you
outlined above

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 3/9] vfs: Add name to file handle conversion support
  2010-05-15  6:00                           ` Al Viro
@ 2010-05-15 15:28                             ` Aneesh Kumar K. V
  0 siblings, 0 replies; 49+ messages in thread
From: Aneesh Kumar K. V @ 2010-05-15 15:28 UTC (permalink / raw)
  To: Al Viro
  Cc: Andreas Dilger, Neil Brown, Christoph Hellwig, Jonathan Corbet,
	Serge Hallyn, linux-fsdevel, Steven French, Philippe Deniel,
	linux-kernel@vger.kernel.org Mailinglist

On Sat, 15 May 2010 07:00:23 +0100, Al Viro <viro@ZenIV.linux.org.uk> wrote:
> On Sat, May 15, 2010 at 11:01:13AM +0530, Aneesh Kumar K. V wrote:
> 
> > We need to have a file system identifier to identify a file system with
> > respect to which file_id need to be decoded.
> 
> Obviously.
> 
> > We can either do it in
> > userspace via statfs or any other encoding or let the kernel fill the
> > field for userspace. Now having kernel providing both file_id and file
> > system id implies we can get an 'unique' handle with one syscall. (Not
> > unique if we have multiple file system with same UUID. Userspace
> > file server can find out whether all the mount point it is exporting
> > have a unique UUID during startup.) 
> 
> Or it can decide which UUID to use by parsing its analog of /etc/exports, or
> any number of other schemes.  Sure.
> 
> > Now if we have UUID as a part of file handle, adding the check make
> > sure that we are passing the right file handle with the correct mountdirfd.
> > If they don't match return -ESTALE. 
> 
> Pardon?  Your variant needs to choose mountdirfd somehow and it has to be
> done by UUID found in fhandle it has received.  Whatever userland code will
> be doing that, you
> 	a) need to open() these suckers at some point
> 	b) do _NOT_ want to reopen them on each incoming fhandle, for obvious
> efficiency reasons
> 	c) will have very few opens compared to the amount of fhandles you
> are going to handle (number of filesystems vs. number of requested accesses
> to them)
> 	d) can be sure that identity of fs hosting an opened directory will
> _not_ change while it's open, so the checks concerning the validity of
> UUID -> mountdirfd mapping you need to do should be done only once per
> opening mountdirfd
> 
> So why the devil would you repeat them on each open-by-fhandle in a variant
> that gets mountdirfd from caller?
> 
> > Adding UUID as a part of file handle also help us to get a 16 byte UUID
> > of the file system in userspace in a simple syscall interface, rather
> > than using file system specific libraries like libext2fs.
> 
> Your choice of fsid encoding is orthogonal to my question; again, the question
> is "why would you want to have open-by-fhandle do anything with fsid, if it
> gets filesystem identified by opened file descriptor passed by userland anyway?"
> 
> I don't think that UUID is always a good choice, BTW, but that's a separate
> issue.  For one thing, not every fs _has_ UUID, so while that might be a
> reasonable default, it doesn't make sense to hardwire it into the design.
> For another, you might want a different policy (e.g. you might want to
> allow explicitly configured "this fsid value corresponds to whatever's mounted
> on that path" for some userland server), so again it doesn't make any sense
> to hardwire that one.
> 
> But that's a completely independent issue; whether you use hardwire UUID into
> the thing or not, rechecking that yes, mountdirfd is on the same fs as it
> used to back when we'd opened it, is pointless.  It *will* be on the same fs,
> so checking that we'd got the right filesystem should be done once per
> opening mountdirfd, not once per opening file by fhandle.

Ok that made it clear. Thanks for the explanation. BTW we still are ok
with UUID/16byte array to be part of file handle returned by the kernel
right? ie file_handle of type.

struct file_handle {
	int handle_size;
	int handle_type;
	/* File system identifier */
	struct uuid fsid;
	/* file identifier */
	unsigned char f_handle[0];
};

If the file system provide an unique identifier it can return that via
super_block->s_ops->get_fsid. If not just don't implement the call back. We
don't validate the UUID in the handle on open_by_handle because the
user space server could use a file system identifier different from
the file system UUID.

-aneesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH -V7 4/9] vfs: Add open by file handle support
  2010-05-14 19:56 [PATCH -V7 4/9] vfs: Add open by file handle support Steve French
@ 2010-05-16  7:24 ` Aneesh Kumar K. V
  0 siblings, 0 replies; 49+ messages in thread
From: Aneesh Kumar K. V @ 2010-05-16  7:24 UTC (permalink / raw)
  To: Steve French, linux-fsdevel, LKML

On Fri, 14 May 2010 14:56:08 -0500, Steve French <smfrench@gmail.com> wrote:
> I think open by handle will turn out to be useful, but in discussing
> various "duplicate inode number" checks that we are having to add to
> cifs, it reminded me that we probably need a "generation number" or
> some equivalent (birth time is probably good enough as well) to be
> able to tell the case where a file is deleted and new file is created
> reusing the same inode number (eventually Samba needs to return this
> to posix clients if inode numbers are to be useful - and I don't know
> how to tell Samba how to get birth time or generation numbers out of
> stat in userspace)
>

The file handle already have generation number encoded. The patches
describe file handle as 

struct file_handle {
	int handle_size;
	int handle_type;
	/* File system identifier */
	struct uuid fsid;
	/* file identifier */
	unsigned char f_handle[0];
};

The file identifier which is stored in f_handle and of size handle_size
is obtained via exportfs_encode_fh (default to export_encode_fh). File 
system can hook a super_block->export_operations there. The default one
export_encode_fh actually use inode number and generation number as a
part of handle. So we shoud be having different file identifier on the
client side for these files which happened to have duplicate inode
number due to inode number reuse. But being able to determine generation
number in the userspace would be nice.

-aneesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2010-05-16  7:24 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-12 15:50 [PATCH -V7 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
2010-05-12 15:50 ` [PATCH -V7 1/9] exportfs: Return the minimum required handle size Aneesh Kumar K.V
2010-05-12 15:50 ` [PATCH -V7 2/9] vfs: Add uuid based vfsmount lookup Aneesh Kumar K.V
2010-05-12 15:50 ` [PATCH -V7 3/9] vfs: Add name to file handle conversion support Aneesh Kumar K.V
2010-05-12 21:49   ` Andreas Dilger
2010-05-12 22:43     ` Neil Brown
2010-05-13  6:17       ` Aneesh Kumar K. V
2010-05-13  7:11         ` Neil Brown
2010-05-13  8:30           ` Andreas Dilger
2010-05-13  8:47             ` Neil Brown
2010-05-13 14:21             ` Aneesh Kumar K. V
2010-05-13 18:17               ` Aneesh Kumar K. V
2010-05-13 22:54                 ` Andreas Dilger
2010-05-14 17:25                   ` Al Viro
2010-05-14 18:18                     ` Aneesh Kumar K. V
2010-05-14 18:40                       ` Al Viro
2010-05-15  5:31                         ` Aneesh Kumar K. V
2010-05-15  6:00                           ` Al Viro
2010-05-15 15:28                             ` Aneesh Kumar K. V
2010-05-13  0:20     ` Dave Chinner
2010-05-13  6:23       ` Aneesh Kumar K. V
2010-05-13  7:31         ` Dave Chinner
2010-05-13  5:56     ` Aneesh Kumar K. V
2010-05-13 14:24     ` Aneesh Kumar K. V
2010-05-12 15:50 ` [PATCH -V7 4/9] vfs: Add open by file handle support Aneesh Kumar K.V
2010-05-12 23:44   ` Neil Brown
2010-05-13  6:09     ` Dave Chinner
2010-05-13  6:37       ` Aneesh Kumar K. V
2010-05-14 10:41         ` Dave Chinner
2010-05-12 15:50 ` [PATCH -V7 5/9] vfs: Add freadlink syscall Aneesh Kumar K.V
2010-05-13  1:43   ` Neil Brown
2010-05-13  6:25     ` Aneesh Kumar K. V
2010-05-13  6:56       ` Neil Brown
2010-05-13  7:34         ` Aneesh Kumar K. V
2010-05-13  8:09           ` Neil Brown
2010-05-14 11:18         ` Aneesh Kumar K. V
2010-05-12 15:50 ` [PATCH -V7 6/9] ext4: Add get_fsid callback Aneesh Kumar K.V
2010-05-13  3:11   ` Dave Chinner
2010-05-13  6:32     ` Aneesh Kumar K. V
2010-05-14  1:44       ` Dave Chinner
2010-05-15  6:09         ` Aneesh Kumar K. V
2010-05-14 17:32   ` Coly Li
2010-05-14 18:21     ` Aneesh Kumar K. V
2010-05-14 19:08       ` Coly Li
2010-05-12 15:50 ` [PATCH -V7 7/9] x86: Add new syscalls for x86_32 Aneesh Kumar K.V
2010-05-12 15:50 ` [PATCH -V7 8/9] x86: Add new syscalls for x86_64 Aneesh Kumar K.V
2010-05-12 15:50 ` [PATCH -V7 9/9] ext3: Add get_fsid callback Aneesh Kumar K.V
  -- strict thread matches above, loose matches on Subject: below --
2010-05-14 19:56 [PATCH -V7 4/9] vfs: Add open by file handle support Steve French
2010-05-16  7:24 ` Aneesh Kumar K. V

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).