* [PATCH -V5 0/8] Generic name to handle and open by handle syscalls
@ 2010-04-26 17:33 Aneesh Kumar K.V
2010-04-26 17:33 ` [PATCH -V5 1/8] exportfs: Return the minimum required handle size Aneesh Kumar K.V
` (7 more replies)
0 siblings, 8 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:33 UTC (permalink / raw)
To: hch, viro, adilger, corbet, serue, neilb; +Cc: linux-fsdevel, sfrench
Hi,
The below set of patches implement open by handle support using exportfs
operations. This allows user space application to map a file name to file
handle and later open the file using handle. This should be usable
for userspace NFS [1] and 9P server [2]. XFS already support this with the ioctls
XFS_IOC_PATH_TO_HANDLE and XFS_IOC_OPEN_BY_HANDLE.
[1] http://nfs-ganesha.sourceforge.net/
[2] http://lists.gnu.org/archive/html/qemu-devel/2010-03/msg01087.html
TODO:
I guess we would need to optimize how we get the vfsmount for the filesystem
uuid specified. Searching the file system list and task name space may be a big
overhead for each open by handle call.
Chages from V4:
a) Changed the syscal arguments so that we don't need compat syscalls
as suggested by Christoph
c) Added two new syscall sys_lname_to_handle and sys_freadlink to work with
symlinks
d) Changed open_by_handle to work with all file types
e) Add ext3 support
Changes from V3:
a) Code cleanup suggested by Andreas
b) x86_64 syscall support
c) add compat syscall
Chages from V2:
a) Support system wide unique handle.
Changes from v1:
a) handle size is now specified in bytes
b) returns -EOVERFLOW if the handle size is small
c) dropped open_handle syscall and added open_by_handle_at syscall
open_by_handle_at takes mount_fd as the directory fd of the mount point
containing the file
e) handle will only be unique in a given file system. So for an NFS server
exporting multiple file system, NFS server will have to internally track the
mount point to which a file handle belongs to. We should be able to do it much
easily than expecting kernel to give a system wide unique file handle. System
wide unique file handle would need much larger changes to the exportfs or VFS
interface and I was not sure whether we really need to do that in the kernel or
in the user space
f) open_handle_at now only check for DAC_OVERRIDE capability
Example program: (x86_32). (x86_64 would need a different syscall number)
----------------
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <string.h>
struct uuid {
char uuid[16];
};
struct file_handle {
int handle_size;
int handle_type;
struct uuid fsid;
char handle[0];
};
static int name_to_handle(const char *name, struct file_handle *fh)
{
return syscall(338, name, fh);
}
static int lname_to_handle(const char *name, struct file_handle *fh)
{
return syscall(339, name, fh);
}
static int open_by_handle(struct file_handle *fh, int flags)
{
return syscall(340, fh, flags);
}
static int freadlink(int fd, char *buf, size_t bufsiz)
{
return syscall(341, fd, buf, bufsiz);
}
#define BUFSZ 100
int main(int argc, char *argv[])
{
int ret;
int handle_sz;
struct stat bufstat;
int fd, dirfd;
char buf[BUFSZ];
struct file_handle *fh = NULL;;
again:
if (fh && fh->handle_size) {
handle_sz = fh->handle_size;
free(fh);
fh = malloc(sizeof(struct file_handle) + handle_sz);
fh->handle_size = handle_sz;
} else {
fh = malloc(sizeof(struct file_handle));
fh->handle_size = 0;
}
errno = 0;
ret = lname_to_handle(argv[1], fh);
if (ret) {
perror("Error:");
printf("Found the handle size needed to be %d\n", fh->handle_size);
printf("Trying again..\n");
goto again;
exit(1);
}
fd = open_by_handle(fh, O_RDONLY);
if (fd <= 0 ) {
perror("Error:");
exit(1);
}
fstat(fd, &bufstat);
ret = S_ISLNK(bufstat.st_mode);
if (ret) {
memset(buf, 0 , BUFSZ);
freadlink(fd, buf, BUFSZ);
printf("%s is a symlink pointing to %s\n", argv[1], buf);
}
memset(buf, 0 , BUFSZ);
while (1) {
ret = read(fd, buf, BUFSZ -1);
if (ret <= 0)
break;
buf[ret] = '\0';
printf("%s", buf);
memset(buf, 0 , BUFSZ);
}
return 0;
}
-aneesh
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH -V5 1/8] exportfs: Return the minimum required handle size
2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
@ 2010-04-26 17:33 ` Aneesh Kumar K.V
2010-04-26 17:34 ` [PATCH -V5 2/8] vfs: Add name to file handle conversion support Aneesh Kumar K.V
` (6 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:33 UTC (permalink / raw)
To: hch, viro, adilger, corbet, serue, neilb
Cc: linux-fsdevel, sfrench, Aneesh Kumar K.V
The exportfs encode handle function should return the minimum required
handle size. This helps user to find out the handle size by passing 0
handle size in the first step and then redoing to the call again with
the returned handle size value.
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
fs/btrfs/export.c | 8 ++++++--
fs/exportfs/expfs.c | 9 +++++++--
fs/fat/inode.c | 4 +++-
fs/fuse/inode.c | 4 +++-
fs/gfs2/export.c | 8 ++++++--
fs/isofs/export.c | 8 ++++++--
fs/ocfs2/export.c | 8 +++++++-
fs/reiserfs/inode.c | 7 ++++++-
fs/udf/namei.c | 7 ++++++-
fs/xfs/linux-2.6/xfs_export.c | 4 +++-
mm/shmem.c | 4 +++-
11 files changed, 56 insertions(+), 15 deletions(-)
diff --git a/fs/btrfs/export.c b/fs/btrfs/export.c
index 951ef09..5f8ee5a 100644
--- a/fs/btrfs/export.c
+++ b/fs/btrfs/export.c
@@ -21,9 +21,13 @@ static int btrfs_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
int len = *max_len;
int type;
- if ((len < BTRFS_FID_SIZE_NON_CONNECTABLE) ||
- (connectable && len < BTRFS_FID_SIZE_CONNECTABLE))
+ if (connectable && (len < BTRFS_FID_SIZE_CONNECTABLE)) {
+ *max_len = BTRFS_FID_SIZE_CONNECTABLE;
return 255;
+ } else if (len < BTRFS_FID_SIZE_NON_CONNECTABLE) {
+ *max_len = BTRFS_FID_SIZE_NON_CONNECTABLE;
+ return 255;
+ }
len = BTRFS_FID_SIZE_NON_CONNECTABLE;
type = FILEID_BTRFS_WITHOUT_PARENT;
diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index e9e1759..cfee0f0 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -319,9 +319,14 @@ static int export_encode_fh(struct dentry *dentry, struct fid *fid,
struct inode * inode = dentry->d_inode;
int len = *max_len;
int type = FILEID_INO32_GEN;
-
- if (len < 2 || (connectable && len < 4))
+
+ if (connectable && (len < 4)) {
+ *max_len = 4;
+ return 255;
+ } else if (len < 2) {
+ *max_len = 2;
return 255;
+ }
len = 2;
fid->i32.ino = inode->i_ino;
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 0ce143b..6f83bc7 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -738,8 +738,10 @@ fat_encode_fh(struct dentry *de, __u32 *fh, int *lenp, int connectable)
struct inode *inode = de->d_inode;
u32 ipos_h, ipos_m, ipos_l;
- if (len < 5)
+ if (len < 5) {
+ *lenp = 5;
return 255; /* no room */
+ }
ipos_h = MSDOS_I(inode)->i_pos >> 8;
ipos_m = (MSDOS_I(inode)->i_pos & 0xf0) << 24;
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index ec14d19..beaea69 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -638,8 +638,10 @@ static int fuse_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
u64 nodeid;
u32 generation;
- if (*max_len < len)
+ if (*max_len < len) {
+ *max_len = len;
return 255;
+ }
nodeid = get_fuse_inode(inode)->nodeid;
generation = inode->i_generation;
diff --git a/fs/gfs2/export.c b/fs/gfs2/export.c
index c22c211..d022236 100644
--- a/fs/gfs2/export.c
+++ b/fs/gfs2/export.c
@@ -36,9 +36,13 @@ static int gfs2_encode_fh(struct dentry *dentry, __u32 *p, int *len,
struct super_block *sb = inode->i_sb;
struct gfs2_inode *ip = GFS2_I(inode);
- if (*len < GFS2_SMALL_FH_SIZE ||
- (connectable && *len < GFS2_LARGE_FH_SIZE))
+ if (connectable && (*len < GFS2_LARGE_FH_SIZE)) {
+ *len = GFS2_LARGE_FH_SIZE;
return 255;
+ } else if (*len < GFS2_SMALL_FH_SIZE) {
+ *len = GFS2_SMALL_FH_SIZE;
+ return 255;
+ }
fh[0] = cpu_to_be32(ip->i_no_formal_ino >> 32);
fh[1] = cpu_to_be32(ip->i_no_formal_ino & 0xFFFFFFFF);
diff --git a/fs/isofs/export.c b/fs/isofs/export.c
index ed752cb..dd4687f 100644
--- a/fs/isofs/export.c
+++ b/fs/isofs/export.c
@@ -124,9 +124,13 @@ isofs_export_encode_fh(struct dentry *dentry,
* offset of the inode and the upper 16 bits of fh32[1] to
* hold the offset of the parent.
*/
-
- if (len < 3 || (connectable && len < 5))
+ if (connectable && (len < 5)) {
+ *max_len = 5;
+ return 255;
+ } else if (len < 3) {
+ *max_len = 3;
return 255;
+ }
len = 3;
fh32[0] = ei->i_iget5_block;
diff --git a/fs/ocfs2/export.c b/fs/ocfs2/export.c
index 19ad145..250a347 100644
--- a/fs/ocfs2/export.c
+++ b/fs/ocfs2/export.c
@@ -201,8 +201,14 @@ static int ocfs2_encode_fh(struct dentry *dentry, u32 *fh_in, int *max_len,
dentry->d_name.len, dentry->d_name.name,
fh, len, connectable);
- if (len < 3 || (connectable && len < 6)) {
+ if (connectable && (len < 6)) {
mlog(ML_ERROR, "fh buffer is too small for encoding\n");
+ *max_len = 6;
+ type = 255;
+ goto bail;
+ } else if (len < 3) {
+ mlog(ML_ERROR, "fh buffer is too small for encoding\n");
+ *max_len = 3;
type = 255;
goto bail;
}
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index dc2c65e..5fff1e2 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -1588,8 +1588,13 @@ int reiserfs_encode_fh(struct dentry *dentry, __u32 * data, int *lenp,
struct inode *inode = dentry->d_inode;
int maxlen = *lenp;
- if (maxlen < 3)
+ if (need_parent && (maxlen < 5)) {
+ *lenp = 5;
return 255;
+ } else if (maxlen < 3) {
+ *lenp = 3;
+ return 255;
+ }
data[0] = inode->i_ino;
data[1] = le32_to_cpu(INODE_PKEY(inode)->k_dir_id);
diff --git a/fs/udf/namei.c b/fs/udf/namei.c
index 7581602..37ce713 100644
--- a/fs/udf/namei.c
+++ b/fs/udf/namei.c
@@ -1360,8 +1360,13 @@ static int udf_encode_fh(struct dentry *de, __u32 *fh, int *lenp,
struct fid *fid = (struct fid *)fh;
int type = FILEID_UDF_WITHOUT_PARENT;
- if (len < 3 || (connectable && len < 5))
+ if (connectable && (len < 5)) {
+ *lenp = 5;
+ return 255;
+ } else if (len < 3) {
+ *lenp = 3;
return 255;
+ }
*lenp = 3;
fid->udf.block = location.logicalBlockNum;
diff --git a/fs/xfs/linux-2.6/xfs_export.c b/fs/xfs/linux-2.6/xfs_export.c
index 846b75a..82c0553 100644
--- a/fs/xfs/linux-2.6/xfs_export.c
+++ b/fs/xfs/linux-2.6/xfs_export.c
@@ -81,8 +81,10 @@ xfs_fs_encode_fh(
* seven combinations work. The real answer is "don't use v2".
*/
len = xfs_fileid_length(fileid_type);
- if (*max_len < len)
+ if (*max_len < len) {
+ *max_len = len
return 255;
+ }
*max_len = len;
switch (fileid_type) {
diff --git a/mm/shmem.c b/mm/shmem.c
index eef4ebe..bbeda1c 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2125,8 +2125,10 @@ static int shmem_encode_fh(struct dentry *dentry, __u32 *fh, int *len,
{
struct inode *inode = dentry->d_inode;
- if (*len < 3)
+ if (*len < 3) {
+ *len = 3;
return 255;
+ }
if (hlist_unhashed(&inode->i_hash)) {
/* Unfortunately insert_inode_hash is not idempotent,
--
1.7.0.4.360.g11766c
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH -V5 2/8] vfs: Add name to file handle conversion support
2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
2010-04-26 17:33 ` [PATCH -V5 1/8] exportfs: Return the minimum required handle size Aneesh Kumar K.V
@ 2010-04-26 17:34 ` Aneesh Kumar K.V
2010-04-27 1:19 ` Neil Brown
2010-04-26 17:34 ` [PATCH -V5 3/8] vfs: Add open by file handle support Aneesh Kumar K.V
` (5 subsequent siblings)
7 siblings, 1 reply; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:34 UTC (permalink / raw)
To: hch, viro, adilger, corbet, serue, neilb
Cc: linux-fsdevel, sfrench, Aneesh Kumar K.V
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
fs/open.c | 100 ++++++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 17 +++++++++
2 files changed, 117 insertions(+), 0 deletions(-)
diff --git a/fs/open.c b/fs/open.c
index 74e5cd9..e9aae5c 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -30,6 +30,8 @@
#include <linux/falloc.h>
#include <linux/fs_struct.h>
#include <linux/ima.h>
+#include <linux/exportfs.h>
+#include <linux/mnt_namespace.h>
#include "internal.h"
@@ -1206,3 +1208,101 @@ int nonseekable_open(struct inode *inode, struct file *filp)
}
EXPORT_SYMBOL(nonseekable_open);
+
+/* limit the handle size to some value */
+#define MAX_HANDLE_SZ 4096
+static long do_sys_name_to_handle(struct path *path,
+ struct file_handle __user *ufh)
+{
+ int retval;
+ int handle_size;
+ struct super_block *sb;
+ struct uuid *this_fs_id;
+ struct file_handle f_handle;
+ struct file_handle *handle = NULL;
+
+ if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle)))
+ return -EFAULT;
+
+ if (f_handle.handle_size > MAX_HANDLE_SZ)
+ return -EINVAL;
+
+ handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_size,
+ GFP_KERNEL);
+ if (!handle) {
+ retval = -ENOMEM;
+ goto err_out;
+ }
+ handle_size = f_handle.handle_size;
+
+ /* we ask for a non connected handle */
+ retval = exportfs_encode_fh(path->dentry,
+ (struct fid *)handle->f_handle,
+ &handle_size, 0);
+ /* convert handle size to bytes */
+ handle_size *= sizeof(u32);
+ handle->handle_type = retval;
+ handle->handle_size = handle_size;
+ if (handle_size <= f_handle.handle_size) {
+ /* get the uuid */
+ sb = path->mnt->mnt_sb;
+ if (sb->s_op->get_fsid) {
+ this_fs_id = sb->s_op->get_fsid(sb);
+ memcpy(handle->fsid.uuid, this_fs_id->uuid,
+ sizeof(handle->fsid.uuid));
+ } else
+ memset(handle->fsid.uuid, 0, sizeof(handle->fsid.uuid));
+ retval = 0;
+ } else {
+ /*
+ * set the handle_size to zero so we copy only
+ * non variable part of the file_handle
+ */
+ handle_size = 0;
+ retval = -EOVERFLOW;
+ }
+ if (copy_to_user(ufh, handle,
+ sizeof(struct file_handle) + handle_size))
+ retval = -EFAULT;
+
+ kfree(handle);
+err_out:
+ return retval;
+}
+
+SYSCALL_DEFINE2(name_to_handle, const char __user *, name,
+ struct file_handle __user *, handle)
+{
+ long ret;
+ struct path path;
+
+ /* Follow links */
+ ret = user_path(name, &path);
+ if (ret)
+ return ret;
+
+ ret = do_sys_name_to_handle(&path, handle);
+ path_put(&path);
+
+ /* avoid REGPARM breakage on x86: */
+ asmlinkage_protect(2, ret, name, handle);
+ return ret;
+}
+
+SYSCALL_DEFINE2(lname_to_handle, const char __user *, name,
+ struct file_handle __user *, handle)
+{
+ long ret;
+ struct path path;
+
+ ret = user_lpath(name, &path);
+ if (ret)
+ return ret;
+
+ ret = do_sys_name_to_handle(&path, handle);
+ path_put(&path);
+
+ /* avoid REGPARM breakage on x86: */
+ asmlinkage_protect(2, ret, name, handle);
+ return ret;
+}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 39d57bc..da28b80 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -948,6 +948,20 @@ struct file {
unsigned long f_mnt_write_state;
#endif
};
+
+struct uuid {
+ char uuid[16];
+};
+
+struct file_handle {
+ int handle_size;
+ int handle_type;
+ /* File system identifier */
+ struct uuid fsid;
+ /* file identifier */
+ char f_handle[0];
+};
+
extern spinlock_t files_lock;
#define file_list_lock() spin_lock(&files_lock);
#define file_list_unlock() spin_unlock(&files_lock);
@@ -1580,6 +1594,7 @@ struct super_operations {
ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
#endif
int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
+ struct uuid *(*get_fsid)(struct super_block *);
};
/*
@@ -2328,6 +2343,8 @@ extern struct super_block *get_super(struct block_device *);
extern struct super_block *get_active_super(struct block_device *bdev);
extern struct super_block *user_get_super(dev_t);
extern void drop_super(struct super_block *sb);
+extern struct super_block *fs_get_sb(struct uuid *fsid);
+
extern int dcache_dir_open(struct inode *, struct file *);
extern int dcache_dir_close(struct inode *, struct file *);
--
1.7.0.4.360.g11766c
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH -V5 3/8] vfs: Add open by file handle support
2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
2010-04-26 17:33 ` [PATCH -V5 1/8] exportfs: Return the minimum required handle size Aneesh Kumar K.V
2010-04-26 17:34 ` [PATCH -V5 2/8] vfs: Add name to file handle conversion support Aneesh Kumar K.V
@ 2010-04-26 17:34 ` Aneesh Kumar K.V
2010-04-27 1:05 ` Neil Brown
2010-04-26 17:34 ` [PATCH -V5 4/8] vfs: Add freadlink syscall Aneesh Kumar K.V
` (4 subsequent siblings)
7 siblings, 1 reply; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:34 UTC (permalink / raw)
To: hch, viro, adilger, corbet, serue, neilb
Cc: linux-fsdevel, sfrench, Aneesh Kumar K.V
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
fs/filesystems.c | 32 +++++++++-
fs/namei.c | 24 -------
fs/namespace.c | 38 +++++++++++
fs/open.c | 148 +++++++++++++++++++++++++++++++++++++++++
fs/pnode.c | 2 +-
include/linux/mnt_namespace.h | 2 +
include/linux/namei.h | 24 +++++++
7 files changed, 244 insertions(+), 26 deletions(-)
diff --git a/fs/filesystems.c b/fs/filesystems.c
index 68ba492..743d36e 100644
--- a/fs/filesystems.c
+++ b/fs/filesystems.c
@@ -281,5 +281,35 @@ struct file_system_type *get_fs_type(const char *name)
}
return fs;
}
-
EXPORT_SYMBOL(get_fs_type);
+
+struct super_block *fs_get_sb(struct uuid *fsid)
+{
+ struct uuid *this_fsid;
+ struct file_system_type *fs_type;
+ struct super_block *sb, *found_sb = NULL;
+
+ read_lock(&file_systems_lock);
+ fs_type = file_systems;
+ while (fs_type) {
+ spin_lock(&sb_lock);
+ list_for_each_entry(sb, &fs_type->fs_supers, s_instances) {
+ if (!sb->s_op->get_fsid)
+ continue;
+ this_fsid = sb->s_op->get_fsid(sb);
+ if (!memcmp(fsid->uuid, this_fsid->uuid,
+ sizeof(this_fsid->uuid))) {
+ /* found the matching super_block */
+ atomic_inc(&sb->s_active);
+ found_sb = sb;
+ spin_unlock(&sb_lock);
+ goto out;
+ }
+ }
+ spin_unlock(&sb_lock);
+ fs_type = fs_type->next;
+ }
+out:
+ read_unlock(&file_systems_lock);
+ return found_sb;
+}
diff --git a/fs/namei.c b/fs/namei.c
index a7dce91..a18711e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1521,30 +1521,6 @@ out_unlock:
return may_open(&nd->path, 0, open_flag & ~O_TRUNC);
}
-/*
- * Note that while the flag value (low two bits) for sys_open means:
- * 00 - read-only
- * 01 - write-only
- * 10 - read-write
- * 11 - special
- * it is changed into
- * 00 - no permissions needed
- * 01 - read-permission
- * 10 - write-permission
- * 11 - read-write
- * for the internal routines (ie open_namei()/follow_link() etc)
- * This is more logical, and also allows the 00 "no perm needed"
- * to be used for symlinks (where the permissions are checked
- * later).
- *
-*/
-static inline int open_to_namei_flags(int flag)
-{
- if ((flag+1) & O_ACCMODE)
- flag++;
- return flag;
-}
-
static int open_will_truncate(int flag, struct inode *inode)
{
/*
diff --git a/fs/namespace.c b/fs/namespace.c
index 8174c8a..6168526 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2364,3 +2364,41 @@ void put_mnt_ns(struct mnt_namespace *ns)
kfree(ns);
}
EXPORT_SYMBOL(put_mnt_ns);
+
+/*
+ * Get any vfsmount mapping the superblock in the
+ * task namespace
+ */
+struct vfsmount *fs_get_vfsmount(struct task_struct *task,
+ struct super_block *sb)
+{
+ struct nsproxy *nsp;
+ struct list_head *mount_list;
+ struct mnt_namespace *ns = NULL;
+ struct vfsmount *mnt, *sb_mnt = NULL;
+
+ rcu_read_lock();
+ nsp = task_nsproxy(task);
+ if (nsp) {
+ ns = nsp->mnt_ns;
+ if (ns)
+ get_mnt_ns(ns);
+ }
+ rcu_read_unlock();
+ if (!ns)
+ return NULL;
+ down_read(&namespace_sem);
+ list_for_each(mount_list, &ns->list) {
+ mnt = list_entry(mount_list, struct vfsmount, mnt_list);
+ if (mnt->mnt_sb == sb) {
+ /* found the matching super block */
+ sb_mnt = mnt;
+ mntget(sb_mnt);
+ break;
+ }
+ }
+ up_read(&namespace_sem);
+
+ put_mnt_ns(ns);
+ return sb_mnt;
+}
diff --git a/fs/open.c b/fs/open.c
index e9aae5c..b87fa32 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1306,3 +1306,151 @@ SYSCALL_DEFINE2(lname_to_handle, const char __user *, name,
asmlinkage_protect(2, ret, name, handle);
return ret;
}
+
+static int vfs_dentry_acceptable(void *context, struct dentry *dentry)
+{
+ return 1;
+}
+
+static struct dentry *handle_to_dentry(struct vfsmount *mnt,
+ struct file_handle *handle)
+{
+ int handle_size;
+ struct dentry *dentry;
+
+ /* change the handle size to multiple of sizeof(u32) */
+ handle_size = handle->handle_size >> 2;
+ dentry = exportfs_decode_fh(mnt, (struct fid *)handle->f_handle,
+ handle_size, handle->handle_type,
+ vfs_dentry_acceptable, NULL);
+ return dentry;
+}
+
+long do_sys_open_by_handle(struct file_handle __user *ufh, int flags)
+{
+ int fd;
+ int retval = 0;
+ int d_flags = flags;
+ struct file *filp;
+ struct vfsmount *mnt;
+ struct inode *inode;
+ struct dentry *dentry;
+ struct super_block *sb;
+ struct file_handle f_handle;
+ struct file_handle *handle = NULL;
+
+ if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle)))
+ return -EFAULT;
+
+ if ((f_handle.handle_size > MAX_HANDLE_SZ) ||
+ (f_handle.handle_size <= 0))
+ return -EINVAL;
+
+ if (!capable(CAP_DAC_OVERRIDE))
+ return -EPERM;
+
+ sb = fs_get_sb(&f_handle.fsid);
+ if (!sb)
+ return -ESTALE;
+ /*
+ * Find the vfsmount for this superblock in the
+ * current namespace
+ */
+ mnt = fs_get_vfsmount(current, sb);
+ if (!mnt) {
+ retval = -ESTALE;
+ goto out_sb;
+ }
+
+ handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_size,
+ GFP_KERNEL);
+ if (!handle) {
+ retval = -ENOMEM;
+ goto out_mnt;
+ }
+ /* copy the full handle */
+ if (copy_from_user(handle, ufh,
+ sizeof(struct file_handle) +
+ f_handle.handle_size)) {
+ retval = -EFAULT;
+ goto out_mnt;
+ }
+
+ dentry = handle_to_dentry(mnt, handle);
+ if (IS_ERR(dentry)) {
+ retval = PTR_ERR(dentry);
+ goto out_mnt;
+ }
+
+ inode = dentry->d_inode;
+
+ flags = open_to_namei_flags(flags);
+ /* O_TRUNC implies we need access checks for write permissions */
+ if (flags & O_TRUNC)
+ flags |= MAY_WRITE;
+
+ if ((!(flags & O_APPEND) || (flags & O_TRUNC)) &&
+ (flags & FMODE_WRITE) && IS_APPEND(inode)) {
+ retval = -EPERM;
+ goto out_err;
+ }
+
+ if ((flags & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
+ retval = -EACCES;
+ goto out_err;
+ }
+
+ /* Can't write directories. */
+ if (S_ISDIR(inode->i_mode) && (flags & FMODE_WRITE)) {
+ retval = -EISDIR;
+ goto out_err;
+ }
+
+ fd = get_unused_fd();
+ if (fd < 0) {
+ retval = fd;
+ goto out_err;
+ }
+
+ filp = dentry_open(dentry, mntget(mnt),
+ d_flags, current_cred());
+ if (IS_ERR(filp)) {
+ put_unused_fd(fd);
+ retval = PTR_ERR(filp);
+ goto out_err;
+ }
+
+ if (inode->i_mode & S_IFREG) {
+ filp->f_flags |= O_NOATIME;
+ filp->f_mode |= FMODE_NOCMTIME;
+ }
+ fsnotify_open(filp->f_path.dentry);
+ fd_install(fd, filp);
+ retval = fd;
+ goto out_mnt;
+
+out_err:
+ dput(dentry);
+out_mnt:
+ kfree(handle);
+ mntput(mnt);
+out_sb:
+ deactivate_super(sb);
+
+ return retval;
+}
+
+SYSCALL_DEFINE2(open_by_handle, struct file_handle __user *, handle,
+ int, flags)
+{
+ long ret;
+
+ if (force_o_largefile())
+ flags |= O_LARGEFILE;
+
+ ret = do_sys_open_by_handle(handle, flags);
+
+ /* avoid REGPARM breakage on x86: */
+ asmlinkage_protect(2, ret, handle, flags);
+ return ret;
+}
diff --git a/fs/pnode.c b/fs/pnode.c
index 5cc564a..9f6d12d 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -6,9 +6,9 @@
* Author : Ram Pai (linuxram@us.ibm.com)
*
*/
+#include <linux/fs.h>
#include <linux/mnt_namespace.h>
#include <linux/mount.h>
-#include <linux/fs.h>
#include "internal.h"
#include "pnode.h"
diff --git a/include/linux/mnt_namespace.h b/include/linux/mnt_namespace.h
index 0b89efc..d363ecc 100644
--- a/include/linux/mnt_namespace.h
+++ b/include/linux/mnt_namespace.h
@@ -36,6 +36,8 @@ extern const struct seq_operations mounts_op;
extern const struct seq_operations mountinfo_op;
extern const struct seq_operations mountstats_op;
extern int mnt_had_events(struct proc_mounts *);
+extern struct vfsmount *fs_get_vfsmount(struct task_struct *task,
+ struct super_block *sb);
#endif
#endif
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 05b441d..a853aa0 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -4,6 +4,7 @@
#include <linux/dcache.h>
#include <linux/linkage.h>
#include <linux/path.h>
+#include <asm-generic/fcntl.h>
struct vfsmount;
@@ -96,4 +97,27 @@ static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
((char *) name)[min(len, maxlen)] = '\0';
}
+/*
+ * Note that while the flag value (low two bits) for sys_open means:
+ * 00 - read-only
+ * 01 - write-only
+ * 10 - read-write
+ * 11 - special
+ * it is changed into
+ * 00 - no permissions needed
+ * 01 - read-permission
+ * 10 - write-permission
+ * 11 - read-write
+ * for the internal routines (ie open_namei()/follow_link() etc)
+ * This is more logical, and also allows the 00 "no perm needed"
+ * to be used for symlinks (where the permissions are checked
+ * later).
+ *
+*/
+static inline int open_to_namei_flags(int flag)
+{
+ if ((flag+1) & O_ACCMODE)
+ flag++;
+ return flag;
+}
#endif /* _LINUX_NAMEI_H */
--
1.7.0.4.360.g11766c
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH -V5 4/8] vfs: Add freadlink syscall
2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
` (2 preceding siblings ...)
2010-04-26 17:34 ` [PATCH -V5 3/8] vfs: Add open by file handle support Aneesh Kumar K.V
@ 2010-04-26 17:34 ` Aneesh Kumar K.V
2010-04-26 17:34 ` [PATCH -V5 5/8] ext4: Add get_fsid callback Aneesh Kumar K.V
` (3 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:34 UTC (permalink / raw)
To: hch, viro, adilger, corbet, serue, neilb
Cc: linux-fsdevel, sfrench, Aneesh Kumar K.V
This enables to use open-by-handle and then get the link target
details of a symlink using the fd returned by handle
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
fs/stat.c | 27 +++++++++++++++++++++++++++
1 files changed, 27 insertions(+), 0 deletions(-)
diff --git a/fs/stat.c b/fs/stat.c
index c4ecd52..0fbab00 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -314,6 +314,33 @@ SYSCALL_DEFINE3(readlink, const char __user *, path, char __user *, buf,
return sys_readlinkat(AT_FDCWD, path, buf, bufsiz);
}
+SYSCALL_DEFINE3(freadlink, int, fd, char __user *, buf,
+ int, bufsiz)
+{
+ int fput_needed;
+ struct file *file;
+ int error = -EBADF;
+ struct inode *inode;
+
+ file = fget_light(fd, &fput_needed);
+ if (!file)
+ return error;
+
+ inode = file->f_path.dentry->d_inode;
+ error = -EINVAL;
+ if (inode->i_op->readlink) {
+ error = security_inode_readlink(file->f_path.dentry);
+ if (!error) {
+ touch_atime(file->f_path.mnt, file->f_path.dentry);
+ error = inode->i_op->readlink(file->f_path.dentry,
+ buf, bufsiz);
+ }
+ }
+ fput_light(file, fput_needed);
+ return error;
+}
+
+
/* ---------- LFS-64 ----------- */
#ifdef __ARCH_WANT_STAT64
--
1.7.0.4.360.g11766c
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH -V5 5/8] ext4: Add get_fsid callback
2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
` (3 preceding siblings ...)
2010-04-26 17:34 ` [PATCH -V5 4/8] vfs: Add freadlink syscall Aneesh Kumar K.V
@ 2010-04-26 17:34 ` Aneesh Kumar K.V
2010-04-26 17:34 ` [PATCH -V5 6/8] x86: Add new syscalls for x86_32 Aneesh Kumar K.V
` (2 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:34 UTC (permalink / raw)
To: hch, viro, adilger, corbet, serue, neilb
Cc: linux-fsdevel, sfrench, Aneesh Kumar K.V
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
fs/ext4/super.c | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index e14d22c..b485f0a 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1049,6 +1049,14 @@ static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
return try_to_free_buffers(page);
}
+static struct uuid *ext4_get_fsid(struct super_block *sb)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ struct ext4_super_block *es = sbi->s_es;
+
+ return (struct uuid *)es->s_uuid;
+}
+
#ifdef CONFIG_QUOTA
#define QTYPE2NAME(t) ((t) == USRQUOTA ? "user" : "group")
#define QTYPE2MOPT(on, t) ((t) == USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA))
@@ -1109,6 +1117,7 @@ static const struct super_operations ext4_sops = {
.quota_write = ext4_quota_write,
#endif
.bdev_try_to_free_page = bdev_try_to_free_page,
+ .get_fsid = ext4_get_fsid,
};
static const struct super_operations ext4_nojournal_sops = {
@@ -1128,6 +1137,7 @@ static const struct super_operations ext4_nojournal_sops = {
.quota_write = ext4_quota_write,
#endif
.bdev_try_to_free_page = bdev_try_to_free_page,
+ .get_fsid = ext4_get_fsid,
};
static const struct export_operations ext4_export_ops = {
--
1.7.0.4.360.g11766c
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH -V5 6/8] x86: Add new syscalls for x86_32
2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
` (4 preceding siblings ...)
2010-04-26 17:34 ` [PATCH -V5 5/8] ext4: Add get_fsid callback Aneesh Kumar K.V
@ 2010-04-26 17:34 ` Aneesh Kumar K.V
2010-04-26 17:34 ` [PATCH -V5 7/8] x86: Add new syscalls for x86_64 Aneesh Kumar K.V
2010-04-26 17:34 ` [PATCH -V5 8/8] ext3: Add get_fsid callback Aneesh Kumar K.V
7 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:34 UTC (permalink / raw)
To: hch, viro, adilger, corbet, serue, neilb
Cc: linux-fsdevel, sfrench, Aneesh Kumar K.V
This patch adds sys_name_to_handle, sys_lname_to_handle, sys_open_by_handle
and sys_freadlink syscalls to x86_32
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/x86/include/asm/unistd_32.h | 6 +++++-
arch/x86/kernel/syscall_table_32.S | 4 ++++
2 files changed, 9 insertions(+), 1 deletions(-)
diff --git a/arch/x86/include/asm/unistd_32.h b/arch/x86/include/asm/unistd_32.h
index beb9b5f..7ea7101 100644
--- a/arch/x86/include/asm/unistd_32.h
+++ b/arch/x86/include/asm/unistd_32.h
@@ -343,10 +343,14 @@
#define __NR_rt_tgsigqueueinfo 335
#define __NR_perf_event_open 336
#define __NR_recvmmsg 337
+#define __NR_name_to_handle 338
+#define __NR_lname_to_handle 339
+#define __NR_open_by_handle 340
+#define __NR_freadlink 341
#ifdef __KERNEL__
-#define NR_syscalls 338
+#define NR_syscalls 342
#define __ARCH_WANT_IPC_PARSE_VERSION
#define __ARCH_WANT_OLD_READDIR
diff --git a/arch/x86/kernel/syscall_table_32.S b/arch/x86/kernel/syscall_table_32.S
index 8b37293..caa3f0c 100644
--- a/arch/x86/kernel/syscall_table_32.S
+++ b/arch/x86/kernel/syscall_table_32.S
@@ -337,3 +337,7 @@ ENTRY(sys_call_table)
.long sys_rt_tgsigqueueinfo /* 335 */
.long sys_perf_event_open
.long sys_recvmmsg
+ .long sys_name_to_handle
+ .long sys_lname_to_handle
+ .long sys_open_by_handle
+ .long sys_freadlink /* 341 */
--
1.7.0.4.360.g11766c
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH -V5 7/8] x86: Add new syscalls for x86_64
2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
` (5 preceding siblings ...)
2010-04-26 17:34 ` [PATCH -V5 6/8] x86: Add new syscalls for x86_32 Aneesh Kumar K.V
@ 2010-04-26 17:34 ` Aneesh Kumar K.V
2010-04-26 17:34 ` [PATCH -V5 8/8] ext3: Add get_fsid callback Aneesh Kumar K.V
7 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:34 UTC (permalink / raw)
To: hch, viro, adilger, corbet, serue, neilb
Cc: linux-fsdevel, sfrench, Aneesh Kumar K.V
Add sys_name_to_handle, sys_lname_to_handle, sys_open_by_handle syscall
and sys_freadlink for x86_64
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/x86/ia32/ia32entry.S | 4 ++++
arch/x86/include/asm/unistd_64.h | 8 ++++++++
2 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 59b4556..927cc8b 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -842,4 +842,8 @@ ia32_sys_call_table:
.quad compat_sys_rt_tgsigqueueinfo /* 335 */
.quad sys_perf_event_open
.quad compat_sys_recvmmsg
+ .quad sys_name_to_handle
+ .quad sys_lname_to_handle
+ .quad sys_open_by_handle
+ .quad sys_freadlink /* 341 */
ia32_syscall_end:
diff --git a/arch/x86/include/asm/unistd_64.h b/arch/x86/include/asm/unistd_64.h
index ff4307b..a495455 100644
--- a/arch/x86/include/asm/unistd_64.h
+++ b/arch/x86/include/asm/unistd_64.h
@@ -663,6 +663,14 @@ __SYSCALL(__NR_rt_tgsigqueueinfo, sys_rt_tgsigqueueinfo)
__SYSCALL(__NR_perf_event_open, sys_perf_event_open)
#define __NR_recvmmsg 299
__SYSCALL(__NR_recvmmsg, sys_recvmmsg)
+#define __NR_name_to_handle 300
+__SYSCALL(__NR_name_to_handle, sys_name_to_handle)
+#define __NR_lname_to_handle 301
+__SYSCALL(__NR_lname_to_handle, sys_lname_to_handle)
+#define __NR_open_by_handle 302
+__SYSCALL(__NR_open_by_handle, sys_open_by_handle)
+#define __NR_freadlink 303
+__SYSCALL(__NR_freadlink, sys_freadlink)
#ifndef __NO_STUBS
#define __ARCH_WANT_OLD_READDIR
--
1.7.0.4.360.g11766c
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH -V5 8/8] ext3: Add get_fsid callback
2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
` (6 preceding siblings ...)
2010-04-26 17:34 ` [PATCH -V5 7/8] x86: Add new syscalls for x86_64 Aneesh Kumar K.V
@ 2010-04-26 17:34 ` Aneesh Kumar K.V
7 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:34 UTC (permalink / raw)
To: hch, viro, adilger, corbet, serue, neilb
Cc: linux-fsdevel, sfrench, Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
fs/ext3/super.c | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index 1bee604..dfe50a4 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -734,6 +734,14 @@ static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
return try_to_free_buffers(page);
}
+static struct uuid *ext3_get_fsid(struct super_block *sb)
+{
+ struct ext3_sb_info *sbi = EXT3_SB(sb);
+ struct ext3_super_block *es = sbi->s_es;
+
+ return (struct uuid *)es->s_uuid;
+}
+
#ifdef CONFIG_QUOTA
#define QTYPE2NAME(t) ((t)==USRQUOTA?"user":"group")
#define QTYPE2MOPT(on, t) ((t)==USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA))
@@ -791,6 +799,7 @@ static const struct super_operations ext3_sops = {
.quota_write = ext3_quota_write,
#endif
.bdev_try_to_free_page = bdev_try_to_free_page,
+ .get_fsid = ext3_get_fsid,
};
static const struct export_operations ext3_export_ops = {
--
1.7.0.4.360.g11766c
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH -V5 3/8] vfs: Add open by file handle support
2010-04-26 17:34 ` [PATCH -V5 3/8] vfs: Add open by file handle support Aneesh Kumar K.V
@ 2010-04-27 1:05 ` Neil Brown
2010-04-27 6:10 ` Aneesh Kumar K. V
0 siblings, 1 reply; 13+ messages in thread
From: Neil Brown @ 2010-04-27 1:05 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: hch, viro, adilger, corbet, serue, linux-fsdevel, sfrench
On Mon, 26 Apr 2010 23:04:01 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> Acked-by: Serge Hallyn <serue@us.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
> fs/filesystems.c | 32 +++++++++-
> fs/namei.c | 24 -------
> fs/namespace.c | 38 +++++++++++
> fs/open.c | 148 +++++++++++++++++++++++++++++++++++++++++
> fs/pnode.c | 2 +-
> include/linux/mnt_namespace.h | 2 +
> include/linux/namei.h | 24 +++++++
> 7 files changed, 244 insertions(+), 26 deletions(-)
>
> diff --git a/fs/filesystems.c b/fs/filesystems.c
> index 68ba492..743d36e 100644
> --- a/fs/filesystems.c
> +++ b/fs/filesystems.c
> @@ -281,5 +281,35 @@ struct file_system_type *get_fs_type(const char *name)
> }
> return fs;
> }
> -
> EXPORT_SYMBOL(get_fs_type);
> +
> +struct super_block *fs_get_sb(struct uuid *fsid)
> +{
> + struct uuid *this_fsid;
> + struct file_system_type *fs_type;
> + struct super_block *sb, *found_sb = NULL;
> +
> + read_lock(&file_systems_lock);
> + fs_type = file_systems;
> + while (fs_type) {
You've open-coded a for-loop here...
Why not:
for (fs_type = file_systems ; fs_type ; fs_type = fs_type->next) {
??
> + spin_lock(&sb_lock);
> + list_for_each_entry(sb, &fs_type->fs_supers, s_instances) {
> + if (!sb->s_op->get_fsid)
> + continue;
> + this_fsid = sb->s_op->get_fsid(sb);
> + if (!memcmp(fsid->uuid, this_fsid->uuid,
> + sizeof(this_fsid->uuid))) {
> + /* found the matching super_block */
> + atomic_inc(&sb->s_active);
> + found_sb = sb;
> + spin_unlock(&sb_lock);
> + goto out;
> + }
> + }
> + spin_unlock(&sb_lock);
> + fs_type = fs_type->next;
> + }
> +out:
> + read_unlock(&file_systems_lock);
> + return found_sb;
> +}
> diff --git a/fs/namei.c b/fs/namei.c
> index a7dce91..a18711e 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1521,30 +1521,6 @@ out_unlock:
> return may_open(&nd->path, 0, open_flag & ~O_TRUNC);
> }
>
> -/*
> - * Note that while the flag value (low two bits) for sys_open means:
> - * 00 - read-only
> - * 01 - write-only
> - * 10 - read-write
> - * 11 - special
> - * it is changed into
> - * 00 - no permissions needed
> - * 01 - read-permission
> - * 10 - write-permission
> - * 11 - read-write
> - * for the internal routines (ie open_namei()/follow_link() etc)
> - * This is more logical, and also allows the 00 "no perm needed"
> - * to be used for symlinks (where the permissions are checked
> - * later).
> - *
> -*/
> -static inline int open_to_namei_flags(int flag)
> -{
> - if ((flag+1) & O_ACCMODE)
> - flag++;
> - return flag;
> -}
> -
> static int open_will_truncate(int flag, struct inode *inode)
> {
> /*
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 8174c8a..6168526 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2364,3 +2364,41 @@ void put_mnt_ns(struct mnt_namespace *ns)
> kfree(ns);
> }
> EXPORT_SYMBOL(put_mnt_ns);
> +
> +/*
> + * Get any vfsmount mapping the superblock in the
> + * task namespace
> + */
> +struct vfsmount *fs_get_vfsmount(struct task_struct *task,
> + struct super_block *sb)
> +{
> + struct nsproxy *nsp;
> + struct list_head *mount_list;
> + struct mnt_namespace *ns = NULL;
> + struct vfsmount *mnt, *sb_mnt = NULL;
> +
> + rcu_read_lock();
> + nsp = task_nsproxy(task);
> + if (nsp) {
> + ns = nsp->mnt_ns;
> + if (ns)
> + get_mnt_ns(ns);
> + }
> + rcu_read_unlock();
> + if (!ns)
> + return NULL;
> + down_read(&namespace_sem);
> + list_for_each(mount_list, &ns->list) {
> + mnt = list_entry(mount_list, struct vfsmount, mnt_list);
> + if (mnt->mnt_sb == sb) {
> + /* found the matching super block */
> + sb_mnt = mnt;
> + mntget(sb_mnt);
> + break;
> + }
> + }
> + up_read(&namespace_sem);
> +
> + put_mnt_ns(ns);
> + return sb_mnt;
> +}
If a task has the same filesystem mounted several times in it's namespace
(via "mount --bind") you just return any arbitrary one (the first).
Suppose that the filesystem does appear twice, and that in one appearance a
particular directory is mounted on, while in another appearance that
directory is not mounted on. Suppose that the mounted-on appearance is
listed first in ns->list
Now we use a series of "name_to_handle" / "open_by_handle" calls to descend
through the second appearance to something beneath that directory which (in
the first appearance) is mounted-on.
This will not work as you will actually be descending the first appearance of
the filesystem.
I really think you should leave the choice of filesystem/mountpoint to
user-space.
> diff --git a/fs/open.c b/fs/open.c
> index e9aae5c..b87fa32 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -1306,3 +1306,151 @@ SYSCALL_DEFINE2(lname_to_handle, const char __user *, name,
> asmlinkage_protect(2, ret, name, handle);
> return ret;
> }
> +
> +static int vfs_dentry_acceptable(void *context, struct dentry *dentry)
> +{
> + return 1;
> +}
> +
> +static struct dentry *handle_to_dentry(struct vfsmount *mnt,
> + struct file_handle *handle)
> +{
> + int handle_size;
> + struct dentry *dentry;
> +
> + /* change the handle size to multiple of sizeof(u32) */
> + handle_size = handle->handle_size >> 2;
> + dentry = exportfs_decode_fh(mnt, (struct fid *)handle->f_handle,
> + handle_size, handle->handle_type,
> + vfs_dentry_acceptable, NULL);
> + return dentry;
> +}
> +
> +long do_sys_open_by_handle(struct file_handle __user *ufh, int flags)
> +{
> + int fd;
> + int retval = 0;
> + int d_flags = flags;
> + struct file *filp;
> + struct vfsmount *mnt;
> + struct inode *inode;
> + struct dentry *dentry;
> + struct super_block *sb;
> + struct file_handle f_handle;
> + struct file_handle *handle = NULL;
> +
> + if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle)))
> + return -EFAULT;
> +
> + if ((f_handle.handle_size > MAX_HANDLE_SZ) ||
> + (f_handle.handle_size <= 0))
> + return -EINVAL;
> +
> + if (!capable(CAP_DAC_OVERRIDE))
> + return -EPERM;
> +
> + sb = fs_get_sb(&f_handle.fsid);
> + if (!sb)
> + return -ESTALE;
> + /*
> + * Find the vfsmount for this superblock in the
> + * current namespace
> + */
> + mnt = fs_get_vfsmount(current, sb);
> + if (!mnt) {
> + retval = -ESTALE;
> + goto out_sb;
> + }
> +
> + handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_size,
> + GFP_KERNEL);
> + if (!handle) {
> + retval = -ENOMEM;
> + goto out_mnt;
> + }
> + /* copy the full handle */
> + if (copy_from_user(handle, ufh,
> + sizeof(struct file_handle) +
> + f_handle.handle_size)) {
> + retval = -EFAULT;
> + goto out_mnt;
> + }
> +
> + dentry = handle_to_dentry(mnt, handle);
> + if (IS_ERR(dentry)) {
> + retval = PTR_ERR(dentry);
> + goto out_mnt;
> + }
> +
> + inode = dentry->d_inode;
> +
> + flags = open_to_namei_flags(flags);
> + /* O_TRUNC implies we need access checks for write permissions */
> + if (flags & O_TRUNC)
> + flags |= MAY_WRITE;
> +
> + if ((!(flags & O_APPEND) || (flags & O_TRUNC)) &&
> + (flags & FMODE_WRITE) && IS_APPEND(inode)) {
> + retval = -EPERM;
> + goto out_err;
> + }
> +
> + if ((flags & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
> + retval = -EACCES;
> + goto out_err;
> + }
> +
> + /* Can't write directories. */
> + if (S_ISDIR(inode->i_mode) && (flags & FMODE_WRITE)) {
> + retval = -EISDIR;
> + goto out_err;
> + }
> +
> + fd = get_unused_fd();
You want to use alloc_fd() (or get_unused_fd_flags) rather than get_unused_fd
so that the flags can be passed though and, in particular, so O_CLOEXEC will
be honoured.
> + if (fd < 0) {
> + retval = fd;
> + goto out_err;
> + }
> +
> + filp = dentry_open(dentry, mntget(mnt),
> + d_flags, current_cred());
dentry_open consumed the reference to dentry, so ...
> + if (IS_ERR(filp)) {
> + put_unused_fd(fd);
> + retval = PTR_ERR(filp);
> + goto out_err;
The dput at 'out_err' will put it one time too many.
I think the best fix would be to pass dget(dentry) to dentry_open, then ....
> + }
> +
> + if (inode->i_mode & S_IFREG) {
> + filp->f_flags |= O_NOATIME;
> + filp->f_mode |= FMODE_NOCMTIME;
> + }
> + fsnotify_open(filp->f_path.dentry);
> + fd_install(fd, filp);
> + retval = fd;
> + goto out_mnt;
... this goto (Which is rather ugly to me) would not be needed.
NeilBrown
> +
> +out_err:
> + dput(dentry);
> +out_mnt:
> + kfree(handle);
> + mntput(mnt);
> +out_sb:
> + deactivate_super(sb);
> +
> + return retval;
> +}
> +
> +SYSCALL_DEFINE2(open_by_handle, struct file_handle __user *, handle,
> + int, flags)
> +{
> + long ret;
> +
> + if (force_o_largefile())
> + flags |= O_LARGEFILE;
> +
> + ret = do_sys_open_by_handle(handle, flags);
> +
> + /* avoid REGPARM breakage on x86: */
> + asmlinkage_protect(2, ret, handle, flags);
> + return ret;
> +}
> diff --git a/fs/pnode.c b/fs/pnode.c
> index 5cc564a..9f6d12d 100644
> --- a/fs/pnode.c
> +++ b/fs/pnode.c
> @@ -6,9 +6,9 @@
> * Author : Ram Pai (linuxram@us.ibm.com)
> *
> */
> +#include <linux/fs.h>
> #include <linux/mnt_namespace.h>
> #include <linux/mount.h>
> -#include <linux/fs.h>
> #include "internal.h"
> #include "pnode.h"
>
> diff --git a/include/linux/mnt_namespace.h b/include/linux/mnt_namespace.h
> index 0b89efc..d363ecc 100644
> --- a/include/linux/mnt_namespace.h
> +++ b/include/linux/mnt_namespace.h
> @@ -36,6 +36,8 @@ extern const struct seq_operations mounts_op;
> extern const struct seq_operations mountinfo_op;
> extern const struct seq_operations mountstats_op;
> extern int mnt_had_events(struct proc_mounts *);
> +extern struct vfsmount *fs_get_vfsmount(struct task_struct *task,
> + struct super_block *sb);
>
> #endif
> #endif
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 05b441d..a853aa0 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -4,6 +4,7 @@
> #include <linux/dcache.h>
> #include <linux/linkage.h>
> #include <linux/path.h>
> +#include <asm-generic/fcntl.h>
>
> struct vfsmount;
>
> @@ -96,4 +97,27 @@ static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
> ((char *) name)[min(len, maxlen)] = '\0';
> }
>
> +/*
> + * Note that while the flag value (low two bits) for sys_open means:
> + * 00 - read-only
> + * 01 - write-only
> + * 10 - read-write
> + * 11 - special
> + * it is changed into
> + * 00 - no permissions needed
> + * 01 - read-permission
> + * 10 - write-permission
> + * 11 - read-write
> + * for the internal routines (ie open_namei()/follow_link() etc)
> + * This is more logical, and also allows the 00 "no perm needed"
> + * to be used for symlinks (where the permissions are checked
> + * later).
> + *
> +*/
> +static inline int open_to_namei_flags(int flag)
> +{
> + if ((flag+1) & O_ACCMODE)
> + flag++;
> + return flag;
> +}
> #endif /* _LINUX_NAMEI_H */
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH -V5 2/8] vfs: Add name to file handle conversion support
2010-04-26 17:34 ` [PATCH -V5 2/8] vfs: Add name to file handle conversion support Aneesh Kumar K.V
@ 2010-04-27 1:19 ` Neil Brown
2010-04-27 6:08 ` Aneesh Kumar K. V
0 siblings, 1 reply; 13+ messages in thread
From: Neil Brown @ 2010-04-27 1:19 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: hch, viro, adilger, corbet, serue, linux-fsdevel, sfrench
On Mon, 26 Apr 2010 23:04:00 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> Acked-by: Serge Hallyn <serue@us.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
> fs/open.c | 100 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/fs.h | 17 +++++++++
> 2 files changed, 117 insertions(+), 0 deletions(-)
>
> diff --git a/fs/open.c b/fs/open.c
> index 74e5cd9..e9aae5c 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -30,6 +30,8 @@
> #include <linux/falloc.h>
> #include <linux/fs_struct.h>
> #include <linux/ima.h>
> +#include <linux/exportfs.h>
> +#include <linux/mnt_namespace.h>
>
> #include "internal.h"
>
> @@ -1206,3 +1208,101 @@ int nonseekable_open(struct inode *inode, struct file *filp)
> }
>
> EXPORT_SYMBOL(nonseekable_open);
> +
> +/* limit the handle size to some value */
> +#define MAX_HANDLE_SZ 4096
> +static long do_sys_name_to_handle(struct path *path,
> + struct file_handle __user *ufh)
> +{
> + int retval;
> + int handle_size;
> + struct super_block *sb;
> + struct uuid *this_fs_id;
> + struct file_handle f_handle;
> + struct file_handle *handle = NULL;
> +
> + if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle)))
> + return -EFAULT;
> +
> + if (f_handle.handle_size > MAX_HANDLE_SZ)
> + return -EINVAL;
> +
> + handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_size,
> + GFP_KERNEL);
> + if (!handle) {
> + retval = -ENOMEM;
> + goto err_out;
> + }
You are mixing your patterns here.
In some cases you do:
if (something fails)
return -ERROR
other times you do
if (something fails) {
retval = -ERROR;
goto err_out;
}
which has exactly the same fix. Being consistent is best and I would
recommend using the goto option - it is more maintainable.
I would actually prefer:
retval = -ERROR;
if (something fails)
goto err_out;
but leave the choice to you. Consistency is the important thing.
> + handle_size = f_handle.handle_size;
> +
> + /* we ask for a non connected handle */
> + retval = exportfs_encode_fh(path->dentry,
> + (struct fid *)handle->f_handle,
> + &handle_size, 0);
> + /* convert handle size to bytes */
> + handle_size *= sizeof(u32);
> + handle->handle_type = retval;
> + handle->handle_size = handle_size;
> + if (handle_size <= f_handle.handle_size) {
> + /* get the uuid */
> + sb = path->mnt->mnt_sb;
> + if (sb->s_op->get_fsid) {
> + this_fs_id = sb->s_op->get_fsid(sb);
> + memcpy(handle->fsid.uuid, this_fs_id->uuid,
> + sizeof(handle->fsid.uuid));
> + } else
> + memset(handle->fsid.uuid, 0, sizeof(handle->fsid.uuid));
> + retval = 0;
> + } else {
> + /*
> + * set the handle_size to zero so we copy only
> + * non variable part of the file_handle
> + */
> + handle_size = 0;
> + retval = -EOVERFLOW;
> + }
> + if (copy_to_user(ufh, handle,
> + sizeof(struct file_handle) + handle_size))
> + retval = -EFAULT;
> +
> + kfree(handle);
> +err_out:
> + return retval;
> +}
> +
> +SYSCALL_DEFINE2(name_to_handle, const char __user *, name,
> + struct file_handle __user *, handle)
> +{
> + long ret;
> + struct path path;
*Surely* you want name_to_handle_at, or name_at_to_handle here.
i.e. include an int 'dfd' to specify the starting directory.
Then you can make use of AT_SYMLINK_NOFOLLOW and so avoid the need for
2 syscalls.
> +
> + /* Follow links */
> + ret = user_path(name, &path);
> + if (ret)
> + return ret;
> +
> + ret = do_sys_name_to_handle(&path, handle);
> + path_put(&path);
> +
> + /* avoid REGPARM breakage on x86: */
> + asmlinkage_protect(2, ret, name, handle);
> + return ret;
> +}
> +
> +SYSCALL_DEFINE2(lname_to_handle, const char __user *, name,
> + struct file_handle __user *, handle)
> +{
> + long ret;
> + struct path path;
> +
> + ret = user_lpath(name, &path);
> + if (ret)
> + return ret;
> +
> + ret = do_sys_name_to_handle(&path, handle);
> + path_put(&path);
> +
> + /* avoid REGPARM breakage on x86: */
> + asmlinkage_protect(2, ret, name, handle);
> + return ret;
> +}
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 39d57bc..da28b80 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -948,6 +948,20 @@ struct file {
> unsigned long f_mnt_write_state;
> #endif
> };
> +
> +struct uuid {
> + char uuid[16];
unsigned char??? just a thought.
> +};
> +
> +struct file_handle {
> + int handle_size;
> + int handle_type;
> + /* File system identifier */
> + struct uuid fsid;
> + /* file identifier */
> + char f_handle[0];
> +};
> +
> extern spinlock_t files_lock;
> #define file_list_lock() spin_lock(&files_lock);
> #define file_list_unlock() spin_unlock(&files_lock);
> @@ -1580,6 +1594,7 @@ struct super_operations {
> ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
> #endif
> int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
> + struct uuid *(*get_fsid)(struct super_block *);
Assuming this was actually a good idea (which I am not yet sold on),
I think you would do better to pass in the address of a uuid to be filled in:
int (*get_fsid)(struct super_block *, struct uuid *)
just incase the filesystem stored something in a slightly different format.
And as XFS has a 'nouuid' mount option, it might be best to allow an error
return to say "there is no uuid" - maybe expect -ENOENT ??
NeilBrown
> };
>
> /*
> @@ -2328,6 +2343,8 @@ extern struct super_block *get_super(struct block_device *);
> extern struct super_block *get_active_super(struct block_device *bdev);
> extern struct super_block *user_get_super(dev_t);
> extern void drop_super(struct super_block *sb);
> +extern struct super_block *fs_get_sb(struct uuid *fsid);
> +
>
> extern int dcache_dir_open(struct inode *, struct file *);
> extern int dcache_dir_close(struct inode *, struct file *);
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH -V5 2/8] vfs: Add name to file handle conversion support
2010-04-27 1:19 ` Neil Brown
@ 2010-04-27 6:08 ` Aneesh Kumar K. V
0 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K. V @ 2010-04-27 6:08 UTC (permalink / raw)
To: Neil Brown; +Cc: hch, viro, adilger, corbet, serue, linux-fsdevel, sfrench
On Tue, 27 Apr 2010 11:19:49 +1000, Neil Brown <neilb@suse.de> wrote:
> On Mon, 26 Apr 2010 23:04:00 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
>
> > Acked-by: Serge Hallyn <serue@us.ibm.com>
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > ---
> > fs/open.c | 100 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/fs.h | 17 +++++++++
> > 2 files changed, 117 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/open.c b/fs/open.c
> > index 74e5cd9..e9aae5c 100644
> > --- a/fs/open.c
> > +++ b/fs/open.c
> > @@ -30,6 +30,8 @@
> > #include <linux/falloc.h>
> > #include <linux/fs_struct.h>
> > #include <linux/ima.h>
> > +#include <linux/exportfs.h>
> > +#include <linux/mnt_namespace.h>
> >
> > #include "internal.h"
> >
> > @@ -1206,3 +1208,101 @@ int nonseekable_open(struct inode *inode, struct file *filp)
> > }
> >
> > EXPORT_SYMBOL(nonseekable_open);
> > +
> > +/* limit the handle size to some value */
> > +#define MAX_HANDLE_SZ 4096
> > +static long do_sys_name_to_handle(struct path *path,
> > + struct file_handle __user *ufh)
> > +{
> > + int retval;
> > + int handle_size;
> > + struct super_block *sb;
> > + struct uuid *this_fs_id;
> > + struct file_handle f_handle;
> > + struct file_handle *handle = NULL;
> > +
> > + if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle)))
> > + return -EFAULT;
> > +
> > + if (f_handle.handle_size > MAX_HANDLE_SZ)
> > + return -EINVAL;
> > +
> > + handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_size,
> > + GFP_KERNEL);
> > + if (!handle) {
> > + retval = -ENOMEM;
> > + goto err_out;
> > + }
>
> You are mixing your patterns here.
> In some cases you do:
> if (something fails)
> return -ERROR
> other times you do
>
> if (something fails) {
> retval = -ERROR;
> goto err_out;
> }
>
> which has exactly the same fix. Being consistent is best and I would
> recommend using the goto option - it is more maintainable.
> I would actually prefer:
>
> retval = -ERROR;
> if (something fails)
> goto err_out;
>
> but leave the choice to you. Consistency is the important thing.
Fixed as per you suggestion.
>
>
>
> > + handle_size = f_handle.handle_size;
> > +
> > + /* we ask for a non connected handle */
> > + retval = exportfs_encode_fh(path->dentry,
> > + (struct fid *)handle->f_handle,
> > + &handle_size, 0);
> > + /* convert handle size to bytes */
> > + handle_size *= sizeof(u32);
> > + handle->handle_type = retval;
> > + handle->handle_size = handle_size;
> > + if (handle_size <= f_handle.handle_size) {
> > + /* get the uuid */
> > + sb = path->mnt->mnt_sb;
> > + if (sb->s_op->get_fsid) {
> > + this_fs_id = sb->s_op->get_fsid(sb);
> > + memcpy(handle->fsid.uuid, this_fs_id->uuid,
> > + sizeof(handle->fsid.uuid));
> > + } else
> > + memset(handle->fsid.uuid, 0, sizeof(handle->fsid.uuid));
> > + retval = 0;
> > + } else {
> > + /*
> > + * set the handle_size to zero so we copy only
> > + * non variable part of the file_handle
> > + */
> > + handle_size = 0;
> > + retval = -EOVERFLOW;
> > + }
> > + if (copy_to_user(ufh, handle,
> > + sizeof(struct file_handle) + handle_size))
> > + retval = -EFAULT;
> > +
> > + kfree(handle);
> > +err_out:
> > + return retval;
> > +}
> > +
> > +SYSCALL_DEFINE2(name_to_handle, const char __user *, name,
> > + struct file_handle __user *, handle)
> > +{
> > + long ret;
> > + struct path path;
>
> *Surely* you want name_to_handle_at, or name_at_to_handle here.
> i.e. include an int 'dfd' to specify the starting directory.
>
> Then you can make use of AT_SYMLINK_NOFOLLOW and so avoid the need for
> 2 syscalls.
>
Done. Droped the lname_to_handle syscall
>
> > +
> > + /* Follow links */
> > + ret = user_path(name, &path);
> > + if (ret)
> > + return ret;
> > +
> > + ret = do_sys_name_to_handle(&path, handle);
> > + path_put(&path);
> > +
> > + /* avoid REGPARM breakage on x86: */
> > + asmlinkage_protect(2, ret, name, handle);
> > + return ret;
> > +}
> > +
> > +SYSCALL_DEFINE2(lname_to_handle, const char __user *, name,
> > + struct file_handle __user *, handle)
> > +{
> > + long ret;
> > + struct path path;
> > +
> > + ret = user_lpath(name, &path);
> > + if (ret)
> > + return ret;
> > +
> > + ret = do_sys_name_to_handle(&path, handle);
> > + path_put(&path);
> > +
> > + /* avoid REGPARM breakage on x86: */
> > + asmlinkage_protect(2, ret, name, handle);
> > + return ret;
> > +}
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 39d57bc..da28b80 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -948,6 +948,20 @@ struct file {
> > unsigned long f_mnt_write_state;
> > #endif
> > };
> > +
> > +struct uuid {
> > + char uuid[16];
>
> unsigned char??? just a thought.
>
Fixed
>
> > +};
> > +
> > +struct file_handle {
> > + int handle_size;
> > + int handle_type;
> > + /* File system identifier */
> > + struct uuid fsid;
> > + /* file identifier */
> > + char f_handle[0];
I also changed this to unsigned char f_handle[0];
> > +};
> > +
> > extern spinlock_t files_lock;
> > #define file_list_lock() spin_lock(&files_lock);
> > #define file_list_unlock() spin_unlock(&files_lock);
> > @@ -1580,6 +1594,7 @@ struct super_operations {
> > ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
> > #endif
> > int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
> > + struct uuid *(*get_fsid)(struct super_block *);
>
> Assuming this was actually a good idea (which I am not yet sold on),
> I think you would do better to pass in the address of a uuid to be filled in:
>
> int (*get_fsid)(struct super_block *, struct uuid *)
>
> just incase the filesystem stored something in a slightly different format.
>
> And as XFS has a 'nouuid' mount option, it might be best to allow an error
> return to say "there is no uuid" - maybe expect -ENOENT ??
Fixed
Thanks for the review
-aneesh
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH -V5 3/8] vfs: Add open by file handle support
2010-04-27 1:05 ` Neil Brown
@ 2010-04-27 6:10 ` Aneesh Kumar K. V
0 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K. V @ 2010-04-27 6:10 UTC (permalink / raw)
To: Neil Brown; +Cc: hch, viro, adilger, corbet, serue, linux-fsdevel, sfrench
On Tue, 27 Apr 2010 11:05:45 +1000, Neil Brown <neilb@suse.de> wrote:
> On Mon, 26 Apr 2010 23:04:01 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
>
> > Acked-by: Serge Hallyn <serue@us.ibm.com>
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > ---
> > fs/filesystems.c | 32 +++++++++-
> > fs/namei.c | 24 -------
> > fs/namespace.c | 38 +++++++++++
> > fs/open.c | 148 +++++++++++++++++++++++++++++++++++++++++
> > fs/pnode.c | 2 +-
> > include/linux/mnt_namespace.h | 2 +
> > include/linux/namei.h | 24 +++++++
> > 7 files changed, 244 insertions(+), 26 deletions(-)
> >
> > diff --git a/fs/filesystems.c b/fs/filesystems.c
> > index 68ba492..743d36e 100644
> > --- a/fs/filesystems.c
> > +++ b/fs/filesystems.c
> > @@ -281,5 +281,35 @@ struct file_system_type *get_fs_type(const char *name)
> > }
> > return fs;
> > }
> > -
> > EXPORT_SYMBOL(get_fs_type);
> > +
> > +struct super_block *fs_get_sb(struct uuid *fsid)
> > +{
> > + struct uuid *this_fsid;
> > + struct file_system_type *fs_type;
> > + struct super_block *sb, *found_sb = NULL;
> > +
> > + read_lock(&file_systems_lock);
> > + fs_type = file_systems;
> > + while (fs_type) {
>
> You've open-coded a for-loop here...
> Why not:
> for (fs_type = file_systems ; fs_type ; fs_type = fs_type->next) {
> ??
Fixed
>
>
> > + spin_lock(&sb_lock);
> > + list_for_each_entry(sb, &fs_type->fs_supers, s_instances) {
> > + if (!sb->s_op->get_fsid)
> > + continue;
> > + this_fsid = sb->s_op->get_fsid(sb);
> > + if (!memcmp(fsid->uuid, this_fsid->uuid,
> > + sizeof(this_fsid->uuid))) {
> > + /* found the matching super_block */
> > + atomic_inc(&sb->s_active);
> > + found_sb = sb;
> > + spin_unlock(&sb_lock);
> > + goto out;
> > + }
> > + }
> > + spin_unlock(&sb_lock);
> > + fs_type = fs_type->next;
> > + }
> > +out:
> > + read_unlock(&file_systems_lock);
> > + return found_sb;
> > +}
> > diff --git a/fs/namei.c b/fs/namei.c
> > index a7dce91..a18711e 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -1521,30 +1521,6 @@ out_unlock:
> > return may_open(&nd->path, 0, open_flag & ~O_TRUNC);
> > }
> >
> > -/*
> > - * Note that while the flag value (low two bits) for sys_open means:
> > - * 00 - read-only
> > - * 01 - write-only
> > - * 10 - read-write
> > - * 11 - special
> > - * it is changed into
> > - * 00 - no permissions needed
> > - * 01 - read-permission
> > - * 10 - write-permission
> > - * 11 - read-write
> > - * for the internal routines (ie open_namei()/follow_link() etc)
> > - * This is more logical, and also allows the 00 "no perm needed"
> > - * to be used for symlinks (where the permissions are checked
> > - * later).
> > - *
> > -*/
> > -static inline int open_to_namei_flags(int flag)
> > -{
> > - if ((flag+1) & O_ACCMODE)
> > - flag++;
> > - return flag;
> > -}
> > -
> > static int open_will_truncate(int flag, struct inode *inode)
> > {
> > /*
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index 8174c8a..6168526 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -2364,3 +2364,41 @@ void put_mnt_ns(struct mnt_namespace *ns)
> > kfree(ns);
> > }
> > EXPORT_SYMBOL(put_mnt_ns);
> > +
> > +/*
> > + * Get any vfsmount mapping the superblock in the
> > + * task namespace
> > + */
> > +struct vfsmount *fs_get_vfsmount(struct task_struct *task,
> > + struct super_block *sb)
> > +{
> > + struct nsproxy *nsp;
> > + struct list_head *mount_list;
> > + struct mnt_namespace *ns = NULL;
> > + struct vfsmount *mnt, *sb_mnt = NULL;
> > +
> > + rcu_read_lock();
> > + nsp = task_nsproxy(task);
> > + if (nsp) {
> > + ns = nsp->mnt_ns;
> > + if (ns)
> > + get_mnt_ns(ns);
> > + }
> > + rcu_read_unlock();
> > + if (!ns)
> > + return NULL;
> > + down_read(&namespace_sem);
> > + list_for_each(mount_list, &ns->list) {
> > + mnt = list_entry(mount_list, struct vfsmount, mnt_list);
> > + if (mnt->mnt_sb == sb) {
> > + /* found the matching super block */
> > + sb_mnt = mnt;
> > + mntget(sb_mnt);
> > + break;
> > + }
> > + }
> > + up_read(&namespace_sem);
> > +
> > + put_mnt_ns(ns);
> > + return sb_mnt;
> > +}
>
> If a task has the same filesystem mounted several times in it's namespace
> (via "mount --bind") you just return any arbitrary one (the first).
>
> Suppose that the filesystem does appear twice, and that in one appearance a
> particular directory is mounted on, while in another appearance that
> directory is not mounted on. Suppose that the mounted-on appearance is
> listed first in ns->list
>
> Now we use a series of "name_to_handle" / "open_by_handle" calls to descend
> through the second appearance to something beneath that directory which (in
> the first appearance) is mounted-on.
> This will not work as you will actually be descending the first appearance of
> the filesystem.
> I really think you should leave the choice of filesystem/mountpoint to
> user-space.
>
Will comment on this on a separate thread.
>
> > diff --git a/fs/open.c b/fs/open.c
> > index e9aae5c..b87fa32 100644
> > --- a/fs/open.c
> > +++ b/fs/open.c
> > @@ -1306,3 +1306,151 @@ SYSCALL_DEFINE2(lname_to_handle, const char __user *, name,
> > asmlinkage_protect(2, ret, name, handle);
> > return ret;
> > }
> > +
> > +static int vfs_dentry_acceptable(void *context, struct dentry *dentry)
> > +{
> > + return 1;
> > +}
> > +
> > +static struct dentry *handle_to_dentry(struct vfsmount *mnt,
> > + struct file_handle *handle)
> > +{
> > + int handle_size;
> > + struct dentry *dentry;
> > +
> > + /* change the handle size to multiple of sizeof(u32) */
> > + handle_size = handle->handle_size >> 2;
> > + dentry = exportfs_decode_fh(mnt, (struct fid *)handle->f_handle,
> > + handle_size, handle->handle_type,
> > + vfs_dentry_acceptable, NULL);
> > + return dentry;
> > +}
> > +
> > +long do_sys_open_by_handle(struct file_handle __user *ufh, int flags)
> > +{
> > + int fd;
> > + int retval = 0;
> > + int d_flags = flags;
> > + struct file *filp;
> > + struct vfsmount *mnt;
> > + struct inode *inode;
> > + struct dentry *dentry;
> > + struct super_block *sb;
> > + struct file_handle f_handle;
> > + struct file_handle *handle = NULL;
> > +
> > + if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle)))
> > + return -EFAULT;
> > +
> > + if ((f_handle.handle_size > MAX_HANDLE_SZ) ||
> > + (f_handle.handle_size <= 0))
> > + return -EINVAL;
> > +
> > + if (!capable(CAP_DAC_OVERRIDE))
> > + return -EPERM;
> > +
> > + sb = fs_get_sb(&f_handle.fsid);
> > + if (!sb)
> > + return -ESTALE;
> > + /*
> > + * Find the vfsmount for this superblock in the
> > + * current namespace
> > + */
> > + mnt = fs_get_vfsmount(current, sb);
> > + if (!mnt) {
> > + retval = -ESTALE;
> > + goto out_sb;
> > + }
> > +
> > + handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_size,
> > + GFP_KERNEL);
> > + if (!handle) {
> > + retval = -ENOMEM;
> > + goto out_mnt;
> > + }
> > + /* copy the full handle */
> > + if (copy_from_user(handle, ufh,
> > + sizeof(struct file_handle) +
> > + f_handle.handle_size)) {
> > + retval = -EFAULT;
> > + goto out_mnt;
> > + }
> > +
> > + dentry = handle_to_dentry(mnt, handle);
> > + if (IS_ERR(dentry)) {
> > + retval = PTR_ERR(dentry);
> > + goto out_mnt;
> > + }
> > +
> > + inode = dentry->d_inode;
> > +
> > + flags = open_to_namei_flags(flags);
> > + /* O_TRUNC implies we need access checks for write permissions */
> > + if (flags & O_TRUNC)
> > + flags |= MAY_WRITE;
> > +
> > + if ((!(flags & O_APPEND) || (flags & O_TRUNC)) &&
> > + (flags & FMODE_WRITE) && IS_APPEND(inode)) {
> > + retval = -EPERM;
> > + goto out_err;
> > + }
> > +
> > + if ((flags & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
> > + retval = -EACCES;
> > + goto out_err;
> > + }
> > +
> > + /* Can't write directories. */
> > + if (S_ISDIR(inode->i_mode) && (flags & FMODE_WRITE)) {
> > + retval = -EISDIR;
> > + goto out_err;
> > + }
> > +
> > + fd = get_unused_fd();
>
> You want to use alloc_fd() (or get_unused_fd_flags) rather than get_unused_fd
> so that the flags can be passed though and, in particular, so O_CLOEXEC will
> be honoured.
Fixed by saying fd = get_unused_fd_flags(d_flags);
>
>
> > + if (fd < 0) {
> > + retval = fd;
> > + goto out_err;
> > + }
> > +
> > + filp = dentry_open(dentry, mntget(mnt),
> > + d_flags, current_cred());
>
> dentry_open consumed the reference to dentry, so ...
>
> > + if (IS_ERR(filp)) {
> > + put_unused_fd(fd);
> > + retval = PTR_ERR(filp);
> > + goto out_err;
>
> The dput at 'out_err' will put it one time too many.
> I think the best fix would be to pass dget(dentry) to dentry_open, then ....
>
> > + }
> > +
> > + if (inode->i_mode & S_IFREG) {
> > + filp->f_flags |= O_NOATIME;
> > + filp->f_mode |= FMODE_NOCMTIME;
> > + }
> > + fsnotify_open(filp->f_path.dentry);
> > + fd_install(fd, filp);
> > + retval = fd;
> > + goto out_mnt;
>
> ... this goto (Which is rather ugly to me) would not be needed.
>
Fixed
Thanks for the review
-aneesh
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2010-04-27 6:10 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
2010-04-26 17:33 ` [PATCH -V5 1/8] exportfs: Return the minimum required handle size Aneesh Kumar K.V
2010-04-26 17:34 ` [PATCH -V5 2/8] vfs: Add name to file handle conversion support Aneesh Kumar K.V
2010-04-27 1:19 ` Neil Brown
2010-04-27 6:08 ` Aneesh Kumar K. V
2010-04-26 17:34 ` [PATCH -V5 3/8] vfs: Add open by file handle support Aneesh Kumar K.V
2010-04-27 1:05 ` Neil Brown
2010-04-27 6:10 ` Aneesh Kumar K. V
2010-04-26 17:34 ` [PATCH -V5 4/8] vfs: Add freadlink syscall Aneesh Kumar K.V
2010-04-26 17:34 ` [PATCH -V5 5/8] ext4: Add get_fsid callback Aneesh Kumar K.V
2010-04-26 17:34 ` [PATCH -V5 6/8] x86: Add new syscalls for x86_32 Aneesh Kumar K.V
2010-04-26 17:34 ` [PATCH -V5 7/8] x86: Add new syscalls for x86_64 Aneesh Kumar K.V
2010-04-26 17:34 ` [PATCH -V5 8/8] ext3: Add get_fsid callback Aneesh Kumar K.V
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).