* Re: [PATCH -V7 4/9] vfs: Add open by file handle support
@ 2010-05-14 19:56 Steve French
2010-05-16 7:24 ` Aneesh Kumar K. V
0 siblings, 1 reply; 7+ messages in thread
From: Steve French @ 2010-05-14 19:56 UTC (permalink / raw)
To: linux-fsdevel, LKML
I think open by handle will turn out to be useful, but in discussing
various "duplicate inode number" checks that we are having to add to
cifs, it reminded me that we probably need a "generation number" or
some equivalent (birth time is probably good enough as well) to be
able to tell the case where a file is deleted and new file is created
reusing the same inode number (eventually Samba needs to return this
to posix clients if inode numbers are to be useful - and I don't know
how to tell Samba how to get birth time or generation numbers out of
stat in userspace)
--
Thanks,
Steve
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH -V7 4/9] vfs: Add open by file handle support
2010-05-14 19:56 [PATCH -V7 4/9] vfs: Add open by file handle support Steve French
@ 2010-05-16 7:24 ` Aneesh Kumar K. V
0 siblings, 0 replies; 7+ messages in thread
From: Aneesh Kumar K. V @ 2010-05-16 7:24 UTC (permalink / raw)
To: Steve French, linux-fsdevel, LKML
On Fri, 14 May 2010 14:56:08 -0500, Steve French <smfrench@gmail.com> wrote:
> I think open by handle will turn out to be useful, but in discussing
> various "duplicate inode number" checks that we are having to add to
> cifs, it reminded me that we probably need a "generation number" or
> some equivalent (birth time is probably good enough as well) to be
> able to tell the case where a file is deleted and new file is created
> reusing the same inode number (eventually Samba needs to return this
> to posix clients if inode numbers are to be useful - and I don't know
> how to tell Samba how to get birth time or generation numbers out of
> stat in userspace)
>
The file handle already have generation number encoded. The patches
describe file handle as
struct file_handle {
int handle_size;
int handle_type;
/* File system identifier */
struct uuid fsid;
/* file identifier */
unsigned char f_handle[0];
};
The file identifier which is stored in f_handle and of size handle_size
is obtained via exportfs_encode_fh (default to export_encode_fh). File
system can hook a super_block->export_operations there. The default one
export_encode_fh actually use inode number and generation number as a
part of handle. So we shoud be having different file identifier on the
client side for these files which happened to have duplicate inode
number due to inode number reuse. But being able to determine generation
number in the userspace would be nice.
-aneesh
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH -V7 0/8] Generic name to handle and open by handle syscalls
@ 2010-05-12 15:50 Aneesh Kumar K.V
2010-05-12 15:50 ` [PATCH -V7 4/9] vfs: Add open by file handle support Aneesh Kumar K.V
0 siblings, 1 reply; 7+ messages in thread
From: Aneesh Kumar K.V @ 2010-05-12 15:50 UTC (permalink / raw)
To: hch, viro, adilger, corbet, serue, neilb
Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel
Hi,
The below set of patches implement open by handle support using exportfs
operations. This allows user space application to map a file name to file
handle and later open the file using handle. This should be usable
for userspace NFS [1] and 9P server [2]. XFS already support this with the ioctls
XFS_IOC_PATH_TO_HANDLE and XFS_IOC_OPEN_BY_HANDLE.
[1] http://nfs-ganesha.sourceforge.net/
[2] http://thread.gmane.org/gmane.comp.emulators.qemu/68992
I guess v7 address all the review comments. If I have missing some please let me
know. I am also looking at getting xfsprogs libhandle.so on top of these syscalls.
Let me know if you have any objection towards merging this patchset upstream.
Changes from V6:
a) Add uuid to vfsmount lookup and drop uuid to superblock lookup
b) Return -EOPNOTSUPP in sys_name_to_handle if the file system returned uuid
doesn't give the same vfsmount on lookup. This ensure that we fail
sys_name_to_handle when we have multiple file system returning same UUID.
Changes from V5:
a) added sys_name_to_handle_at syscall which takes AT_SYMLINK_NOFOLLOW flag
instead of two syscalls sys_name_to_handle and sys_lname_to_handle.
b) addressed review comments from Niel Brown
c) rebased to b91ce4d14a21fc04d165be30319541e0f9204f15
d) Add compat_sys_open_by_handle
Chages from V4:
a) Changed the syscal arguments so that we don't need compat syscalls
as suggested by Christoph
c) Added two new syscall sys_lname_to_handle and sys_freadlink to work with
symlinks
d) Changed open_by_handle to work with all file types
e) Add ext3 support
Changes from V3:
a) Code cleanup suggested by Andreas
b) x86_64 syscall support
c) add compat syscall
Chages from V2:
a) Support system wide unique handle.
Changes from v1:
a) handle size is now specified in bytes
b) returns -EOVERFLOW if the handle size is small
c) dropped open_handle syscall and added open_by_handle_at syscall
open_by_handle_at takes mount_fd as the directory fd of the mount point
containing the file
e) handle will only be unique in a given file system. So for an NFS server
exporting multiple file system, NFS server will have to internally track the
mount point to which a file handle belongs to. We should be able to do it much
easily than expecting kernel to give a system wide unique file handle. System
wide unique file handle would need much larger changes to the exportfs or VFS
interface and I was not sure whether we really need to do that in the kernel or
in the user space
f) open_handle_at now only check for DAC_OVERRIDE capability
Example program: (x86_32). (x86_64 would need a different syscall number)
----------------
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <string.h>
struct uuid {
unsigned char uuid[16];
};
struct file_handle {
int handle_size;
int handle_type;
struct uuid fsid;
unsigned char handle[0];
};
#define AT_FDCWD -100
#define AT_SYMLINK_NOFOLLOW 0x100
static int name_to_handle(const char *name, struct file_handle *fh)
{
return syscall(338, AT_FDCWD, name, fh, 0);
}
static int lname_to_handle(const char *name, struct file_handle *fh)
{
return syscall(338, AT_FDCWD, name, fh, AT_SYMLINK_NOFOLLOW);
}
static int open_by_handle(struct file_handle *fh, int flags)
{
return syscall(339, fh, flags);
}
static int freadlink(int fd, char *buf, size_t bufsiz)
{
return syscall(340, fd, buf, bufsiz);
}
#define BUFSZ 100
int main(int argc, char *argv[])
{
int ret;
int handle_sz;
struct stat bufstat;
int fd, dirfd;
char buf[BUFSZ];
struct file_handle *fh = NULL;;
again:
if (fh && fh->handle_size) {
handle_sz = fh->handle_size;
free(fh);
fh = malloc(sizeof(struct file_handle) + handle_sz);
fh->handle_size = handle_sz;
} else {
fh = malloc(sizeof(struct file_handle));
fh->handle_size = 0;
}
errno = 0;
ret = lname_to_handle(argv[1], fh);
if (ret && errno == EOVERFLOW) {
perror("Error:");
printf("Found the handle size needed to be %d\n", fh->handle_size);
printf("Trying again..\n");
goto again;
} else if (ret) {
perror("Error:");
exit(1);
}
fd = open_by_handle(fh, O_RDONLY);
if (fd <= 0 ) {
perror("Error:");
exit(1);
}
fstat(fd, &bufstat);
ret = S_ISLNK(bufstat.st_mode);
if (ret) {
memset(buf, 0 , BUFSZ);
freadlink(fd, buf, BUFSZ);
printf("%s is a symlink pointing to %s\n", argv[1], buf);
}
memset(buf, 0 , BUFSZ);
while (1) {
ret = read(fd, buf, BUFSZ -1);
if (ret <= 0)
break;
buf[ret] = '\0';
printf("%s", buf);
memset(buf, 0 , BUFSZ);
}
return 0;
}
-aneesh
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH -V7 4/9] vfs: Add open by file handle support
2010-05-12 15:50 [PATCH -V7 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
@ 2010-05-12 15:50 ` Aneesh Kumar K.V
2010-05-12 23:44 ` Neil Brown
0 siblings, 1 reply; 7+ messages in thread
From: Aneesh Kumar K.V @ 2010-05-12 15:50 UTC (permalink / raw)
To: hch, viro, adilger, corbet, serue, neilb
Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel,
Aneesh Kumar K.V
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
fs/namei.c | 24 ---------
fs/open.c | 136 +++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/namei.h | 24 +++++++++
3 files changed, 160 insertions(+), 24 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index a7dce91..a18711e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1521,30 +1521,6 @@ out_unlock:
return may_open(&nd->path, 0, open_flag & ~O_TRUNC);
}
-/*
- * Note that while the flag value (low two bits) for sys_open means:
- * 00 - read-only
- * 01 - write-only
- * 10 - read-write
- * 11 - special
- * it is changed into
- * 00 - no permissions needed
- * 01 - read-permission
- * 10 - write-permission
- * 11 - read-write
- * for the internal routines (ie open_namei()/follow_link() etc)
- * This is more logical, and also allows the 00 "no perm needed"
- * to be used for symlinks (where the permissions are checked
- * later).
- *
-*/
-static inline int open_to_namei_flags(int flag)
-{
- if ((flag+1) & O_ACCMODE)
- flag++;
- return flag;
-}
-
static int open_will_truncate(int flag, struct inode *inode)
{
/*
diff --git a/fs/open.c b/fs/open.c
index 9a34b81..348a1b9 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1315,3 +1315,139 @@ err_out:
asmlinkage_protect(4, ret, dfd, name, handle, flag);
return ret;
}
+
+static int vfs_dentry_acceptable(void *context, struct dentry *dentry)
+{
+ return 1;
+}
+
+static struct dentry *handle_to_dentry(struct vfsmount *mnt,
+ struct file_handle *handle)
+{
+ int handle_size;
+ struct dentry *dentry;
+
+ /* change the handle size to multiple of sizeof(u32) */
+ handle_size = handle->handle_size >> 2;
+ dentry = exportfs_decode_fh(mnt, (struct fid *)handle->f_handle,
+ handle_size, handle->handle_type,
+ vfs_dentry_acceptable, NULL);
+ return dentry;
+}
+
+static long do_sys_open_by_handle(struct file_handle __user *ufh, int flags)
+{
+ int fd;
+ int retval = 0;
+ int d_flags = flags;
+ struct file *filp;
+ struct vfsmount *mnt;
+ struct inode *inode;
+ struct dentry *dentry;
+ struct file_handle f_handle;
+ struct file_handle *handle = NULL;
+
+ if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle))) {
+ retval = -EFAULT;
+ goto out_err;
+ }
+ if ((f_handle.handle_size > MAX_HANDLE_SZ) ||
+ (f_handle.handle_size <= 0)) {
+ retval = -EINVAL;
+ goto out_err;
+ }
+ if (!capable(CAP_DAC_OVERRIDE)) {
+ retval = -EPERM;
+ goto out_err;
+ }
+ /*
+ * Find the vfsmount for this uuid in the
+ * current namespace
+ */
+ mnt = fs_get_vfsmount(current, &f_handle.fsid);
+ if (!mnt) {
+ retval = -ESTALE;
+ goto out_err;
+ }
+
+ handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_size,
+ GFP_KERNEL);
+ if (!handle) {
+ retval = -ENOMEM;
+ goto out_mnt;
+ }
+ /* copy the full handle */
+ if (copy_from_user(handle, ufh,
+ sizeof(struct file_handle) +
+ f_handle.handle_size)) {
+ retval = -EFAULT;
+ goto out_mnt;
+ }
+ dentry = handle_to_dentry(mnt, handle);
+ if (IS_ERR(dentry)) {
+ retval = PTR_ERR(dentry);
+ goto out_mnt;
+ }
+ inode = dentry->d_inode;
+ flags = open_to_namei_flags(flags);
+ /* O_TRUNC implies we need access checks for write permissions */
+ if (flags & O_TRUNC)
+ flags |= MAY_WRITE;
+
+ if ((!(flags & O_APPEND) || (flags & O_TRUNC)) &&
+ (flags & FMODE_WRITE) && IS_APPEND(inode)) {
+ retval = -EPERM;
+ goto out_dentry;
+ }
+ if ((flags & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
+ retval = -EACCES;
+ goto out_dentry;
+ }
+ /* Can't write directories. */
+ if (S_ISDIR(inode->i_mode) && (flags & FMODE_WRITE)) {
+ retval = -EISDIR;
+ goto out_dentry;
+ }
+ fd = get_unused_fd_flags(d_flags);
+ if (fd < 0) {
+ retval = fd;
+ goto out_dentry;
+ }
+ filp = dentry_open(dget(dentry), mntget(mnt),
+ d_flags, current_cred());
+ if (IS_ERR(filp)) {
+ put_unused_fd(fd);
+ retval = PTR_ERR(filp);
+ goto out_dentry;
+ }
+ if (inode->i_mode & S_IFREG) {
+ filp->f_flags |= O_NOATIME;
+ filp->f_mode |= FMODE_NOCMTIME;
+ }
+ fsnotify_open(filp->f_path.dentry);
+ fd_install(fd, filp);
+ retval = fd;
+
+out_dentry:
+ dput(dentry);
+out_mnt:
+ kfree(handle);
+ mntput(mnt);
+out_err:
+ return retval;
+}
+
+SYSCALL_DEFINE2(open_by_handle, struct file_handle __user *, handle,
+ int, flags)
+{
+ long ret;
+
+ if (force_o_largefile())
+ flags |= O_LARGEFILE;
+
+ ret = do_sys_open_by_handle(handle, flags);
+
+ /* avoid REGPARM breakage on x86: */
+ asmlinkage_protect(2, ret, handle, flags);
+ return ret;
+}
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 05b441d..a853aa0 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -4,6 +4,7 @@
#include <linux/dcache.h>
#include <linux/linkage.h>
#include <linux/path.h>
+#include <asm-generic/fcntl.h>
struct vfsmount;
@@ -96,4 +97,27 @@ static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
((char *) name)[min(len, maxlen)] = '\0';
}
+/*
+ * Note that while the flag value (low two bits) for sys_open means:
+ * 00 - read-only
+ * 01 - write-only
+ * 10 - read-write
+ * 11 - special
+ * it is changed into
+ * 00 - no permissions needed
+ * 01 - read-permission
+ * 10 - write-permission
+ * 11 - read-write
+ * for the internal routines (ie open_namei()/follow_link() etc)
+ * This is more logical, and also allows the 00 "no perm needed"
+ * to be used for symlinks (where the permissions are checked
+ * later).
+ *
+*/
+static inline int open_to_namei_flags(int flag)
+{
+ if ((flag+1) & O_ACCMODE)
+ flag++;
+ return flag;
+}
#endif /* _LINUX_NAMEI_H */
--
1.7.1.78.g212f0
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH -V7 4/9] vfs: Add open by file handle support
2010-05-12 15:50 ` [PATCH -V7 4/9] vfs: Add open by file handle support Aneesh Kumar K.V
@ 2010-05-12 23:44 ` Neil Brown
2010-05-13 6:09 ` Dave Chinner
0 siblings, 1 reply; 7+ messages in thread
From: Neil Brown @ 2010-05-12 23:44 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: hch, viro, adilger, corbet, serue, linux-fsdevel, sfrench,
philippe.deniel, linux-kernel
On Wed, 12 May 2010 21:20:39 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> Acked-by: Serge Hallyn <serue@us.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
> fs/namei.c | 24 ---------
> fs/open.c | 136 +++++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/namei.h | 24 +++++++++
> 3 files changed, 160 insertions(+), 24 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index a7dce91..a18711e 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1521,30 +1521,6 @@ out_unlock:
> return may_open(&nd->path, 0, open_flag & ~O_TRUNC);
> }
>
> -/*
> - * Note that while the flag value (low two bits) for sys_open means:
> - * 00 - read-only
> - * 01 - write-only
> - * 10 - read-write
> - * 11 - special
> - * it is changed into
> - * 00 - no permissions needed
> - * 01 - read-permission
> - * 10 - write-permission
> - * 11 - read-write
> - * for the internal routines (ie open_namei()/follow_link() etc)
> - * This is more logical, and also allows the 00 "no perm needed"
> - * to be used for symlinks (where the permissions are checked
> - * later).
> - *
> -*/
> -static inline int open_to_namei_flags(int flag)
> -{
> - if ((flag+1) & O_ACCMODE)
> - flag++;
> - return flag;
> -}
> -
> static int open_will_truncate(int flag, struct inode *inode)
> {
> /*
> diff --git a/fs/open.c b/fs/open.c
> index 9a34b81..348a1b9 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -1315,3 +1315,139 @@ err_out:
> asmlinkage_protect(4, ret, dfd, name, handle, flag);
> return ret;
> }
> +
> +static int vfs_dentry_acceptable(void *context, struct dentry *dentry)
> +{
> + return 1;
> +}
> +
> +static struct dentry *handle_to_dentry(struct vfsmount *mnt,
> + struct file_handle *handle)
> +{
> + int handle_size;
> + struct dentry *dentry;
> +
> + /* change the handle size to multiple of sizeof(u32) */
> + handle_size = handle->handle_size >> 2;
> + dentry = exportfs_decode_fh(mnt, (struct fid *)handle->f_handle,
> + handle_size, handle->handle_type,
> + vfs_dentry_acceptable, NULL);
> + return dentry;
> +}
> +
> +static long do_sys_open_by_handle(struct file_handle __user *ufh, int flags)
> +{
> + int fd;
> + int retval = 0;
> + int d_flags = flags;
> + struct file *filp;
> + struct vfsmount *mnt;
> + struct inode *inode;
> + struct dentry *dentry;
> + struct file_handle f_handle;
> + struct file_handle *handle = NULL;
> +
> + if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle))) {
> + retval = -EFAULT;
> + goto out_err;
> + }
> + if ((f_handle.handle_size > MAX_HANDLE_SZ) ||
> + (f_handle.handle_size <= 0)) {
> + retval = -EINVAL;
> + goto out_err;
> + }
> + if (!capable(CAP_DAC_OVERRIDE)) {
> + retval = -EPERM;
> + goto out_err;
> + }
> + /*
> + * Find the vfsmount for this uuid in the
> + * current namespace
> + */
> + mnt = fs_get_vfsmount(current, &f_handle.fsid);
> + if (!mnt) {
> + retval = -ESTALE;
> + goto out_err;
> + }
> +
> + handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_size,
> + GFP_KERNEL);
> + if (!handle) {
> + retval = -ENOMEM;
> + goto out_mnt;
> + }
> + /* copy the full handle */
> + if (copy_from_user(handle, ufh,
> + sizeof(struct file_handle) +
> + f_handle.handle_size)) {
> + retval = -EFAULT;
> + goto out_mnt;
> + }
> + dentry = handle_to_dentry(mnt, handle);
> + if (IS_ERR(dentry)) {
> + retval = PTR_ERR(dentry);
> + goto out_mnt;
> + }
> + inode = dentry->d_inode;
> + flags = open_to_namei_flags(flags);
> + /* O_TRUNC implies we need access checks for write permissions */
> + if (flags & O_TRUNC)
> + flags |= MAY_WRITE;
> +
> + if ((!(flags & O_APPEND) || (flags & O_TRUNC)) &&
> + (flags & FMODE_WRITE) && IS_APPEND(inode)) {
> + retval = -EPERM;
> + goto out_dentry;
> + }
> + if ((flags & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
> + retval = -EACCES;
> + goto out_dentry;
> + }
> + /* Can't write directories. */
> + if (S_ISDIR(inode->i_mode) && (flags & FMODE_WRITE)) {
> + retval = -EISDIR;
> + goto out_dentry;
> + }
Including all these checks inline here seems error prone. Can you not just
use finish_open ?? It might do more than you need, but it would be more
obvious that you didn't forget anything..
> + fd = get_unused_fd_flags(d_flags);
> + if (fd < 0) {
> + retval = fd;
> + goto out_dentry;
> + }
> + filp = dentry_open(dget(dentry), mntget(mnt),
> + d_flags, current_cred());
> + if (IS_ERR(filp)) {
> + put_unused_fd(fd);
> + retval = PTR_ERR(filp);
> + goto out_dentry;
> + }
> + if (inode->i_mode & S_IFREG) {
I suspect this is not the test you want. It tests for IFREG or IFLNK or
IFSOCK.
> + filp->f_flags |= O_NOATIME;
> + filp->f_mode |= FMODE_NOCMTIME;
> + }
I think you need a comment here explaining the rational for these setting.
Why is O_NOATIME important IFREG but not for IFDIR?
Why is it not sufficient to honour O_NOATIME that is passed in.
How can you ever justify setting FMODE_NOCMTIME ?
I guess you are just copying from xfs code, but it still needs justification.
NeilBrown
> + fsnotify_open(filp->f_path.dentry);
> + fd_install(fd, filp);
> + retval = fd;
> +
> +out_dentry:
> + dput(dentry);
> +out_mnt:
> + kfree(handle);
> + mntput(mnt);
> +out_err:
> + return retval;
> +}
> +
> +SYSCALL_DEFINE2(open_by_handle, struct file_handle __user *, handle,
> + int, flags)
> +{
> + long ret;
> +
> + if (force_o_largefile())
> + flags |= O_LARGEFILE;
> +
> + ret = do_sys_open_by_handle(handle, flags);
> +
> + /* avoid REGPARM breakage on x86: */
> + asmlinkage_protect(2, ret, handle, flags);
> + return ret;
> +}
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 05b441d..a853aa0 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -4,6 +4,7 @@
> #include <linux/dcache.h>
> #include <linux/linkage.h>
> #include <linux/path.h>
> +#include <asm-generic/fcntl.h>
>
> struct vfsmount;
>
> @@ -96,4 +97,27 @@ static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
> ((char *) name)[min(len, maxlen)] = '\0';
> }
>
> +/*
> + * Note that while the flag value (low two bits) for sys_open means:
> + * 00 - read-only
> + * 01 - write-only
> + * 10 - read-write
> + * 11 - special
> + * it is changed into
> + * 00 - no permissions needed
> + * 01 - read-permission
> + * 10 - write-permission
> + * 11 - read-write
> + * for the internal routines (ie open_namei()/follow_link() etc)
> + * This is more logical, and also allows the 00 "no perm needed"
> + * to be used for symlinks (where the permissions are checked
> + * later).
> + *
> +*/
> +static inline int open_to_namei_flags(int flag)
> +{
> + if ((flag+1) & O_ACCMODE)
> + flag++;
> + return flag;
> +}
> #endif /* _LINUX_NAMEI_H */
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH -V7 4/9] vfs: Add open by file handle support
2010-05-12 23:44 ` Neil Brown
@ 2010-05-13 6:09 ` Dave Chinner
2010-05-13 6:37 ` Aneesh Kumar K. V
0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2010-05-13 6:09 UTC (permalink / raw)
To: Neil Brown
Cc: Aneesh Kumar K.V, hch, viro, adilger, corbet, serue,
linux-fsdevel, sfrench, philippe.deniel, linux-kernel
On Thu, May 13, 2010 at 09:44:22AM +1000, Neil Brown wrote:
> On Wed, 12 May 2010 21:20:39 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
>
> > + filp->f_flags |= O_NOATIME;
> > + filp->f_mode |= FMODE_NOCMTIME;
> > + }
>
> I think you need a comment here explaining the rational for these setting.
If you've never seen how applications use the XFS handle interface
in conjunction with other XFS functionality, then I guess if would
seem like bad voodoo.
> Why is O_NOATIME important IFREG but not for IFDIR?
No application has ever required directory access or modification
via the handle interface to be invisible to the rest of the system.
> Why is it not sufficient to honour O_NOATIME that is passed in.
Because the XFS handle library is cross platform and predates
O_NOATIME on linux. Hence the library it has never set that flag and
always relied on the kernel implementation of the API to ensure
atime was never updated on fds derived from handles..
> How can you ever justify setting FMODE_NOCMTIME ?
Quite easily. ;)
The XFS handle interface was designed specifically to allow
applications to execute silent/invisible movement of data in, out
and around the filesystem without leaving user visible traces in
file metadata. This enables backup or filesysetm utilities that
operate on active filesystems need to be able to access or modify
inodes and data without affecting running applications. It's a
feature of the handle interface, and used by xfs_dump, xfs_fsr,
SGI's HSM, etc to do stuff that isn't otherwise possible.
FWIW, if you are curious, here's the initial commit of the XFS
handle code into Irix tree from 3 Sep 1994, showing that the initial
XFS open_by_handle() implementation sets the FINVIS flag to trigger
invisible IO semantics:
http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=commitdiff;h=575b66fae833429a51fcadb204d45521c2dfc26f
> I guess you are just copying from xfs code, but it still needs justification.
"They are intended for use by a limited set of system
utilities such as backup programs."
- open_by_handle(3) man page
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH -V7 4/9] vfs: Add open by file handle support
2010-05-13 6:09 ` Dave Chinner
@ 2010-05-13 6:37 ` Aneesh Kumar K. V
2010-05-14 10:41 ` Dave Chinner
0 siblings, 1 reply; 7+ messages in thread
From: Aneesh Kumar K. V @ 2010-05-13 6:37 UTC (permalink / raw)
To: Dave Chinner, Neil Brown
Cc: hch, viro, adilger, corbet, serue, linux-fsdevel, sfrench,
philippe.deniel, linux-kernel
On Thu, 13 May 2010 16:09:55 +1000, Dave Chinner <david@fromorbit.com> wrote:
> On Thu, May 13, 2010 at 09:44:22AM +1000, Neil Brown wrote:
> > On Wed, 12 May 2010 21:20:39 +0530
> > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> >
> > > + filp->f_flags |= O_NOATIME;
> > > + filp->f_mode |= FMODE_NOCMTIME;
> > > + }
> >
> > I think you need a comment here explaining the rational for these setting.
>
> If you've never seen how applications use the XFS handle interface
> in conjunction with other XFS functionality, then I guess if would
> seem like bad voodoo.
>
> > Why is O_NOATIME important IFREG but not for IFDIR?
>
> No application has ever required directory access or modification
> via the handle interface to be invisible to the rest of the system.
>
> > Why is it not sufficient to honour O_NOATIME that is passed in.
>
> Because the XFS handle library is cross platform and predates
> O_NOATIME on linux. Hence the library it has never set that flag and
> always relied on the kernel implementation of the API to ensure
> atime was never updated on fds derived from handles..
>
> > How can you ever justify setting FMODE_NOCMTIME ?
>
> Quite easily. ;)
>
> The XFS handle interface was designed specifically to allow
> applications to execute silent/invisible movement of data in, out
> and around the filesystem without leaving user visible traces in
> file metadata. This enables backup or filesysetm utilities that
> operate on active filesystems need to be able to access or modify
> inodes and data without affecting running applications. It's a
> feature of the handle interface, and used by xfs_dump, xfs_fsr,
> SGI's HSM, etc to do stuff that isn't otherwise possible.
>
> FWIW, if you are curious, here's the initial commit of the XFS
> handle code into Irix tree from 3 Sep 1994, showing that the initial
> XFS open_by_handle() implementation sets the FINVIS flag to trigger
> invisible IO semantics:
>
> http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=commitdiff;h=575b66fae833429a51fcadb204d45521c2dfc26f
Thanks for sharing this. I haven't looked at the details you mentioned here.
>
> > I guess you are just copying from xfs code, but it still needs justification.
>
> "They are intended for use by a limited set of system
> utilities such as backup programs."
>
> - open_by_handle(3) man page
>
Should we retain all the above behaviour in the new syscall ?. Or just
do what a normal open(2) call does ?
-aneesh
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH -V7 4/9] vfs: Add open by file handle support
2010-05-13 6:37 ` Aneesh Kumar K. V
@ 2010-05-14 10:41 ` Dave Chinner
0 siblings, 0 replies; 7+ messages in thread
From: Dave Chinner @ 2010-05-14 10:41 UTC (permalink / raw)
To: Aneesh Kumar K. V
Cc: Neil Brown, hch, viro, adilger, corbet, serue, linux-fsdevel,
sfrench, philippe.deniel, linux-kernel
On Thu, May 13, 2010 at 12:07:02PM +0530, Aneesh Kumar K. V wrote:
> On Thu, 13 May 2010 16:09:55 +1000, Dave Chinner <david@fromorbit.com> wrote:
> > On Thu, May 13, 2010 at 09:44:22AM +1000, Neil Brown wrote:
> > > On Wed, 12 May 2010 21:20:39 +0530
> > > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> > >
> > > > + filp->f_flags |= O_NOATIME;
> > > > + filp->f_mode |= FMODE_NOCMTIME;
> > > > + }
> > >
> > > I think you need a comment here explaining the rational for these setting.
> >
> > If you've never seen how applications use the XFS handle interface
> > in conjunction with other XFS functionality, then I guess if would
> > seem like bad voodoo.
> >
> > > Why is O_NOATIME important IFREG but not for IFDIR?
> >
> > No application has ever required directory access or modification
> > via the handle interface to be invisible to the rest of the system.
> >
> > > Why is it not sufficient to honour O_NOATIME that is passed in.
> >
> > Because the XFS handle library is cross platform and predates
> > O_NOATIME on linux. Hence the library it has never set that flag and
> > always relied on the kernel implementation of the API to ensure
> > atime was never updated on fds derived from handles..
> >
> > > How can you ever justify setting FMODE_NOCMTIME ?
> >
> > Quite easily. ;)
> >
> > The XFS handle interface was designed specifically to allow
> > applications to execute silent/invisible movement of data in, out
> > and around the filesystem without leaving user visible traces in
> > file metadata. This enables backup or filesysetm utilities that
> > operate on active filesystems need to be able to access or modify
> > inodes and data without affecting running applications. It's a
> > feature of the handle interface, and used by xfs_dump, xfs_fsr,
> > SGI's HSM, etc to do stuff that isn't otherwise possible.
> >
> > FWIW, if you are curious, here's the initial commit of the XFS
> > handle code into Irix tree from 3 Sep 1994, showing that the initial
> > XFS open_by_handle() implementation sets the FINVIS flag to trigger
> > invisible IO semantics:
> >
> > http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=commitdiff;h=575b66fae833429a51fcadb204d45521c2dfc26f
>
> Thanks for sharing this. I haven't looked at the details you mentioned here.
>
> >
> > > I guess you are just copying from xfs code, but it still needs justification.
> >
> > "They are intended for use by a limited set of system
> > utilities such as backup programs."
> >
> > - open_by_handle(3) man page
> >
>
> Should we retain all the above behaviour in the new syscall ?. Or just
> do what a normal open(2) call does ?
I'm not sure that FMODE_NOCMTIME can be set from userspace at the
moment. In fs.h:
82 /*
83 * Don't update ctime and mtime.
84 *
85 * Currently a special hack for the XFS open_by_handle ioctl, but we'll
86 * hopefully graduate it to a proper O_CMTIME flag supported by open(2) soon.
87 */
88 #define FMODE_NOCMTIME ((__force fmode_t)0x800)
Perhaps we need to introduce O_NOCMTIME as the comment suggests, and
then the new handle code doesn't need to automatically set it. If
libhandle is converted, then it could set the open flags as
necessary...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-05-16 7:24 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-14 19:56 [PATCH -V7 4/9] vfs: Add open by file handle support Steve French
2010-05-16 7:24 ` Aneesh Kumar K. V
-- strict thread matches above, loose matches on Subject: below --
2010-05-12 15:50 [PATCH -V7 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
2010-05-12 15:50 ` [PATCH -V7 4/9] vfs: Add open by file handle support Aneesh Kumar K.V
2010-05-12 23:44 ` Neil Brown
2010-05-13 6:09 ` Dave Chinner
2010-05-13 6:37 ` Aneesh Kumar K. V
2010-05-14 10:41 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).