* [PATCH 1/2] [RFC] vfs: 'stat light' fstatat flags
@ 2009-04-07 8:00 Mark Fasheh
2009-04-07 10:23 ` Andreas Dilger
2009-04-07 17:42 ` Jamie Lokier
0 siblings, 2 replies; 15+ messages in thread
From: Mark Fasheh @ 2009-04-07 8:00 UTC (permalink / raw)
To: linux-fsdevel; +Cc: Sage Weil, Trond Myklebust, Andreas Dilger
This very, very rough patch set adds three flags to fstatat - AT_NO_SIZE,
AT_NO_TIMES, and AT_STRICT.
The first two flags (AT_NO_SIZE, AT_NO_TIMES) allow userspace to notify the
file system layer that certain stat fields are not required to be accurate.
Some file systems want this information in order to optimize away expensive
operations associated with stat. In particular, NFS can avoid some syncing
to the server (if userspace doesn't want atime, ctime or mtime) and Lustre
can avoid some expensive locking by avoiding an update of various size
fields (st_size, st_blocks).
AT_STRICT allows userspace to indicate that it wants the most up to date
version of a files status, regardless of performance impact. A distributed
file system which has a non-coherent inode cache would know then to send a
direct query to it's server.
As noted previously, these patches are really rough. Mostly I'd like to get
some feedback on the interface and general direction of implementation. Some
glaring issues which we want to resolve:
- This patch set doesn't actually wire up any client file systems yet :)
- There's the question of whether we wire up [fl]stat(2) variants instead of
using fstatat. I went with the former as the implementation was more
straight forward.
- I'm not sure whether we want to force zeroing of the optional fields, or
return whatever's in the inode (which may be stale, or just junk).
- Testing has been very light (compiles and boots on x86_64). There really
isn't a whole lot to test yet anyway.
- There's probably other issues I'm missing :)
Finally, credit should be given to Sage Weil, Ted Tso, Andreas Dilger,
Russell Cattelan, Trond Myklebust and others at LSF who participated in a
lively discussion on this topic :)
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
arch/arm/kernel/sys_oabi-compat.c | 4 +-
arch/s390/kernel/compat_linux.c | 4 +-
arch/sparc/kernel/sys_sparc32.c | 4 +-
arch/x86/ia32/sys_ia32.c | 4 +-
drivers/block/loop.c | 2 +-
fs/compat.c | 8 +++---
fs/libfs.c | 2 +-
fs/nfsd/nfs3proc.c | 2 +-
fs/nfsd/nfs3xdr.c | 4 +-
fs/nfsd/nfs4xdr.c | 5 ++-
fs/nfsd/nfsproc.c | 6 ++--
fs/nfsd/nfsxdr.c | 2 +-
fs/stat.c | 44 +++++++++++++++++++++---------------
include/linux/fcntl.h | 6 +++++
include/linux/fs.h | 12 ++++++----
15 files changed, 63 insertions(+), 46 deletions(-)
diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index 42623db..b48a9ac 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -182,9 +182,9 @@ asmlinkage long sys_oabi_fstatat64(int dfd,
goto out;
if (flag & AT_SYMLINK_NOFOLLOW)
- error = vfs_lstat_fd(dfd, filename, &stat);
+ error = vfs_lstat_fd(dfd, filename, &stat, flag);
else
- error = vfs_stat_fd(dfd, filename, &stat);
+ error = vfs_stat_fd(dfd, filename, &stat, flag);
if (!error)
error = cp_oldabi_stat64(&stat, statbuf);
diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
index 6cc87d8..de65f40 100644
--- a/arch/s390/kernel/compat_linux.c
+++ b/arch/s390/kernel/compat_linux.c
@@ -708,9 +708,9 @@ asmlinkage long sys32_fstatat64(unsigned int dfd, char __user *filename,
goto out;
if (flag & AT_SYMLINK_NOFOLLOW)
- error = vfs_lstat_fd(dfd, filename, &stat);
+ error = vfs_lstat_fd(dfd, filename, &stat, flag);
else
- error = vfs_stat_fd(dfd, filename, &stat);
+ error = vfs_stat_fd(dfd, filename, &stat, flag);
if (!error)
error = cp_stat64(statbuf, &stat);
diff --git a/arch/sparc/kernel/sys_sparc32.c b/arch/sparc/kernel/sys_sparc32.c
index e800503..387823e 100644
--- a/arch/sparc/kernel/sys_sparc32.c
+++ b/arch/sparc/kernel/sys_sparc32.c
@@ -212,9 +212,9 @@ asmlinkage long compat_sys_fstatat64(unsigned int dfd, char __user *filename,
goto out;
if (flag & AT_SYMLINK_NOFOLLOW)
- error = vfs_lstat_fd(dfd, filename, &stat);
+ error = vfs_lstat_fd(dfd, filename, &stat, flag);
else
- error = vfs_stat_fd(dfd, filename, &stat);
+ error = vfs_stat_fd(dfd, filename, &stat, flag);
if (!error)
error = cp_compat_stat64(&stat, statbuf);
diff --git a/arch/x86/ia32/sys_ia32.c b/arch/x86/ia32/sys_ia32.c
index 6c0d7f6..79eb683 100644
--- a/arch/x86/ia32/sys_ia32.c
+++ b/arch/x86/ia32/sys_ia32.c
@@ -135,9 +135,9 @@ asmlinkage long sys32_fstatat(unsigned int dfd, char __user *filename,
goto out;
if (flag & AT_SYMLINK_NOFOLLOW)
- error = vfs_lstat_fd(dfd, filename, &stat);
+ error = vfs_lstat_fd(dfd, filename, &stat, flag);
else
- error = vfs_stat_fd(dfd, filename, &stat);
+ error = vfs_stat_fd(dfd, filename, &stat, flag);
if (!error)
error = cp_stat64(statbuf, &stat);
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index bf03455..02d3b47 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1027,7 +1027,7 @@ loop_get_status(struct loop_device *lo, struct loop_info64 *info)
if (lo->lo_state != Lo_bound)
return -ENXIO;
- error = vfs_getattr(file->f_path.mnt, file->f_path.dentry, &stat);
+ error = vfs_getattr(file->f_path.mnt, file->f_path.dentry, &stat, 0);
if (error)
return error;
memset(info, 0, sizeof(*info));
diff --git a/fs/compat.c b/fs/compat.c
index d0145ca..7549b3f 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -180,7 +180,7 @@ asmlinkage long compat_sys_newstat(char __user * filename,
struct compat_stat __user *statbuf)
{
struct kstat stat;
- int error = vfs_stat_fd(AT_FDCWD, filename, &stat);
+ int error = vfs_stat_fd(AT_FDCWD, filename, &stat, 0);
if (!error)
error = cp_compat_stat(&stat, statbuf);
@@ -191,7 +191,7 @@ asmlinkage long compat_sys_newlstat(char __user * filename,
struct compat_stat __user *statbuf)
{
struct kstat stat;
- int error = vfs_lstat_fd(AT_FDCWD, filename, &stat);
+ int error = vfs_lstat_fd(AT_FDCWD, filename, &stat, 0);
if (!error)
error = cp_compat_stat(&stat, statbuf);
@@ -209,9 +209,9 @@ asmlinkage long compat_sys_newfstatat(unsigned int dfd, char __user *filename,
goto out;
if (flag & AT_SYMLINK_NOFOLLOW)
- error = vfs_lstat_fd(dfd, filename, &stat);
+ error = vfs_lstat_fd(dfd, filename, &stat, flag);
else
- error = vfs_stat_fd(dfd, filename, &stat);
+ error = vfs_stat_fd(dfd, filename, &stat, flag);
if (!error)
error = cp_compat_stat(&stat, statbuf);
diff --git a/fs/libfs.c b/fs/libfs.c
index 49b4409..cbde1fb 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -13,7 +13,7 @@
#include <asm/uaccess.h>
int simple_getattr(struct vfsmount *mnt, struct dentry *dentry,
- struct kstat *stat)
+ struct kstat *stat, int flags)
{
struct inode *inode = dentry->d_inode;
generic_fillattr(inode, stat);
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index 9dbd2eb..9d56561 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -69,7 +69,7 @@ nfsd3_proc_getattr(struct svc_rqst *rqstp, struct nfsd_fhandle *argp,
RETURN_STATUS(nfserr);
err = vfs_getattr(resp->fh.fh_export->ex_path.mnt,
- resp->fh.fh_dentry, &resp->stat);
+ resp->fh.fh_dentry, &resp->stat, 0);
nfserr = nfserrno(err);
RETURN_STATUS(nfserr);
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 17d0dd9..b19d320 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -218,7 +218,7 @@ encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp)
int err;
struct kstat stat;
- err = vfs_getattr(fhp->fh_export->ex_path.mnt, dentry, &stat);
+ err = vfs_getattr(fhp->fh_export->ex_path.mnt, dentry, &stat, 0);
if (!err) {
*p++ = xdr_one; /* attributes follow */
lease_get_mtime(dentry->d_inode, &stat.mtime);
@@ -271,7 +271,7 @@ void fill_post_wcc(struct svc_fh *fhp)
printk("nfsd: inode locked twice during operation.\n");
err = vfs_getattr(fhp->fh_export->ex_path.mnt, fhp->fh_dentry,
- &fhp->fh_post_attr);
+ &fhp->fh_post_attr, 0);
if (err)
fhp->fh_post_saved = 0;
else
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 9250067..0a29e85 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1436,7 +1436,7 @@ nfsd4_encode_fattr(struct svc_fh *fhp, struct svc_export *exp,
goto out;
}
- err = vfs_getattr(exp->ex_path.mnt, dentry, &stat);
+ err = vfs_getattr(exp->ex_path.mnt, dentry, &stat, 0);
if (err)
goto out_nfserr;
if ((bmval0 & (FATTR4_WORD0_FILES_FREE | FATTR4_WORD0_FILES_TOTAL |
@@ -1795,7 +1795,8 @@ out_acl:
if (ignore_crossmnt == 0 &&
exp->ex_path.mnt->mnt_root->d_inode == dentry->d_inode) {
err = vfs_getattr(exp->ex_path.mnt->mnt_parent,
- exp->ex_path.mnt->mnt_mountpoint, &stat);
+ exp->ex_path.mnt->mnt_mountpoint,
+ &stat, 0);
if (err)
goto out_nfserr;
}
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index 6f7f263..2f7c86d 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -43,7 +43,7 @@ nfsd_return_attrs(__be32 err, struct nfsd_attrstat *resp)
if (err) return err;
return nfserrno(vfs_getattr(resp->fh.fh_export->ex_path.mnt,
resp->fh.fh_dentry,
- &resp->stat));
+ &resp->stat, 0));
}
static __be32
nfsd_return_dirop(__be32 err, struct nfsd_diropres *resp)
@@ -51,7 +51,7 @@ nfsd_return_dirop(__be32 err, struct nfsd_diropres *resp)
if (err) return err;
return nfserrno(vfs_getattr(resp->fh.fh_export->ex_path.mnt,
resp->fh.fh_dentry,
- &resp->stat));
+ &resp->stat, 0));
}
/*
* Get a file's attributes
@@ -167,7 +167,7 @@ nfsd_proc_read(struct svc_rqst *rqstp, struct nfsd_readargs *argp,
if (nfserr) return nfserr;
return nfserrno(vfs_getattr(resp->fh.fh_export->ex_path.mnt,
resp->fh.fh_dentry,
- &resp->stat));
+ &resp->stat, 0));
}
/*
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c
index afd08e2..25dc0d0 100644
--- a/fs/nfsd/nfsxdr.c
+++ b/fs/nfsd/nfsxdr.c
@@ -207,7 +207,7 @@ encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp,
__be32 *nfs2svc_encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp)
{
struct kstat stat;
- vfs_getattr(fhp->fh_export->ex_path.mnt, fhp->fh_dentry, &stat);
+ vfs_getattr(fhp->fh_export->ex_path.mnt, fhp->fh_dentry, &stat, 0);
return encode_fattr(rqstp, p, fhp, &stat);
}
diff --git a/fs/stat.c b/fs/stat.c
index 2db740a..624af05 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -37,17 +37,25 @@ void generic_fillattr(struct inode *inode, struct kstat *stat)
EXPORT_SYMBOL(generic_fillattr);
-int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
+int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat,
+ int flags)
{
struct inode *inode = dentry->d_inode;
int retval;
+ int attr_flags = ATTR_STAT_ALL;
retval = security_inode_getattr(mnt, dentry);
if (retval)
return retval;
+ if (flags & AT_NO_SIZE)
+ attr_flags &= ~ATTR_SIZE;
+
+ if (flags & AT_NO_TIMES)
+ attr_flags &= ~(ATTR_MTIME|ATTR_CTIME|ATTR_ATIME);
+
if (inode->i_op->getattr)
- return inode->i_op->getattr(mnt, dentry, stat);
+ return inode->i_op->getattr(mnt, dentry, stat, attr_flags);
generic_fillattr(inode, stat);
return 0;
@@ -55,14 +63,14 @@ int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
EXPORT_SYMBOL(vfs_getattr);
-int vfs_stat_fd(int dfd, char __user *name, struct kstat *stat)
+int vfs_stat_fd(int dfd, char __user *name, struct kstat *stat, int flags)
{
struct path path;
int error;
error = user_path_at(dfd, name, LOOKUP_FOLLOW, &path);
if (!error) {
- error = vfs_getattr(path.mnt, path.dentry, stat);
+ error = vfs_getattr(path.mnt, path.dentry, stat, flags);
path_put(&path);
}
return error;
@@ -70,19 +78,19 @@ int vfs_stat_fd(int dfd, char __user *name, struct kstat *stat)
int vfs_stat(char __user *name, struct kstat *stat)
{
- return vfs_stat_fd(AT_FDCWD, name, stat);
+ return vfs_stat_fd(AT_FDCWD, name, stat, 0);
}
EXPORT_SYMBOL(vfs_stat);
-int vfs_lstat_fd(int dfd, char __user *name, struct kstat *stat)
+int vfs_lstat_fd(int dfd, char __user *name, struct kstat *stat, int flags)
{
struct path path;
int error;
error = user_path_at(dfd, name, 0, &path);
if (!error) {
- error = vfs_getattr(path.mnt, path.dentry, stat);
+ error = vfs_getattr(path.mnt, path.dentry, stat, flags);
path_put(&path);
}
return error;
@@ -90,7 +98,7 @@ int vfs_lstat_fd(int dfd, char __user *name, struct kstat *stat)
int vfs_lstat(char __user *name, struct kstat *stat)
{
- return vfs_lstat_fd(AT_FDCWD, name, stat);
+ return vfs_lstat_fd(AT_FDCWD, name, stat, 0);
}
EXPORT_SYMBOL(vfs_lstat);
@@ -101,7 +109,7 @@ int vfs_fstat(unsigned int fd, struct kstat *stat)
int error = -EBADF;
if (f) {
- error = vfs_getattr(f->f_path.mnt, f->f_path.dentry, stat);
+ error = vfs_getattr(f->f_path.mnt, f->f_path.dentry, stat, 0);
fput(f);
}
return error;
@@ -155,7 +163,7 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta
SYSCALL_DEFINE2(stat, char __user *, filename, struct __old_kernel_stat __user *, statbuf)
{
struct kstat stat;
- int error = vfs_stat_fd(AT_FDCWD, filename, &stat);
+ int error = vfs_stat_fd(AT_FDCWD, filename, &stat, 0);
if (!error)
error = cp_old_stat(&stat, statbuf);
@@ -166,7 +174,7 @@ SYSCALL_DEFINE2(stat, char __user *, filename, struct __old_kernel_stat __user *
SYSCALL_DEFINE2(lstat, char __user *, filename, struct __old_kernel_stat __user *, statbuf)
{
struct kstat stat;
- int error = vfs_lstat_fd(AT_FDCWD, filename, &stat);
+ int error = vfs_lstat_fd(AT_FDCWD, filename, &stat, 0);
if (!error)
error = cp_old_stat(&stat, statbuf);
@@ -240,7 +248,7 @@ static int cp_new_stat(struct kstat *stat, struct stat __user *statbuf)
SYSCALL_DEFINE2(newstat, char __user *, filename, struct stat __user *, statbuf)
{
struct kstat stat;
- int error = vfs_stat_fd(AT_FDCWD, filename, &stat);
+ int error = vfs_stat_fd(AT_FDCWD, filename, &stat, 0);
if (!error)
error = cp_new_stat(&stat, statbuf);
@@ -251,7 +259,7 @@ SYSCALL_DEFINE2(newstat, char __user *, filename, struct stat __user *, statbuf)
SYSCALL_DEFINE2(newlstat, char __user *, filename, struct stat __user *, statbuf)
{
struct kstat stat;
- int error = vfs_lstat_fd(AT_FDCWD, filename, &stat);
+ int error = vfs_lstat_fd(AT_FDCWD, filename, &stat, 0);
if (!error)
error = cp_new_stat(&stat, statbuf);
@@ -266,13 +274,13 @@ SYSCALL_DEFINE4(newfstatat, int, dfd, char __user *, filename,
struct kstat stat;
int error = -EINVAL;
- if ((flag & ~AT_SYMLINK_NOFOLLOW) != 0)
+ if ((flag & ~AT_MASK) != 0)
goto out;
if (flag & AT_SYMLINK_NOFOLLOW)
- error = vfs_lstat_fd(dfd, filename, &stat);
+ error = vfs_lstat_fd(dfd, filename, &stat, flag);
else
- error = vfs_stat_fd(dfd, filename, &stat);
+ error = vfs_stat_fd(dfd, filename, &stat, flag);
if (!error)
error = cp_new_stat(&stat, statbuf);
@@ -410,9 +418,9 @@ SYSCALL_DEFINE4(fstatat64, int, dfd, char __user *, filename,
goto out;
if (flag & AT_SYMLINK_NOFOLLOW)
- error = vfs_lstat_fd(dfd, filename, &stat);
+ error = vfs_lstat_fd(dfd, filename, &stat, flag);
else
- error = vfs_stat_fd(dfd, filename, &stat);
+ error = vfs_stat_fd(dfd, filename, &stat, flag);
if (!error)
error = cp_new_stat64(&stat, statbuf);
diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h
index 8603740..b02127a 100644
--- a/include/linux/fcntl.h
+++ b/include/linux/fcntl.h
@@ -39,6 +39,12 @@
#define AT_REMOVEDIR 0x200 /* Remove directory instead of
unlinking file. */
#define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */
+#define AT_NO_SIZE 0x800 /* Do not return size or block count */
+#define AT_NO_TIMES 0x1000 /* Do not return [amc]time */
+#define AT_STRICT 0x2000 /* Guarantee correctness of field
+ * values */
+
+#define AT_MASK (AT_SYMLINK_NOFOLLOW|AT_NO_SIZE|AT_NO_TIMES)
#ifdef __KERNEL__
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 92734c0..2e099dd 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -366,6 +366,8 @@ typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
#define ATTR_OPEN (1 << 15) /* Truncating from open(O_TRUNC) */
#define ATTR_TIMES_SET (1 << 16)
+#define ATTR_STAT_ALL (ATTR_MODE|ATTR_UID|ATTR_GID|ATTR_SIZE|ATTR_ATIME|ATTR_MTIME|ATTR_CTIME)
+
/*
* This is the Inode Attributes structure, used for notify_change(). It
* uses the above definitions as flags, to know which values have changed.
@@ -1353,7 +1355,7 @@ struct inode_operations {
void (*truncate) (struct inode *);
int (*permission) (struct inode *, int);
int (*setattr) (struct dentry *, struct iattr *);
- int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);
+ int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *, int);
int (*setxattr) (struct dentry *, const char *,const void *,size_t,int);
ssize_t (*getxattr) (struct dentry *, const char *, void *, size_t);
ssize_t (*listxattr) (struct dentry *, char *, size_t);
@@ -2062,7 +2064,7 @@ extern int page_symlink(struct inode *inode, const char *symname, int len);
extern const struct inode_operations page_symlink_inode_operations;
extern int generic_readlink(struct dentry *, char __user *, int);
extern void generic_fillattr(struct inode *, struct kstat *);
-extern int vfs_getattr(struct vfsmount *, struct dentry *, struct kstat *);
+extern int vfs_getattr(struct vfsmount *, struct dentry *, struct kstat *, int);
void inode_add_bytes(struct inode *inode, loff_t bytes);
void inode_sub_bytes(struct inode *inode, loff_t bytes);
loff_t inode_get_bytes(struct inode *inode);
@@ -2072,8 +2074,8 @@ extern int vfs_readdir(struct file *, filldir_t, void *);
extern int vfs_stat(char __user *, struct kstat *);
extern int vfs_lstat(char __user *, struct kstat *);
-extern int vfs_stat_fd(int dfd, char __user *, struct kstat *);
-extern int vfs_lstat_fd(int dfd, char __user *, struct kstat *);
+extern int vfs_stat_fd(int dfd, char __user *, struct kstat *, int);
+extern int vfs_lstat_fd(int dfd, char __user *, struct kstat *, int);
extern int vfs_fstat(unsigned int, struct kstat *);
extern int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
@@ -2096,7 +2098,7 @@ extern int dcache_dir_open(struct inode *, struct file *);
extern int dcache_dir_close(struct inode *, struct file *);
extern loff_t dcache_dir_lseek(struct file *, loff_t, int);
extern int dcache_readdir(struct file *, void *, filldir_t);
-extern int simple_getattr(struct vfsmount *, struct dentry *, struct kstat *);
+extern int simple_getattr(struct vfsmount *, struct dentry *, struct kstat *, int);
extern int simple_statfs(struct dentry *, struct kstatfs *);
extern int simple_link(struct dentry *, struct inode *, struct dentry *);
extern int simple_unlink(struct inode *, struct dentry *);
--
1.5.6
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] [RFC] vfs: 'stat light' fstatat flags
2009-04-07 8:00 [PATCH 1/2] [RFC] vfs: 'stat light' fstatat flags Mark Fasheh
@ 2009-04-07 10:23 ` Andreas Dilger
2009-04-07 14:59 ` Oleg Drokin
` (3 more replies)
2009-04-07 17:42 ` Jamie Lokier
1 sibling, 4 replies; 15+ messages in thread
From: Andreas Dilger @ 2009-04-07 10:23 UTC (permalink / raw)
To: Mark Fasheh; +Cc: linux-fsdevel, Sage Weil, Trond Myklebust
On Apr 07, 2009 01:00 -0700, Mark Fasheh wrote:
> This very, very rough patch set adds three flags to fstatat - AT_NO_SIZE,
> AT_NO_TIMES, and AT_STRICT.
It seems you and Oleg have two patches crossing in the night...
> The first two flags (AT_NO_SIZE, AT_NO_TIMES) allow userspace to notify the
> file system layer that certain stat fields are not required to be accurate.
The problem with "AT_NO_*" is that old applications which couldn't possibly
know about or use a new stat field couldn't possibly know not to ask for
it. Instead, as was proposed in the last generation of this thread, there
should be AT_GET_{ATIME,MTIME,CTIME,BLOCKS,SIZE,NLINKS,...}, to return the
flags that the application actually wants. If none of them are specified,
then the current behaviour of "get all attributes" is kept.
> Some file systems want this information in order to optimize away expensive
> operations associated with stat. In particular, NFS can avoid some syncing
> to the server (if userspace doesn't want atime, ctime or mtime) and Lustre
> can avoid some expensive locking by avoiding an update of various size
> fields (st_size, st_blocks).
Actually, despite what was said today, Lustre doesn't revoke the writing
client's locks when getting the file size, unlike block-based filesystems.
That said, it is still relatively a lot of work to query the size for
a widely-striped file since there may be a hundred servers holding the
file data and any one of them might have the end-of-file.
> AT_STRICT allows userspace to indicate that it wants the most up to date
> version of a files status, regardless of performance impact. A distributed
> file system which has a non-coherent inode cache would know then to send a
> direct query to it's server.
The issue with AT_STRICT is that applications TODAY are expecting the
proper file status. I'd be more inclined to have an AT_LAZY flag for
applications that do NOT require the most up-to-date information.
> As noted previously, these patches are really rough. Mostly I'd like to get
> some feedback on the interface and general direction of implementation. Some
> glaring issues which we want to resolve:
>
> - This patch set doesn't actually wire up any client file systems yet :)
>
> - There's the question of whether we wire up [fl]stat(2) variants instead of
> using fstatat. I went with the former as the implementation was more
> straight forward.
It appears that fstatat(2) fits the need for "statlite" cleanly. I'd
thought today that this would require opening each file, but it is
only necessary to open the parent directory and call fstatat() for
each filename in the directory.
> - I'm not sure whether we want to force zeroing of the optional fields, or
> return whatever's in the inode (which may be stale, or just junk).
I think yes, since there have been security bugs filed in rare cases when
other bits of kernel data are exposed to user space, even if just a byte
or two.
> -int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
> +int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat,
> + int flags)
> {
> if (inode->i_op->getattr)
> - return inode->i_op->getattr(mnt, dentry, stat);
> + return inode->i_op->getattr(mnt, dentry, stat, attr_flags);
Oleg's version added a second ->getattr_lite() call that passed the flags,
which avoids changing all of the filesystems, though I guess it isn't a
huge deal either way.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] [RFC] vfs: 'stat light' fstatat flags
2009-04-07 10:23 ` Andreas Dilger
@ 2009-04-07 14:59 ` Oleg Drokin
2009-04-07 15:18 ` Sage Weil
` (2 subsequent siblings)
3 siblings, 0 replies; 15+ messages in thread
From: Oleg Drokin @ 2009-04-07 14:59 UTC (permalink / raw)
To: Andreas Dilger; +Cc: Mark Fasheh, linux-fsdevel, Sage Weil, Trond Myklebust
Hello!
On Apr 7, 2009, at 6:23 AM, Andreas Dilger wrote:
>> - I'm not sure whether we want to force zeroing of the optional
>> fields, or
>> return whatever's in the inode (which may be stale, or just junk).
> I think yes, since there have been security bugs filed in rare cases
> when
> other bits of kernel data are exposed to user space, even if just a
> byte
> or two.
Well, this is not just some random stale data from the stack, this is
just
stale information in inode, I presume.
>> -int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry,
>> struct kstat *stat)
>> +int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry,
>> struct kstat *stat,
>> + int flags)
>> {
>> if (inode->i_op->getattr)
>> - return inode->i_op->getattr(mnt, dentry, stat);
>> + return inode->i_op->getattr(mnt, dentry, stat, attr_flags);
> Oleg's version added a second ->getattr_lite() call that passed the
> flags,
> which avoids changing all of the filesystems, though I guess it
> isn't a
> huge deal either way.
Actually just changing the getattr for all filesystems is better
approach, I just
was too lazy to update all of the filesystems myself. Filesystems that
do not care
about this would just ignore the flags and return everything
regardless, so
there is absolutely no need for 2 functions that do the same thing.
Bye,
Oleg
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] [RFC] vfs: 'stat light' fstatat flags
2009-04-07 10:23 ` Andreas Dilger
2009-04-07 14:59 ` Oleg Drokin
@ 2009-04-07 15:18 ` Sage Weil
2009-04-07 15:29 ` Mark Fasheh
2009-04-07 23:13 ` Russell Cattelan
3 siblings, 0 replies; 15+ messages in thread
From: Sage Weil @ 2009-04-07 15:18 UTC (permalink / raw)
To: Andreas Dilger; +Cc: Mark Fasheh, linux-fsdevel, Trond Myklebust
> > - There's the question of whether we wire up [fl]stat(2) variants instead of
> > using fstatat. I went with the former as the implementation was more
> > straight forward.
>
> It appears that fstatat(2) fits the need for "statlite" cleanly. I'd
> thought today that this would require opening each file, but it is
> only necessary to open the parent directory and call fstatat() for
> each filename in the directory.
It's better than that... fstatat(2) actually captures all of stat(2),
lstat(2), and fstat(2), without opening any extra file handles. AT_FDCWD
for the fd makes things relative to the working directory, and
AT_SYMLINK_NOFOLLOW gives you the 'l' variant. That's how they're all
wired up in fs/stat.c anyway.
sage
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] [RFC] vfs: 'stat light' fstatat flags
2009-04-07 10:23 ` Andreas Dilger
2009-04-07 14:59 ` Oleg Drokin
2009-04-07 15:18 ` Sage Weil
@ 2009-04-07 15:29 ` Mark Fasheh
2009-04-07 17:52 ` Andreas Dilger
2009-04-07 23:13 ` Russell Cattelan
3 siblings, 1 reply; 15+ messages in thread
From: Mark Fasheh @ 2009-04-07 15:29 UTC (permalink / raw)
To: Andreas Dilger; +Cc: linux-fsdevel, Sage Weil, Trond Myklebust
On Tue, Apr 07, 2009 at 03:23:32AM -0700, Andreas Dilger wrote:
> On Apr 07, 2009 01:00 -0700, Mark Fasheh wrote:
> > This very, very rough patch set adds three flags to fstatat - AT_NO_SIZE,
> > AT_NO_TIMES, and AT_STRICT.
>
> It seems you and Oleg have two patches crossing in the night...
Heh, it seems so. At least we went from having zero interest in implementing
this to two patches! At any rate, thanks for taking the time to review them.
> > The first two flags (AT_NO_SIZE, AT_NO_TIMES) allow userspace to notify the
> > file system layer that certain stat fields are not required to be accurate.
>
> The problem with "AT_NO_*" is that old applications which couldn't possibly
> know about or use a new stat field couldn't possibly know not to ask for
> it.
Well, if we add stat fields, we need new system calls _anyway_ because
struct stat has to undergo a change. At that point, adding a new flag to the
modified fstatat, or just changing the behavior of an existing flag in a
backwards compatible manner should be quite easy in comparison.
> Instead, as was proposed in the last generation of this thread, there
> should be AT_GET_{ATIME,MTIME,CTIME,BLOCKS,SIZE,NLINKS,...}, to return the
> flags that the application actually wants.
Well, I respectfully disagree. In my opinion, this adds unncessary
complexity to an already targetted case, where only a handful of file
systems will even care to optimize away a couple of fields. In that light, I
really think we want simplicity here. The good news is that we still have a
very good amount of space in the flags field. We could always go back if we
_really_ find the need to micro-optimize to the point of per stat field
flags. Again though, at this point I don't really think starting out with a
slew of flags is a good idea.
> If none of them are specified, then the current behaviour of "get all
> attributes" is kept.
This of course, is how it works today - if none of the new flags are
specified you get the exact old behavior. In fact, most file systems will
wind up just happily ignoring the new flags.
>
> > Some file systems want this information in order to optimize away expensive
> > operations associated with stat. In particular, NFS can avoid some syncing
> > to the server (if userspace doesn't want atime, ctime or mtime) and Lustre
> > can avoid some expensive locking by avoiding an update of various size
> > fields (st_size, st_blocks).
>
> Actually, despite what was said today, Lustre doesn't revoke the writing
> client's locks when getting the file size, unlike block-based filesystems.
> That said, it is still relatively a lot of work to query the size for
> a widely-striped file since there may be a hundred servers holding the
> file data and any one of them might have the end-of-file.
Ahh ok, but the point still stands, right? It costs a significant amount of
performance to query each file stripe for the purposes of a size calculation
during a simple 'ls'. Is the behavior as implemented by AT_SIZE sufficient
for Lustre to avoid such a performance loss?
> > AT_STRICT allows userspace to indicate that it wants the most up to date
> > version of a files status, regardless of performance impact. A distributed
> > file system which has a non-coherent inode cache would know then to send a
> > direct query to it's server.
>
> The issue with AT_STRICT is that applications TODAY are expecting the
> proper file status. I'd be more inclined to have an AT_LAZY flag for
> applications that do NOT require the most up-to-date information.
The reason I put that in there is because I believe there may be some
file systems which implement the 'lazy' behavior for regular stat so this
seemed a good way to short circuit that caching for those apps that really
care.
That said, I'm not 100% sure either AT_STRICT or AT_LAZY are really required
for a first pass - it just seemed convenient to throw AT_STRICT in there.
> > As noted previously, these patches are really rough. Mostly I'd like to get
> > some feedback on the interface and general direction of implementation. Some
> > glaring issues which we want to resolve:
> >
> > - This patch set doesn't actually wire up any client file systems yet :)
> >
> > - There's the question of whether we wire up [fl]stat(2) variants instead of
> > using fstatat. I went with the former as the implementation was more
> > straight forward.
>
> It appears that fstatat(2) fits the need for "statlite" cleanly. I'd
> thought today that this would require opening each file, but it is
> only necessary to open the parent directory and call fstatat() for
> each filename in the directory.
>
> > - I'm not sure whether we want to force zeroing of the optional fields, or
> > return whatever's in the inode (which may be stale, or just junk).
>
> I think yes, since there have been security bugs filed in rare cases when
> other bits of kernel data are exposed to user space, even if just a byte
> or two.
Yes, good idea. I'll implement that zeroing via the client fs ->setattr
methods in my next pass. By the way, I think that adds to my argument that
we don't really want to micro-optimize the flags yet - now all file systems
will have to switch on a ton of attr flags during their zeroing... Granted
that can be turned into an exported function but hopefully you see my
point.
> > -int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
> > +int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat,
> > + int flags)
> > {
> > if (inode->i_op->getattr)
> > - return inode->i_op->getattr(mnt, dentry, stat);
> > + return inode->i_op->getattr(mnt, dentry, stat, attr_flags);
>
> Oleg's version added a second ->getattr_lite() call that passed the flags,
> which avoids changing all of the filesystems, though I guess it isn't a
> huge deal either way.
Yeah, I'll look at how he did it. I just didn't want this to return ENOSYS
or whatnot if a file system doesn't have ->getattr_lite() but there's other
ways around that.
--Mark
--
Mark Fasheh
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] [RFC] vfs: 'stat light' fstatat flags
2009-04-07 8:00 [PATCH 1/2] [RFC] vfs: 'stat light' fstatat flags Mark Fasheh
2009-04-07 10:23 ` Andreas Dilger
@ 2009-04-07 17:42 ` Jamie Lokier
2009-04-07 17:48 ` Oleg Drokin
1 sibling, 1 reply; 15+ messages in thread
From: Jamie Lokier @ 2009-04-07 17:42 UTC (permalink / raw)
To: Mark Fasheh; +Cc: linux-fsdevel, Sage Weil, Trond Myklebust, Andreas Dilger
Mark Fasheh wrote:
> AT_STRICT allows userspace to indicate that it wants the most up to date
> version of a files status, regardless of performance impact. A distributed
> file system which has a non-coherent inode cache would know then to send a
> direct query to it's server.
Good idea! Sort out some NFS pain.
If a filesystem doesn't honour AT_STRICT, can we have the function
return an error instead of stale values?
I think AT_SYNC, AT_UPTODATE, AT_CURRENT or AT_NOW are clearer names
than AT_STRICT, but it doesn't matter.
Actually, all the names with AT_ are a bit silly, just because it
happens to use the same flag space at the *at() syscalls. They don't
have anything to do with "atness". What about calling these flags
STAT_*?
-- JAmie
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] [RFC] vfs: 'stat light' fstatat flags
2009-04-07 17:42 ` Jamie Lokier
@ 2009-04-07 17:48 ` Oleg Drokin
2009-04-07 18:16 ` Jamie Lokier
0 siblings, 1 reply; 15+ messages in thread
From: Oleg Drokin @ 2009-04-07 17:48 UTC (permalink / raw)
To: Jamie Lokier
Cc: Mark Fasheh, linux-fsdevel, Sage Weil, Trond Myklebust,
Andreas Dilger
Hello!
On Apr 7, 2009, at 1:42 PM, Jamie Lokier wrote:
> Mark Fasheh wrote:
>> AT_STRICT allows userspace to indicate that it wants the most up to
>> date
>> version of a files status, regardless of performance impact. A
>> distributed
>> file system which has a non-coherent inode cache would know then to
>> send a
>> direct query to it's server.
> Good idea! Sort out some NFS pain.
> If a filesystem doesn't honour AT_STRICT, can we have the function
> return an error instead of stale values?
Supposedly the existing stat() is the way to do this?
Bye,
Oleg
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] [RFC] vfs: 'stat light' fstatat flags
2009-04-07 15:29 ` Mark Fasheh
@ 2009-04-07 17:52 ` Andreas Dilger
2009-04-07 18:13 ` Jamie Lokier
0 siblings, 1 reply; 15+ messages in thread
From: Andreas Dilger @ 2009-04-07 17:52 UTC (permalink / raw)
To: Mark Fasheh; +Cc: linux-fsdevel, Sage Weil, Trond Myklebust
On Apr 07, 2009 08:29 -0700, Mark Fasheh wrote:
> On Tue, Apr 07, 2009 at 03:23:32AM -0700, Andreas Dilger wrote:
> > The problem with "AT_NO_*" is that old applications which couldn't possibly
> > know about or use a new stat field couldn't possibly know not to ask for
> > it.
>
> Well, if we add stat fields, we need new system calls _anyway_ because
> struct stat has to undergo a change. At that point, adding a new flag to the
> modified fstatat, or just changing the behavior of an existing flag in a
> backwards compatible manner should be quite easy in comparison.
Actually, there are some reserved fields in the current stat struct,
and in the past these fields have been used in a compatible manner (e.g.
nanosecond timestamps) and applications that were written before these
fields were defined will never use them so filling them in is pointless
if there is a cost in doing so.
> > Instead, as was proposed in the last generation of this thread, there
> > should be AT_GET_{ATIME,MTIME,CTIME,BLOCKS,SIZE,NLINKS,...}, to return the
> > flags that the application actually wants.
>
> Well, I respectfully disagree. In my opinion, this adds unncessary
> complexity to an already targetted case, where only a handful of file
> systems will even care to optimize away a couple of fields. In that light, I
> really think we want simplicity here.
The difficult part is that the application knows which fields it is going
to use, and the filesystem knows which fields are expensive to fill in,
and by merging multiple fields together you are deciding arbitrarily that
you know better than either. The main users will be common tools like
"ls --color" and "find", and they know exactly which fields are needed,
but they need to do a LOT of stats so making it efficient is worthwhile.
> The good news is that we still have a
> very good amount of space in the flags field. We could always go back if we
> _really_ find the need to micro-optimize to the point of per stat field
> flags. Again though, at this point I don't really think starting out with a
> slew of flags is a good idea.
Similar to the helper macros for S_IRWXU = (S_IRUSR|S_IWUSR|S_IXUSR)
applications won't necessarily need to specify all flags explicitly.
> > If none of them are specified, then the current behaviour of "get all
> > attributes" is kept.
>
> This of course, is how it works today - if none of the new flags are
> specified you get the exact old behavior. In fact, most file systems will
> wind up just happily ignoring the new flags.
Sure, but applications that DO care about these fields will want to
be able to specify the ones they want, instead of specifying the ones
they don't want (which is impossible to do for flags that the application
doesn't know about).
> > Actually, despite what was said today, Lustre doesn't revoke the writing
> > client's locks when getting the file size, unlike block-based filesystems.
> > That said, it is still relatively a lot of work to query the size for
> > a widely-striped file since there may be a hundred servers holding the
> > file data and any one of them might have the end-of-file.
>
> Ahh ok, but the point still stands, right? It costs a significant amount of
> performance to query each file stripe for the purposes of a size calculation
> during a simple 'ls'.
Correct. I was just clarifying that this wasn't related to lock revocation
(i.e. the node doing the writing is not affected by the ls).
> Is the behavior as implemented by AT_SIZE sufficient
> for Lustre to avoid such a performance loss?
If you mean applications not specifying AT_STAT_SIZE, then yes.
> > I think yes, since there have been security bugs filed in rare cases when
> > other bits of kernel data are exposed to user space, even if just a byte
> > or two.
>
> Yes, good idea. I'll implement that zeroing via the client fs ->setattr
> methods in my next pass. By the way, I think that adds to my argument that
> we don't really want to micro-optimize the flags yet - now all file systems
> will have to switch on a ton of attr flags during their zeroing... Granted
> that can be turned into an exported function but hopefully you see my
> point.
Actually, it appears that the kernel is already doing memset() for
the stat struct returned to userspace, so that the filesystems don't
have to do it themselves.
> Yeah, I'll look at how he did it. I just didn't want this to return ENOSYS
> or whatnot if a file system doesn't have ->getattr_lite() but there's other
> ways around that.
Oleg agrees that changing the ->getattr() prototype is better.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] [RFC] vfs: 'stat light' fstatat flags
2009-04-07 17:52 ` Andreas Dilger
@ 2009-04-07 18:13 ` Jamie Lokier
0 siblings, 0 replies; 15+ messages in thread
From: Jamie Lokier @ 2009-04-07 18:13 UTC (permalink / raw)
To: Andreas Dilger; +Cc: Mark Fasheh, linux-fsdevel, Sage Weil, Trond Myklebust
Andreas Dilger wrote:
> > > Instead, as was proposed in the last generation of this thread,
> > > there should be
> > > AT_GET_{ATIME,MTIME,CTIME,BLOCKS,SIZE,NLINKS,...}, to return the
> > > flags that the application actually wants.
> >
> > Well, I respectfully disagree. In my opinion, this adds unncessary
> > complexity to an already targetted case, where only a handful of
> > file systems will even care to optimize away a couple of
> > fields. In that light, I really think we want simplicity here.
>
> The difficult part is that the application knows which fields it is going
> to use, and the filesystem knows which fields are expensive to fill in,
> and by merging multiple fields together you are deciding arbitrarily that
> you know better than either. The main users will be common tools like
> "ls --color" and "find", and they know exactly which fields are needed,
> but they need to do a LOT of stats so making it efficient is worthwhile.
I agree with Andreas. The issue isn't "what is efficient and simple (now)",
but what makes a good future-compatible interface.
> > The good news is that we still have a
> > very good amount of space in the flags field. We could always go back if we
> > _really_ find the need to micro-optimize to the point of per stat field
> > flags. Again though, at this point I don't really think starting out with a
> > slew of flags is a good idea.
>
> Similar to the helper macros for S_IRWXU = (S_IRUSR|S_IWUSR|S_IXUSR)
> applications won't necessarily need to specify all flags explicitly.
Yes.
If it's intended to re-use the fstatat system call for this, the
conversion from "no stat field flags" to AT_STAT_ALL should be done in
the general fstatat code.
fstatat() is for doing stat() or lstat() relative to an open
directory. (Anyone else thing the name is stupid - why isn't it
statat like openat, etc.?)
So how do you do a selective fstat() - attributes of an open file?
Don't waste a flag, and just specify AT_SELF as the pathname value to
mean return the attributes of the fd argument itself. AT_SELF would
be NULL, or perhaps (const char *) -1 to catch application errors.
-- Jamie
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] [RFC] vfs: 'stat light' fstatat flags
2009-04-07 17:48 ` Oleg Drokin
@ 2009-04-07 18:16 ` Jamie Lokier
2009-04-07 18:24 ` Oleg Drokin
0 siblings, 1 reply; 15+ messages in thread
From: Jamie Lokier @ 2009-04-07 18:16 UTC (permalink / raw)
To: Oleg Drokin
Cc: Mark Fasheh, linux-fsdevel, Sage Weil, Trond Myklebust,
Andreas Dilger
Oleg Drokin wrote:
> >>AT_STRICT allows userspace to indicate that it wants the most up to
> >>date
> >>version of a files status, regardless of performance impact. A
> >>distributed
> >>file system which has a non-coherent inode cache would know then to
> >>send a
> >>direct query to it's server.
> >Good idea! Sort out some NFS pain.
> >If a filesystem doesn't honour AT_STRICT, can we have the function
> >return an error instead of stale values?
>
> Supposedly the existing stat() is the way to do this?
I don't understand your response. If an application wants to be sure
it has non-stale attributes, how does stat() help? Are you saying
stat() does this? (Afaik it doesn't on NFS).
-- Jamie
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] [RFC] vfs: 'stat light' fstatat flags
2009-04-07 18:16 ` Jamie Lokier
@ 2009-04-07 18:24 ` Oleg Drokin
2009-04-07 20:37 ` Trond Myklebust
0 siblings, 1 reply; 15+ messages in thread
From: Oleg Drokin @ 2009-04-07 18:24 UTC (permalink / raw)
To: Jamie Lokier
Cc: Mark Fasheh, linux-fsdevel, Sage Weil, Trond Myklebust,
Andreas Dilger
Hello!
On Apr 7, 2009, at 2:16 PM, Jamie Lokier wrote:
> Oleg Drokin wrote:
>>>> AT_STRICT allows userspace to indicate that it wants the most up to
>>>> date
>>>> version of a files status, regardless of performance impact. A
>>>> distributed
>>>> file system which has a non-coherent inode cache would know then to
>>>> send a
>>>> direct query to it's server.
>>> Good idea! Sort out some NFS pain.
>>> If a filesystem doesn't honour AT_STRICT, can we have the function
>>> return an error instead of stale values?
>> Supposedly the existing stat() is the way to do this?
> I don't understand your response. If an application wants to be sure
> it has non-stale attributes, how does stat() help? Are you saying
> stat() does this? (Afaik it doesn't on NFS).
Well, what the stat actually meant to do is "give me the file
information
as it is now". By the time it returns, the data is stale anyway, and the
longer your path from the user app to the actual file storage, the
more potentially out of date the information is.
NFS just takes it to an extreme case and introduces an assumed validity
timeout.
While I do not directly oppose such a flag, I really do not see huge
value
in it. I wonder what is the specific usecase do you have in mind?
Bye,
Oleg
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] [RFC] vfs: 'stat light' fstatat flags
2009-04-07 18:24 ` Oleg Drokin
@ 2009-04-07 20:37 ` Trond Myklebust
2009-04-08 18:48 ` Jamie Lokier
0 siblings, 1 reply; 15+ messages in thread
From: Trond Myklebust @ 2009-04-07 20:37 UTC (permalink / raw)
To: Oleg Drokin
Cc: Jamie Lokier, Mark Fasheh, linux-fsdevel, Sage Weil,
Andreas Dilger
On Tue, 2009-04-07 at 14:24 -0400, Oleg Drokin wrote:
> Well, what the stat actually meant to do is "give me the file
> information
> as it is now". By the time it returns, the data is stale anyway, and the
> longer your path from the user app to the actual file storage, the
> more potentially out of date the information is.
> NFS just takes it to an extreme case and introduces an assumed validity
> timeout.
> While I do not directly oppose such a flag, I really do not see huge
> value
> in it. I wonder what is the specific usecase do you have in mind?
The default behaviour of stat() on NFS is to do a revalidation of the
cached data (by which I mean that we issue an RPC call if and only if
the cache has timed out, or if it is known to be invalid).
The AT_STRICT would be used by the application to tell NFS that it must
retrieve the cached data from the server. One instance where this is
useful would be the case where you're doing a distributed compile: the
application knows that the file may have changed on the server, and
wants to force the kernel to check mtime.
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] [RFC] vfs: 'stat light' fstatat flags
2009-04-07 10:23 ` Andreas Dilger
` (2 preceding siblings ...)
2009-04-07 15:29 ` Mark Fasheh
@ 2009-04-07 23:13 ` Russell Cattelan
3 siblings, 0 replies; 15+ messages in thread
From: Russell Cattelan @ 2009-04-07 23:13 UTC (permalink / raw)
To: Andreas Dilger, linux-fsdevel
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Andreas Dilger wrote:
> On Apr 07, 2009 01:00 -0700, Mark Fasheh wrote:
>> This very, very rough patch set adds three flags to fstatat -
>> AT_NO_SIZE, AT_NO_TIMES, and AT_STRICT.
>
> It seems you and Oleg have two patches crossing in the night...
>
>> The first two flags (AT_NO_SIZE, AT_NO_TIMES) allow userspace to
>> notify the file system layer that certain stat fields are not
>> required to be accurate.
>
> The problem with "AT_NO_*" is that old applications which couldn't
> possibly know about or use a new stat field couldn't possibly know
> not to ask for it. Instead, as was proposed in the last generation
> of this thread, there should be
> AT_GET_{ATIME,MTIME,CTIME,BLOCKS,SIZE,NLINKS,...}, to return the
> flags that the application actually wants. If none of them are
> specified, then the current behaviour of "get all attributes" is
> kept.
>
I agree with implementing positive flags "I want" vs negative flags "I
don't want", any application that is going to
actually go to the effort to selectively pick which stat fields it
wants is only expecting that particular
information to be returned. The use of negative flags could get
confusing in the future if stat info
is added or adjusted, the file system may then need to make guesses as
to weather or not it should
return this new info, especially if the new info has performance
implications.
The other thing that occurs to me that might be useful is a system
wide stat mask or a shell based stat mask?
I mean lets face it will be very difficult to get all applications
updated to use the stat call such that the "expensive"
stat call that potentially kills your cluster file system performance.
So adding the ability for an administrator to carefully
change the stat mask on critical systems might be a way to actually
realize the benefits of stat lite.
- -Russell Cattelan
Digital Elves Inc.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFJ294fNRmM+OaGhBgRAs9tAJ9uMnjQW/VEnpJCPDNIQX9hQsvhJgCfWFhZ
uyh+xkYmp5Sxa2T9T4IyEpc=
=7qdU
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] [RFC] vfs: 'stat light' fstatat flags
2009-04-07 20:37 ` Trond Myklebust
@ 2009-04-08 18:48 ` Jamie Lokier
2009-04-09 15:13 ` Trond Myklebust
0 siblings, 1 reply; 15+ messages in thread
From: Jamie Lokier @ 2009-04-08 18:48 UTC (permalink / raw)
To: Trond Myklebust
Cc: Oleg Drokin, Mark Fasheh, linux-fsdevel, Sage Weil,
Andreas Dilger
Trond Myklebust wrote:
> On Tue, 2009-04-07 at 14:24 -0400, Oleg Drokin wrote:
> > Well, what the stat actually meant to do is "give me the file
> > information as it is now". By the time it returns, the data is
> > stale anyway, and the longer your path from the user app to the
> > actual file storage, the more potentially out of date the
> > information is. NFS just takes it to an extreme case and
> > introduces an assumed validity timeout. While I do not directly
> > oppose such a flag, I really do not see huge value in it. I wonder
> > what is the specific usecase do you have in mind?
>
> The default behaviour of stat() on NFS is to do a revalidation of the
> cached data (by which I mean that we issue an RPC call if and only if
> the cache has timed out, or if it is known to be invalid).
>
> The AT_STRICT would be used by the application to tell NFS that it must
> retrieve the cached data from the server. One instance where this is
> useful would be the case where you're doing a distributed compile: the
> application knows that the file may have changed on the server, and
> wants to force the kernel to check mtime.
Yes, that's exactly the sort of thing I had in mind.
The amount of time taken isn't relevant, neither is the fact that the
file may change after calling stat() before using the result.
What matters is the order of dependent events.
program on host A does something which may affect files,
then sends message to host B - "I've finished whatever may affect files"
in response to message,
program on host B calls stat() to check for changes to files
The stat() call must get attributes which are no older than when host
A modified those attributes.
This implies host A must have written the attribute changes too :-)
NFS does this on close(); other network filesystems use other mechanisms.
As Trond says, the default behaviour of stat() does not get
sufficiently recent attributes, which means things like distributed
compiles sometimes fail on NFS with default settings.
I was thinking of another application: #!/usr/bin/perlcachedcompiler
which behaves like a normal script interpreter, except it compiles the
script first (slow optimising compile...) and runs the compiled
version. Next time it's run, it checks the persistent cache of
compiled versions against the attributes of the script file, and if
there's a match, no need to compile: just run the compiled version
quickly.
This sort of cache validation needs AT_STRICT to be reliable with NFS,
when a script is edited on one host, then run on another.
(Actually I'd like dnotify/inotify/fsnotify so I can omit the stat()
round trip entirely please on each application cache
revalidation... but only if "check for any file change events, answer
= no events" is synchronous in the same way that AT_STRICT is...)
-- Jamie
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] [RFC] vfs: 'stat light' fstatat flags
2009-04-08 18:48 ` Jamie Lokier
@ 2009-04-09 15:13 ` Trond Myklebust
0 siblings, 0 replies; 15+ messages in thread
From: Trond Myklebust @ 2009-04-09 15:13 UTC (permalink / raw)
To: Jamie Lokier
Cc: Oleg Drokin, Mark Fasheh, linux-fsdevel, Sage Weil,
Andreas Dilger
On Wed, 2009-04-08 at 19:48 +0100, Jamie Lokier wrote:
> (Actually I'd like dnotify/inotify/fsnotify so I can omit the stat()
> round trip entirely please on each application cache
> revalidation... but only if "check for any file change events, answer
> = no events" is synchronous in the same way that AT_STRICT is...)
There is no protocol support for inotify or fsnotify.
The NFSv4.1 protocol does support a dnotify-ish callback, but that isn't
synchronous due to scalability concerns. It should be quite sufficient
for the likes of konqueror or nautilus, though...
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2009-04-09 15:13 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-07 8:00 [PATCH 1/2] [RFC] vfs: 'stat light' fstatat flags Mark Fasheh
2009-04-07 10:23 ` Andreas Dilger
2009-04-07 14:59 ` Oleg Drokin
2009-04-07 15:18 ` Sage Weil
2009-04-07 15:29 ` Mark Fasheh
2009-04-07 17:52 ` Andreas Dilger
2009-04-07 18:13 ` Jamie Lokier
2009-04-07 23:13 ` Russell Cattelan
2009-04-07 17:42 ` Jamie Lokier
2009-04-07 17:48 ` Oleg Drokin
2009-04-07 18:16 ` Jamie Lokier
2009-04-07 18:24 ` Oleg Drokin
2009-04-07 20:37 ` Trond Myklebust
2009-04-08 18:48 ` Jamie Lokier
2009-04-09 15:13 ` Trond Myklebust
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).