* [patch 1/2] VFS: new fgetattr() file operation
@ 2007-09-24 12:24 Miklos Szeredi
2007-09-24 12:25 ` [patch 2/2] VFS: allow filesystem to override mknod capability checks Miklos Szeredi
2007-09-24 12:36 ` [patch 1/2] VFS: new fgetattr() file operation Christoph Hellwig
0 siblings, 2 replies; 17+ messages in thread
From: Miklos Szeredi @ 2007-09-24 12:24 UTC (permalink / raw)
To: hch; +Cc: trond.myklebust, adilger, linux-kernel, linux-fsdevel
Thanks to everyone for the feedback. Here's two of the VFS patches
reworked according to comments. I also plan to rework the setattr()
patch accordingly and perhaps the xattr patch, altough that is the
lowest priority.
Christoph, are these OK with you in this form?
----
From: Miklos Szeredi <mszeredi@suse.cz>
Add a new file operation: f_op->fgetattr(), that is invoked by
fstat(). Fall back to i_op->getattr() if it is not defined.
This is useful for filesystems such as sshfs, which don't have a state
associated with inodes, but do have a state associated with open file
handles, and can perform a getattr operation using these handles.
In these cases there are basically two ways to correctly implement
open-unlink-fstat semantics:
1) use "sillyrenaming"
2) keep track of open files for each inode within the filesystem, and
randomly choose one for each getattr() request on an inode with
i_nlink == 0
3) VFS passes the open file to the filesystem, which can be used to
perform a getattr operation on the file handle
No 3. is by far the simplest solution, but it does require this
interface change in the VFS. It is also the only one that takes care
of the case, when a regular file is unlinked or renamed on the remote
host, while it is still open locally.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
Index: linux/fs/stat.c
===================================================================
--- linux.orig/fs/stat.c 2007-09-24 13:04:37.000000000 +0200
+++ linux/fs/stat.c 2007-09-24 13:06:10.000000000 +0200
@@ -55,6 +55,27 @@ int vfs_getattr(struct vfsmount *mnt, st
EXPORT_SYMBOL(vfs_getattr);
+static int vfs_fgetattr(struct file *file, struct kstat *stat)
+{
+ struct vfsmount *mnt = file->f_path.mnt;
+ struct dentry *dentry = file->f_path.dentry;
+ struct inode *inode = dentry->d_inode;
+ int retval;
+
+ retval = security_inode_getattr(mnt, dentry);
+ if (retval)
+ return retval;
+
+ if (file->f_op && file->f_op->fgetattr) {
+ return file->f_op->fgetattr(file, stat);
+ } else if (inode->i_op->getattr) {
+ return inode->i_op->getattr(mnt, dentry, stat);
+ } else {
+ generic_fillattr(inode, stat);
+ return 0;
+ }
+}
+
int vfs_stat_fd(int dfd, char __user *name, struct kstat *stat)
{
struct nameidata nd;
@@ -101,7 +122,7 @@ int vfs_fstat(unsigned int fd, struct ks
int error = -EBADF;
if (f) {
- error = vfs_getattr(f->f_path.mnt, f->f_path.dentry, stat);
+ error = vfs_fgetattr(f, stat);
fput(f);
}
return error;
Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h 2007-09-24 13:04:37.000000000 +0200
+++ linux/include/linux/fs.h 2007-09-24 13:06:10.000000000 +0200
@@ -1193,6 +1193,7 @@ struct file_operations {
ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int);
int (*setlease)(struct file *, long, struct file_lock **);
int (*revoke)(struct file *, struct address_space *);
+ int (*fgetattr)(struct file *, struct kstat *);
};
struct inode_operations {
^ permalink raw reply [flat|nested] 17+ messages in thread* [patch 2/2] VFS: allow filesystem to override mknod capability checks 2007-09-24 12:24 [patch 1/2] VFS: new fgetattr() file operation Miklos Szeredi @ 2007-09-24 12:25 ` Miklos Szeredi 2007-09-24 12:38 ` Christoph Hellwig 2007-09-28 2:40 ` Neil Brown 2007-09-24 12:36 ` [patch 1/2] VFS: new fgetattr() file operation Christoph Hellwig 1 sibling, 2 replies; 17+ messages in thread From: Miklos Szeredi @ 2007-09-24 12:25 UTC (permalink / raw) To: hch; +Cc: trond.myklebust, adilger, linux-kernel, linux-fsdevel From: Miklos Szeredi <mszeredi@suse.cz> Add a new super block flag, that results in the VFS not checking if the current process has enough privileges to do an mknod(). If this flag is set, all mounts for this super block will have the "nodev" flag implied. This is needed on filesystems, where an unprivileged user may be able to create a device node, without causing security problems. One such example is "mountlo" a loopback mount utility implemented with fuse and UML, which runs as an unprivileged userspace process. In this case the user does in fact have the right to create device nodes within the filesystem image, as long as the user has write access to the image. Since the filesystem is mounted with "nodev", adding device nodes is not a security concern. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> --- Index: linux/fs/namei.c =================================================================== --- linux.orig/fs/namei.c 2007-09-24 13:52:17.000000000 +0200 +++ linux/fs/namei.c 2007-09-24 13:54:57.000000000 +0200 @@ -1617,7 +1617,7 @@ int may_open(struct nameidata *nd, int a if (S_ISFIFO(inode->i_mode) || S_ISSOCK(inode->i_mode)) { flag &= ~O_TRUNC; } else if (S_ISBLK(inode->i_mode) || S_ISCHR(inode->i_mode)) { - if (nd->mnt->mnt_flags & MNT_NODEV) + if (IS_MNT_NODEV(nd->mnt)) return -EACCES; flag &= ~O_TRUNC; @@ -1920,7 +1920,8 @@ int vfs_mknod(struct inode *dir, struct if (error) return error; - if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD)) + if (!(dir->i_sb->s_flags & MS_MKNOD_NOCAP) && + (S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD)) return -EPERM; if (!dir->i_op || !dir->i_op->mknod) Index: linux/include/linux/fs.h =================================================================== --- linux.orig/include/linux/fs.h 2007-09-24 13:52:17.000000000 +0200 +++ linux/include/linux/fs.h 2007-09-24 13:54:57.000000000 +0200 @@ -130,6 +130,8 @@ extern int dir_notify_enable; #define MS_SETUSER (1<<23) /* set mnt_uid to current user */ #define MS_NOMNT (1<<24) /* don't allow unprivileged submounts */ #define MS_KERNMOUNT (1<<25) /* this is a kern_mount call */ +#define MS_MKNOD_NOCAP (1<<26) /* no capability check in mknod, + implies "nodev" */ #define MS_ACTIVE (1<<30) #define MS_NOUSER (1<<31) @@ -190,6 +192,10 @@ extern int dir_notify_enable; #define IS_SWAPFILE(inode) ((inode)->i_flags & S_SWAPFILE) #define IS_PRIVATE(inode) ((inode)->i_flags & S_PRIVATE) +#define IS_MNT_NODEV(mnt) (((mnt)->mnt_flags & MNT_NODEV) || \ + ((mnt)->mnt_sb->s_flags & MS_MKNOD_NOCAP)) + + /* the read-only stuff doesn't really belong here, but any other place is probably as bad and I don't want to create yet another include file. */ Index: linux/drivers/mtd/mtdsuper.c =================================================================== --- linux.orig/drivers/mtd/mtdsuper.c 2007-09-24 13:52:17.000000000 +0200 +++ linux/drivers/mtd/mtdsuper.c 2007-09-24 13:54:57.000000000 +0200 @@ -194,7 +194,7 @@ int get_sb_mtd(struct file_system_type * if (!S_ISBLK(nd.dentry->d_inode->i_mode)) goto out; - if (nd.mnt->mnt_flags & MNT_NODEV) { + if (IS_MNT_NODEV(nd.mnt)) { ret = -EACCES; goto out; } Index: linux/fs/block_dev.c =================================================================== --- linux.orig/fs/block_dev.c 2007-09-24 13:52:17.000000000 +0200 +++ linux/fs/block_dev.c 2007-09-24 13:54:57.000000000 +0200 @@ -1408,7 +1408,7 @@ struct block_device *lookup_bdev(const c if (!S_ISBLK(inode->i_mode)) goto fail; error = -EACCES; - if (nd.mnt->mnt_flags & MNT_NODEV) + if (IS_MNT_NODEV(nd.mnt)) goto fail; error = -ENOMEM; bdev = bd_acquire(inode); Index: linux/fs/namespace.c =================================================================== --- linux.orig/fs/namespace.c 2007-09-24 13:52:17.000000000 +0200 +++ linux/fs/namespace.c 2007-09-24 13:54:57.000000000 +0200 @@ -431,7 +431,6 @@ static int show_vfsmnt(struct seq_file * }; static struct proc_fs_info mnt_info[] = { { MNT_NOSUID, ",nosuid" }, - { MNT_NODEV, ",nodev" }, { MNT_NOEXEC, ",noexec" }, { MNT_NOATIME, ",noatime" }, { MNT_NODIRATIME, ",nodiratime" }, @@ -459,6 +458,8 @@ static int show_vfsmnt(struct seq_file * if (mnt->mnt_flags & fs_infop->flag) seq_puts(m, fs_infop->str); } + if (IS_MNT_NODEV(mnt)) + seq_puts(m, ",nodev"); if (mnt->mnt_flags & MNT_USER) seq_printf(m, ",user=%i", mnt->mnt_uid); if (mnt->mnt_sb->s_op->show_options) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch 2/2] VFS: allow filesystem to override mknod capability checks 2007-09-24 12:25 ` [patch 2/2] VFS: allow filesystem to override mknod capability checks Miklos Szeredi @ 2007-09-24 12:38 ` Christoph Hellwig 2007-09-28 2:40 ` Neil Brown 1 sibling, 0 replies; 17+ messages in thread From: Christoph Hellwig @ 2007-09-24 12:38 UTC (permalink / raw) To: Miklos Szeredi; +Cc: hch, trond.myklebust, adilger, linux-kernel, linux-fsdevel On Mon, Sep 24, 2007 at 02:25:54PM +0200, Miklos Szeredi wrote: > From: Miklos Szeredi <mszeredi@suse.cz> > > Add a new super block flag, that results in the VFS not checking if > the current process has enough privileges to do an mknod(). > > If this flag is set, all mounts for this super block will have the > "nodev" flag implied. > > This is needed on filesystems, where an unprivileged user may be able > to create a device node, without causing security problems. > > One such example is "mountlo" a loopback mount utility implemented > with fuse and UML, which runs as an unprivileged userspace process. > In this case the user does in fact have the right to create device > nodes within the filesystem image, as long as the user has write > access to the image. Since the filesystem is mounted with "nodev", > adding device nodes is not a security concern. This one looks okay, but I'd prefer to not put it in until we actually have proper non-privilegued mounts. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch 2/2] VFS: allow filesystem to override mknod capability checks 2007-09-24 12:25 ` [patch 2/2] VFS: allow filesystem to override mknod capability checks Miklos Szeredi 2007-09-24 12:38 ` Christoph Hellwig @ 2007-09-28 2:40 ` Neil Brown 2007-10-01 18:06 ` Miklos Szeredi 1 sibling, 1 reply; 17+ messages in thread From: Neil Brown @ 2007-09-28 2:40 UTC (permalink / raw) To: Miklos Szeredi; +Cc: hch, trond.myklebust, adilger, linux-kernel, linux-fsdevel On Monday September 24, miklos@szeredi.hu wrote: > From: Miklos Szeredi <mszeredi@suse.cz> > > Add a new super block flag, that results in the VFS not checking if > the current process has enough privileges to do an mknod(). > > If this flag is set, all mounts for this super block will have the > "nodev" flag implied. > > This is needed on filesystems, where an unprivileged user may be able > to create a device node, without causing security problems. > > One such example is "mountlo" a loopback mount utility implemented > with fuse and UML, which runs as an unprivileged userspace process. > In this case the user does in fact have the right to create device > nodes within the filesystem image, as long as the user has write > access to the image. Since the filesystem is mounted with "nodev", > adding device nodes is not a security concern. I must admit that I don't feel very comfortable about this. I wonder how many more flags we might be tempted to add to allow user-controlled filesystems to do interesting things. Somehow I doubt this will be the last, so we should be very careful allowing it to be the first (or is it the second already?) A more concrete comment on the patch: Is it really necessary to introduce IS_MNT_NODEV?? Why not simply test both the flags (MS_MKNOD_NOCAP and MNT_NODEV) before allowing the mknod? That would localise the change to where is it really relevant. Do we actually need a new flag? Would not a combination of MS_NODEV and MS_SETUSER achieve the same thing (near enough)? Do you imagine this flag being set as a mount option (-o unprivmknod) or does the filesystem set it itself? If the latter, maybe this test should be moved down into the filesystems ->mknod operation. Most filesystems get if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD)) return -EPERM; at the top of ->mknod. fuse can do whatever it likes without bothering common code. According to fs.h, we only support 32 fs-independent mount-flags, and over half are in use. I'm not convinced we should spend one on such a narrow requirement. NeilBrown > > Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> > --- > > Index: linux/fs/namei.c > =================================================================== > --- linux.orig/fs/namei.c 2007-09-24 13:52:17.000000000 +0200 > +++ linux/fs/namei.c 2007-09-24 13:54:57.000000000 +0200 > @@ -1617,7 +1617,7 @@ int may_open(struct nameidata *nd, int a > if (S_ISFIFO(inode->i_mode) || S_ISSOCK(inode->i_mode)) { > flag &= ~O_TRUNC; > } else if (S_ISBLK(inode->i_mode) || S_ISCHR(inode->i_mode)) { > - if (nd->mnt->mnt_flags & MNT_NODEV) > + if (IS_MNT_NODEV(nd->mnt)) > return -EACCES; > > flag &= ~O_TRUNC; > @@ -1920,7 +1920,8 @@ int vfs_mknod(struct inode *dir, struct > if (error) > return error; > > - if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD)) > + if (!(dir->i_sb->s_flags & MS_MKNOD_NOCAP) && > + (S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD)) > return -EPERM; > > if (!dir->i_op || !dir->i_op->mknod) > Index: linux/include/linux/fs.h > =================================================================== > --- linux.orig/include/linux/fs.h 2007-09-24 13:52:17.000000000 +0200 > +++ linux/include/linux/fs.h 2007-09-24 13:54:57.000000000 +0200 > @@ -130,6 +130,8 @@ extern int dir_notify_enable; > #define MS_SETUSER (1<<23) /* set mnt_uid to current user */ > #define MS_NOMNT (1<<24) /* don't allow unprivileged submounts */ > #define MS_KERNMOUNT (1<<25) /* this is a kern_mount call */ > +#define MS_MKNOD_NOCAP (1<<26) /* no capability check in mknod, > + implies "nodev" */ > #define MS_ACTIVE (1<<30) > #define MS_NOUSER (1<<31) > > @@ -190,6 +192,10 @@ extern int dir_notify_enable; > #define IS_SWAPFILE(inode) ((inode)->i_flags & S_SWAPFILE) > #define IS_PRIVATE(inode) ((inode)->i_flags & S_PRIVATE) > > +#define IS_MNT_NODEV(mnt) (((mnt)->mnt_flags & MNT_NODEV) || \ > + ((mnt)->mnt_sb->s_flags & MS_MKNOD_NOCAP)) > + > + > /* the read-only stuff doesn't really belong here, but any other place is > probably as bad and I don't want to create yet another include file. */ > > Index: linux/drivers/mtd/mtdsuper.c > =================================================================== > --- linux.orig/drivers/mtd/mtdsuper.c 2007-09-24 13:52:17.000000000 +0200 > +++ linux/drivers/mtd/mtdsuper.c 2007-09-24 13:54:57.000000000 +0200 > @@ -194,7 +194,7 @@ int get_sb_mtd(struct file_system_type * > if (!S_ISBLK(nd.dentry->d_inode->i_mode)) > goto out; > > - if (nd.mnt->mnt_flags & MNT_NODEV) { > + if (IS_MNT_NODEV(nd.mnt)) { > ret = -EACCES; > goto out; > } > Index: linux/fs/block_dev.c > =================================================================== > --- linux.orig/fs/block_dev.c 2007-09-24 13:52:17.000000000 +0200 > +++ linux/fs/block_dev.c 2007-09-24 13:54:57.000000000 +0200 > @@ -1408,7 +1408,7 @@ struct block_device *lookup_bdev(const c > if (!S_ISBLK(inode->i_mode)) > goto fail; > error = -EACCES; > - if (nd.mnt->mnt_flags & MNT_NODEV) > + if (IS_MNT_NODEV(nd.mnt)) > goto fail; > error = -ENOMEM; > bdev = bd_acquire(inode); > Index: linux/fs/namespace.c > =================================================================== > --- linux.orig/fs/namespace.c 2007-09-24 13:52:17.000000000 +0200 > +++ linux/fs/namespace.c 2007-09-24 13:54:57.000000000 +0200 > @@ -431,7 +431,6 @@ static int show_vfsmnt(struct seq_file * > }; > static struct proc_fs_info mnt_info[] = { > { MNT_NOSUID, ",nosuid" }, > - { MNT_NODEV, ",nodev" }, > { MNT_NOEXEC, ",noexec" }, > { MNT_NOATIME, ",noatime" }, > { MNT_NODIRATIME, ",nodiratime" }, > @@ -459,6 +458,8 @@ static int show_vfsmnt(struct seq_file * > if (mnt->mnt_flags & fs_infop->flag) > seq_puts(m, fs_infop->str); > } > + if (IS_MNT_NODEV(mnt)) > + seq_puts(m, ",nodev"); > if (mnt->mnt_flags & MNT_USER) > seq_printf(m, ",user=%i", mnt->mnt_uid); > if (mnt->mnt_sb->s_op->show_options) > - > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch 2/2] VFS: allow filesystem to override mknod capability checks 2007-09-28 2:40 ` Neil Brown @ 2007-10-01 18:06 ` Miklos Szeredi 0 siblings, 0 replies; 17+ messages in thread From: Miklos Szeredi @ 2007-10-01 18:06 UTC (permalink / raw) To: neilb; +Cc: hch, trond.myklebust, adilger, linux-kernel, linux-fsdevel > On Monday September 24, miklos@szeredi.hu wrote: > > From: Miklos Szeredi <mszeredi@suse.cz> > > > > Add a new super block flag, that results in the VFS not checking if > > the current process has enough privileges to do an mknod(). > > > > If this flag is set, all mounts for this super block will have the > > "nodev" flag implied. > > > > This is needed on filesystems, where an unprivileged user may be able > > to create a device node, without causing security problems. > > > > One such example is "mountlo" a loopback mount utility implemented > > with fuse and UML, which runs as an unprivileged userspace process. > > In this case the user does in fact have the right to create device > > nodes within the filesystem image, as long as the user has write > > access to the image. Since the filesystem is mounted with "nodev", > > adding device nodes is not a security concern. > > I must admit that I don't feel very comfortable about this. I wonder > how many more flags we might be tempted to add to allow > user-controlled filesystems to do interesting things. Somehow I doubt > this will be the last, so we should be very careful allowing it to be > the first (or is it the second already?) Or third ;) Yes, I've always argued, that permission checking needs to be pushed to the filesystem, since the VFS can't always do a good enough job. My usual example is sshfs, where it is just impossible to know the uid/gid mapping between the server and the client. So any permission checking based on uid or gid on the client simply can't work, the only sane thing to do is to delegate the permission checking to the server. Which works beautifully, since the server is an unprivileged process, and everything automatically works out. All the fuse kernel module has to do is to basically define ->permission() to always return success, and let the userspace filesystem do the permission checking. This works fine, except a couple of things, like checking the sticky bit on a directory, and mknod(). > A more concrete comment on the patch: Is it really necessary to > introduce IS_MNT_NODEV?? Why not simply test both the flags > (MS_MKNOD_NOCAP and MNT_NODEV) before allowing the mknod? Because we need MNT_NODEV on _all_ mounts belonging to a superblock, not just the one on which mknod was performed on, which would get really ugly. This way it's simple: if the MS_MKNOD_NOCAP is specified for the super block, that implies, that devices can't be opened. > Do we actually need a new flag? Would not a combination of MS_NODEV > and MS_SETUSER achieve the same thing (near enough)? See above. > Do you imagine this flag being set as a mount option (-o unprivmknod) > or does the filesystem set it itself? I imagine this flag to usually be set by the filesystem itself. But it could be a separate mount option. I guess it could make sense in some non-fuse cases as well. > If the latter, maybe this test should be moved down into the > filesystems ->mknod operation. Most filesystems get > > if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD)) > return -EPERM; > > at the top of ->mknod. fuse can do whatever it likes without > bothering common code. Yes, that's one of the options, but it would be a huge change, with nasty security implications for past and future filesystems. > According to fs.h, we only support 32 fs-independent mount-flags, and > over half are in use. I'm not convinced we should spend one on such a > narrow requirement. I think there's consensus on the need for a new mount API. Not just because the 32 bits for the flags will be exhausted sooner or later anyway, but the current API is limited in lots of other ways. Miklos ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch 1/2] VFS: new fgetattr() file operation 2007-09-24 12:24 [patch 1/2] VFS: new fgetattr() file operation Miklos Szeredi 2007-09-24 12:25 ` [patch 2/2] VFS: allow filesystem to override mknod capability checks Miklos Szeredi @ 2007-09-24 12:36 ` Christoph Hellwig 2007-09-24 12:48 ` Miklos Szeredi 1 sibling, 1 reply; 17+ messages in thread From: Christoph Hellwig @ 2007-09-24 12:36 UTC (permalink / raw) To: Miklos Szeredi; +Cc: hch, trond.myklebust, adilger, linux-kernel, linux-fsdevel On Mon, Sep 24, 2007 at 02:24:54PM +0200, Miklos Szeredi wrote: > Thanks to everyone for the feedback. Here's two of the VFS patches > reworked according to comments. I also plan to rework the setattr() > patch accordingly and perhaps the xattr patch, altough that is the > lowest priority. > > Christoph, are these OK with you in this form? Not at all. Attribute operations like this have no business at all looking at the struct file. Please fix your dreaded filesystem to implement proper unix semantics intead, and if that means adding silly rename support so be it. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch 1/2] VFS: new fgetattr() file operation 2007-09-24 12:36 ` [patch 1/2] VFS: new fgetattr() file operation Christoph Hellwig @ 2007-09-24 12:48 ` Miklos Szeredi 2007-09-24 12:59 ` Matthew Wilcox 2007-09-24 13:42 ` Alan Cox 0 siblings, 2 replies; 17+ messages in thread From: Miklos Szeredi @ 2007-09-24 12:48 UTC (permalink / raw) To: hch; +Cc: miklos, hch, trond.myklebust, adilger, linux-kernel, linux-fsdevel > On Mon, Sep 24, 2007 at 02:24:54PM +0200, Miklos Szeredi wrote: > > Thanks to everyone for the feedback. Here's two of the VFS patches > > reworked according to comments. I also plan to rework the setattr() > > patch accordingly and perhaps the xattr patch, altough that is the > > lowest priority. > > > > Christoph, are these OK with you in this form? > > Not at all. Attribute operations like this have no business at all > looking at the struct file. Please fix your dreaded filesystem to > implement proper unix semantics intead, It's a fixed protocol, with servers installed on millions of servers. The protocol is called SFTP and the server is part of the OpenSSH package. There's nothing I can change there. And even if I could change the protocol, it's impossible to implement full UNIX semantics with a userspace server. Please think a bit about it. > and if that means adding silly rename support so be it. That's what is done currently. But it's has various dawbacks, like rmdir doesn't work if there are open files within an otherwise empty directory. I'd happily accept suggestions on how to deal with this differenty. Thanks, Miklos ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch 1/2] VFS: new fgetattr() file operation 2007-09-24 12:48 ` Miklos Szeredi @ 2007-09-24 12:59 ` Matthew Wilcox 2007-09-24 13:06 ` Miklos Szeredi 2007-09-24 13:42 ` Alan Cox 1 sibling, 1 reply; 17+ messages in thread From: Matthew Wilcox @ 2007-09-24 12:59 UTC (permalink / raw) To: Miklos Szeredi; +Cc: hch, trond.myklebust, adilger, linux-kernel, linux-fsdevel On Mon, Sep 24, 2007 at 02:48:08PM +0200, Miklos Szeredi wrote: > > and if that means adding silly rename support so be it. > > That's what is done currently. > > But it's has various dawbacks, like rmdir doesn't work if there are > open files within an otherwise empty directory. > > I'd happily accept suggestions on how to deal with this differenty. Only sillyrename files with nlink > 1? I don't see how attributes can change anything for a deleted file. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch 1/2] VFS: new fgetattr() file operation 2007-09-24 12:59 ` Matthew Wilcox @ 2007-09-24 13:06 ` Miklos Szeredi 2007-09-24 13:10 ` Christoph Hellwig 2007-09-24 13:16 ` Matthew Wilcox 0 siblings, 2 replies; 17+ messages in thread From: Miklos Szeredi @ 2007-09-24 13:06 UTC (permalink / raw) To: matthew; +Cc: miklos, hch, trond.myklebust, adilger, linux-kernel, linux-fsdevel > > > and if that means adding silly rename support so be it. > > > > That's what is done currently. > > > > But it's has various dawbacks, like rmdir doesn't work if there are > > open files within an otherwise empty directory. > > > > I'd happily accept suggestions on how to deal with this differenty. > > Only sillyrename files with nlink > 1? I don't see how attributes can > change anything for a deleted file. I don't quite understand your suggestion. A file isn't deleted while there are still links or open files refering to it. So getting the attributes for a file with nlink==0 is perfectly valid while the file is still open. If a network filesystem protocol can't handle operations (be it data or metadata) on an unlinked file, we must do sillirenaming, so that the file is not actually unlinked. Miklos ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch 1/2] VFS: new fgetattr() file operation 2007-09-24 13:06 ` Miklos Szeredi @ 2007-09-24 13:10 ` Christoph Hellwig 2007-09-24 13:18 ` Miklos Szeredi 2007-09-24 13:16 ` Matthew Wilcox 1 sibling, 1 reply; 17+ messages in thread From: Christoph Hellwig @ 2007-09-24 13:10 UTC (permalink / raw) To: Miklos Szeredi Cc: matthew, hch, trond.myklebust, adilger, linux-kernel, linux-fsdevel On Mon, Sep 24, 2007 at 03:06:06PM +0200, Miklos Szeredi wrote: > If a network filesystem protocol can't handle operations (be it data > or metadata) on an unlinked file, we must do sillirenaming, so that > the file is not actually unlinked. Or not support such a broken protocol at all. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch 1/2] VFS: new fgetattr() file operation 2007-09-24 13:10 ` Christoph Hellwig @ 2007-09-24 13:18 ` Miklos Szeredi 2007-09-24 13:27 ` Christoph Hellwig 0 siblings, 1 reply; 17+ messages in thread From: Miklos Szeredi @ 2007-09-24 13:18 UTC (permalink / raw) To: hch Cc: miklos, matthew, hch, trond.myklebust, adilger, linux-kernel, linux-fsdevel > > If a network filesystem protocol can't handle operations (be it data > > or metadata) on an unlinked file, we must do sillirenaming, so that > > the file is not actually unlinked. > > Or not support such a broken protocol at all. Wonder what people would say if we removed support for NFSv[23]. Just because a protocol does not support "perfect" UNIX semantics, it doesn't mean it's broken. By that standard almost all network filesystem protocols are severely broken. Miklos ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch 1/2] VFS: new fgetattr() file operation 2007-09-24 13:18 ` Miklos Szeredi @ 2007-09-24 13:27 ` Christoph Hellwig 2007-09-24 13:38 ` Miklos Szeredi 0 siblings, 1 reply; 17+ messages in thread From: Christoph Hellwig @ 2007-09-24 13:27 UTC (permalink / raw) To: Miklos Szeredi Cc: hch, matthew, trond.myklebust, adilger, linux-kernel, linux-fsdevel On Mon, Sep 24, 2007 at 03:18:10PM +0200, Miklos Szeredi wrote: > > Or not support such a broken protocol at all. > > Wonder what people would say if we removed support for NFSv[23]. > > Just because a protocol does not support "perfect" UNIX semantics, it > doesn't mean it's broken. By that standard almost all network > filesystem protocols are severely broken. Well, they are broken by these and other standards. At least nfs and cifs maintainers do the workarounds for this brokeness where they belong. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch 1/2] VFS: new fgetattr() file operation 2007-09-24 13:27 ` Christoph Hellwig @ 2007-09-24 13:38 ` Miklos Szeredi 0 siblings, 0 replies; 17+ messages in thread From: Miklos Szeredi @ 2007-09-24 13:38 UTC (permalink / raw) To: hch Cc: miklos, hch, matthew, trond.myklebust, adilger, linux-kernel, linux-fsdevel > On Mon, Sep 24, 2007 at 03:18:10PM +0200, Miklos Szeredi wrote: > > > Or not support such a broken protocol at all. > > > > Wonder what people would say if we removed support for NFSv[23]. > > > > Just because a protocol does not support "perfect" UNIX semantics, it > > doesn't mean it's broken. By that standard almost all network > > filesystem protocols are severely broken. > > Well, they are broken by these and other standards. At least nfs and > cifs maintainers do the workarounds for this brokeness where they belong. And my patch is not working around a problem, rather solving a problem in a correct way. Let me summarise it: There are valid reasons we have fstat() in addition to stat/lstat. For example we want to protect agains races involving unlink/rename on an open file. Say I want to implement a network filesystem with a pure unprivileged userspace sever (this is basically what sshfs does). I want my filesystem client implementation to keep all these advantages of fstat(). So how can I do that? There's a simple way: implement this operation with fstat() on the server. I get all the advantages on the remote system automatically. But for that the filesystem needs to have the open file that the fstat() on the client was performed on. It's that simple. There's really no ugly hacks going on behind the scenes. It's just that we do want to delegate some properties of this operation onto the server, and the simplest and best implementation is to just let the filesystem have the information it needs. Why is that such a big problem? Miklos ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch 1/2] VFS: new fgetattr() file operation 2007-09-24 13:06 ` Miklos Szeredi 2007-09-24 13:10 ` Christoph Hellwig @ 2007-09-24 13:16 ` Matthew Wilcox 2007-09-24 13:21 ` Miklos Szeredi 1 sibling, 1 reply; 17+ messages in thread From: Matthew Wilcox @ 2007-09-24 13:16 UTC (permalink / raw) To: Miklos Szeredi; +Cc: hch, trond.myklebust, adilger, linux-kernel, linux-fsdevel On Mon, Sep 24, 2007 at 03:06:06PM +0200, Miklos Szeredi wrote: > A file isn't deleted while there are still links or open files > refering to it. So getting the attributes for a file with nlink==0 is > perfectly valid while the file is still open. Is it? Why not just pretend that the attributes are wiped when the file is deleted. Effectively, they are, since they can't affect anything. > If a network filesystem protocol can't handle operations (be it data > or metadata) on an unlinked file, we must do sillirenaming, so that > the file is not actually unlinked. Or you could call getattr right before you unlink and cache the result in the client. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch 1/2] VFS: new fgetattr() file operation 2007-09-24 13:16 ` Matthew Wilcox @ 2007-09-24 13:21 ` Miklos Szeredi 0 siblings, 0 replies; 17+ messages in thread From: Miklos Szeredi @ 2007-09-24 13:21 UTC (permalink / raw) To: matthew; +Cc: miklos, hch, trond.myklebust, adilger, linux-kernel, linux-fsdevel > On Mon, Sep 24, 2007 at 03:06:06PM +0200, Miklos Szeredi wrote: > > A file isn't deleted while there are still links or open files > > refering to it. So getting the attributes for a file with nlink==0 is > > perfectly valid while the file is still open. > > Is it? Why not just pretend that the attributes are wiped when the file > is deleted. You mean "when finally unlinked"? Delete happens when the file is closed. > Effectively, they are, since they can't affect anything. Sure it can. It may be open on the server as well. > > If a network filesystem protocol can't handle operations (be it data > > or metadata) on an unlinked file, we must do sillirenaming, so that > > the file is not actually unlinked. > > Or you could call getattr right before you unlink and cache the result > in the client. The file can still be modified after being unlinked. And even if we did this caching thing and modify the attributes when the file is modified, it would not deal with access on the remote end, and would be much more complex than the other alternatives. Miklos ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch 1/2] VFS: new fgetattr() file operation 2007-09-24 12:48 ` Miklos Szeredi 2007-09-24 12:59 ` Matthew Wilcox @ 2007-09-24 13:42 ` Alan Cox 2007-09-24 13:40 ` Miklos Szeredi 1 sibling, 1 reply; 17+ messages in thread From: Alan Cox @ 2007-09-24 13:42 UTC (permalink / raw) To: Miklos Szeredi Cc: hch, miklos, trond.myklebust, adilger, linux-kernel, linux-fsdevel > But it's has various dawbacks, like rmdir doesn't work if there are > open files within an otherwise empty directory. > > I'd happily accept suggestions on how to deal with this differenty. NFS has that problem because it really has to sillyrename into the same directory. I don't see that ssh/sftp needs to do that. Instead it can sillyrename anywhere in the filesystem. Alan ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch 1/2] VFS: new fgetattr() file operation 2007-09-24 13:42 ` Alan Cox @ 2007-09-24 13:40 ` Miklos Szeredi 0 siblings, 0 replies; 17+ messages in thread From: Miklos Szeredi @ 2007-09-24 13:40 UTC (permalink / raw) To: alan Cc: miklos, hch, miklos, trond.myklebust, adilger, linux-kernel, linux-fsdevel > > But it's has various dawbacks, like rmdir doesn't work if there are > > open files within an otherwise empty directory. > > > > I'd happily accept suggestions on how to deal with this differenty. > > NFS has that problem because it really has to sillyrename into the same > directory. I don't see that ssh/sftp needs to do that. Instead it can > sillyrename anywhere in the filesystem. I don't think it can. How can we find in a reliable way another directory, which is writable by the user? Miklos ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2007-10-01 18:07 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-09-24 12:24 [patch 1/2] VFS: new fgetattr() file operation Miklos Szeredi 2007-09-24 12:25 ` [patch 2/2] VFS: allow filesystem to override mknod capability checks Miklos Szeredi 2007-09-24 12:38 ` Christoph Hellwig 2007-09-28 2:40 ` Neil Brown 2007-10-01 18:06 ` Miklos Szeredi 2007-09-24 12:36 ` [patch 1/2] VFS: new fgetattr() file operation Christoph Hellwig 2007-09-24 12:48 ` Miklos Szeredi 2007-09-24 12:59 ` Matthew Wilcox 2007-09-24 13:06 ` Miklos Szeredi 2007-09-24 13:10 ` Christoph Hellwig 2007-09-24 13:18 ` Miklos Szeredi 2007-09-24 13:27 ` Christoph Hellwig 2007-09-24 13:38 ` Miklos Szeredi 2007-09-24 13:16 ` Matthew Wilcox 2007-09-24 13:21 ` Miklos Szeredi 2007-09-24 13:42 ` Alan Cox 2007-09-24 13:40 ` Miklos Szeredi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).