[patch 00/13] vfs: add helpers to check r/o bind mounts

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [patch 00/13] vfs: add helpers to check r/o bind mounts
@ 2008-04-24 11:39 Miklos Szeredi
  2008-04-24 11:39 ` [patch 01/13] ecryptfs: add missing lock around notify_change Miklos Szeredi
                   ` (15 more replies)
  0 siblings, 16 replies; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 11:39 UTC (permalink / raw)
  To: viro; +Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

Patches 1-3 are cleanups/bugfixes preceding the main series.  The
description is in 4/13.  This is against latest git.

<rant>

R/O bind mounts patches have been reviewed numerous times.  Still a
relatively large number of places remained where checking for R/O
mounts was missed.

Then I did this series, which basically guarantees, that that cannot
happen.  Al rejected it, and rather fixed some of the remaining
places.  He still missed several, which sort of proves my point.

I think it's totally pointless to continue trying to fix the symptoms
instead of getting at the root of the problem.

I know that VFS interfaces are a sensitive question, but it would be
nice it we could have some sanity back in this discussion.

</rant>

Thanks,
Miklos

--

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [patch 01/13] ecryptfs: add missing lock around notify_change
  2008-04-24 11:39 [patch 00/13] vfs: add helpers to check r/o bind mounts Miklos Szeredi
@ 2008-04-24 11:39 ` Miklos Szeredi
  2008-04-24 16:56   ` Erez Zadok
  2008-04-24 11:39 ` [patch 02/13] ecryptfs: clean up (un)lock_parent Miklos Szeredi
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 11:39 UTC (permalink / raw)
  To: viro; +Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

[-- Attachment #1: ecryptfs_locking_fix.patch --]
[-- Type: text/plain, Size: 834 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Caller of notify_change() need to hold i_mutex.

Compile tested only.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/ecryptfs/inode.c |    2 ++
 1 file changed, 2 insertions(+)

Index: vfs-2.6/fs/ecryptfs/inode.c
===================================================================
--- vfs-2.6.orig/fs/ecryptfs/inode.c	2008-04-02 18:47:17.000000000 +0200
+++ vfs-2.6/fs/ecryptfs/inode.c	2008-04-02 18:49:18.000000000 +0200
@@ -908,7 +908,9 @@ static int ecryptfs_setattr(struct dentr
 	if (ia->ia_valid & (ATTR_KILL_SUID | ATTR_KILL_SGID))
 		ia->ia_valid &= ~ATTR_MODE;
 
+	mutex_lock(&lower_dentry->d_inode->i_mutex);
 	rc = notify_change(lower_dentry, ia);
+	mutex_unlock(&lower_dentry->d_inode->i_mutex);
 out:
 	fsstack_copy_attr_all(inode, lower_inode, NULL);
 	return rc;

--

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [patch 02/13] ecryptfs: clean up (un)lock_parent
  2008-04-24 11:39 [patch 00/13] vfs: add helpers to check r/o bind mounts Miklos Szeredi
  2008-04-24 11:39 ` [patch 01/13] ecryptfs: add missing lock around notify_change Miklos Szeredi
@ 2008-04-24 11:39 ` Miklos Szeredi
  2008-04-24 11:39 ` [patch 03/13] nfsd: clean up mnt_want_write calls Miklos Szeredi
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 11:39 UTC (permalink / raw)
  To: viro; +Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

[-- Attachment #1: ecryptfs_cleanup.patch --]
[-- Type: text/plain, Size: 1747 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

dget(dentry->d_parent) --> dget_parent(dentry)

unlock_parent() is racy and unnecessary.  Replace single caller with
unlock_dir().

There are several other suspect uses of ->d_parent in ecryptfs...

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/ecryptfs/inode.c |   13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

Index: vfs-2.6/fs/ecryptfs/inode.c
===================================================================
--- vfs-2.6.orig/fs/ecryptfs/inode.c	2008-04-23 18:50:29.000000000 +0200
+++ vfs-2.6/fs/ecryptfs/inode.c	2008-04-23 19:00:19.000000000 +0200
@@ -37,17 +37,11 @@ static struct dentry *lock_parent(struct
 {
 	struct dentry *dir;
 
-	dir = dget(dentry->d_parent);
+	dir = dget_parent(dentry);
 	mutex_lock_nested(&(dir->d_inode->i_mutex), I_MUTEX_PARENT);
 	return dir;
 }
 
-static void unlock_parent(struct dentry *dentry)
-{
-	mutex_unlock(&(dentry->d_parent->d_inode->i_mutex));
-	dput(dentry->d_parent);
-}
-
 static void unlock_dir(struct dentry *dir)
 {
 	mutex_unlock(&dir->d_inode->i_mutex);
@@ -426,8 +420,9 @@ static int ecryptfs_unlink(struct inode 
 	int rc = 0;
 	struct dentry *lower_dentry = ecryptfs_dentry_to_lower(dentry);
 	struct inode *lower_dir_inode = ecryptfs_inode_to_lower(dir);
+	struct dentry *lower_dir_dentry;
 
-	lock_parent(lower_dentry);
+	lower_dir_dentry = lock_parent(lower_dentry);
 	rc = vfs_unlink(lower_dir_inode, lower_dentry);
 	if (rc) {
 		printk(KERN_ERR "Error in vfs_unlink; rc = [%d]\n", rc);
@@ -439,7 +434,7 @@ static int ecryptfs_unlink(struct inode 
 	dentry->d_inode->i_ctime = dir->i_ctime;
 	d_drop(dentry);
 out_unlock:
-	unlock_parent(lower_dentry);
+	unlock_dir(lower_dir_dentry);
 	return rc;
 }
 

--

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [patch 03/13] nfsd: clean up mnt_want_write calls
  2008-04-24 11:39 [patch 00/13] vfs: add helpers to check r/o bind mounts Miklos Szeredi
  2008-04-24 11:39 ` [patch 01/13] ecryptfs: add missing lock around notify_change Miklos Szeredi
  2008-04-24 11:39 ` [patch 02/13] ecryptfs: clean up (un)lock_parent Miklos Szeredi
@ 2008-04-24 11:39 ` Miklos Szeredi
  2008-04-24 11:39 ` [patch 04/13] vfs: add path_create() and path_mknod() Miklos Szeredi
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 11:39 UTC (permalink / raw)
  To: viro; +Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

[-- Attachment #1: nfsd_create_cleanup.patch --]
[-- Type: text/plain, Size: 1952 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Multiple mnt_want_write() calls in the switch statement looks really
ugly.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/nfsd/vfs.c |   24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

Index: vfs-2.6/fs/nfsd/vfs.c
===================================================================
--- vfs-2.6.orig/fs/nfsd/vfs.c	2008-04-23 18:14:14.000000000 +0200
+++ vfs-2.6/fs/nfsd/vfs.c	2008-04-23 18:50:29.000000000 +0200
@@ -1249,36 +1249,33 @@ nfsd_create(struct svc_rqst *rqstp, stru
 		iap->ia_mode = 0;
 	iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
 
+	err = nfserr_inval;
+	if (!S_ISREG(type) && !S_ISDIR(type) && !special_file(type)) {
+		printk("nfsd: bad file type %o in nfsd_create\n", type);
+		goto out;
+	}
+
+	host_err = mnt_want_write(fhp->fh_export->ex_path.mnt);
+	if (host_err)
+		goto out_nfserr;
+
 	/*
 	 * Get the dir op function pointer.
 	 */
 	err = 0;
 	switch (type) {
 	case S_IFREG:
-		host_err = mnt_want_write(fhp->fh_export->ex_path.mnt);
-		if (host_err)
-			goto out_nfserr;
 		host_err = vfs_create(dirp, dchild, iap->ia_mode, NULL);
 		break;
 	case S_IFDIR:
-		host_err = mnt_want_write(fhp->fh_export->ex_path.mnt);
-		if (host_err)
-			goto out_nfserr;
 		host_err = vfs_mkdir(dirp, dchild, iap->ia_mode);
 		break;
 	case S_IFCHR:
 	case S_IFBLK:
 	case S_IFIFO:
 	case S_IFSOCK:
-		host_err = mnt_want_write(fhp->fh_export->ex_path.mnt);
-		if (host_err)
-			goto out_nfserr;
 		host_err = vfs_mknod(dirp, dchild, iap->ia_mode, rdev);
 		break;
-	default:
-	        printk("nfsd: bad file type %o in nfsd_create\n", type);
-		host_err = -EINVAL;
-		goto out_nfserr;
 	}
 	if (host_err < 0) {
 		mnt_drop_write(fhp->fh_export->ex_path.mnt);
@@ -1290,7 +1287,6 @@ nfsd_create(struct svc_rqst *rqstp, stru
 		write_inode_now(dchild->d_inode, 1);
 	}
 
-
 	err2 = nfsd_create_setattr(rqstp, resfhp, iap);
 	if (err2)
 		err = err2;

--

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [patch 04/13] vfs: add path_create() and path_mknod()
  2008-04-24 11:39 [patch 00/13] vfs: add helpers to check r/o bind mounts Miklos Szeredi
                   ` (2 preceding siblings ...)
  2008-04-24 11:39 ` [patch 03/13] nfsd: clean up mnt_want_write calls Miklos Szeredi
@ 2008-04-24 11:39 ` Miklos Szeredi
  2008-04-24 11:39 ` [patch 05/13] vfs: add path_mkdir() Miklos Szeredi
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 11:39 UTC (permalink / raw)
  To: viro; +Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

[-- Attachment #1: path_create.patch --]
[-- Type: text/plain, Size: 12913 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

R/O bind mounts require operations which modify the filesystem to be
wrapped in mnt_want_write()/mnt_drop_write().  Create helpers which do
this, so callers won't need to bother, and more importantly, cannot
forget!  Call these path_*, analogous to vfs_*.  Where there are no
callers of vfs_* left, make them static.

Overall this patchset is just 15 lines in the red, but at the same
time it fixes several places in nfsd and the whole of ecryptfs, where
the mnt_want_write/drop_write() calls were mssing.  So it's definitely
a cleanup.

It will also help with merging certain security modules, which need to
know the path within the namespace, and not just within the
filesystem.  These helpers will allow the security hooks to be in a
common place, and need not be repeated in all callers.

As I understood Al Viro's main objection is that this is not a good
interface, because callers should not be forced to have a vfsmount
available.  This is bogus for two reasons:

 1) The dentry_open() interface already requires a vfsmount, so any
 callers operating on the filesystem and wanting to do file I/O would
 necessarily have the vfsmount available.  Theoretically it would be
 possible that the caller doesn't want to do file I/O only
 create/remove/*attr, but this is unlikely.

 2) Many benefits of the R/O bind mounts would disappear if we'd allow
 callers to bypass the per-mount R/O checks.  Which means, that
 callers do have to have the vfsmount available to do these checks
 anyway.

Note, that callers might want to do mnt_want_write/drop_write() around
multiple operation to make them atomic wrt. remounting read-only.  In
that case checking within the VFS is superfluous, but it should not
have any noticable performance impact.


This patch:

Introduce path_create() and path_mknod().  Make vfs_create() and
vfs_mknod() static.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/ecryptfs/inode.c |   33 ++++++++++------------
 fs/namei.c          |   75 +++++++++++++++++++++++++++-------------------------
 fs/nfsd/vfs.c       |   19 +++++++++----
 include/linux/fs.h  |    4 +-
 ipc/mqueue.c        |    6 +++-
 net/unix/af_unix.c  |    6 ----
 6 files changed, 76 insertions(+), 67 deletions(-)

Index: vfs-2.6/fs/namei.c
===================================================================
--- vfs-2.6.orig/fs/namei.c	2008-04-23 21:30:25.000000000 +0200
+++ vfs-2.6/fs/namei.c	2008-04-24 12:00:28.000000000 +0200
@@ -1574,7 +1574,7 @@ void unlock_rename(struct dentry *p1, st
 	}
 }
 
-int vfs_create(struct inode *dir, struct dentry *dentry, int mode,
+static int vfs_create(struct inode *dir, struct dentry *dentry, int mode,
 		struct nameidata *nd)
 {
 	int error = may_create(dir, dentry, nd);
@@ -1596,6 +1596,20 @@ int vfs_create(struct inode *dir, struct
 	return error;
 }
 
+int path_create(struct path *dir_path, struct dentry *dentry, int mode,
+		struct nameidata *nd)
+{
+	int error = mnt_want_write(dir_path->mnt);
+
+	if (!error) {
+		error = vfs_create(dir_path->dentry->d_inode, dentry, mode, nd);
+		mnt_drop_write(dir_path->mnt);
+	}
+
+	return error;
+}
+EXPORT_SYMBOL(path_create);
+
 int may_open(struct nameidata *nd, int acc_mode, int flag)
 {
 	struct dentry *dentry = nd->path.dentry;
@@ -2015,7 +2029,7 @@ fail:
 }
 EXPORT_SYMBOL_GPL(lookup_create);
 
-int vfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
+static int vfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
 {
 	int error = may_create(dir, dentry, NULL);
 
@@ -2039,22 +2053,19 @@ int vfs_mknod(struct inode *dir, struct 
 	return error;
 }
 
-static int may_mknod(mode_t mode)
+int path_mknod(struct path *dir_path, struct dentry *dentry, int mode,
+	       dev_t dev)
 {
-	switch (mode & S_IFMT) {
-	case S_IFREG:
-	case S_IFCHR:
-	case S_IFBLK:
-	case S_IFIFO:
-	case S_IFSOCK:
-	case 0: /* zero mode translates to S_IFREG */
-		return 0;
-	case S_IFDIR:
-		return -EPERM;
-	default:
-		return -EINVAL;
+	int error = mnt_want_write(dir_path->mnt);
+
+	if (!error) {
+		error = vfs_mknod(dir_path->dentry->d_inode, dentry, mode, dev);
+		mnt_drop_write(dir_path->mnt);
 	}
+
+	return error;
 }
+EXPORT_SYMBOL(path_mknod);
 
 asmlinkage long sys_mknodat(int dfd, const char __user *filename, int mode,
 				unsigned dev)
@@ -2080,26 +2091,22 @@ asmlinkage long sys_mknodat(int dfd, con
 	}
 	if (!IS_POSIXACL(nd.path.dentry->d_inode))
 		mode &= ~current->fs->umask;
-	error = may_mknod(mode);
-	if (error)
-		goto out_dput;
-	error = mnt_want_write(nd.path.mnt);
-	if (error)
-		goto out_dput;
 	switch (mode & S_IFMT) {
-		case 0: case S_IFREG:
-			error = vfs_create(nd.path.dentry->d_inode,dentry,mode,&nd);
-			break;
-		case S_IFCHR: case S_IFBLK:
-			error = vfs_mknod(nd.path.dentry->d_inode,dentry,mode,
-					new_decode_dev(dev));
-			break;
-		case S_IFIFO: case S_IFSOCK:
-			error = vfs_mknod(nd.path.dentry->d_inode,dentry,mode,0);
-			break;
+	case 0: case S_IFREG:
+		error = path_create(&nd.path, dentry, mode, &nd);
+		break;
+	case S_IFCHR: case S_IFBLK:
+		error = path_mknod(&nd.path, dentry, mode, new_decode_dev(dev));
+		break;
+	case S_IFIFO: case S_IFSOCK:
+		error = path_mknod(&nd.path, dentry, mode, 0);
+		break;
+	case S_IFDIR:
+		error = -EPERM;
+		break;
+	default:
+		error = -EINVAL;
 	}
-	mnt_drop_write(nd.path.mnt);
-out_dput:
 	dput(dentry);
 out_unlock:
 	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
@@ -2966,11 +2973,9 @@ EXPORT_SYMBOL(permission);
 EXPORT_SYMBOL(vfs_permission);
 EXPORT_SYMBOL(file_permission);
 EXPORT_SYMBOL(unlock_rename);
-EXPORT_SYMBOL(vfs_create);
 EXPORT_SYMBOL(vfs_follow_link);
 EXPORT_SYMBOL(vfs_link);
 EXPORT_SYMBOL(vfs_mkdir);
-EXPORT_SYMBOL(vfs_mknod);
 EXPORT_SYMBOL(generic_permission);
 EXPORT_SYMBOL(vfs_readlink);
 EXPORT_SYMBOL(vfs_rename);
Index: vfs-2.6/include/linux/fs.h
===================================================================
--- vfs-2.6.orig/include/linux/fs.h	2008-04-23 21:30:25.000000000 +0200
+++ vfs-2.6/include/linux/fs.h	2008-04-24 12:00:28.000000000 +0200
@@ -1123,9 +1123,9 @@ extern void unlock_super(struct super_bl
  * VFS helper functions..
  */
 extern int vfs_permission(struct nameidata *, int);
-extern int vfs_create(struct inode *, struct dentry *, int, struct nameidata *);
+extern int path_create(struct path *, struct dentry *, int, struct nameidata *);
 extern int vfs_mkdir(struct inode *, struct dentry *, int);
-extern int vfs_mknod(struct inode *, struct dentry *, int, dev_t);
+extern int path_mknod(struct path *, struct dentry *, int, dev_t);
 extern int vfs_symlink(struct inode *, struct dentry *, const char *, int);
 extern int vfs_link(struct dentry *, struct inode *, struct dentry *);
 extern int vfs_rmdir(struct inode *, struct dentry *);
Index: vfs-2.6/fs/ecryptfs/inode.c
===================================================================
--- vfs-2.6.orig/fs/ecryptfs/inode.c	2008-04-23 21:30:28.000000000 +0200
+++ vfs-2.6/fs/ecryptfs/inode.c	2008-04-24 12:00:28.000000000 +0200
@@ -61,23 +61,19 @@ static void unlock_dir(struct dentry *di
  * Returns zero on success; non-zero on error condition
  */
 static int
-ecryptfs_create_underlying_file(struct inode *lower_dir_inode,
+ecryptfs_create_underlying_file(struct dentry *lower_dir_dentry,
 				struct dentry *dentry, int mode,
 				struct nameidata *nd)
 {
 	struct dentry *lower_dentry = ecryptfs_dentry_to_lower(dentry);
-	struct vfsmount *lower_mnt = ecryptfs_dentry_to_lower_mnt(dentry);
-	struct dentry *dentry_save;
-	struct vfsmount *vfsmount_save;
+	struct path save;
 	int rc;
 
-	dentry_save = nd->path.dentry;
-	vfsmount_save = nd->path.mnt;
-	nd->path.dentry = lower_dentry;
-	nd->path.mnt = lower_mnt;
-	rc = vfs_create(lower_dir_inode, lower_dentry, mode, nd);
-	nd->path.dentry = dentry_save;
-	nd->path.mnt = vfsmount_save;
+	save = nd->path;
+	nd->path.mnt = ecryptfs_dentry_to_lower_mnt(dentry);
+	nd->path.dentry = lower_dir_dentry;
+	rc = path_create(&nd->path, lower_dentry, mode, nd);
+	nd->path = save;
 	return rc;
 }
 
@@ -111,7 +107,7 @@ ecryptfs_do_create(struct inode *directo
 		rc = PTR_ERR(lower_dir_dentry);
 		goto out;
 	}
-	rc = ecryptfs_create_underlying_file(lower_dir_dentry->d_inode,
+	rc = ecryptfs_create_underlying_file(lower_dir_dentry,
 					     ecryptfs_dentry, mode, nd);
 	if (rc) {
 		printk(KERN_ERR "%s: Failure to create dentry in lower fs; "
@@ -530,20 +526,21 @@ ecryptfs_mknod(struct inode *dir, struct
 {
 	int rc;
 	struct dentry *lower_dentry;
-	struct dentry *lower_dir_dentry;
+	struct path lower_dir;
 
 	lower_dentry = ecryptfs_dentry_to_lower(dentry);
-	lower_dir_dentry = lock_parent(lower_dentry);
-	rc = vfs_mknod(lower_dir_dentry->d_inode, lower_dentry, mode, dev);
+	lower_dir.mnt = ecryptfs_dentry_to_lower_mnt(dentry);
+	lower_dir.dentry = lock_parent(lower_dentry);
+	rc = path_mknod(&lower_dir, lower_dentry, mode, dev);
 	if (rc || !lower_dentry->d_inode)
 		goto out;
 	rc = ecryptfs_interpose(lower_dentry, dentry, dir->i_sb, 0);
 	if (rc)
 		goto out;
-	fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
-	fsstack_copy_inode_size(dir, lower_dir_dentry->d_inode);
+	fsstack_copy_attr_times(dir, lower_dir.dentry->d_inode);
+	fsstack_copy_inode_size(dir, lower_dir.dentry->d_inode);
 out:
-	unlock_dir(lower_dir_dentry);
+	unlock_dir(lower_dir.dentry);
 	if (!dentry->d_inode)
 		d_drop(dentry);
 	return rc;
Index: vfs-2.6/ipc/mqueue.c
===================================================================
--- vfs-2.6.orig/ipc/mqueue.c	2008-04-23 21:30:25.000000000 +0200
+++ vfs-2.6/ipc/mqueue.c	2008-04-24 12:00:28.000000000 +0200
@@ -599,6 +599,7 @@ static struct file *do_create(struct den
 {
 	struct mq_attr attr;
 	struct file *result;
+	struct path dir_path;
 	int ret;
 
 	if (u_attr) {
@@ -616,7 +617,10 @@ static struct file *do_create(struct den
 	ret = mnt_want_write(mqueue_mnt);
 	if (ret)
 		goto out;
-	ret = vfs_create(dir->d_inode, dentry, mode, NULL);
+
+	dir_path.mnt = mqueue_mnt;
+	dir_path.dentry = dir;
+	ret = path_create(&dir_path, dentry, mode, NULL);
 	dentry->d_fsdata = NULL;
 	if (ret)
 		goto out_drop_write;
Index: vfs-2.6/net/unix/af_unix.c
===================================================================
--- vfs-2.6.orig/net/unix/af_unix.c	2008-04-23 21:30:25.000000000 +0200
+++ vfs-2.6/net/unix/af_unix.c	2008-04-23 21:30:34.000000000 +0200
@@ -819,11 +819,7 @@ static int unix_bind(struct socket *sock
 		 */
 		mode = S_IFSOCK |
 		       (SOCK_INODE(sock)->i_mode & ~current->fs->umask);
-		err = mnt_want_write(nd.path.mnt);
-		if (err)
-			goto out_mknod_dput;
-		err = vfs_mknod(nd.path.dentry->d_inode, dentry, mode, 0);
-		mnt_drop_write(nd.path.mnt);
+		err = path_mknod(&nd.path, dentry, mode, 0);
 		if (err)
 			goto out_mknod_dput;
 		mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
Index: vfs-2.6/fs/nfsd/vfs.c
===================================================================
--- vfs-2.6.orig/fs/nfsd/vfs.c	2008-04-23 21:30:28.000000000 +0200
+++ vfs-2.6/fs/nfsd/vfs.c	2008-04-24 12:00:28.000000000 +0200
@@ -130,6 +130,12 @@ out:
 	return err;
 }
 
+static void fh_to_path(struct svc_fh *fhp, struct path *path)
+{
+	path->dentry = fhp->fh_dentry;
+	path->mnt = fhp->fh_export->ex_path.mnt;
+}
+
 __be32
 nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		   const char *name, unsigned int len,
@@ -1185,6 +1191,7 @@ nfsd_create(struct svc_rqst *rqstp, stru
 		char *fname, int flen, struct iattr *iap,
 		int type, dev_t rdev, struct svc_fh *resfhp)
 {
+	struct path	dir_path;
 	struct dentry	*dentry, *dchild = NULL;
 	struct inode	*dirp;
 	__be32		err;
@@ -1259,13 +1266,11 @@ nfsd_create(struct svc_rqst *rqstp, stru
 	if (host_err)
 		goto out_nfserr;
 
-	/*
-	 * Get the dir op function pointer.
-	 */
+	fh_to_path(fhp, &dir_path);
 	err = 0;
 	switch (type) {
 	case S_IFREG:
-		host_err = vfs_create(dirp, dchild, iap->ia_mode, NULL);
+		host_err = path_create(&dir_path, dchild, iap->ia_mode, NULL);
 		break;
 	case S_IFDIR:
 		host_err = vfs_mkdir(dirp, dchild, iap->ia_mode);
@@ -1274,7 +1279,7 @@ nfsd_create(struct svc_rqst *rqstp, stru
 	case S_IFBLK:
 	case S_IFIFO:
 	case S_IFSOCK:
-		host_err = vfs_mknod(dirp, dchild, iap->ia_mode, rdev);
+		host_err = path_mknod(&dir_path, dchild, iap->ia_mode, rdev);
 		break;
 	}
 	if (host_err < 0) {
@@ -1316,6 +1321,7 @@ nfsd_create_v3(struct svc_rqst *rqstp, s
 		struct svc_fh *resfhp, int createmode, u32 *verifier,
 	        int *truncp, int *created)
 {
+	struct path 	dir_path;
 	struct dentry	*dentry, *dchild = NULL;
 	struct inode	*dirp;
 	__be32		err;
@@ -1406,7 +1412,8 @@ nfsd_create_v3(struct svc_rqst *rqstp, s
 		goto out;
 	}
 
-	host_err = vfs_create(dirp, dchild, iap->ia_mode, NULL);
+	fh_to_path(fhp, &dir_path);
+	host_err = path_create(&dir_path, dchild, iap->ia_mode, NULL);
 	if (host_err < 0) {
 		mnt_drop_write(fhp->fh_export->ex_path.mnt);
 		goto out_nfserr;

--

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [patch 05/13] vfs: add path_mkdir()
  2008-04-24 11:39 [patch 00/13] vfs: add helpers to check r/o bind mounts Miklos Szeredi
                   ` (3 preceding siblings ...)
  2008-04-24 11:39 ` [patch 04/13] vfs: add path_create() and path_mknod() Miklos Szeredi
@ 2008-04-24 11:39 ` Miklos Szeredi
  2008-04-24 11:39 ` [patch 06/13] vfs: add path_rmdir() Miklos Szeredi
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 11:39 UTC (permalink / raw)
  To: viro; +Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

[-- Attachment #1: path_mkdir.patch --]
[-- Type: text/plain, Size: 5192 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Introduce path_mkdir().  Unexport vfs_mkdir() to modules.

The only internal user of vfs_mkdir() remaining is cgroup_clone().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/ecryptfs/inode.c   |   15 ++++++++-------
 fs/namei.c            |   21 ++++++++++++++-------
 fs/nfsd/nfs4recover.c |    6 +-----
 fs/nfsd/vfs.c         |    2 +-
 include/linux/fs.h    |    1 +
 5 files changed, 25 insertions(+), 20 deletions(-)

Index: vfs-2.6/fs/ecryptfs/inode.c
===================================================================
--- vfs-2.6.orig/fs/ecryptfs/inode.c	2008-04-23 19:10:13.000000000 +0200
+++ vfs-2.6/fs/ecryptfs/inode.c	2008-04-23 19:49:52.000000000 +0200
@@ -478,21 +478,22 @@ static int ecryptfs_mkdir(struct inode *
 {
 	int rc;
 	struct dentry *lower_dentry;
-	struct dentry *lower_dir_dentry;
+	struct path lower_dir;
 
 	lower_dentry = ecryptfs_dentry_to_lower(dentry);
-	lower_dir_dentry = lock_parent(lower_dentry);
-	rc = vfs_mkdir(lower_dir_dentry->d_inode, lower_dentry, mode);
+	lower_dir.mnt = ecryptfs_dentry_to_lower_mnt(dentry);
+	lower_dir.dentry = lock_parent(lower_dentry);
+	rc = path_mkdir(&lower_dir, lower_dentry, mode);
 	if (rc || !lower_dentry->d_inode)
 		goto out;
 	rc = ecryptfs_interpose(lower_dentry, dentry, dir->i_sb, 0);
 	if (rc)
 		goto out;
-	fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
-	fsstack_copy_inode_size(dir, lower_dir_dentry->d_inode);
-	dir->i_nlink = lower_dir_dentry->d_inode->i_nlink;
+	fsstack_copy_attr_times(dir, lower_dir.dentry->d_inode);
+	fsstack_copy_inode_size(dir, lower_dir.dentry->d_inode);
+	dir->i_nlink = lower_dir.dentry->d_inode->i_nlink;
 out:
-	unlock_dir(lower_dir_dentry);
+	unlock_dir(lower_dir.dentry);
 	if (!dentry->d_inode)
 		d_drop(dentry);
 	return rc;
Index: vfs-2.6/fs/namei.c
===================================================================
--- vfs-2.6.orig/fs/namei.c	2008-04-23 19:10:13.000000000 +0200
+++ vfs-2.6/fs/namei.c	2008-04-23 19:10:17.000000000 +0200
@@ -2144,6 +2144,19 @@ int vfs_mkdir(struct inode *dir, struct 
 	return error;
 }
 
+int path_mkdir(struct path *dir_path, struct dentry *dentry, int mode)
+{
+	int error = mnt_want_write(dir_path->mnt);
+
+	if (!error) {
+		error = vfs_mkdir(dir_path->dentry->d_inode, dentry, mode);
+		mnt_drop_write(dir_path->mnt);
+	}
+
+	return error;
+}
+EXPORT_SYMBOL(path_mkdir);
+
 asmlinkage long sys_mkdirat(int dfd, const char __user *pathname, int mode)
 {
 	int error = 0;
@@ -2166,12 +2179,7 @@ asmlinkage long sys_mkdirat(int dfd, con
 
 	if (!IS_POSIXACL(nd.path.dentry->d_inode))
 		mode &= ~current->fs->umask;
-	error = mnt_want_write(nd.path.mnt);
-	if (error)
-		goto out_dput;
-	error = vfs_mkdir(nd.path.dentry->d_inode, dentry, mode);
-	mnt_drop_write(nd.path.mnt);
-out_dput:
+	error = path_mkdir(&nd.path, dentry, mode);
 	dput(dentry);
 out_unlock:
 	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
@@ -2975,7 +2983,6 @@ EXPORT_SYMBOL(file_permission);
 EXPORT_SYMBOL(unlock_rename);
 EXPORT_SYMBOL(vfs_follow_link);
 EXPORT_SYMBOL(vfs_link);
-EXPORT_SYMBOL(vfs_mkdir);
 EXPORT_SYMBOL(generic_permission);
 EXPORT_SYMBOL(vfs_readlink);
 EXPORT_SYMBOL(vfs_rename);
Index: vfs-2.6/fs/nfsd/nfs4recover.c
===================================================================
--- vfs-2.6.orig/fs/nfsd/nfs4recover.c	2008-04-23 18:58:33.000000000 +0200
+++ vfs-2.6/fs/nfsd/nfs4recover.c	2008-04-23 19:10:17.000000000 +0200
@@ -155,11 +155,7 @@ nfsd4_create_clid_dir(struct nfs4_client
 		dprintk("NFSD: nfsd4_create_clid_dir: DIRECTORY EXISTS\n");
 		goto out_put;
 	}
-	status = mnt_want_write(rec_dir.path.mnt);
-	if (status)
-		goto out_put;
-	status = vfs_mkdir(rec_dir.path.dentry->d_inode, dentry, S_IRWXU);
-	mnt_drop_write(rec_dir.path.mnt);
+	status = path_mkdir(&rec_dir.path, dentry, S_IRWXU);
 out_put:
 	dput(dentry);
 out_unlock:
Index: vfs-2.6/fs/nfsd/vfs.c
===================================================================
--- vfs-2.6.orig/fs/nfsd/vfs.c	2008-04-23 19:10:13.000000000 +0200
+++ vfs-2.6/fs/nfsd/vfs.c	2008-04-23 19:10:17.000000000 +0200
@@ -1273,7 +1273,7 @@ nfsd_create(struct svc_rqst *rqstp, stru
 		host_err = path_create(&dir_path, dchild, iap->ia_mode, NULL);
 		break;
 	case S_IFDIR:
-		host_err = vfs_mkdir(dirp, dchild, iap->ia_mode);
+		host_err = path_mkdir(&dir_path, dchild, iap->ia_mode);
 		break;
 	case S_IFCHR:
 	case S_IFBLK:
Index: vfs-2.6/include/linux/fs.h
===================================================================
--- vfs-2.6.orig/include/linux/fs.h	2008-04-23 19:10:13.000000000 +0200
+++ vfs-2.6/include/linux/fs.h	2008-04-23 19:10:17.000000000 +0200
@@ -1125,6 +1125,7 @@ extern void unlock_super(struct super_bl
 extern int vfs_permission(struct nameidata *, int);
 extern int path_create(struct path *, struct dentry *, int, struct nameidata *);
 extern int vfs_mkdir(struct inode *, struct dentry *, int);
+extern int path_mkdir(struct path *, struct dentry *, int);
 extern int path_mknod(struct path *, struct dentry *, int, dev_t);
 extern int vfs_symlink(struct inode *, struct dentry *, const char *, int);
 extern int vfs_link(struct dentry *, struct inode *, struct dentry *);

--

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [patch 06/13] vfs: add path_rmdir()
  2008-04-24 11:39 [patch 00/13] vfs: add helpers to check r/o bind mounts Miklos Szeredi
                   ` (4 preceding siblings ...)
  2008-04-24 11:39 ` [patch 05/13] vfs: add path_mkdir() Miklos Szeredi
@ 2008-04-24 11:39 ` Miklos Szeredi
  2008-04-24 11:39 ` [patch 07/13] vfs: add path_unlink() Miklos Szeredi
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 11:39 UTC (permalink / raw)
  To: viro; +Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

[-- Attachment #1: path_rmdir.patch --]
[-- Type: text/plain, Size: 5245 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Introduce path_rmdir().

The only user of vfs_rmdir() remaining is reiserfs_delete_xattrs().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/ecryptfs/inode.c   |   13 +++++++------
 fs/namei.c            |   20 ++++++++++++++------
 fs/nfsd/nfs4recover.c |    3 ++-
 fs/nfsd/vfs.c         |    4 +++-
 include/linux/fs.h    |    1 +
 5 files changed, 27 insertions(+), 14 deletions(-)

Index: vfs-2.6/fs/namei.c
===================================================================
--- vfs-2.6.orig/fs/namei.c	2008-04-23 20:04:28.000000000 +0200
+++ vfs-2.6/fs/namei.c	2008-04-23 20:04:42.000000000 +0200
@@ -2255,6 +2255,19 @@ int vfs_rmdir(struct inode *dir, struct 
 	return error;
 }
 
+int path_rmdir(struct path *dir_path, struct dentry *dentry)
+{
+	int error = mnt_want_write(dir_path->mnt);
+
+	if (!error) {
+		error = vfs_rmdir(dir_path->dentry->d_inode, dentry);
+		mnt_drop_write(dir_path->mnt);
+	}
+
+	return error;
+}
+EXPORT_SYMBOL(path_rmdir);
+
 static long do_rmdir(int dfd, const char __user *pathname)
 {
 	int error = 0;
@@ -2286,12 +2299,7 @@ static long do_rmdir(int dfd, const char
 	error = PTR_ERR(dentry);
 	if (IS_ERR(dentry))
 		goto exit2;
-	error = mnt_want_write(nd.path.mnt);
-	if (error)
-		goto exit3;
-	error = vfs_rmdir(nd.path.dentry->d_inode, dentry);
-	mnt_drop_write(nd.path.mnt);
-exit3:
+	error = path_rmdir(&nd.path, dentry);
 	dput(dentry);
 exit2:
 	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
Index: vfs-2.6/fs/ecryptfs/inode.c
===================================================================
--- vfs-2.6.orig/fs/ecryptfs/inode.c	2008-04-23 20:04:28.000000000 +0200
+++ vfs-2.6/fs/ecryptfs/inode.c	2008-04-23 20:04:42.000000000 +0200
@@ -502,20 +502,21 @@ out:
 static int ecryptfs_rmdir(struct inode *dir, struct dentry *dentry)
 {
 	struct dentry *lower_dentry;
-	struct dentry *lower_dir_dentry;
+	struct path lower_dir;
 	int rc;
 
 	lower_dentry = ecryptfs_dentry_to_lower(dentry);
 	dget(dentry);
-	lower_dir_dentry = lock_parent(lower_dentry);
+	lower_dir.mnt = ecryptfs_dentry_to_lower_mnt(dentry);
+	lower_dir.dentry = lock_parent(lower_dentry);
 	dget(lower_dentry);
-	rc = vfs_rmdir(lower_dir_dentry->d_inode, lower_dentry);
+	rc = path_rmdir(&lower_dir, lower_dentry);
 	dput(lower_dentry);
 	if (!rc)
 		d_delete(lower_dentry);
-	fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
-	dir->i_nlink = lower_dir_dentry->d_inode->i_nlink;
-	unlock_dir(lower_dir_dentry);
+	fsstack_copy_attr_times(dir, lower_dir.dentry->d_inode);
+	dir->i_nlink = lower_dir.dentry->d_inode->i_nlink;
+	unlock_dir(lower_dir.dentry);
 	if (!rc)
 		d_drop(dentry);
 	dput(dentry);
Index: vfs-2.6/fs/nfsd/nfs4recover.c
===================================================================
--- vfs-2.6.orig/fs/nfsd/nfs4recover.c	2008-04-23 20:04:28.000000000 +0200
+++ vfs-2.6/fs/nfsd/nfs4recover.c	2008-04-23 20:04:42.000000000 +0200
@@ -267,6 +267,7 @@ nfsd4_remove_clid_file(struct dentry *di
 static int
 nfsd4_clear_clid_dir(struct dentry *dir, struct dentry *dentry)
 {
+	struct path dir_path = { .dentry = dir, .mnt = rec_dir.path.mnt };
 	int status;
 
 	/* For now this directory should already be empty, but we empty it of
@@ -274,7 +275,7 @@ nfsd4_clear_clid_dir(struct dentry *dir,
 	 * a kernel from the future.... */
 	nfsd4_list_rec_dir(dentry, nfsd4_remove_clid_file);
 	mutex_lock_nested(&dir->d_inode->i_mutex, I_MUTEX_PARENT);
-	status = vfs_rmdir(dir->d_inode, dentry);
+	status = path_rmdir(&dir_path, dentry);
 	mutex_unlock(&dir->d_inode->i_mutex);
 	return status;
 }
Index: vfs-2.6/fs/nfsd/vfs.c
===================================================================
--- vfs-2.6.orig/fs/nfsd/vfs.c	2008-04-23 20:04:28.000000000 +0200
+++ vfs-2.6/fs/nfsd/vfs.c	2008-04-23 20:05:26.000000000 +0200
@@ -1764,6 +1764,7 @@ __be32
 nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
 				char *fname, int flen)
 {
+	struct path 	dir_path;
 	struct dentry	*dentry, *rdentry;
 	struct inode	*dirp;
 	__be32		err;
@@ -1798,6 +1799,7 @@ nfsd_unlink(struct svc_rqst *rqstp, stru
 	if (host_err)
 		goto out_nfserr;
 
+	fh_to_path(fhp, &dir_path);
 	if (type != S_IFDIR) { /* It's UNLINK */
 #ifdef MSNFS
 		if ((fhp->fh_export->ex_flags & NFSEXP_MSNFS) &&
@@ -1807,7 +1809,7 @@ nfsd_unlink(struct svc_rqst *rqstp, stru
 #endif
 		host_err = vfs_unlink(dirp, rdentry);
 	} else { /* It's RMDIR */
-		host_err = vfs_rmdir(dirp, rdentry);
+		host_err = path_rmdir(&dir_path, rdentry);
 	}
 
 	dput(rdentry);
Index: vfs-2.6/include/linux/fs.h
===================================================================
--- vfs-2.6.orig/include/linux/fs.h	2008-04-23 20:04:28.000000000 +0200
+++ vfs-2.6/include/linux/fs.h	2008-04-23 20:04:42.000000000 +0200
@@ -1130,6 +1130,7 @@ extern int path_mknod(struct path *, str
 extern int vfs_symlink(struct inode *, struct dentry *, const char *, int);
 extern int vfs_link(struct dentry *, struct inode *, struct dentry *);
 extern int vfs_rmdir(struct inode *, struct dentry *);
+extern int path_rmdir(struct path *, struct dentry *);
 extern int vfs_unlink(struct inode *, struct dentry *);
 extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *);
 

--

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [patch 07/13] vfs: add path_unlink()
  2008-04-24 11:39 [patch 00/13] vfs: add helpers to check r/o bind mounts Miklos Szeredi
                   ` (5 preceding siblings ...)
  2008-04-24 11:39 ` [patch 06/13] vfs: add path_rmdir() Miklos Szeredi
@ 2008-04-24 11:39 ` Miklos Szeredi
  2008-04-24 11:39 ` [patch 08/13] vfs: add path_symlink() Miklos Szeredi
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 11:39 UTC (permalink / raw)
  To: viro; +Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

[-- Attachment #1: path_unlink.patch --]
[-- Type: text/plain, Size: 7126 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Introduce path_unlink().  Make vfs_unlink() static.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/ecryptfs/inode.c   |   12 ++++++------
 fs/namei.c            |   22 +++++++++++++++-------
 fs/nfsd/nfs4recover.c |    3 ++-
 fs/nfsd/vfs.c         |   12 ++----------
 include/linux/fs.h    |    2 +-
 ipc/mqueue.c          |   10 +++++-----
 6 files changed, 31 insertions(+), 30 deletions(-)

Index: vfs-2.6/fs/ecryptfs/inode.c
===================================================================
--- vfs-2.6.orig/fs/ecryptfs/inode.c	2008-04-23 20:04:42.000000000 +0200
+++ vfs-2.6/fs/ecryptfs/inode.c	2008-04-23 21:30:17.000000000 +0200
@@ -415,22 +415,22 @@ static int ecryptfs_unlink(struct inode 
 {
 	int rc = 0;
 	struct dentry *lower_dentry = ecryptfs_dentry_to_lower(dentry);
-	struct inode *lower_dir_inode = ecryptfs_inode_to_lower(dir);
-	struct dentry *lower_dir_dentry;
+	struct path lower_dir;
 
-	lower_dir_dentry = lock_parent(lower_dentry);
-	rc = vfs_unlink(lower_dir_inode, lower_dentry);
+	lower_dir.mnt = ecryptfs_dentry_to_lower_mnt(dentry);
+	lower_dir.dentry = lock_parent(lower_dentry);
+	rc = path_unlink(&lower_dir, lower_dentry);
 	if (rc) {
 		printk(KERN_ERR "Error in vfs_unlink; rc = [%d]\n", rc);
 		goto out_unlock;
 	}
-	fsstack_copy_attr_times(dir, lower_dir_inode);
+	fsstack_copy_attr_times(dir, lower_dir.dentry->d_inode);
 	dentry->d_inode->i_nlink =
 		ecryptfs_inode_to_lower(dentry->d_inode)->i_nlink;
 	dentry->d_inode->i_ctime = dir->i_ctime;
 	d_drop(dentry);
 out_unlock:
-	unlock_dir(lower_dir_dentry);
+	unlock_dir(lower_dir.dentry);
 	return rc;
 }
 
Index: vfs-2.6/fs/namei.c
===================================================================
--- vfs-2.6.orig/fs/namei.c	2008-04-23 20:04:42.000000000 +0200
+++ vfs-2.6/fs/namei.c	2008-04-23 21:30:17.000000000 +0200
@@ -2315,7 +2315,7 @@ asmlinkage long sys_rmdir(const char __u
 	return do_rmdir(AT_FDCWD, pathname);
 }
 
-int vfs_unlink(struct inode *dir, struct dentry *dentry)
+static int vfs_unlink(struct inode *dir, struct dentry *dentry)
 {
 	int error = may_delete(dir, dentry, 0);
 
@@ -2346,6 +2346,19 @@ int vfs_unlink(struct inode *dir, struct
 	return error;
 }
 
+int path_unlink(struct path *dir_path, struct dentry *dentry)
+{
+	int error = mnt_want_write(dir_path->mnt);
+
+	if (!error) {
+		error = vfs_unlink(dir_path->dentry->d_inode, dentry);
+		mnt_drop_write(dir_path->mnt);
+	}
+
+	return error;
+}
+EXPORT_SYMBOL(path_unlink);
+
 /*
  * Make sure that the actual truncation of the file will occur outside its
  * directory's i_mutex.  Truncate can take a long time if there is a lot of
@@ -2380,11 +2393,7 @@ static long do_unlinkat(int dfd, const c
 		inode = dentry->d_inode;
 		if (inode)
 			atomic_inc(&inode->i_count);
-		error = mnt_want_write(nd.path.mnt);
-		if (error)
-			goto exit2;
-		error = vfs_unlink(nd.path.dentry->d_inode, dentry);
-		mnt_drop_write(nd.path.mnt);
+		error = path_unlink(&nd.path, dentry);
 	exit2:
 		dput(dentry);
 	}
@@ -2996,6 +3005,5 @@ EXPORT_SYMBOL(vfs_readlink);
 EXPORT_SYMBOL(vfs_rename);
 EXPORT_SYMBOL(vfs_rmdir);
 EXPORT_SYMBOL(vfs_symlink);
-EXPORT_SYMBOL(vfs_unlink);
 EXPORT_SYMBOL(dentry_unhash);
 EXPORT_SYMBOL(generic_readlink);
Index: vfs-2.6/fs/nfsd/nfs4recover.c
===================================================================
--- vfs-2.6.orig/fs/nfsd/nfs4recover.c	2008-04-23 20:04:42.000000000 +0200
+++ vfs-2.6/fs/nfsd/nfs4recover.c	2008-04-23 20:08:08.000000000 +0200
@@ -252,6 +252,7 @@ out:
 static int
 nfsd4_remove_clid_file(struct dentry *dir, struct dentry *dentry)
 {
+	struct path dir_path = { .dentry = dir, .mnt = rec_dir.path.mnt };
 	int status;
 
 	if (!S_ISREG(dir->d_inode->i_mode)) {
@@ -259,7 +260,7 @@ nfsd4_remove_clid_file(struct dentry *di
 		return -EINVAL;
 	}
 	mutex_lock_nested(&dir->d_inode->i_mutex, I_MUTEX_PARENT);
-	status = vfs_unlink(dir->d_inode, dentry);
+	status = path_unlink(&dir_path, dentry);
 	mutex_unlock(&dir->d_inode->i_mutex);
 	return status;
 }
Index: vfs-2.6/fs/nfsd/vfs.c
===================================================================
--- vfs-2.6.orig/fs/nfsd/vfs.c	2008-04-23 20:05:26.000000000 +0200
+++ vfs-2.6/fs/nfsd/vfs.c	2008-04-23 21:30:17.000000000 +0200
@@ -1766,7 +1766,6 @@ nfsd_unlink(struct svc_rqst *rqstp, stru
 {
 	struct path 	dir_path;
 	struct dentry	*dentry, *rdentry;
-	struct inode	*dirp;
 	__be32		err;
 	int		host_err;
 
@@ -1779,7 +1778,6 @@ nfsd_unlink(struct svc_rqst *rqstp, stru
 
 	fh_lock_nested(fhp, I_MUTEX_PARENT);
 	dentry = fhp->fh_dentry;
-	dirp = dentry->d_inode;
 
 	rdentry = lookup_one_len(fname, dentry, flen);
 	host_err = PTR_ERR(rdentry);
@@ -1795,10 +1793,6 @@ nfsd_unlink(struct svc_rqst *rqstp, stru
 	if (!type)
 		type = rdentry->d_inode->i_mode & S_IFMT;
 
-	host_err = mnt_want_write(fhp->fh_export->ex_path.mnt);
-	if (host_err)
-		goto out_nfserr;
-
 	fh_to_path(fhp, &dir_path);
 	if (type != S_IFDIR) { /* It's UNLINK */
 #ifdef MSNFS
@@ -1807,7 +1801,7 @@ nfsd_unlink(struct svc_rqst *rqstp, stru
 			host_err = -EPERM;
 		} else
 #endif
-		host_err = vfs_unlink(dirp, rdentry);
+		host_err = path_unlink(&dir_path, rdentry);
 	} else { /* It's RMDIR */
 		host_err = path_rmdir(&dir_path, rdentry);
 	}
@@ -1815,12 +1809,10 @@ nfsd_unlink(struct svc_rqst *rqstp, stru
 	dput(rdentry);
 
 	if (host_err)
-		goto out_drop;
+		goto out_nfserr;
 	if (EX_ISSYNC(fhp->fh_export))
 		host_err = nfsd_sync_dir(dentry);
 
-out_drop:
-	mnt_drop_write(fhp->fh_export->ex_path.mnt);
 out_nfserr:
 	err = nfserrno(host_err);
 out:
Index: vfs-2.6/include/linux/fs.h
===================================================================
--- vfs-2.6.orig/include/linux/fs.h	2008-04-23 20:04:42.000000000 +0200
+++ vfs-2.6/include/linux/fs.h	2008-04-23 21:30:17.000000000 +0200
@@ -1131,7 +1131,7 @@ extern int vfs_symlink(struct inode *, s
 extern int vfs_link(struct dentry *, struct inode *, struct dentry *);
 extern int vfs_rmdir(struct inode *, struct dentry *);
 extern int path_rmdir(struct path *, struct dentry *);
-extern int vfs_unlink(struct inode *, struct dentry *);
+extern int path_unlink(struct path *, struct dentry *);
 extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *);
 
 /*
Index: vfs-2.6/ipc/mqueue.c
===================================================================
--- vfs-2.6.orig/ipc/mqueue.c	2008-04-23 19:10:13.000000000 +0200
+++ vfs-2.6/ipc/mqueue.c	2008-04-23 20:08:08.000000000 +0200
@@ -737,6 +737,7 @@ asmlinkage long sys_mq_unlink(const char
 	char *name;
 	struct dentry *dentry;
 	struct inode *inode = NULL;
+	struct path dir_path;
 
 	name = getname(u_name);
 	if (IS_ERR(name))
@@ -758,11 +759,10 @@ asmlinkage long sys_mq_unlink(const char
 	inode = dentry->d_inode;
 	if (inode)
 		atomic_inc(&inode->i_count);
-	err = mnt_want_write(mqueue_mnt);
-	if (err)
-		goto out_err;
-	err = vfs_unlink(dentry->d_parent->d_inode, dentry);
-	mnt_drop_write(mqueue_mnt);
+
+	dir_path.mnt = mqueue_mnt;
+	dir_path.dentry = dentry->d_parent;
+	err = path_unlink(&dir_path, dentry);
 out_err:
 	dput(dentry);
 

--

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [patch 08/13] vfs: add path_symlink()
  2008-04-24 11:39 [patch 00/13] vfs: add helpers to check r/o bind mounts Miklos Szeredi
                   ` (6 preceding siblings ...)
  2008-04-24 11:39 ` [patch 07/13] vfs: add path_unlink() Miklos Szeredi
@ 2008-04-24 11:39 ` Miklos Szeredi
  2008-04-24 11:39 ` [patch 09/13] vfs: add path_link() Miklos Szeredi
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 11:39 UTC (permalink / raw)
  To: viro; +Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

[-- Attachment #1: path_symlink.patch --]
[-- Type: text/plain, Size: 6257 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Introduce path_symlink().  Make vfs_symlink() static.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/ecryptfs/inode.c |   14 +++++++-------
 fs/namei.c          |   26 ++++++++++++++++++--------
 fs/nfsd/vfs.c       |   12 ++++--------
 include/linux/fs.h  |    2 +-
 4 files changed, 30 insertions(+), 24 deletions(-)

Index: vfs-2.6/fs/ecryptfs/inode.c
===================================================================
--- vfs-2.6.orig/fs/ecryptfs/inode.c	2008-04-23 20:08:08.000000000 +0200
+++ vfs-2.6/fs/ecryptfs/inode.c	2008-04-23 21:30:10.000000000 +0200
@@ -439,7 +439,7 @@ static int ecryptfs_symlink(struct inode
 {
 	int rc;
 	struct dentry *lower_dentry;
-	struct dentry *lower_dir_dentry;
+	struct path lower_dir;
 	umode_t mode;
 	char *encoded_symname;
 	int encoded_symlen;
@@ -447,7 +447,8 @@ static int ecryptfs_symlink(struct inode
 
 	lower_dentry = ecryptfs_dentry_to_lower(dentry);
 	dget(lower_dentry);
-	lower_dir_dentry = lock_parent(lower_dentry);
+	lower_dir.mnt = ecryptfs_dentry_to_lower_mnt(dentry);
+	lower_dir.dentry = lock_parent(lower_dentry);
 	mode = S_IALLUGO;
 	encoded_symlen = ecryptfs_encode_filename(crypt_stat, symname,
 						  strlen(symname),
@@ -456,18 +457,17 @@ static int ecryptfs_symlink(struct inode
 		rc = encoded_symlen;
 		goto out_lock;
 	}
-	rc = vfs_symlink(lower_dir_dentry->d_inode, lower_dentry,
-			 encoded_symname, mode);
+	rc = path_symlink(&lower_dir, lower_dentry, encoded_symname, mode);
 	kfree(encoded_symname);
 	if (rc || !lower_dentry->d_inode)
 		goto out_lock;
 	rc = ecryptfs_interpose(lower_dentry, dentry, dir->i_sb, 0);
 	if (rc)
 		goto out_lock;
-	fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
-	fsstack_copy_inode_size(dir, lower_dir_dentry->d_inode);
+	fsstack_copy_attr_times(dir, lower_dir.dentry->d_inode);
+	fsstack_copy_inode_size(dir, lower_dir.dentry->d_inode);
 out_lock:
-	unlock_dir(lower_dir_dentry);
+	unlock_dir(lower_dir.dentry);
 	dput(lower_dentry);
 	if (!dentry->d_inode)
 		d_drop(dentry);
Index: vfs-2.6/fs/namei.c
===================================================================
--- vfs-2.6.orig/fs/namei.c	2008-04-23 20:08:08.000000000 +0200
+++ vfs-2.6/fs/namei.c	2008-04-23 21:30:10.000000000 +0200
@@ -2428,7 +2428,7 @@ asmlinkage long sys_unlink(const char __
 	return do_unlinkat(AT_FDCWD, pathname);
 }
 
-int vfs_symlink(struct inode *dir, struct dentry *dentry, const char *oldname, int mode)
+static int vfs_symlink(struct inode *dir, struct dentry *dentry, const char *oldname, int mode)
 {
 	int error = may_create(dir, dentry, NULL);
 
@@ -2449,6 +2449,22 @@ int vfs_symlink(struct inode *dir, struc
 	return error;
 }
 
+int path_symlink(struct path *dir_path, struct dentry *dentry,
+		 const char *oldname, int mode)
+{
+	int error = mnt_want_write(dir_path->mnt);
+
+	if (!error) {
+		struct inode *dir = dir_path->dentry->d_inode;
+
+		error = vfs_symlink(dir, dentry, oldname, mode);
+		mnt_drop_write(dir_path->mnt);
+	}
+
+	return error;
+}
+EXPORT_SYMBOL(path_symlink);
+
 asmlinkage long sys_symlinkat(const char __user *oldname,
 			      int newdfd, const char __user *newname)
 {
@@ -2474,12 +2490,7 @@ asmlinkage long sys_symlinkat(const char
 	if (IS_ERR(dentry))
 		goto out_unlock;
 
-	error = mnt_want_write(nd.path.mnt);
-	if (error)
-		goto out_dput;
-	error = vfs_symlink(nd.path.dentry->d_inode, dentry, from, S_IALLUGO);
-	mnt_drop_write(nd.path.mnt);
-out_dput:
+	error = path_symlink(&nd.path, dentry, from, S_IALLUGO);
 	dput(dentry);
 out_unlock:
 	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
@@ -3004,6 +3015,5 @@ EXPORT_SYMBOL(generic_permission);
 EXPORT_SYMBOL(vfs_readlink);
 EXPORT_SYMBOL(vfs_rename);
 EXPORT_SYMBOL(vfs_rmdir);
-EXPORT_SYMBOL(vfs_symlink);
 EXPORT_SYMBOL(dentry_unhash);
 EXPORT_SYMBOL(generic_readlink);
Index: vfs-2.6/fs/nfsd/vfs.c
===================================================================
--- vfs-2.6.orig/fs/nfsd/vfs.c	2008-04-23 20:20:43.000000000 +0200
+++ vfs-2.6/fs/nfsd/vfs.c	2008-04-23 21:30:10.000000000 +0200
@@ -1518,6 +1518,7 @@ nfsd_symlink(struct svc_rqst *rqstp, str
 				struct svc_fh *resfhp,
 				struct iattr *iap)
 {
+	struct path 	dir_path;
 	struct dentry	*dentry, *dnew;
 	__be32		err, cerr;
 	int		host_err;
@@ -1545,10 +1546,7 @@ nfsd_symlink(struct svc_rqst *rqstp, str
 	if (iap && (iap->ia_valid & ATTR_MODE))
 		mode = iap->ia_mode & S_IALLUGO;
 
-	host_err = mnt_want_write(fhp->fh_export->ex_path.mnt);
-	if (host_err)
-		goto out_nfserr;
-
+	fh_to_path(fhp, &dir_path);
 	if (unlikely(path[plen] != 0)) {
 		char *path_alloced = kmalloc(plen+1, GFP_KERNEL);
 		if (path_alloced == NULL)
@@ -1556,11 +1554,11 @@ nfsd_symlink(struct svc_rqst *rqstp, str
 		else {
 			strncpy(path_alloced, path, plen);
 			path_alloced[plen] = 0;
-			host_err = vfs_symlink(dentry->d_inode, dnew, path_alloced, mode);
+			host_err = path_symlink(&dir_path, dnew, path_alloced, mode);
 			kfree(path_alloced);
 		}
 	} else
-		host_err = vfs_symlink(dentry->d_inode, dnew, path, mode);
+		host_err = path_symlink(&dir_path, dnew, path, mode);
 
 	if (!host_err) {
 		if (EX_ISSYNC(fhp->fh_export))
@@ -1569,8 +1567,6 @@ nfsd_symlink(struct svc_rqst *rqstp, str
 	err = nfserrno(host_err);
 	fh_unlock(fhp);
 
-	mnt_drop_write(fhp->fh_export->ex_path.mnt);
-
 	cerr = fh_compose(resfhp, fhp->fh_export, dnew, fhp);
 	dput(dnew);
 	if (err==0) err = cerr;
Index: vfs-2.6/include/linux/fs.h
===================================================================
--- vfs-2.6.orig/include/linux/fs.h	2008-04-23 20:08:08.000000000 +0200
+++ vfs-2.6/include/linux/fs.h	2008-04-23 21:30:10.000000000 +0200
@@ -1127,7 +1127,7 @@ extern int path_create(struct path *, st
 extern int vfs_mkdir(struct inode *, struct dentry *, int);
 extern int path_mkdir(struct path *, struct dentry *, int);
 extern int path_mknod(struct path *, struct dentry *, int, dev_t);
-extern int vfs_symlink(struct inode *, struct dentry *, const char *, int);
+extern int path_symlink(struct path *, struct dentry *, const char *, int);
 extern int vfs_link(struct dentry *, struct inode *, struct dentry *);
 extern int vfs_rmdir(struct inode *, struct dentry *);
 extern int path_rmdir(struct path *, struct dentry *);

--

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [patch 09/13] vfs: add path_link()
  2008-04-24 11:39 [patch 00/13] vfs: add helpers to check r/o bind mounts Miklos Szeredi
                   ` (7 preceding siblings ...)
  2008-04-24 11:39 ` [patch 08/13] vfs: add path_symlink() Miklos Szeredi
@ 2008-04-24 11:39 ` Miklos Szeredi
  2008-04-24 11:40 ` [patch 10/13] vfs: add path_rename() Miklos Szeredi
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 11:39 UTC (permalink / raw)
  To: viro; +Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

[-- Attachment #1: path_link.patch --]
[-- Type: text/plain, Size: 5913 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Introduce path_link().  Make vfs_link() static.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/ecryptfs/inode.c |   10 +++++-----
 fs/namei.c          |   26 ++++++++++++++++++--------
 fs/nfsd/vfs.c       |   14 ++++----------
 include/linux/fs.h  |    2 +-
 4 files changed, 28 insertions(+), 24 deletions(-)

Index: vfs-2.6/fs/ecryptfs/inode.c
===================================================================
--- vfs-2.6.orig/fs/ecryptfs/inode.c	2008-04-23 20:32:59.000000000 +0200
+++ vfs-2.6/fs/ecryptfs/inode.c	2008-04-23 20:44:38.000000000 +0200
@@ -379,7 +379,7 @@ static int ecryptfs_link(struct dentry *
 {
 	struct dentry *lower_old_dentry;
 	struct dentry *lower_new_dentry;
-	struct dentry *lower_dir_dentry;
+	struct path lower_dir;
 	u64 file_size_save;
 	int rc;
 
@@ -388,9 +388,9 @@ static int ecryptfs_link(struct dentry *
 	lower_new_dentry = ecryptfs_dentry_to_lower(new_dentry);
 	dget(lower_old_dentry);
 	dget(lower_new_dentry);
-	lower_dir_dentry = lock_parent(lower_new_dentry);
-	rc = vfs_link(lower_old_dentry, lower_dir_dentry->d_inode,
-		      lower_new_dentry);
+	lower_dir.mnt = ecryptfs_dentry_to_lower_mnt(new_dentry);
+	lower_dir.dentry = lock_parent(lower_new_dentry);
+	rc = path_link(lower_old_dentry, &lower_dir, lower_new_dentry);
 	if (rc || !lower_new_dentry->d_inode)
 		goto out_lock;
 	rc = ecryptfs_interpose(lower_new_dentry, new_dentry, dir->i_sb, 0);
@@ -402,7 +402,7 @@ static int ecryptfs_link(struct dentry *
 		ecryptfs_inode_to_lower(old_dentry->d_inode)->i_nlink;
 	i_size_write(new_dentry->d_inode, file_size_save);
 out_lock:
-	unlock_dir(lower_dir_dentry);
+	unlock_dir(lower_dir.dentry);
 	dput(lower_new_dentry);
 	dput(lower_old_dentry);
 	d_drop(lower_old_dentry);
Index: vfs-2.6/fs/namei.c
===================================================================
--- vfs-2.6.orig/fs/namei.c	2008-04-23 20:32:59.000000000 +0200
+++ vfs-2.6/fs/namei.c	2008-04-23 20:44:38.000000000 +0200
@@ -2507,7 +2507,7 @@ asmlinkage long sys_symlink(const char _
 	return sys_symlinkat(oldname, AT_FDCWD, newname);
 }
 
-int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry)
+static int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry)
 {
 	struct inode *inode = old_dentry->d_inode;
 	int error;
@@ -2545,6 +2545,22 @@ int vfs_link(struct dentry *old_dentry, 
 	return error;
 }
 
+int path_link(struct dentry *old_dentry, struct path *dir_path,
+	      struct dentry *new_dentry)
+{
+	int error = mnt_want_write(dir_path->mnt);
+
+	if (!error) {
+		struct inode *dir = dir_path->dentry->d_inode;
+
+		error = vfs_link(old_dentry, dir, new_dentry);
+		mnt_drop_write(dir_path->mnt);
+	}
+
+	return error;
+}
+EXPORT_SYMBOL(path_link);
+
 /*
  * Hardlinks are often used in delicate situations.  We avoid
  * security-related surprises by not following symlinks on the
@@ -2585,12 +2601,7 @@ asmlinkage long sys_linkat(int olddfd, c
 	error = PTR_ERR(new_dentry);
 	if (IS_ERR(new_dentry))
 		goto out_unlock;
-	error = mnt_want_write(nd.path.mnt);
-	if (error)
-		goto out_dput;
-	error = vfs_link(old_nd.path.dentry, nd.path.dentry->d_inode, new_dentry);
-	mnt_drop_write(nd.path.mnt);
-out_dput:
+	error = path_link(old_nd.path.dentry, &nd.path, new_dentry);
 	dput(new_dentry);
 out_unlock:
 	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
@@ -3010,7 +3021,6 @@ EXPORT_SYMBOL(vfs_permission);
 EXPORT_SYMBOL(file_permission);
 EXPORT_SYMBOL(unlock_rename);
 EXPORT_SYMBOL(vfs_follow_link);
-EXPORT_SYMBOL(vfs_link);
 EXPORT_SYMBOL(generic_permission);
 EXPORT_SYMBOL(vfs_readlink);
 EXPORT_SYMBOL(vfs_rename);
Index: vfs-2.6/fs/nfsd/vfs.c
===================================================================
--- vfs-2.6.orig/fs/nfsd/vfs.c	2008-04-23 20:34:39.000000000 +0200
+++ vfs-2.6/fs/nfsd/vfs.c	2008-04-23 20:46:10.000000000 +0200
@@ -1586,8 +1586,9 @@ __be32
 nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
 				char *name, int len, struct svc_fh *tfhp)
 {
+	struct path 	dir_path;
 	struct dentry	*ddir, *dnew, *dold;
-	struct inode	*dirp, *dest;
+	struct inode	*dest;
 	__be32		err;
 	int		host_err;
 
@@ -1607,7 +1608,6 @@ nfsd_link(struct svc_rqst *rqstp, struct
 
 	fh_lock_nested(ffhp, I_MUTEX_PARENT);
 	ddir = ffhp->fh_dentry;
-	dirp = ddir->d_inode;
 
 	dnew = lookup_one_len(name, ddir, len);
 	host_err = PTR_ERR(dnew);
@@ -1617,12 +1617,8 @@ nfsd_link(struct svc_rqst *rqstp, struct
 	dold = tfhp->fh_dentry;
 	dest = dold->d_inode;
 
-	host_err = mnt_want_write(tfhp->fh_export->ex_path.mnt);
-	if (host_err) {
-		err = nfserrno(host_err);
-		goto out_dput;
-	}
-	host_err = vfs_link(dold, dirp, dnew);
+	fh_to_path(ffhp, &dir_path);
+	host_err = path_link(dold, &dir_path, dnew);
 	if (!host_err) {
 		if (EX_ISSYNC(ffhp->fh_export)) {
 			err = nfserrno(nfsd_sync_dir(ddir));
@@ -1635,8 +1631,6 @@ nfsd_link(struct svc_rqst *rqstp, struct
 		else
 			err = nfserrno(host_err);
 	}
-	mnt_drop_write(tfhp->fh_export->ex_path.mnt);
-out_dput:
 	dput(dnew);
 out_unlock:
 	fh_unlock(ffhp);
Index: vfs-2.6/include/linux/fs.h
===================================================================
--- vfs-2.6.orig/include/linux/fs.h	2008-04-23 20:32:59.000000000 +0200
+++ vfs-2.6/include/linux/fs.h	2008-04-23 20:44:38.000000000 +0200
@@ -1128,7 +1128,7 @@ extern int vfs_mkdir(struct inode *, str
 extern int path_mkdir(struct path *, struct dentry *, int);
 extern int path_mknod(struct path *, struct dentry *, int, dev_t);
 extern int path_symlink(struct path *, struct dentry *, const char *, int);
-extern int vfs_link(struct dentry *, struct inode *, struct dentry *);
+extern int path_link(struct dentry *, struct path *, struct dentry *);
 extern int vfs_rmdir(struct inode *, struct dentry *);
 extern int path_rmdir(struct path *, struct dentry *);
 extern int path_unlink(struct path *, struct dentry *);

--

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [patch 10/13] vfs: add path_rename()
  2008-04-24 11:39 [patch 00/13] vfs: add helpers to check r/o bind mounts Miklos Szeredi
                   ` (8 preceding siblings ...)
  2008-04-24 11:39 ` [patch 09/13] vfs: add path_link() Miklos Szeredi
@ 2008-04-24 11:40 ` Miklos Szeredi
  2008-04-24 11:40 ` [patch 11/13] vfs: add path_setattr() Miklos Szeredi
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 11:40 UTC (permalink / raw)
  To: viro; +Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

[-- Attachment #1: path_rename.patch --]
[-- Type: text/plain, Size: 6103 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Introduce path_rename().  Make vfs_rename() static.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/ecryptfs/inode.c |   13 +++++++------
 fs/namei.c          |   27 +++++++++++++++++++--------
 fs/nfsd/vfs.c       |   16 ++++------------
 include/linux/fs.h  |    2 +-
 4 files changed, 31 insertions(+), 27 deletions(-)

Index: vfs-2.6/fs/ecryptfs/inode.c
===================================================================
--- vfs-2.6.orig/fs/ecryptfs/inode.c	2008-04-23 20:44:38.000000000 +0200
+++ vfs-2.6/fs/ecryptfs/inode.c	2008-04-23 21:30:04.000000000 +0200
@@ -555,25 +555,26 @@ ecryptfs_rename(struct inode *old_dir, s
 	int rc;
 	struct dentry *lower_old_dentry;
 	struct dentry *lower_new_dentry;
-	struct dentry *lower_old_dir_dentry;
+	struct path lower_old_dir;
 	struct dentry *lower_new_dir_dentry;
 
 	lower_old_dentry = ecryptfs_dentry_to_lower(old_dentry);
 	lower_new_dentry = ecryptfs_dentry_to_lower(new_dentry);
 	dget(lower_old_dentry);
 	dget(lower_new_dentry);
-	lower_old_dir_dentry = dget_parent(lower_old_dentry);
+	lower_old_dir.mnt = ecryptfs_dentry_to_lower_mnt(old_dentry);
+	lower_old_dir.dentry = dget_parent(lower_old_dentry);
 	lower_new_dir_dentry = dget_parent(lower_new_dentry);
-	lock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
-	rc = vfs_rename(lower_old_dir_dentry->d_inode, lower_old_dentry,
+	lock_rename(lower_old_dir.dentry, lower_new_dir_dentry);
+	rc = path_rename(&lower_old_dir, lower_old_dentry,
 			lower_new_dir_dentry->d_inode, lower_new_dentry);
 	if (rc)
 		goto out_lock;
 	fsstack_copy_attr_all(new_dir, lower_new_dir_dentry->d_inode, NULL);
 	if (new_dir != old_dir)
-		fsstack_copy_attr_all(old_dir, lower_old_dir_dentry->d_inode, NULL);
+		fsstack_copy_attr_all(old_dir, lower_old_dir.dentry->d_inode, NULL);
 out_lock:
-	unlock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
+	unlock_rename(lower_old_dir.dentry, lower_new_dir_dentry);
 	dput(lower_new_dentry->d_parent);
 	dput(lower_old_dentry->d_parent);
 	dput(lower_new_dentry);
Index: vfs-2.6/fs/namei.c
===================================================================
--- vfs-2.6.orig/fs/namei.c	2008-04-23 20:44:38.000000000 +0200
+++ vfs-2.6/fs/namei.c	2008-04-23 20:49:49.000000000 +0200
@@ -2723,8 +2723,8 @@ static int vfs_rename_other(struct inode
 	return error;
 }
 
-int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
-	       struct inode *new_dir, struct dentry *new_dentry)
+static int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+		      struct inode *new_dir, struct dentry *new_dentry)
 {
 	int error;
 	int is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
@@ -2766,6 +2766,22 @@ int vfs_rename(struct inode *old_dir, st
 	return error;
 }
 
+int path_rename(struct path *old_dir_path, struct dentry *old_dentry,
+		struct inode *new_dir, struct dentry *new_dentry)
+{
+	int error = mnt_want_write(old_dir_path->mnt);
+
+	if (!error) {
+		struct inode *old_dir = old_dir_path->dentry->d_inode;
+
+		error = vfs_rename(old_dir, old_dentry, new_dir, new_dentry);
+		mnt_drop_write(old_dir_path->mnt);
+	}
+
+	return error;
+}
+EXPORT_SYMBOL(path_rename);
+
 static int do_rename(int olddfd, const char *oldname,
 			int newdfd, const char *newname)
 {
@@ -2827,12 +2843,8 @@ static int do_rename(int olddfd, const c
 	if (new_dentry == trap)
 		goto exit5;
 
-	error = mnt_want_write(oldnd.path.mnt);
-	if (error)
-		goto exit5;
-	error = vfs_rename(old_dir->d_inode, old_dentry,
+	error = path_rename(&oldnd.path, old_dentry,
 				   new_dir->d_inode, new_dentry);
-	mnt_drop_write(oldnd.path.mnt);
 exit5:
 	dput(new_dentry);
 exit4:
@@ -3023,7 +3035,6 @@ EXPORT_SYMBOL(unlock_rename);
 EXPORT_SYMBOL(vfs_follow_link);
 EXPORT_SYMBOL(generic_permission);
 EXPORT_SYMBOL(vfs_readlink);
-EXPORT_SYMBOL(vfs_rename);
 EXPORT_SYMBOL(vfs_rmdir);
 EXPORT_SYMBOL(dentry_unhash);
 EXPORT_SYMBOL(generic_readlink);
Index: vfs-2.6/fs/nfsd/vfs.c
===================================================================
--- vfs-2.6.orig/fs/nfsd/vfs.c	2008-04-23 20:46:10.000000000 +0200
+++ vfs-2.6/fs/nfsd/vfs.c	2008-04-23 21:30:04.000000000 +0200
@@ -1650,8 +1650,9 @@ __be32
 nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
 			    struct svc_fh *tfhp, char *tname, int tlen)
 {
+	struct path 	old_dir_path;
 	struct dentry	*fdentry, *tdentry, *odentry, *ndentry, *trap;
-	struct inode	*fdir, *tdir;
+	struct inode	*tdir;
 	__be32		err;
 	int		host_err;
 
@@ -1663,7 +1664,6 @@ nfsd_rename(struct svc_rqst *rqstp, stru
 		goto out;
 
 	fdentry = ffhp->fh_dentry;
-	fdir = fdentry->d_inode;
 
 	tdentry = tfhp->fh_dentry;
 	tdir = tdentry->d_inode;
@@ -1710,22 +1710,14 @@ nfsd_rename(struct svc_rqst *rqstp, stru
 			goto out_dput_new;
 	}
 
-	host_err = -EXDEV;
-	if (ffhp->fh_export->ex_path.mnt != tfhp->fh_export->ex_path.mnt)
-		goto out_dput_new;
-	host_err = mnt_want_write(ffhp->fh_export->ex_path.mnt);
-	if (host_err)
-		goto out_dput_new;
-
-	host_err = vfs_rename(fdir, odentry, tdir, ndentry);
+	fh_to_path(ffhp, &old_dir_path);
+	host_err = path_rename(&old_dir_path, odentry, tdir, ndentry);
 	if (!host_err && EX_ISSYNC(tfhp->fh_export)) {
 		host_err = nfsd_sync_dir(tdentry);
 		if (!host_err)
 			host_err = nfsd_sync_dir(fdentry);
 	}
 
-	mnt_drop_write(ffhp->fh_export->ex_path.mnt);
-
  out_dput_new:
 	dput(ndentry);
  out_dput_old:
Index: vfs-2.6/include/linux/fs.h
===================================================================
--- vfs-2.6.orig/include/linux/fs.h	2008-04-23 20:44:38.000000000 +0200
+++ vfs-2.6/include/linux/fs.h	2008-04-23 21:30:04.000000000 +0200
@@ -1132,7 +1132,7 @@ extern int path_link(struct dentry *, st
 extern int vfs_rmdir(struct inode *, struct dentry *);
 extern int path_rmdir(struct path *, struct dentry *);
 extern int path_unlink(struct path *, struct dentry *);
-extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *);
+extern int path_rename(struct path *, struct dentry *, struct inode *, struct dentry *);
 
 /*
  * VFS dentry helper functions.

--

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [patch 11/13] vfs: add path_setattr()
  2008-04-24 11:39 [patch 00/13] vfs: add helpers to check r/o bind mounts Miklos Szeredi
                   ` (9 preceding siblings ...)
  2008-04-24 11:40 ` [patch 10/13] vfs: add path_rename() Miklos Szeredi
@ 2008-04-24 11:40 ` Miklos Szeredi
  2008-04-24 11:40 ` [patch 12/13] vfs: add path_setxattr() Miklos Szeredi
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 11:40 UTC (permalink / raw)
  To: viro; +Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

[-- Attachment #1: path_setattr.patch --]
[-- Type: text/plain, Size: 9189 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Introduce path_setattr().  Add checks to IS_IMMUTABLE(inode) and
IS_APPEND(inode) since all callers require it.  Maybe these should be
moved into notify_change()?

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/attr.c           |   23 ++++++++++++++++-
 fs/ecryptfs/inode.c |   11 ++++----
 fs/nfsd/vfs.c       |   17 ++++++++-----
 fs/open.c           |   68 ++++++----------------------------------------------
 include/linux/fs.h  |    1 
 5 files changed, 49 insertions(+), 71 deletions(-)

Index: vfs-2.6/fs/open.c
===================================================================
--- vfs-2.6.orig/fs/open.c	2008-04-23 21:33:41.000000000 +0200
+++ vfs-2.6/fs/open.c	2008-04-23 21:33:42.000000000 +0200
@@ -578,23 +578,13 @@ asmlinkage long sys_fchmod(unsigned int 
 
 	audit_inode(NULL, dentry);
 
-	err = mnt_want_write(file->f_path.mnt);
-	if (err)
-		goto out_putf;
-	err = -EPERM;
-	if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
-		goto out_drop_write;
 	mutex_lock(&inode->i_mutex);
 	if (mode == (mode_t) -1)
 		mode = inode->i_mode;
 	newattrs.ia_mode = (mode & S_IALLUGO) | (inode->i_mode & ~S_IALLUGO);
 	newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
-	err = notify_change(dentry, &newattrs);
+	err = path_setattr(&file->f_path, &newattrs);
 	mutex_unlock(&inode->i_mutex);
-
-out_drop_write:
-	mnt_drop_write(file->f_path.mnt);
-out_putf:
 	fput(file);
 out:
 	return err;
@@ -613,25 +603,14 @@ asmlinkage long sys_fchmodat(int dfd, co
 		goto out;
 	inode = nd.path.dentry->d_inode;
 
-	error = mnt_want_write(nd.path.mnt);
-	if (error)
-		goto dput_and_out;
-
-	error = -EPERM;
-	if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
-		goto out_drop_write;
-
 	mutex_lock(&inode->i_mutex);
 	if (mode == (mode_t) -1)
 		mode = inode->i_mode;
 	newattrs.ia_mode = (mode & S_IALLUGO) | (inode->i_mode & ~S_IALLUGO);
 	newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
-	error = notify_change(nd.path.dentry, &newattrs);
+	error = path_setattr(&nd.path, &newattrs);
 	mutex_unlock(&inode->i_mutex);
 
-out_drop_write:
-	mnt_drop_write(nd.path.mnt);
-dput_and_out:
 	path_put(&nd.path);
 out:
 	return error;
@@ -642,20 +621,12 @@ asmlinkage long sys_chmod(const char __u
 	return sys_fchmodat(AT_FDCWD, filename, mode);
 }
 
-static int chown_common(struct dentry * dentry, uid_t user, gid_t group)
+static int chown_common(struct path *path, uid_t user, gid_t group)
 {
-	struct inode * inode;
+	struct inode *inode = path->dentry->d_inode;
 	int error;
 	struct iattr newattrs;
 
-	error = -ENOENT;
-	if (!(inode = dentry->d_inode)) {
-		printk(KERN_ERR "chown_common: NULL inode\n");
-		goto out;
-	}
-	error = -EPERM;
-	if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
-		goto out;
 	newattrs.ia_valid =  ATTR_CTIME;
 	if (user != (uid_t) -1) {
 		newattrs.ia_valid |= ATTR_UID;
@@ -669,9 +640,8 @@ static int chown_common(struct dentry * 
 		newattrs.ia_valid |=
 			ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
 	mutex_lock(&inode->i_mutex);
-	error = notify_change(dentry, &newattrs);
+	error = path_setattr(path, &newattrs);
 	mutex_unlock(&inode->i_mutex);
-out:
 	return error;
 }
 
@@ -683,12 +653,7 @@ asmlinkage long sys_chown(const char __u
 	error = user_path_walk(filename, &nd);
 	if (error)
 		goto out;
-	error = mnt_want_write(nd.path.mnt);
-	if (error)
-		goto out_release;
-	error = chown_common(nd.path.dentry, user, group);
-	mnt_drop_write(nd.path.mnt);
-out_release:
+	error = chown_common(&nd.path, user, group);
 	path_put(&nd.path);
 out:
 	return error;
@@ -708,12 +673,7 @@ asmlinkage long sys_fchownat(int dfd, co
 	error = __user_walk_fd(dfd, filename, follow, &nd);
 	if (error)
 		goto out;
-	error = mnt_want_write(nd.path.mnt);
-	if (error)
-		goto out_release;
-	error = chown_common(nd.path.dentry, user, group);
-	mnt_drop_write(nd.path.mnt);
-out_release:
+	error = chown_common(&nd.path, user, group);
 	path_put(&nd.path);
 out:
 	return error;
@@ -727,12 +687,7 @@ asmlinkage long sys_lchown(const char __
 	error = user_path_walk_link(filename, &nd);
 	if (error)
 		goto out;
-	error = mnt_want_write(nd.path.mnt);
-	if (error)
-		goto out_release;
-	error = chown_common(nd.path.dentry, user, group);
-	mnt_drop_write(nd.path.mnt);
-out_release:
+	error = chown_common(&nd.path, user, group);
 	path_put(&nd.path);
 out:
 	return error;
@@ -749,14 +704,9 @@ asmlinkage long sys_fchown(unsigned int 
 	if (!file)
 		goto out;
 
-	error = mnt_want_write(file->f_path.mnt);
-	if (error)
-		goto out_fput;
 	dentry = file->f_path.dentry;
 	audit_inode(NULL, dentry);
-	error = chown_common(dentry, user, group);
-	mnt_drop_write(file->f_path.mnt);
-out_fput:
+	error = chown_common(&file->f_path, user, group);
 	fput(file);
 out:
 	return error;
Index: vfs-2.6/fs/attr.c
===================================================================
--- vfs-2.6.orig/fs/attr.c	2008-04-23 21:33:41.000000000 +0200
+++ vfs-2.6/fs/attr.c	2008-04-23 21:33:42.000000000 +0200
@@ -14,6 +14,7 @@
 #include <linux/fcntl.h>
 #include <linux/quotaops.h>
 #include <linux/security.h>
+#include <linux/mount.h>
 
 /* Taken over from the old code... */
 
@@ -182,5 +183,25 @@ int notify_change(struct dentry * dentry
 
 	return error;
 }
-
 EXPORT_SYMBOL(notify_change);
+
+int path_setattr(struct path *path, struct iattr *attr)
+{
+	struct inode *inode = path->dentry->d_inode;
+	int error = mnt_want_write(path->mnt);
+
+	if (error)
+		goto out;
+
+	error = -EPERM;
+	if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
+		goto out_drop_write;
+
+	error = notify_change(path->dentry, attr);
+
+ out_drop_write:
+	mnt_drop_write(path->mnt);
+ out:
+	return error;
+}
+EXPORT_SYMBOL(path_setattr);
Index: vfs-2.6/fs/nfsd/vfs.c
===================================================================
--- vfs-2.6.orig/fs/nfsd/vfs.c	2008-04-23 21:33:41.000000000 +0200
+++ vfs-2.6/fs/nfsd/vfs.c	2008-04-23 21:33:42.000000000 +0200
@@ -393,8 +393,11 @@ nfsd_setattr(struct svc_rqst *rqstp, str
 
 	err = nfserr_notsync;
 	if (!check_guard || guardtime == inode->i_ctime.tv_sec) {
+		struct path path;
+
 		fh_lock(fhp);
-		host_err = notify_change(dentry, iap);
+		fh_to_path(fhp, &path);
+		host_err = path_setattr(&path, iap);
 		err = nfserrno(host_err);
 		fh_unlock(fhp);
 	}
@@ -950,14 +953,16 @@ out:
 	return err;
 }
 
-static void kill_suid(struct dentry *dentry)
+static void kill_suid(struct svc_fh *fhp)
 {
+	struct path	path;
 	struct iattr	ia;
 	ia.ia_valid = ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
 
-	mutex_lock(&dentry->d_inode->i_mutex);
-	notify_change(dentry, &ia);
-	mutex_unlock(&dentry->d_inode->i_mutex);
+	mutex_lock(&path.dentry->d_inode->i_mutex);
+	fh_to_path(fhp, &path);
+	path_setattr(&path, &ia);
+	mutex_unlock(&path.dentry->d_inode->i_mutex);
 }
 
 static __be32
@@ -1015,7 +1020,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, s
 
 	/* clear setuid/setgid flag after write */
 	if (host_err >= 0 && (inode->i_mode & (S_ISUID | S_ISGID)))
-		kill_suid(dentry);
+		kill_suid(fhp);
 
 	if (host_err >= 0 && stable) {
 		static ino_t	last_ino;
Index: vfs-2.6/include/linux/fs.h
===================================================================
--- vfs-2.6.orig/include/linux/fs.h	2008-04-23 21:33:41.000000000 +0200
+++ vfs-2.6/include/linux/fs.h	2008-04-23 21:33:42.000000000 +0200
@@ -1760,6 +1760,7 @@ extern int do_remount_sb(struct super_bl
 extern sector_t bmap(struct inode *, sector_t);
 #endif
 extern int notify_change(struct dentry *, struct iattr *);
+extern int path_setattr(struct path *, struct iattr *);
 extern int permission(struct inode *, int, struct nameidata *);
 extern int generic_permission(struct inode *, int,
 		int (*check_acl)(struct inode *, int));
Index: vfs-2.6/fs/ecryptfs/inode.c
===================================================================
--- vfs-2.6.orig/fs/ecryptfs/inode.c	2008-04-23 21:33:41.000000000 +0200
+++ vfs-2.6/fs/ecryptfs/inode.c	2008-04-23 21:33:42.000000000 +0200
@@ -842,7 +842,7 @@ ecryptfs_permission(struct inode *inode,
 static int ecryptfs_setattr(struct dentry *dentry, struct iattr *ia)
 {
 	int rc = 0;
-	struct dentry *lower_dentry;
+	struct path lower_path;
 	struct inode *inode;
 	struct inode *lower_inode;
 	struct ecryptfs_crypt_stat *crypt_stat;
@@ -852,7 +852,8 @@ static int ecryptfs_setattr(struct dentr
 		ecryptfs_init_crypt_stat(crypt_stat);
 	inode = dentry->d_inode;
 	lower_inode = ecryptfs_inode_to_lower(inode);
-	lower_dentry = ecryptfs_dentry_to_lower(dentry);
+	lower_path.mnt = ecryptfs_dentry_to_lower_mnt(dentry);
+	lower_path.dentry = ecryptfs_dentry_to_lower(dentry);
 	mutex_lock(&crypt_stat->cs_mutex);
 	if (S_ISDIR(dentry->d_inode->i_mode))
 		crypt_stat->flags &= ~(ECRYPTFS_ENCRYPTED);
@@ -903,9 +904,9 @@ static int ecryptfs_setattr(struct dentr
 	if (ia->ia_valid & (ATTR_KILL_SUID | ATTR_KILL_SGID))
 		ia->ia_valid &= ~ATTR_MODE;
 
-	mutex_lock(&lower_dentry->d_inode->i_mutex);
-	rc = notify_change(lower_dentry, ia);
-	mutex_unlock(&lower_dentry->d_inode->i_mutex);
+	mutex_lock(&lower_path.dentry->d_inode->i_mutex);
+	rc = path_setattr(&lower_path, ia);
+	mutex_unlock(&lower_path.dentry->d_inode->i_mutex);
 out:
 	fsstack_copy_attr_all(inode, lower_inode, NULL);
 	return rc;

--

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [patch 12/13] vfs: add path_setxattr()
  2008-04-24 11:39 [patch 00/13] vfs: add helpers to check r/o bind mounts Miklos Szeredi
                   ` (10 preceding siblings ...)
  2008-04-24 11:40 ` [patch 11/13] vfs: add path_setattr() Miklos Szeredi
@ 2008-04-24 11:40 ` Miklos Szeredi
  2008-04-24 11:40 ` [patch 13/13] vfs: add path_removexattr() Miklos Szeredi
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 11:40 UTC (permalink / raw)
  To: viro; +Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

[-- Attachment #1: path_setxattr.patch --]
[-- Type: text/plain, Size: 6275 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Introduce path_setxattr().  Make vfs_setxattr() static.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/nfsd/vfs.c         |   18 ++++++++++--------
 fs/xattr.c            |   44 ++++++++++++++++++++++----------------------
 include/linux/xattr.h |    2 +-
 3 files changed, 33 insertions(+), 31 deletions(-)

Index: vfs-2.6/fs/nfsd/vfs.c
===================================================================
--- vfs-2.6.orig/fs/nfsd/vfs.c	2008-04-23 21:33:42.000000000 +0200
+++ vfs-2.6/fs/nfsd/vfs.c	2008-04-23 21:33:44.000000000 +0200
@@ -435,7 +435,7 @@ static ssize_t nfsd_getxattr(struct dent
 
 #if defined(CONFIG_NFSD_V4)
 static int
-set_nfsv4_acl_one(struct dentry *dentry, struct posix_acl *pacl, char *key)
+set_nfsv4_acl_one(struct path *path, struct posix_acl *pacl, char *key)
 {
 	int len;
 	size_t buflen;
@@ -454,7 +454,7 @@ set_nfsv4_acl_one(struct dentry *dentry,
 		goto out;
 	}
 
-	error = vfs_setxattr(dentry, key, buf, len, 0);
+	error = path_setxattr(path, key, buf, len, 0);
 out:
 	kfree(buf);
 	return error;
@@ -466,7 +466,7 @@ nfsd4_set_nfs4_acl(struct svc_rqst *rqst
 {
 	__be32 error;
 	int host_error;
-	struct dentry *dentry;
+	struct path path;
 	struct inode *inode;
 	struct posix_acl *pacl = NULL, *dpacl = NULL;
 	unsigned int flags = 0;
@@ -476,8 +476,8 @@ nfsd4_set_nfs4_acl(struct svc_rqst *rqst
 	if (error)
 		return error;
 
-	dentry = fhp->fh_dentry;
-	inode = dentry->d_inode;
+	fh_to_path(fhp, &path);
+	inode = path.dentry->d_inode;
 	if (S_ISDIR(inode->i_mode))
 		flags = NFS4_ACL_DIR;
 
@@ -487,12 +487,12 @@ nfsd4_set_nfs4_acl(struct svc_rqst *rqst
 	} else if (host_error < 0)
 		goto out_nfserr;
 
-	host_error = set_nfsv4_acl_one(dentry, pacl, POSIX_ACL_XATTR_ACCESS);
+	host_error = set_nfsv4_acl_one(&path, pacl, POSIX_ACL_XATTR_ACCESS);
 	if (host_error < 0)
 		goto out_release;
 
 	if (S_ISDIR(inode->i_mode))
-		host_error = set_nfsv4_acl_one(dentry, dpacl, POSIX_ACL_XATTR_DEFAULT);
+		host_error = set_nfsv4_acl_one(&path, dpacl, POSIX_ACL_XATTR_DEFAULT);
 
 out_release:
 	posix_acl_release(pacl);
@@ -2039,6 +2039,7 @@ nfsd_get_posix_acl(struct svc_fh *fhp, i
 int
 nfsd_set_posix_acl(struct svc_fh *fhp, int type, struct posix_acl *acl)
 {
+	struct path path;
 	struct inode *inode = fhp->fh_dentry->d_inode;
 	char *name;
 	void *value = NULL;
@@ -2074,8 +2075,9 @@ nfsd_set_posix_acl(struct svc_fh *fhp, i
 	error = mnt_want_write(fhp->fh_export->ex_path.mnt);
 	if (error)
 		goto getout;
+	fh_to_path(fhp, &path);
 	if (size)
-		error = vfs_setxattr(fhp->fh_dentry, name, value, size, 0);
+		error = path_setxattr(&path, name, value, size, 0);
 	else {
 		if (!S_ISDIR(inode->i_mode) && type == ACL_TYPE_DEFAULT)
 			error = 0;
Index: vfs-2.6/fs/xattr.c
===================================================================
--- vfs-2.6.orig/fs/xattr.c	2008-04-23 21:33:40.000000000 +0200
+++ vfs-2.6/fs/xattr.c	2008-04-23 21:33:44.000000000 +0200
@@ -66,7 +66,7 @@ xattr_permission(struct inode *inode, co
 	return permission(inode, mask, NULL);
 }
 
-int
+static int
 vfs_setxattr(struct dentry *dentry, char *name, void *value,
 		size_t size, int flags)
 {
@@ -101,7 +101,20 @@ out:
 	mutex_unlock(&inode->i_mutex);
 	return error;
 }
-EXPORT_SYMBOL_GPL(vfs_setxattr);
+
+int path_setxattr(struct path *path, char *name, void *value, size_t size,
+		  int flags)
+{
+	int error = mnt_want_write(path->mnt);
+
+	if (!error) {
+		error = vfs_setxattr(path->dentry, name, value, size, flags);
+		mnt_drop_write(path->mnt);
+	}
+
+	return error;
+}
+EXPORT_SYMBOL(path_setxattr);
 
 ssize_t
 xattr_getsecurity(struct inode *inode, const char *name, void *value,
@@ -218,7 +231,7 @@ EXPORT_SYMBOL_GPL(vfs_removexattr);
  * Extended attribute SET operations
  */
 static long
-setxattr(struct dentry *d, char __user *name, void __user *value,
+setxattr(struct path *path, char __user *name, void __user *value,
 	 size_t size, int flags)
 {
 	int error;
@@ -246,7 +259,7 @@ setxattr(struct dentry *d, char __user *
 		}
 	}
 
-	error = vfs_setxattr(d, kname, kvalue, size, flags);
+	error = path_setxattr(path, kname, kvalue, size, flags);
 	kfree(kvalue);
 	return error;
 }
@@ -261,11 +274,8 @@ sys_setxattr(char __user *path, char __u
 	error = user_path_walk(path, &nd);
 	if (error)
 		return error;
-	error = mnt_want_write(nd.path.mnt);
-	if (!error) {
-		error = setxattr(nd.path.dentry, name, value, size, flags);
-		mnt_drop_write(nd.path.mnt);
-	}
+
+	error = setxattr(&nd.path, name, value, size, flags);
 	path_put(&nd.path);
 	return error;
 }
@@ -280,11 +290,7 @@ sys_lsetxattr(char __user *path, char __
 	error = user_path_walk_link(path, &nd);
 	if (error)
 		return error;
-	error = mnt_want_write(nd.path.mnt);
-	if (!error) {
-		error = setxattr(nd.path.dentry, name, value, size, flags);
-		mnt_drop_write(nd.path.mnt);
-	}
+	error = setxattr(&nd.path, name, value, size, flags);
 	path_put(&nd.path);
 	return error;
 }
@@ -294,19 +300,13 @@ sys_fsetxattr(int fd, char __user *name,
 	      size_t size, int flags)
 {
 	struct file *f;
-	struct dentry *dentry;
 	int error = -EBADF;
 
 	f = fget(fd);
 	if (!f)
 		return error;
-	dentry = f->f_path.dentry;
-	audit_inode(NULL, dentry);
-	error = mnt_want_write(f->f_path.mnt);
-	if (!error) {
-		error = setxattr(dentry, name, value, size, flags);
-		mnt_drop_write(f->f_path.mnt);
-	}
+	audit_inode(NULL, f->f_path.dentry);
+	error = setxattr(&f->f_path, name, value, size, flags);
 	fput(f);
 	return error;
 }
Index: vfs-2.6/include/linux/xattr.h
===================================================================
--- vfs-2.6.orig/include/linux/xattr.h	2008-04-23 21:33:40.000000000 +0200
+++ vfs-2.6/include/linux/xattr.h	2008-04-23 21:33:44.000000000 +0200
@@ -49,7 +49,7 @@ struct xattr_handler {
 ssize_t xattr_getsecurity(struct inode *, const char *, void *, size_t);
 ssize_t vfs_getxattr(struct dentry *, char *, void *, size_t);
 ssize_t vfs_listxattr(struct dentry *d, char *list, size_t size);
-int vfs_setxattr(struct dentry *, char *, void *, size_t, int);
+int path_setxattr(struct path *, char *, void *, size_t, int);
 int vfs_removexattr(struct dentry *, char *);
 
 ssize_t generic_getxattr(struct dentry *dentry, const char *name, void *buffer, size_t size);

--

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [patch 13/13] vfs: add path_removexattr()
  2008-04-24 11:39 [patch 00/13] vfs: add helpers to check r/o bind mounts Miklos Szeredi
                   ` (11 preceding siblings ...)
  2008-04-24 11:40 ` [patch 12/13] vfs: add path_setxattr() Miklos Szeredi
@ 2008-04-24 11:40 ` Miklos Szeredi
  2008-04-24 12:42 ` [patch 00/13] vfs: add helpers to check r/o bind mounts Al Viro
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 11:40 UTC (permalink / raw)
  To: viro; +Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

[-- Attachment #1: path_removexattr.patch --]
[-- Type: text/plain, Size: 4529 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Introduce path_removexattr().  Make vfs_removexattr() static.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/nfsd/vfs.c         |    6 +-----
 fs/xattr.c            |   41 +++++++++++++++++++----------------------
 include/linux/xattr.h |    2 +-
 3 files changed, 21 insertions(+), 28 deletions(-)

Index: vfs-2.6/fs/nfsd/vfs.c
===================================================================
--- vfs-2.6.orig/fs/nfsd/vfs.c	2008-04-23 21:33:44.000000000 +0200
+++ vfs-2.6/fs/nfsd/vfs.c	2008-04-23 21:37:57.000000000 +0200
@@ -2072,9 +2072,6 @@ nfsd_set_posix_acl(struct svc_fh *fhp, i
 	} else
 		size = 0;
 
-	error = mnt_want_write(fhp->fh_export->ex_path.mnt);
-	if (error)
-		goto getout;
 	fh_to_path(fhp, &path);
 	if (size)
 		error = path_setxattr(&path, name, value, size, 0);
@@ -2082,12 +2079,11 @@ nfsd_set_posix_acl(struct svc_fh *fhp, i
 		if (!S_ISDIR(inode->i_mode) && type == ACL_TYPE_DEFAULT)
 			error = 0;
 		else {
-			error = vfs_removexattr(fhp->fh_dentry, name);
+			error = path_removexattr(&path, name);
 			if (error == -ENODATA)
 				error = 0;
 		}
 	}
-	mnt_drop_write(fhp->fh_export->ex_path.mnt);
 
 getout:
 	kfree(value);
Index: vfs-2.6/fs/xattr.c
===================================================================
--- vfs-2.6.orig/fs/xattr.c	2008-04-23 21:33:44.000000000 +0200
+++ vfs-2.6/fs/xattr.c	2008-04-23 21:37:20.000000000 +0200
@@ -199,7 +199,7 @@ vfs_listxattr(struct dentry *d, char *li
 }
 EXPORT_SYMBOL_GPL(vfs_listxattr);
 
-int
+static int
 vfs_removexattr(struct dentry *dentry, char *name)
 {
 	struct inode *inode = dentry->d_inode;
@@ -224,8 +224,19 @@ vfs_removexattr(struct dentry *dentry, c
 		fsnotify_xattr(dentry);
 	return error;
 }
-EXPORT_SYMBOL_GPL(vfs_removexattr);
 
+int path_removexattr(struct path *path, char *name)
+{
+	int error = mnt_want_write(path->mnt);
+
+	if (!error) {
+		error = vfs_removexattr(path->dentry, name);
+		mnt_drop_write(path->mnt);
+	}
+
+	return error;
+}
+EXPORT_SYMBOL(path_removexattr);
 
 /*
  * Extended attribute SET operations
@@ -470,7 +481,7 @@ sys_flistxattr(int fd, char __user *list
  * Extended attribute REMOVE operations
  */
 static long
-removexattr(struct dentry *d, char __user *name)
+removexattr(struct path *path, char __user *name)
 {
 	int error;
 	char kname[XATTR_NAME_MAX + 1];
@@ -481,7 +492,7 @@ removexattr(struct dentry *d, char __use
 	if (error < 0)
 		return error;
 
-	return vfs_removexattr(d, kname);
+	return path_removexattr(path, kname);
 }
 
 asmlinkage long
@@ -493,11 +504,7 @@ sys_removexattr(char __user *path, char 
 	error = user_path_walk(path, &nd);
 	if (error)
 		return error;
-	error = mnt_want_write(nd.path.mnt);
-	if (!error) {
-		error = removexattr(nd.path.dentry, name);
-		mnt_drop_write(nd.path.mnt);
-	}
+	error = removexattr(&nd.path, name);
 	path_put(&nd.path);
 	return error;
 }
@@ -511,11 +518,7 @@ sys_lremovexattr(char __user *path, char
 	error = user_path_walk_link(path, &nd);
 	if (error)
 		return error;
-	error = mnt_want_write(nd.path.mnt);
-	if (!error) {
-		error = removexattr(nd.path.dentry, name);
-		mnt_drop_write(nd.path.mnt);
-	}
+	error = removexattr(&nd.path, name);
 	path_put(&nd.path);
 	return error;
 }
@@ -524,19 +527,13 @@ asmlinkage long
 sys_fremovexattr(int fd, char __user *name)
 {
 	struct file *f;
-	struct dentry *dentry;
 	int error = -EBADF;
 
 	f = fget(fd);
 	if (!f)
 		return error;
-	dentry = f->f_path.dentry;
-	audit_inode(NULL, dentry);
-	error = mnt_want_write(f->f_path.mnt);
-	if (!error) {
-		error = removexattr(dentry, name);
-		mnt_drop_write(f->f_path.mnt);
-	}
+	audit_inode(NULL, f->f_path.dentry);
+	error = removexattr(&f->f_path, name);
 	fput(f);
 	return error;
 }
Index: vfs-2.6/include/linux/xattr.h
===================================================================
--- vfs-2.6.orig/include/linux/xattr.h	2008-04-23 21:33:44.000000000 +0200
+++ vfs-2.6/include/linux/xattr.h	2008-04-23 21:36:07.000000000 +0200
@@ -50,7 +50,7 @@ ssize_t xattr_getsecurity(struct inode *
 ssize_t vfs_getxattr(struct dentry *, char *, void *, size_t);
 ssize_t vfs_listxattr(struct dentry *d, char *list, size_t size);
 int path_setxattr(struct path *, char *, void *, size_t, int);
-int vfs_removexattr(struct dentry *, char *);
+int path_removexattr(struct path *, char *);
 
 ssize_t generic_getxattr(struct dentry *dentry, const char *name, void *buffer, size_t size);
 ssize_t generic_listxattr(struct dentry *dentry, char *buffer, size_t buffer_size);

--

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 11:39 [patch 00/13] vfs: add helpers to check r/o bind mounts Miklos Szeredi
                   ` (12 preceding siblings ...)
  2008-04-24 11:40 ` [patch 13/13] vfs: add path_removexattr() Miklos Szeredi
@ 2008-04-24 12:42 ` Al Viro
  2008-04-24 13:05   ` Miklos Szeredi
  2008-04-24 17:04   ` Erez Zadok
  2008-04-24 16:52 ` Erez Zadok
  2008-05-01  5:40 ` Dave Hansen
  15 siblings, 2 replies; 63+ messages in thread
From: Al Viro @ 2008-04-24 12:42 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

On Thu, Apr 24, 2008 at 01:39:50PM +0200, Miklos Szeredi wrote:
> Then I did this series, which basically guarantees, that that cannot
> happen.  Al rejected it, and rather fixed some of the remaining
> places.  He still missed several, which sort of proves my point.

Which ones have I missed?

> I think it's totally pointless to continue trying to fix the symptoms
> instead of getting at the root of the problem.
> 
> I know that VFS interfaces are a sensitive question, but it would be
> nice it we could have some sanity back in this discussion.

Yes, it would.  How about that, for starters:

path_create() et.al. are *wrong* for nfsd; if nothing else, I'm not at
all convinced that even apparmour wants export path + relative there
_and_ r/o vs. r/w is decision that doesn't clearly map to ex_mnt flags.

Moreover, it's not at all obvious that we want to drop write access as
soon as vfs_...() is over in case of nfsd.  Some of the stuff done
immeidately afterwards might very well qualify for inclusion into
protected area; some of the stuff done immediately _prior_ very likely
needs that as well - look at fh_verify() and tell me why we don't want
that "I'll hold write access to vfsmount" to span the area including
that sucker.  If we want the r/o vs r/w policy directly vfsmount-based
for nfsd, that is.

For ecryptfs it's also bogus - at the very least we need to decide what
should happen when underlying vfsmount is remounted.  Again, I'm less
than convinced that we want the same way to express r/o vs. r/w policy.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 12:42 ` [patch 00/13] vfs: add helpers to check r/o bind mounts Al Viro
@ 2008-04-24 13:05   ` Miklos Szeredi
  2008-04-24 13:48     ` Al Viro
  2008-04-24 17:04   ` Erez Zadok
  1 sibling, 1 reply; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 13:05 UTC (permalink / raw)
  To: viro
  Cc: miklos, akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel,
	linux-kernel

> On Thu, Apr 24, 2008 at 01:39:50PM +0200, Miklos Szeredi wrote:
> > Then I did this series, which basically guarantees, that that cannot
> > happen.  Al rejected it, and rather fixed some of the remaining
> > places.  He still missed several, which sort of proves my point.
> 
> Which ones have I missed?

Several calls to nfsd_setattr() for starters.  But I didn't do a full
audit of all vfs_* callers, there might well be others.

> > I think it's totally pointless to continue trying to fix the symptoms
> > instead of getting at the root of the problem.
> > 
> > I know that VFS interfaces are a sensitive question, but it would be
> > nice it we could have some sanity back in this discussion.
> 
> Yes, it would.  How about that, for starters:
> 
> path_create() et.al. are *wrong* for nfsd;

Why are they wrong?  The performance impact is negligible, the code is
not any more complicated.

> if nothing else, I'm not at
> all convinced that even apparmour wants export path + relative there
> _and_ r/o vs. r/w is decision that doesn't clearly map to ex_mnt flags.

I don't care what apparmor wants.  What I care about is consistency of
the thing.  If _anything_ calls into the filesystem, the same security
hooks should be called and the same mount flags should be checked.

Moving these into the callers would just result in a big chaos.

> Moreover, it's not at all obvious that we want to drop write access as
> soon as vfs_...() is over in case of nfsd.  Some of the stuff done
> immeidately afterwards might very well qualify for inclusion into
> protected area; some of the stuff done immediately _prior_ very likely
> needs that as well - look at fh_verify() and tell me why we don't want
> that "I'll hold write access to vfsmount" to span the area including
> that sucker.

I don't see anything in fh_verify() or after which modifies the
filesystem.  The purpose of the r/o checks is to prevent modification,
and nothing more.  Not even fsync() and friends are protected, because
they don't modify the filesystem in the logical sense, even if they do
flush caches to physical media.

> For ecryptfs it's also bogus - at the very least we need to decide what
> should happen when underlying vfsmount is remounted.  Again, I'm less
> than convinced that we want the same way to express r/o vs. r/w policy.

We can add whatever policy we want.  The path_ interface doesn't stop
you from having sane r/o semantics on ecryptfs.  What it does is to
make sure that the r/o rules are _always_ followed, regardless of any
policy or lack thereof in the callers.

Miklos

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 13:05   ` Miklos Szeredi
@ 2008-04-24 13:48     ` Al Viro
  2008-04-24 14:00       ` Al Viro
                         ` (2 more replies)
  0 siblings, 3 replies; 63+ messages in thread
From: Al Viro @ 2008-04-24 13:48 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

On Thu, Apr 24, 2008 at 03:05:21PM +0200, Miklos Szeredi wrote:
> Several calls to nfsd_setattr() for starters.  But I didn't do a full
> audit of all vfs_* callers, there might well be others.
> 
> > > I think it's totally pointless to continue trying to fix the symptoms
> > > instead of getting at the root of the problem.
> > > 
> > > I know that VFS interfaces are a sensitive question, but it would be
> > > nice it we could have some sanity back in this discussion.
> > 
> > Yes, it would.  How about that, for starters:
> > 
> > path_create() et.al. are *wrong* for nfsd;
> 
> Why are they wrong?  The performance impact is negligible, the code is
> not any more complicated.

Because you are mixing the "this sucker will be used for write access for
this interval" and "do what is needed to create a file".  The latter is
not guaranteed to coincide with the former and that in itself is enough.

> > if nothing else, I'm not at
> > all convinced that even apparmour wants export path + relative there
> > _and_ r/o vs. r/w is decision that doesn't clearly map to ex_mnt flags.
> 
> I don't care what apparmor wants.  What I care about is consistency of
> the thing.  If _anything_ calls into the filesystem, the same security
> hooks should be called and the same mount flags should be checked.

_IF_ they make sense for call in question.  At the level where they
are applied.

> > soon as vfs_...() is over in case of nfsd.  Some of the stuff done
> > immeidately afterwards might very well qualify for inclusion into
> > protected area; some of the stuff done immediately _prior_ very likely
> > needs that as well - look at fh_verify() and tell me why we don't want
> > that "I'll hold write access to vfsmount" to span the area including
> > that sucker.
> 
> I don't see anything in fh_verify() or after which modifies the
> filesystem.  The purpose of the r/o checks is to prevent modification,
> and nothing more. 

Bullshit.  It's not just "prevent modification".  It's "make sure that
no remount r/o happens while we do that".  fh_verify() doesn't modify.
It does check, though, and later we have that check duplicated by
will_write/wont_write pair bracketing a part of sequence.

Please, realize that spot checks like that are inherently racy and that's
the problem we had all along with r/o remounts et.al.

And that's why they got split in will/wont pairs and stretched to cover
relevant areas.  Areas that depend on specific callers.

And yes, we need the counterpart for superblock-level stuff, to deal with
remaining races (look at fs_may_remount_ro() and puke - it's still racy
as hell).  E.g. unlink should do sb-level "will write" when it drops
i_nlink to 0 and final removal of inode should do "won't write".

> > For ecryptfs it's also bogus - at the very least we need to decide what
> > should happen when underlying vfsmount is remounted.  Again, I'm less
> > than convinced that we want the same way to express r/o vs. r/w policy.
> 
> We can add whatever policy we want.  The path_ interface doesn't stop
> you from having sane r/o semantics on ecryptfs.  What it does is to
> make sure that the r/o rules are _always_ followed, regardless of any
> policy or lack thereof in the callers.

ecryptfs should not use the bloody vfsmount, for fuck sake!  You are confusing
access to fs with access to fs via specific vfsmount.  And pretending that
the latter is fundamental operation.  It isn't.  Flags on vfsmounts *do*
control it.  But the same operations make sense without any vfsmounts involved.
At all.  And "so let's invent some and express our access control rules by
tweaking its flag" is a kludge, not a sane answer.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 13:48     ` Al Viro
@ 2008-04-24 14:00       ` Al Viro
  2008-04-24 14:16         ` Miklos Szeredi
  2008-04-24 14:09       ` Miklos Szeredi
  2008-04-24 17:25       ` Erez Zadok
  2 siblings, 1 reply; 63+ messages in thread
From: Al Viro @ 2008-04-24 14:00 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

On Thu, Apr 24, 2008 at 02:48:26PM +0100, Al Viro wrote:
> On Thu, Apr 24, 2008 at 03:05:21PM +0200, Miklos Szeredi wrote:
> > Several calls to nfsd_setattr() for starters.  But I didn't do a full
> > audit of all vfs_* callers, there might well be others.

BTW, nfsd_setattr() on r/o ->ex_mnt *will* fail.  Again, check fh_verify()
and see what it does with MAY_SATTR.

And I certainly agree that it ought to be replaced by will/wont pair to
close the remount race.  One that had been there all along.  All fh_verify()
callers of that kind need it - we want to pull mnt_{will,wont}_write()
pair into callers *and* stretch to protect the entire relevant area.

Which contains vfs_...() in case of nfsd_create, etc.  See what I mean?
That's exactly the thing I'd been talking about - the area we want to
cover is _bigger_ than vfs_...() and contains nfsd-specific logic.  IOW,
doesn't get folded into any VFS-provided helper.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 13:48     ` Al Viro
  2008-04-24 14:00       ` Al Viro
@ 2008-04-24 14:09       ` Miklos Szeredi
  2008-04-24 14:28         ` Al Viro
  2008-04-24 17:25       ` Erez Zadok
  2 siblings, 1 reply; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 14:09 UTC (permalink / raw)
  To: viro
  Cc: miklos, akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel,
	linux-kernel

> > Several calls to nfsd_setattr() for starters.  But I didn't do a full
> > audit of all vfs_* callers, there might well be others.
> > 
> > > > I think it's totally pointless to continue trying to fix the symptoms
> > > > instead of getting at the root of the problem.
> > > > 
> > > > I know that VFS interfaces are a sensitive question, but it would be
> > > > nice it we could have some sanity back in this discussion.
> > > 
> > > Yes, it would.  How about that, for starters:
> > > 
> > > path_create() et.al. are *wrong* for nfsd;
> > 
> > Why are they wrong?  The performance impact is negligible, the code is
> > not any more complicated.
> 
> Because you are mixing the "this sucker will be used for write access for
> this interval" and "do what is needed to create a file".  The latter is
> not guaranteed to coincide with the former and that in itself is enough.

I lost you there, sorry.  Can you please rephrase a bit less
abstractedly?

> 
> > > if nothing else, I'm not at
> > > all convinced that even apparmour wants export path + relative there
> > > _and_ r/o vs. r/w is decision that doesn't clearly map to ex_mnt flags.
> > 
> > I don't care what apparmor wants.  What I care about is consistency of
> > the thing.  If _anything_ calls into the filesystem, the same security
> > hooks should be called and the same mount flags should be checked.
> 
> _IF_ they make sense for call in question.  At the level where they
> are applied.

They do.  Specific counterexamples please.

> > > soon as vfs_...() is over in case of nfsd.  Some of the stuff done
> > > immeidately afterwards might very well qualify for inclusion into
> > > protected area; some of the stuff done immediately _prior_ very likely
> > > needs that as well - look at fh_verify() and tell me why we don't want
> > > that "I'll hold write access to vfsmount" to span the area including
> > > that sucker.
> > 
> > I don't see anything in fh_verify() or after which modifies the
> > filesystem.  The purpose of the r/o checks is to prevent modification,
> > and nothing more. 
> 
> Bullshit.  It's not just "prevent modification".  It's "make sure that
> no remount r/o happens while we do that".

Sure.

>  fh_verify() doesn't modify.
> It does check, though, and later we have that check duplicated by
> will_write/wont_write pair bracketing a part of sequence.

So what?  All the other checks are also duplicated within
vfs_create()->may_create()->permission().

> Please, realize that spot checks like that are inherently racy and that's
> the problem we had all along with r/o remounts et.al.

I do very much realize that.

> And that's why they got split in will/wont pairs and stretched to cover
> relevant areas.  Areas that depend on specific callers.

Specifics please.  I don't see any reason why the brackets would need
to be stretched (except to make make multiple vfs calls atomic wrt r/o
remount).

> And yes, we need the counterpart for superblock-level stuff, to deal with
> remaining races (look at fs_may_remount_ro() and puke - it's still racy
> as hell).  E.g. unlink should do sb-level "will write" when it drops
> i_nlink to 0 and final removal of inode should do "won't write".

Sure.

> > > For ecryptfs it's also bogus - at the very least we need to decide what
> > > should happen when underlying vfsmount is remounted.  Again, I'm less
> > > than convinced that we want the same way to express r/o vs. r/w policy.
> > 
> > We can add whatever policy we want.  The path_ interface doesn't stop
> > you from having sane r/o semantics on ecryptfs.  What it does is to
> > make sure that the r/o rules are _always_ followed, regardless of any
> > policy or lack thereof in the callers.
> 
> ecryptfs should not use the bloody vfsmount, for fuck sake!  You are
> confusing access to fs with access to fs via specific vfsmount.  And
> pretending that the latter is fundamental operation.

Umm, isn't it?  Want to redo open() without a vfsmount?

Miklos

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 14:00       ` Al Viro
@ 2008-04-24 14:16         ` Miklos Szeredi
  2008-04-24 14:35           ` Al Viro
  0 siblings, 1 reply; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 14:16 UTC (permalink / raw)
  To: viro
  Cc: miklos, akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel,
	linux-kernel

> > On Thu, Apr 24, 2008 at 03:05:21PM +0200, Miklos Szeredi wrote:
> > > Several calls to nfsd_setattr() for starters.  But I didn't do a full
> > > audit of all vfs_* callers, there might well be others.
> 
> BTW, nfsd_setattr() on r/o ->ex_mnt *will* fail.  Again, check fh_verify()
> and see what it does with MAY_SATTR.

Ahh, nice and racy.  __mnt_is_readonly() is't supposed to be used for
this sort of thing.

> And I certainly agree that it ought to be replaced by will/wont pair to
> close the remount race.  One that had been there all along.  All fh_verify()
> callers of that kind need it - we want to pull mnt_{will,wont}_write()
> pair into callers *and* stretch to protect the entire relevant area.
> 
> Which contains vfs_...() in case of nfsd_create, etc.  See what I mean?
> That's exactly the thing I'd been talking about - the area we want to
> cover is _bigger_ than vfs_...() and contains nfsd-specific logic.  IOW,
> doesn't get folded into any VFS-provided helper.

I still don't get it why it needs to cover nfsd-specifi logic.  What
does nfsd have to do with r/o mounts?

Miklos

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 14:09       ` Miklos Szeredi
@ 2008-04-24 14:28         ` Al Viro
  2008-04-24 14:36           ` Miklos Szeredi
  2008-04-24 17:29           ` Erez Zadok
  0 siblings, 2 replies; 63+ messages in thread
From: Al Viro @ 2008-04-24 14:28 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

On Thu, Apr 24, 2008 at 04:09:18PM +0200, Miklos Szeredi wrote:
> > Because you are mixing the "this sucker will be used for write access for
> > this interval" and "do what is needed to create a file".  The latter is
> > not guaranteed to coincide with the former and that in itself is enough.
> 
> I lost you there, sorry.  Can you please rephrase a bit less
> abstractedly?

The area to be protected against remount is not guaranteed to coincide with
vfs_...() or to contain nothing specific to this caller.

> > Bullshit.  It's not just "prevent modification".  It's "make sure that
> > no remount r/o happens while we do that".
> 
> Sure.
> 
> >  fh_verify() doesn't modify.
> > It does check, though, and later we have that check duplicated by
> > will_write/wont_write pair bracketing a part of sequence.
> 
> So what?  All the other checks are also duplicated within
> vfs_create()->may_create()->permission().

RTFS.  permission() doesn't do "is that vfsmount read-only" checks, exactly
because it's 100% bogus - either you cover it with entire area where we
are guaranteed to stay r/w, or it's by definition racy.

> > ecryptfs should not use the bloody vfsmount, for fuck sake!  You are
> > confusing access to fs with access to fs via specific vfsmount.  And
> > pretending that the latter is fundamental operation.
> 
> Umm, isn't it?  Want to redo open() without a vfsmount?

FWIW, I'm not all that happy about the way ecryptfs_interpose() is done,
while we are at it.  We get the sucker opened by whoever steps on given
place in the tree first, with subsequent operations done using the resulting
struct file.  With fallback to r/o open.  What happens to somebody who
tries to open it with enough permissions to do r/w?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 14:16         ` Miklos Szeredi
@ 2008-04-24 14:35           ` Al Viro
  2008-04-24 14:42             ` Miklos Szeredi
  0 siblings, 1 reply; 63+ messages in thread
From: Al Viro @ 2008-04-24 14:35 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

On Thu, Apr 24, 2008 at 04:16:52PM +0200, Miklos Szeredi wrote:
> > And I certainly agree that it ought to be replaced by will/wont pair to
> > close the remount race.  One that had been there all along.  All fh_verify()
> > callers of that kind need it - we want to pull mnt_{will,wont}_write()
> > pair into callers *and* stretch to protect the entire relevant area.
> > 
> > Which contains vfs_...() in case of nfsd_create, etc.  See what I mean?
> > That's exactly the thing I'd been talking about - the area we want to
> > cover is _bigger_ than vfs_...() and contains nfsd-specific logic.  IOW,
> > doesn't get folded into any VFS-provided helper.
> 
> I still don't get it why it needs to cover nfsd-specifi logic.  What
> does nfsd have to do with r/o mounts?

Explain to me again, how fh_verify() manages to contain no nfsd-specific
logics.  Please.  You are right - we do have races there.  Always had.
And nfsd_permission() is the next target for continuation of ro-bind
series.  Assuming that we don't simply make r/w export to hold will_write
all along, in which case all these checks around calls of vfs_...() in
there simply go away - that's also an arguable option.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 14:28         ` Al Viro
@ 2008-04-24 14:36           ` Miklos Szeredi
  2008-04-24 14:44             ` Al Viro
  2008-04-24 17:29           ` Erez Zadok
  1 sibling, 1 reply; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 14:36 UTC (permalink / raw)
  To: viro
  Cc: miklos, akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel,
	linux-kernel

> > >  fh_verify() doesn't modify.
> > > It does check, though, and later we have that check duplicated by
> > > will_write/wont_write pair bracketing a part of sequence.
> > 
> > So what?  All the other checks are also duplicated within
> > vfs_create()->may_create()->permission().
> 
> RTFS.  permission() doesn't do "is that vfsmount read-only" checks, exactly
> because it's 100% bogus - either you cover it with entire area where we
> are guaranteed to stay r/w, or it's by definition racy.

I know that.

That does not mean, that fh_verify() needs to do vfsmount r/o checks.
AFAICS it's perfectly OK to do that later, around the vfs_ call.

> > > ecryptfs should not use the bloody vfsmount, for fuck sake!  You are
> > > confusing access to fs with access to fs via specific vfsmount.  And
> > > pretending that the latter is fundamental operation.
> > 
> > Umm, isn't it?  Want to redo open() without a vfsmount?
> 
> FWIW, I'm not all that happy about the way ecryptfs_interpose() is done,
> while we are at it.  We get the sucker opened by whoever steps on given
> place in the tree first, with subsequent operations done using the resulting
> struct file.  With fallback to r/o open.  What happens to somebody who
> tries to open it with enough permissions to do r/w?

You are digressing from the subject.  Yes it would be nice to fix
ecryptfs to be less broken.  But that's not what this patchset is set
out to do.

Miklos

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 14:35           ` Al Viro
@ 2008-04-24 14:42             ` Miklos Szeredi
  2008-04-24 14:48               ` Al Viro
  0 siblings, 1 reply; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 14:42 UTC (permalink / raw)
  To: viro
  Cc: miklos, akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel,
	linux-kernel

> > > And I certainly agree that it ought to be replaced by will/wont pair to
> > > close the remount race.  One that had been there all along.  All fh_verify()
> > > callers of that kind need it - we want to pull mnt_{will,wont}_write()
> > > pair into callers *and* stretch to protect the entire relevant area.
> > > 
> > > Which contains vfs_...() in case of nfsd_create, etc.  See what I mean?
> > > That's exactly the thing I'd been talking about - the area we want to
> > > cover is _bigger_ than vfs_...() and contains nfsd-specific logic.  IOW,
> > > doesn't get folded into any VFS-provided helper.
> > 
> > I still don't get it why it needs to cover nfsd-specifi logic.  What
> > does nfsd have to do with r/o mounts?
> 
> Explain to me again, how fh_verify() manages to contain no nfsd-specific
> logics.  Please.

I didn't say it doesn't contain nfsd specifics.  What I said, that it
doesn't modify the filesystem.  So there's no reason to cover it with
mnt_want_write()/drop_write().

>  You are right - we do have races there.  Always had.
> And nfsd_permission() is the next target for continuation of ro-bind
> series.  Assuming that we don't simply make r/w export to hold will_write
> all along, in which case all these checks around calls of vfs_...() in
> there simply go away - that's also an arguable option.

Yes.  And that _still_ doesn't make the path_*() interface wrong.

Miklos

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 14:36           ` Miklos Szeredi
@ 2008-04-24 14:44             ` Al Viro
  2008-04-24 14:53               ` Miklos Szeredi
  0 siblings, 1 reply; 63+ messages in thread
From: Al Viro @ 2008-04-24 14:44 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

On Thu, Apr 24, 2008 at 04:36:38PM +0200, Miklos Szeredi wrote:
> > RTFS.  permission() doesn't do "is that vfsmount read-only" checks, exactly
> > because it's 100% bogus - either you cover it with entire area where we
> > are guaranteed to stay r/w, or it's by definition racy.
> 
> I know that.
> 
> That does not mean, that fh_verify() needs to do vfsmount r/o checks.
> AFAICS it's perfectly OK to do that later, around the vfs_ call.

... and around everything else that happens to be done after fh_verify
for write access, surely?  Note that e.g. nfsd_setattr() does _not_ call
vfs_<anything>()...

BTW, that assumes we want to do these checks for nfsd at access time.
And not go for _exports_ being r/w or r/o, with protection applying to
the entire lifetime of export.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 14:42             ` Miklos Szeredi
@ 2008-04-24 14:48               ` Al Viro
  2008-04-24 14:58                 ` Miklos Szeredi
  0 siblings, 1 reply; 63+ messages in thread
From: Al Viro @ 2008-04-24 14:48 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

On Thu, Apr 24, 2008 at 04:42:52PM +0200, Miklos Szeredi wrote:

> >  You are right - we do have races there.  Always had.
> > And nfsd_permission() is the next target for continuation of ro-bind
> > series.  Assuming that we don't simply make r/w export to hold will_write
> > all along, in which case all these checks around calls of vfs_...() in
> > there simply go away - that's also an arguable option.
> 
> Yes.  And that _still_ doesn't make the path_*() interface wrong.

It would make it bloody useless for nfsd.  With ecryptfs being a piss-poor
argument in favour of anything other than git rm, what's left?  Another
stack frame in fs/namei.c syscalls?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 14:44             ` Al Viro
@ 2008-04-24 14:53               ` Miklos Szeredi
  2008-04-24 15:12                 ` Al Viro
  0 siblings, 1 reply; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 14:53 UTC (permalink / raw)
  To: viro
  Cc: miklos, akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel,
	linux-kernel

> > > RTFS.  permission() doesn't do "is that vfsmount read-only" checks, exactly
> > > because it's 100% bogus - either you cover it with entire area where we
> > > are guaranteed to stay r/w, or it's by definition racy.
> > 
> > I know that.
> > 
> > That does not mean, that fh_verify() needs to do vfsmount r/o checks.
> > AFAICS it's perfectly OK to do that later, around the vfs_ call.
> 
> ... and around everything else that happens to be done after fh_verify
> for write access, surely?

What in particular?  You have managed to avoid answering this question
for the last...I don't know how many emails.

>  Note that e.g. nfsd_setattr() does _not_ call
> vfs_<anything>()...

Yes it does: notify_change().  It's vfs_setattr() under a pseudonym.

Miklos

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 14:48               ` Al Viro
@ 2008-04-24 14:58                 ` Miklos Szeredi
  2008-04-24 15:21                   ` Al Viro
  0 siblings, 1 reply; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 14:58 UTC (permalink / raw)
  To: viro
  Cc: miklos, akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel,
	linux-kernel

> > >  You are right - we do have races there.  Always had.
> > > And nfsd_permission() is the next target for continuation of ro-bind
> > > series.  Assuming that we don't simply make r/w export to hold will_write
> > > all along, in which case all these checks around calls of vfs_...() in
> > > there simply go away - that's also an arguable option.
> > 
> > Yes.  And that _still_ doesn't make the path_*() interface wrong.
> 
> It would make it bloody useless for nfsd.  With ecryptfs being a piss-poor
> argument in favour of anything other than git rm, what's left?  Another
> stack frame in fs/namei.c syscalls?

Since all the vfs_* functions will become static with path_* being the
only caller, the compiler will be happy to get rid of that stack frame
too.

What is left is the guarantee, that the race-free r/o remounts will
always work and some obscure caller didn't forget to surround it with
the r/o checks.

I think it's definitely worth it.

Miklos

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 14:53               ` Miklos Szeredi
@ 2008-04-24 15:12                 ` Al Viro
  2008-04-24 15:18                   ` Miklos Szeredi
  0 siblings, 1 reply; 63+ messages in thread
From: Al Viro @ 2008-04-24 15:12 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

On Thu, Apr 24, 2008 at 04:53:27PM +0200, Miklos Szeredi wrote:
> > > > RTFS.  permission() doesn't do "is that vfsmount read-only" checks, exactly
> > > > because it's 100% bogus - either you cover it with entire area where we
> > > > are guaranteed to stay r/w, or it's by definition racy.
> > > 
> > > I know that.
> > > 
> > > That does not mean, that fh_verify() needs to do vfsmount r/o checks.
> > > AFAICS it's perfectly OK to do that later, around the vfs_ call.
> > 
> > ... and around everything else that happens to be done after fh_verify
> > for write access, surely?
>
> What in particular?  You have managed to avoid answering this question
> for the last...I don't know how many emails.

Oh, for fuck sake...  grep and ye shall see.  Right next to setattr we
have nfsd4_set_nfs4_acl(), with pair of set_nfsv4_acl_one().  I'd rather
have those two covered by a single will/wont range, TYVM.

nfsd_create() will happily do vfs_mkdir() and nfsd_create_setattr().  Ditto.

And while we are at it, losing the check for r/o in fh_verify() will sure
as hell require at least handling it separately on the normal write path.

> >  Note that e.g. nfsd_setattr() does _not_ call
> > vfs_<anything>()...
> 
> Yes it does: notify_change().  It's vfs_setattr() under a pseudonym.

Are you going to move the will/wont in there?  Because there's a bunch of
stuff in fs/open.c that will disagree...

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 15:12                 ` Al Viro
@ 2008-04-24 15:18                   ` Miklos Szeredi
  2008-04-24 15:38                     ` Al Viro
  0 siblings, 1 reply; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 15:18 UTC (permalink / raw)
  To: viro
  Cc: miklos, akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel,
	linux-kernel

> > > > > RTFS.  permission() doesn't do "is that vfsmount read-only" checks, exactly
> > > > > because it's 100% bogus - either you cover it with entire area where we
> > > > > are guaranteed to stay r/w, or it's by definition racy.
> > > > 
> > > > I know that.
> > > > 
> > > > That does not mean, that fh_verify() needs to do vfsmount r/o checks.
> > > > AFAICS it's perfectly OK to do that later, around the vfs_ call.
> > > 
> > > ... and around everything else that happens to be done after fh_verify
> > > for write access, surely?
> >
> > What in particular?  You have managed to avoid answering this question
> > for the last...I don't know how many emails.
> 
> Oh, for fuck sake...  grep and ye shall see.  Right next to setattr we
> have nfsd4_set_nfs4_acl(), with pair of set_nfsv4_acl_one().  I'd rather
> have those two covered by a single will/wont range, TYVM.
> 
> nfsd_create() will happily do vfs_mkdir() and nfsd_create_setattr().  Ditto.

Please read the patches?  I've left exactly these
mnt_want_write/drop_write() calls in place, and removed all the
others.

> And while we are at it, losing the check for r/o in fh_verify() will sure
> as hell require at least handling it separately on the normal write path.

Umm, no, dentry_open() will do that.

> 
> > >  Note that e.g. nfsd_setattr() does _not_ call
> > > vfs_<anything>()...
> > 
> > Yes it does: notify_change().  It's vfs_setattr() under a pseudonym.
> 
> Are you going to move the will/wont in there?  Because there's a bunch of
> stuff in fs/open.c that will disagree...

Please read the patches, won't you?

Miklos

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 14:58                 ` Miklos Szeredi
@ 2008-04-24 15:21                   ` Al Viro
  2008-04-24 15:37                     ` Miklos Szeredi
  0 siblings, 1 reply; 63+ messages in thread
From: Al Viro @ 2008-04-24 15:21 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

On Thu, Apr 24, 2008 at 04:58:00PM +0200, Miklos Szeredi wrote:

> Since all the vfs_* functions will become static with path_* being the
> only caller, the compiler will be happy to get rid of that stack frame
> too.
> 
> What is left is the guarantee, that the race-free r/o remounts will
> always work and some obscure caller didn't forget to surround it with
> the r/o checks.
> 
> I think it's definitely worth it.

It would be, if we had only single vfs_...() as critical sections.  We
do not - see previous mail or read the fucking tree, already.

We don't even have many callers of each, and with a few we do it's not
obvious that we want to go through vfsmounts (and vfsmount-based checks)
in all of them.  So no, I don't buy your argument.  Sorry.

I'm not even convinced that they are useful as helpers, at least until
we'd decided what to do with checks in nfsd.  Until then we do, as
far as I'm concerned, one place where they would definitely DTRT - fs/namei.c.
And I want more than one caller before merging those, let alone removing
the interface that doesn't require checks to be vfsmount-based for all users.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 15:21                   ` Al Viro
@ 2008-04-24 15:37                     ` Miklos Szeredi
  2008-04-24 15:59                       ` Al Viro
  2008-04-24 17:55                       ` Dave Hansen
  0 siblings, 2 replies; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 15:37 UTC (permalink / raw)
  To: viro
  Cc: miklos, akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel,
	linux-kernel

> > Since all the vfs_* functions will become static with path_* being the
> > only caller, the compiler will be happy to get rid of that stack frame
> > too.
> > 
> > What is left is the guarantee, that the race-free r/o remounts will
> > always work and some obscure caller didn't forget to surround it with
> > the r/o checks.
> > 
> > I think it's definitely worth it.
> 
> It would be, if we had only single vfs_...() as critical sections.  We
> do not - see previous mail or read the fucking tree, already.

Why are those so important?  Yes, if we have multiple vfs_() calls,
surround them with an extra want/drop pair.  We do already do multiple
overlapping want/drop pairs with O_TRUNC and O_CREAT (AFAIR).

> We don't even have many callers of each, and with a few we do it's not
> obvious that we want to go through vfsmounts (and vfsmount-based checks)
> in all of them.  So no, I don't buy your argument.  Sorry.
> 
> I'm not even convinced that they are useful as helpers, at least until
> we'd decided what to do with checks in nfsd.  Until then we do, as
> far as I'm concerned, one place where they would definitely DTRT - fs/namei.c.
> And I want more than one caller before merging those,

unix_bind() -> vfs_mknod()
sys_mq_unlink() -> vfs_unlink()
open.c (several) -> notify_change()
*setxattr() -> vfs_setxattr()
*removexattr() -> vfs_removexattr()

> let alone removing the interface that doesn't require checks to be
> vfsmount-based for all users.

What users?  There are paractically _no_ other users.  The ones that
there are (like reiserfs) should not be using them, and there are
already some patches cleaning that mess up.

Miklos


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 15:18                   ` Miklos Szeredi
@ 2008-04-24 15:38                     ` Al Viro
  2008-04-24 15:43                       ` Miklos Szeredi
  0 siblings, 1 reply; 63+ messages in thread
From: Al Viro @ 2008-04-24 15:38 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

On Thu, Apr 24, 2008 at 05:18:12PM +0200, Miklos Szeredi wrote:

> > Oh, for fuck sake...  grep and ye shall see.  Right next to setattr we
> > have nfsd4_set_nfs4_acl(), with pair of set_nfsv4_acl_one().  I'd rather
> > have those two covered by a single will/wont range, TYVM.
> > 
> > nfsd_create() will happily do vfs_mkdir() and nfsd_create_setattr().  Ditto.
> 
> Please read the patches?  I've left exactly these
> mnt_want_write/drop_write() calls in place, and removed all the
> others.
 
You've left _what_ in place?  nfsd4_set_nfs4_acl() currently doesn't have
that single range - fh_verify() + set_nvfs4_acl_one() + set_nfsv4_acl_one().
And AFAICS you do nothing of that kind there.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 15:38                     ` Al Viro
@ 2008-04-24 15:43                       ` Miklos Szeredi
  0 siblings, 0 replies; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 15:43 UTC (permalink / raw)
  To: viro
  Cc: miklos, akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel,
	linux-kernel

> > > Oh, for fuck sake...  grep and ye shall see.  Right next to setattr we
> > > have nfsd4_set_nfs4_acl(), with pair of set_nfsv4_acl_one().  I'd rather
> > > have those two covered by a single will/wont range, TYVM.
> > > 
> > > nfsd_create() will happily do vfs_mkdir() and nfsd_create_setattr().  Ditto.
> > 
> > Please read the patches?  I've left exactly these
> > mnt_want_write/drop_write() calls in place, and removed all the
> > others.
>  
> You've left _what_ in place?  nfsd4_set_nfs4_acl() currently doesn't have
> that single range - fh_verify() + set_nvfs4_acl_one() + set_nfsv4_acl_one().
> And AFAICS you do nothing of that kind there.
> 

Yes I did.  See only caller of nfsd4_set_nfs4_acl() in nfs4proc.c.

Miklos

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 15:37                     ` Miklos Szeredi
@ 2008-04-24 15:59                       ` Al Viro
  2008-04-24 16:16                         ` Miklos Szeredi
  2008-04-25  7:22                         ` Miklos Szeredi
  2008-04-24 17:55                       ` Dave Hansen
  1 sibling, 2 replies; 63+ messages in thread
From: Al Viro @ 2008-04-24 15:59 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel, linux-kernel

On Thu, Apr 24, 2008 at 05:37:39PM +0200, Miklos Szeredi wrote:
> > > What is left is the guarantee, that the race-free r/o remounts will
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > always work and some obscure caller didn't forget to surround it with
^^^^^^^^^^^^^^^^^

> Why are those so important?  Yes, if we have multiple vfs_() calls,
> surround them with an extra want/drop pair.

Which leaves you with the same need to audit all these suckers anyway.

I'm in principle fine with having such helper functions, *IF* they are
not sold as providing all protection one needs, *IF* you are not expecting
to be able to fold all areas down into them and *IF* original ones are
left intact.

Modulo the like path_rename(), BTW - that one is just plain ugly API.

> > We don't even have many callers of each, and with a few we do it's not
> > obvious that we want to go through vfsmounts (and vfsmount-based checks)
> > in all of them.  So no, I don't buy your argument.  Sorry.
> > 
> > I'm not even convinced that they are useful as helpers, at least until
> > we'd decided what to do with checks in nfsd.  Until then we do, as
> > far as I'm concerned, one place where they would definitely DTRT - fs/namei.c.
> > And I want more than one caller before merging those,
> 
> unix_bind() -> vfs_mknod()
> sys_mq_unlink() -> vfs_unlink()
> open.c (several) -> notify_change()
> *setxattr() -> vfs_setxattr()
> *removexattr() -> vfs_removexattr()

OK.

> > let alone removing the interface that doesn't require checks to be
> > vfsmount-based for all users.
> 
> What users?  There are paractically _no_ other users.  The ones that
> there are (like reiserfs) should not be using them, and there are
> already some patches cleaning that mess up.

OK, explain me, in small words, WTF should something that wants to do
operations on filesystem tree have a vfsmount.  Slowly.  And "r/o
bind loses value if it can be bypassed" is a hogwash - fs methods are
still there, so it *can* be bypassed just fine, thank you very much.
It's really up to caller.  "But they won't be able to do open()" also
doesn't fly - again, it's up to whoever writes particular piece of code.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 15:59                       ` Al Viro
@ 2008-04-24 16:16                         ` Miklos Szeredi
  2008-04-28 10:15                           ` Miklos Szeredi
  2008-04-25  7:22                         ` Miklos Szeredi
  1 sibling, 1 reply; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 16:16 UTC (permalink / raw)
  To: viro
  Cc: miklos, akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel,
	linux-kernel

> > > > What is left is the guarantee, that the race-free r/o remounts will
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > > always work and some obscure caller didn't forget to surround it with
> ^^^^^^^^^^^^^^^^^
> 
> > Why are those so important?  Yes, if we have multiple vfs_() calls,
> > surround them with an extra want/drop pair.
> 
> Which leaves you with the same need to audit all these suckers anyway.

Not really.  Missing such calls would just make the *caller* buggy
(i.e. racy with remount r/o), but it would not make the *filesystem*
buggy.  Big difference.

> I'm in principle fine with having such helper functions, *IF* they are
> not sold as providing all protection one needs,

I'm not selling them as that.

> *IF* you are not expecting
> to be able to fold all areas down into them and *IF* original ones are
> left intact.

Left intact for whom, specifically?  Another question you've managed
to avoid answering.

> Modulo the like path_rename(), BTW - that one is just plain ugly API.

I'm all open to improvements.

> > > let alone removing the interface that doesn't require checks to be
> > > vfsmount-based for all users.
> > 
> > What users?  There are paractically _no_ other users.  The ones that
> > there are (like reiserfs) should not be using them, and there are
> > already some patches cleaning that mess up.
> 
> OK, explain me, in small words, WTF should something that wants to do
> operations on filesystem tree have a vfsmount.  Slowly.  And "r/o
> bind loses value if it can be bypassed" is a hogwash - fs methods are
> still there, so it *can* be bypassed just fine, thank you very much.

And we know what to do with such users.

> It's really up to caller.  "But they won't be able to do open()" also
> doesn't fly - again, it's up to whoever writes particular piece of code.

I understand your theory.  But it has zero practical significance.

IOW it doesn't matter that someone _may_ want to access the filesystem
without a vfsmount, if that someone doesn't exist.

Miklos

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 11:39 [patch 00/13] vfs: add helpers to check r/o bind mounts Miklos Szeredi
                   ` (13 preceding siblings ...)
  2008-04-24 12:42 ` [patch 00/13] vfs: add helpers to check r/o bind mounts Al Viro
@ 2008-04-24 16:52 ` Erez Zadok
  2008-04-24 16:58   ` Miklos Szeredi
  2008-05-01  5:40 ` Dave Hansen
  15 siblings, 1 reply; 63+ messages in thread
From: Erez Zadok @ 2008-04-24 16:52 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: viro, akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel,
	linux-kernel

In message <20080424113950.818688067@szeredi.hu>, Miklos Szeredi writes:
> Patches 1-3 are cleanups/bugfixes preceding the main series.  The
> description is in 4/13.  This is against latest git.
> 
> <rant>
> 
> R/O bind mounts patches have been reviewed numerous times.  Still a
> relatively large number of places remained where checking for R/O
> mounts was missed.
> 
> Then I did this series, which basically guarantees, that that cannot
> happen.  Al rejected it, and rather fixed some of the remaining
> places.  He still missed several, which sort of proves my point.
> 
> I think it's totally pointless to continue trying to fix the symptoms
> instead of getting at the root of the problem.
> 
> I know that VFS interfaces are a sensitive question, but it would be
> nice it we could have some sanity back in this discussion.

And I'll reiterate my desire (see http://lkml.org/lkml/2008/4/8/394) for
stackable file systems to know as little as possible about vfsmounts (or
their derivatives ala struct path).  If I have to call helpers which require
vfsmounts, but similar objects don't get passed to me somehow, then I have
to maintain vfsmounts on my own.

Sure, unionfs already does some of it now (and some of it can be recoded
more cleanly), but I'd like to get rid of much of it if I could -- these new
path_* helpers make me have to maintain vfsmounts even moreso than before.

> </rant>
> 
> Thanks,
> Miklos

Erez.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 01/13] ecryptfs: add missing lock around notify_change
  2008-04-24 11:39 ` [patch 01/13] ecryptfs: add missing lock around notify_change Miklos Szeredi
@ 2008-04-24 16:56   ` Erez Zadok
  2008-04-24 17:09     ` Miklos Szeredi
  0 siblings, 1 reply; 63+ messages in thread
From: Erez Zadok @ 2008-04-24 16:56 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: viro, akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel,
	linux-kernel

In message <20080424114008.353519557@szeredi.hu>, Miklos Szeredi writes:
> From: Miklos Szeredi <mszeredi@suse.cz>
> 
> Caller of notify_change() need to hold i_mutex.
> 
> Compile tested only.
> 
> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
> ---
>  fs/ecryptfs/inode.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> Index: vfs-2.6/fs/ecryptfs/inode.c
> ===================================================================
> --- vfs-2.6.orig/fs/ecryptfs/inode.c	2008-04-02 18:47:17.000000000 +0200
> +++ vfs-2.6/fs/ecryptfs/inode.c	2008-04-02 18:49:18.000000000 +0200
> @@ -908,7 +908,9 @@ static int ecryptfs_setattr(struct dentr
>  	if (ia->ia_valid & (ATTR_KILL_SUID | ATTR_KILL_SGID))
>  		ia->ia_valid &= ~ATTR_MODE;
>  
> +	mutex_lock(&lower_dentry->d_inode->i_mutex);
>  	rc = notify_change(lower_dentry, ia);
> +	mutex_unlock(&lower_dentry->d_inode->i_mutex);
>  out:
>  	fsstack_copy_attr_all(inode, lower_inode, NULL);
>  	return rc;

Mike/Miklos,

Why is this patch part of the r/o bind mount 13-part patchset?  It seems to
me that this could go to akpm/Linus right now (w/o waiting for r/o bind
mount debate to conclude).

Ditto for the next one: [patch 02/13] ecryptfs: clean up (un)lock_parent

Thanks,
Erez.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 16:52 ` Erez Zadok
@ 2008-04-24 16:58   ` Miklos Szeredi
  2008-04-24 17:14     ` Erez Zadok
  0 siblings, 1 reply; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 16:58 UTC (permalink / raw)
  To: ezk
  Cc: miklos, viro, akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel,
	linux-kernel

> Sure, unionfs already does some of it now (and some of it can be recoded
> more cleanly), but I'd like to get rid of much of it if I could -- these new
> path_* helpers make me have to maintain vfsmounts even moreso than before.

With the words of Al: "that is hogwash".  Stacks either maintain
vfsmounts or they don't.  There's no middle ground.

Miklos

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 12:42 ` [patch 00/13] vfs: add helpers to check r/o bind mounts Al Viro
  2008-04-24 13:05   ` Miklos Szeredi
@ 2008-04-24 17:04   ` Erez Zadok
  1 sibling, 0 replies; 63+ messages in thread
From: Erez Zadok @ 2008-04-24 17:04 UTC (permalink / raw)
  To: Al Viro
  Cc: Miklos Szeredi, akpm, torvalds, dave, ezk, mhalcrow,
	linux-fsdevel, linux-kernel

In message <20080424124245.GC15214@ZenIV.linux.org.uk>, Al Viro writes:
> On Thu, Apr 24, 2008 at 01:39:50PM +0200, Miklos Szeredi wrote:
> > Then I did this series, which basically guarantees, that that cannot
> > happen.  Al rejected it, and rather fixed some of the remaining
> > places.  He still missed several, which sort of proves my point.
> 
> Which ones have I missed?
>  
> > I think it's totally pointless to continue trying to fix the symptoms
> > instead of getting at the root of the problem.
> > 
> > I know that VFS interfaces are a sensitive question, but it would be
> > nice it we could have some sanity back in this discussion.
> 
> Yes, it would.  How about that, for starters:
> 
> path_create() et.al. are *wrong* for nfsd; if nothing else, I'm not at
> all convinced that even apparmour wants export path + relative there
> _and_ r/o vs. r/w is decision that doesn't clearly map to ex_mnt flags.
> 
> Moreover, it's not at all obvious that we want to drop write access as
> soon as vfs_...() is over in case of nfsd.  Some of the stuff done
> immeidately afterwards might very well qualify for inclusion into
> protected area; some of the stuff done immediately _prior_ very likely
> needs that as well - look at fh_verify() and tell me why we don't want
> that "I'll hold write access to vfsmount" to span the area including
> that sucker.  If we want the r/o vs r/w policy directly vfsmount-based
> for nfsd, that is.
> 
> For ecryptfs it's also bogus - at the very least we need to decide what
> should happen when underlying vfsmount is remounted.  Again, I'm less
> than convinced that we want the same way to express r/o vs. r/w policy.

I tend to agree that the mnt_want* bracketing may need to encompass more
than one vfs_* method.  When copyup is involved, the problem is only
exacerbated.  A copyup may want to perform several actions "atomically", for
example:

1. vfs_mkdir /a
2. vfs_mkdir /a/b
3. vfs_mkdir /a/b/c
4. vfs_create /a/b/c/f
5. vfs_read(source file) -> vfs_write(/a/b/c/f)

These actions are often done in close proximity in unionfs, and in such a
case, it'd be nice if I could want_write only once around this sequence.

However, I'm not sure that for copyup (or ecryptfs), that an vfsmount-level
want/drop write is the right interface.  As you pointed out before, Al, some
other mechanism might be needed, perhaps at the superblock level, to declare
one's desire to write the f/s "atomically" (to avoid concurrent topology
changes).

Thanks,
Erez.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 01/13] ecryptfs: add missing lock around notify_change
  2008-04-24 16:56   ` Erez Zadok
@ 2008-04-24 17:09     ` Miklos Szeredi
  0 siblings, 0 replies; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 17:09 UTC (permalink / raw)
  To: ezk
  Cc: miklos, viro, akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel,
	linux-kernel

> Why is this patch part of the r/o bind mount 13-part patchset?  It seems to
> me that this could go to akpm/Linus right now (w/o waiting for r/o bind
> mount debate to conclude).

This one's already in -mm.

> Ditto for the next one: [patch 02/13] ecryptfs: clean up (un)lock_parent

Yup.  If someone doesn't pick it up, I'll send it to Andrew sometime.

Miklos

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 16:58   ` Miklos Szeredi
@ 2008-04-24 17:14     ` Erez Zadok
  2008-04-24 17:23       ` Miklos Szeredi
  0 siblings, 1 reply; 63+ messages in thread
From: Erez Zadok @ 2008-04-24 17:14 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: ezk, viro, akpm, torvalds, dave, mhalcrow, linux-fsdevel,
	linux-kernel

In message <E1Jp4m0-0004UU-3G@pomaz-ex.szeredi.hu>, Miklos Szeredi writes:
> > Sure, unionfs already does some of it now (and some of it can be recoded
> > more cleanly), but I'd like to get rid of much of it if I could -- these new
> > path_* helpers make me have to maintain vfsmounts even moreso than before.
> 
> With the words of Al: "that is hogwash".  Stacks either maintain
> vfsmounts or they don't.  There's no middle ground.

So, if I wanted to not maintain vfsmounts at all in unionfs, how can I use
the proposed new path_* helpers which require vfsmounts?  Will there be some
other helpers available to perform lower-filesystem operations (e.g., mkdir,
create, unlink, etc.) which won't require passing vfsmounts?

The "or they don't" is not much of an option when I'm forced to use an API
that requires a vfsmount...

> Miklos

Thanks,
Erez.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 17:14     ` Erez Zadok
@ 2008-04-24 17:23       ` Miklos Szeredi
  0 siblings, 0 replies; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 17:23 UTC (permalink / raw)
  To: ezk
  Cc: miklos, ezk, viro, akpm, torvalds, dave, mhalcrow, linux-fsdevel,
	linux-kernel

> > > Sure, unionfs already does some of it now (and some of it can be recoded
> > > more cleanly), but I'd like to get rid of much of it if I could -- these new
> > > path_* helpers make me have to maintain vfsmounts even moreso than before.
> > 
> > With the words of Al: "that is hogwash".  Stacks either maintain
> > vfsmounts or they don't.  There's no middle ground.
> 
> So, if I wanted to not maintain vfsmounts at all in unionfs, how can I use
> the proposed new path_* helpers which require vfsmounts?  Will there be some
> other helpers available to perform lower-filesystem operations (e.g., mkdir,
> create, unlink, etc.) which won't require passing vfsmounts?
> 
> The "or they don't" is not much of an option when I'm forced to use an API
> that requires a vfsmount...

Yes.  The vfsmount is already needed for open() and that pretty much
determines the fate of all stacking filesystems, except the ones which
don't want to do file I/O, but that is rather hard to imagine.

And anyway, I don't see the issue here.  If the stack wants to work on
a single filesystem directly, just do a bind mount of the original to
a kernel private mount and use that.  I'm not sure this can be done
with the current API, but it doesn't sound difficult to implement at
all.

Miklos

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 13:48     ` Al Viro
  2008-04-24 14:00       ` Al Viro
  2008-04-24 14:09       ` Miklos Szeredi
@ 2008-04-24 17:25       ` Erez Zadok
  2008-04-24 17:30         ` Al Viro
  2 siblings, 1 reply; 63+ messages in thread
From: Erez Zadok @ 2008-04-24 17:25 UTC (permalink / raw)
  To: Al Viro
  Cc: Miklos Szeredi, akpm, torvalds, dave, ezk, mhalcrow,
	linux-fsdevel, linux-kernel

In message <20080424134826.GD15214@ZenIV.linux.org.uk>, Al Viro writes:
> On Thu, Apr 24, 2008 at 03:05:21PM +0200, Miklos Szeredi wrote:
[...]
> And yes, we need the counterpart for superblock-level stuff, to deal with
> remaining races (look at fs_may_remount_ro() and puke - it's still racy
> as hell).  E.g. unlink should do sb-level "will write" when it drops
> i_nlink to 0 and final removal of inode should do "won't write".

Al, any near-term plans for sb-level "want write" locking as we discussed
briefly at LSF?  Being able to do so for copyup in unionfs will hopefully
allow me to prevent concurrent topology changes.

And while we're digressing.  It appears to me that when someone wants to
change a single meta-data item in a f/s, then locking rules are simple:
e.g., lock the directory inode.  But when you want to modify more than one,
you need a different kind of lock -- hence the s_vfs_rename_mutex.  The
latter requires calling lock_rename to find the lowest common ancestor of
the two directories that need locking.  This rename-locking protocol appears
to be a special case of the sb-level "want write" idea, right?  Moreover, I
don't believe that cross-directory renames are that common, so replacing the
s_vfs_rename_mutex and its use with a more generic sb-level multi-operation
write-lock should have a negligible performance impact.

Thanks,
Erez.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 14:28         ` Al Viro
  2008-04-24 14:36           ` Miklos Szeredi
@ 2008-04-24 17:29           ` Erez Zadok
  2008-04-24 18:13             ` Al Viro
  1 sibling, 1 reply; 63+ messages in thread
From: Erez Zadok @ 2008-04-24 17:29 UTC (permalink / raw)
  To: Al Viro
  Cc: Miklos Szeredi, akpm, torvalds, dave, ezk, mhalcrow,
	linux-fsdevel, linux-kernel

In message <20080424142857.GF15214@ZenIV.linux.org.uk>, Al Viro writes:
> On Thu, Apr 24, 2008 at 04:09:18PM +0200, Miklos Szeredi wrote:
[...]
> FWIW, I'm not all that happy about the way ecryptfs_interpose() is done,
> while we are at it.  We get the sucker opened by whoever steps on given
> place in the tree first, with subsequent operations done using the resulting
> struct file.  With fallback to r/o open.  What happens to somebody who
> tries to open it with enough permissions to do r/w?

Yes, ecryptfs_interpose() calls ecryptfs_init_persistent_file() which calls
dentry_open(O_RDWR).  What's the proposed solution for this in the face of
r/o vfsmounts?  How could ecryptfs avoid calling this dentry_open in the
first place?

For unionfs, all I need is return -EROFS or the like to trigger a possible
copyup.

Erez.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 17:25       ` Erez Zadok
@ 2008-04-24 17:30         ` Al Viro
  2008-04-24 19:56           ` Erez Zadok
  0 siblings, 1 reply; 63+ messages in thread
From: Al Viro @ 2008-04-24 17:30 UTC (permalink / raw)
  To: Erez Zadok
  Cc: Miklos Szeredi, akpm, torvalds, dave, mhalcrow, linux-fsdevel,
	linux-kernel

On Thu, Apr 24, 2008 at 01:25:58PM -0400, Erez Zadok wrote:
> Al, any near-term plans for sb-level "want write" locking as we discussed
> briefly at LSF?  Being able to do so for copyup in unionfs will hopefully
> allow me to prevent concurrent topology changes.

That's pretty close to the top.

> the two directories that need locking.  This rename-locking protocol appears
> to be a special case of the sb-level "want write" idea, right?

Not really - that one doesn't provide any exclusion between writers and
that's what we want from rename/rename.

Take a look at Documentation/filesystems/directory-locking for details
of locking scheme...

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 15:37                     ` Miklos Szeredi
  2008-04-24 15:59                       ` Al Viro
@ 2008-04-24 17:55                       ` Dave Hansen
  2008-04-24 18:47                         ` Miklos Szeredi
  1 sibling, 1 reply; 63+ messages in thread
From: Dave Hansen @ 2008-04-24 17:55 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: viro, akpm, torvalds, ezk, mhalcrow, linux-fsdevel, linux-kernel

On Thu, 2008-04-24 at 17:37 +0200, Miklos Szeredi wrote:
> Why are those so important?  Yes, if we have multiple vfs_() calls,
> surround them with an extra want/drop pair.  We do already do multiple
> overlapping want/drop pairs with O_TRUNC and O_CREAT (AFAIR).

The one that I remember is the pair that we take for O_CREAT and the
nameidata_to_filp() in do_filp_open() that we do on it after the
creation and hold while the file is open.

The only reason for this one is that we want to shut down a potential
race that would allow the file to be created, then still return a -EROFS
because someone did a r/w->r/o transition between the create an the
nameidata_to_filp().

We could pass some information into nameidata_to_filp()->__dentry_open()
telling it that we already have a write and it doesn't need to take one,
but I think having the two nested writes is cleaner.

Is that the case you're thinking of?

-- Dave

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 17:29           ` Erez Zadok
@ 2008-04-24 18:13             ` Al Viro
  2008-04-24 19:40               ` Erez Zadok
  2008-04-28 21:53               ` J. Bruce Fields
  0 siblings, 2 replies; 63+ messages in thread
From: Al Viro @ 2008-04-24 18:13 UTC (permalink / raw)
  To: Erez Zadok
  Cc: Miklos Szeredi, akpm, torvalds, dave, mhalcrow, linux-fsdevel,
	linux-kernel

On Thu, Apr 24, 2008 at 01:29:49PM -0400, Erez Zadok wrote:
> In message <20080424142857.GF15214@ZenIV.linux.org.uk>, Al Viro writes:
> > On Thu, Apr 24, 2008 at 04:09:18PM +0200, Miklos Szeredi wrote:
> [...]
> > FWIW, I'm not all that happy about the way ecryptfs_interpose() is done,
> > while we are at it.  We get the sucker opened by whoever steps on given
> > place in the tree first, with subsequent operations done using the resulting
> > struct file.  With fallback to r/o open.  What happens to somebody who
> > tries to open it with enough permissions to do r/w?
> 
> Yes, ecryptfs_interpose() calls ecryptfs_init_persistent_file() which calls
> dentry_open(O_RDWR).  What's the proposed solution for this in the face of
> r/o vfsmounts?  How could ecryptfs avoid calling this dentry_open in the
> first place?

Doesn't have anything to do with vfsmounts (you have one to deal with and
if it's r/o, it's equivalent to just doing the entire thing on top of r/o
fs; not interesting).

No, what I'm worried about is much simpler.  Look: we have a file on
underlying fs, owned by root.root with 644 for permissions.  Comes a
luser and tries to open the counterpart of that file in ecryptfs; that
triggers ecryptfs_interpose() and attempts to open file.  Of course,
that's going to fail - it's not world-writable.  So then it (actually
ecryptfs_init_persistent_file()) falls back to opening with O_RDONLY.
Which succeeds just fine and file (opened r/o) is set as ->lower_file.

Now comes root and tries to open the damn thing r/w.  It should be able
to and if it came first it'd get it; as it is, what it gets is ->lower_file
and that puppy is opened read-only and you have no guarantee that underlying
fs will not go bonkers seeing write attempts on it (e.g. open for write
doing a bit more setup of ->private_data, etc.).

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 17:55                       ` Dave Hansen
@ 2008-04-24 18:47                         ` Miklos Szeredi
  0 siblings, 0 replies; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-24 18:47 UTC (permalink / raw)
  To: dave
  Cc: miklos, viro, akpm, torvalds, ezk, mhalcrow, linux-fsdevel,
	linux-kernel

> > Why are those so important?  Yes, if we have multiple vfs_() calls,
> > surround them with an extra want/drop pair.  We do already do multiple
> > overlapping want/drop pairs with O_TRUNC and O_CREAT (AFAIR).
> 
> The one that I remember is the pair that we take for O_CREAT and the
> nameidata_to_filp() in do_filp_open() that we do on it after the
> creation and hold while the file is open.
> 
> The only reason for this one is that we want to shut down a potential
> race that would allow the file to be created, then still return a -EROFS
> because someone did a r/w->r/o transition between the create an the
> nameidata_to_filp().
> 
> We could pass some information into nameidata_to_filp()->__dentry_open()
> telling it that we already have a write and it doesn't need to take one,
> but I think having the two nested writes is cleaner.

Agreed.

> Is that the case you're thinking of?

Yes.  And the same thing is done a little below for similar reasons
around may_open() which does truncation.

Miklos

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 18:13             ` Al Viro
@ 2008-04-24 19:40               ` Erez Zadok
  2008-04-24 20:16                 ` Michael Halcrow
  2008-04-28 21:53               ` J. Bruce Fields
  1 sibling, 1 reply; 63+ messages in thread
From: Erez Zadok @ 2008-04-24 19:40 UTC (permalink / raw)
  To: Al Viro
  Cc: Erez Zadok, Miklos Szeredi, akpm, torvalds, dave, mhalcrow,
	linux-fsdevel, linux-kernel

In message <20080424181332.GB5882@ZenIV.linux.org.uk>, Al Viro writes:
> On Thu, Apr 24, 2008 at 01:29:49PM -0400, Erez Zadok wrote:
> > In message <20080424142857.GF15214@ZenIV.linux.org.uk>, Al Viro writes:
> > > On Thu, Apr 24, 2008 at 04:09:18PM +0200, Miklos Szeredi wrote:
> > [...]
> > > FWIW, I'm not all that happy about the way ecryptfs_interpose() is done,
> > > while we are at it.  We get the sucker opened by whoever steps on given
> > > place in the tree first, with subsequent operations done using the resulting
> > > struct file.  With fallback to r/o open.  What happens to somebody who
> > > tries to open it with enough permissions to do r/w?
> > 
> > Yes, ecryptfs_interpose() calls ecryptfs_init_persistent_file() which calls
> > dentry_open(O_RDWR).  What's the proposed solution for this in the face of
> > r/o vfsmounts?  How could ecryptfs avoid calling this dentry_open in the
> > first place?
> 
> Doesn't have anything to do with vfsmounts (you have one to deal with and
> if it's r/o, it's equivalent to just doing the entire thing on top of r/o
> fs; not interesting).
> 
> No, what I'm worried about is much simpler.  Look: we have a file on
> underlying fs, owned by root.root with 644 for permissions.  Comes a
> luser and tries to open the counterpart of that file in ecryptfs; that
> triggers ecryptfs_interpose() and attempts to open file.  Of course,
> that's going to fail - it's not world-writable.  So then it (actually
> ecryptfs_init_persistent_file()) falls back to opening with O_RDONLY.
> Which succeeds just fine and file (opened r/o) is set as ->lower_file.
> 
> Now comes root and tries to open the damn thing r/w.  It should be able
> to and if it came first it'd get it; as it is, what it gets is ->lower_file
> and that puppy is opened read-only and you have no guarantee that underlying
> fs will not go bonkers seeing write attempts on it (e.g. open for write
> doing a bit more setup of ->private_data, etc.).

Ah, I see.  Yes: ecryptfs_init_persistent_file does have this odd "try to
open readwrite and if that failed, try readonly" code there.  I can't really
say why it's done that way: Mike?  Maybe it was a workaround to not having
the right permissions to open that persistent file?

This could be a similar issue to how I had to handle the .wh.* files in
unionfs.  Because they are separate files, and they get created/destroyed
directly by unionfs, I cannot rely on the caller's capabilities to be able
to create/remove these .wh.* files.  So in several places I have to call
cap_raise temporarily.  (Of course, if/when there'd be native whiteout
support in the most most common file systems, I could happily rip out this
whiteout code. :-)

Erez.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 17:30         ` Al Viro
@ 2008-04-24 19:56           ` Erez Zadok
  0 siblings, 0 replies; 63+ messages in thread
From: Erez Zadok @ 2008-04-24 19:56 UTC (permalink / raw)
  To: Al Viro
  Cc: Erez Zadok, Miklos Szeredi, akpm, torvalds, dave, mhalcrow,
	linux-fsdevel, linux-kernel

In message <20080424173013.GA5882@ZenIV.linux.org.uk>, Al Viro writes:
> On Thu, Apr 24, 2008 at 01:25:58PM -0400, Erez Zadok wrote:
> > Al, any near-term plans for sb-level "want write" locking as we discussed
> > briefly at LSF?  Being able to do so for copyup in unionfs will hopefully
> > allow me to prevent concurrent topology changes.
> 
> That's pretty close to the top.
> 
> > the two directories that need locking.  This rename-locking protocol appears
> > to be a special case of the sb-level "want write" idea, right?
> 
> Not really - that one doesn't provide any exclusion between writers and
> that's what we want from rename/rename.

When you say "that one", do you mean the sb-level idea?  If the sb-level
"want write" not provide exclusion among writers, then how can I prevent
lower topology changes while copyup takes place?

If your sb-level "want write" idea won't provide that exclusion at the sb
level, then how about we elevate the s_vfs_rename_mutex into a generic
s_vfs_multi_dir_change_mutex of sorts, which can be used by rename as well
as copyup?

> Take a look at Documentation/filesystems/directory-locking for details
> of locking scheme...

OK.  lock_rename() at most locks one "parent" and one "child" inodes.  So
what happens to all the ones in b/t?  Suppose I have a path /a/b/c/d/e/f and
someone wants to mv /a/b/c/d/e to /a/b/; in that case lock_rename at least
grabs the mutex on /a/b, right?  Can someone go in the middle and try to
muck with the "c/d/" parts in between?

>From what I gathered, lock_rename won't be enough for me to prevent topology
changes during copyup; I'd really need to lock an entire arbitrarily deep
path of inodes.  That just seems so complex and prone to bugs, that it might
just be easier to have a single sb-level mutex for these complex
multi-directory operations.  I'm not sure how much performance hit this'd
be, thou, and if there's a more efficient way to do so.

If you think lock_rename will be sufficient for me to deal with copyups
vs. topology changes, then I'll start using it.

Thanks,
Erez.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 19:40               ` Erez Zadok
@ 2008-04-24 20:16                 ` Michael Halcrow
  2008-04-24 22:39                   ` Erez Zadok
  0 siblings, 1 reply; 63+ messages in thread
From: Michael Halcrow @ 2008-04-24 20:16 UTC (permalink / raw)
  To: Erez Zadok
  Cc: Al Viro, Miklos Szeredi, akpm, torvalds, dave, linux-fsdevel,
	linux-kernel

On Thu, Apr 24, 2008 at 03:40:03PM -0400, Erez Zadok wrote:
> Ah, I see.  Yes: ecryptfs_init_persistent_file does have this odd
> "try to open readwrite and if that failed, try readonly" code there.
> I can't really say why it's done that way: Mike?  Maybe it was a
> workaround to not having the right permissions to open that
> persistent file?

The notion was that of "best effort," and it is sloppy.

I think the right way to do it will be to allow up to two persistent
files. If the user with read-only perms opens, then open the
persistent file RO. Then a user with write-only (but not read) perms
opens; open another persistent file WO. Allow subsequent reads or
writes by the various users to go through whichever persistent file
has the right access. Then a user with RW access opens the file; close
both the RO file and the WO file and open a new file RW. All the
while, make sure that ecryptfs_open() performs all the requisite perm
checks.

If this sounds reasonable, I will get working on a patch to do this.

Mike

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 20:16                 ` Michael Halcrow
@ 2008-04-24 22:39                   ` Erez Zadok
  2008-04-24 23:33                     ` Michael Halcrow
  0 siblings, 1 reply; 63+ messages in thread
From: Erez Zadok @ 2008-04-24 22:39 UTC (permalink / raw)
  To: Michael Halcrow
  Cc: Erez Zadok, Al Viro, Miklos Szeredi, akpm, torvalds, dave,
	linux-fsdevel, linux-kernel

In message <20080424201630.GC18686@localhost.austin.ibm.com>, Michael Halcrow writes:
> On Thu, Apr 24, 2008 at 03:40:03PM -0400, Erez Zadok wrote:
> > Ah, I see.  Yes: ecryptfs_init_persistent_file does have this odd
> > "try to open readwrite and if that failed, try readonly" code there.
> > I can't really say why it's done that way: Mike?  Maybe it was a
> > workaround to not having the right permissions to open that
> > persistent file?
> 
> The notion was that of "best effort," and it is sloppy.
> 
> I think the right way to do it will be to allow up to two persistent
> files. If the user with read-only perms opens, then open the
> persistent file RO. Then a user with write-only (but not read) perms
> opens; open another persistent file WO. Allow subsequent reads or
> writes by the various users to go through whichever persistent file
> has the right access. Then a user with RW access opens the file; close
> both the RO file and the WO file and open a new file RW. All the
> while, make sure that ecryptfs_open() performs all the requisite perm
> checks.

Correct me if I'm wrong, but what you call "persistent" file is simply (in
my stacking-speek :-) the lower file, right?  If so, then calling it
persistent may be confusing, b/c you could be stacked on top of a volatile
f/s (e.g., tmpfs or even *gasp* another stackable f/s) -- yes, even if it
may not make much sense from a security perspective.

The comment above ecryptfs_init_persistent_file() states that you only ever
keep a single open file for every lower inode.  Was this a security decision
(say, to prevent different access modes to the same ciphertext?); or was
that the attempt to workaround the limitation of the
address_space_operations API (that ->writepage doesn't get a struct file,
which you need to pass to vfs_write)?

I'm unclear what you mean above wrt two open files: do you mean

1. two struct file's, each pointing to a SEPARATE copy of the physical lower
  file?
	or
2. two struct file's, BOTH pointing to the same lower physical file?

Option #1 may be racy and I'm not clear what happens if a data file gets
'forked' that way: do you then have to merge it later on?

Or, are you propsing to keep alternating b/t an open struct file for RO, and
another open struct file for RW (or WO)?  If you alternate b/t the two,
keeping only one open at a time, what happens if two threads want to access
the same file: one reading and another writing the file.  Does that mean
that for every read(2) or write(2) you'll have to essentially release one
open file and open another (could slow things down a lot)?

> If this sounds reasonable, I will get working on a patch to do this.
> 
> Mike

Cheers,
Erez.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 22:39                   ` Erez Zadok
@ 2008-04-24 23:33                     ` Michael Halcrow
  0 siblings, 0 replies; 63+ messages in thread
From: Michael Halcrow @ 2008-04-24 23:33 UTC (permalink / raw)
  To: Erez Zadok
  Cc: Al Viro, Miklos Szeredi, akpm, torvalds, dave, linux-fsdevel,
	linux-kernel, Rusty Russell

On Thu, Apr 24, 2008 at 06:39:07PM -0400, Erez Zadok wrote:
> In message <20080424201630.GC18686@localhost.austin.ibm.com>,
> Michael Halcrow writes:
> > I think the right way to do it will be to allow up to two persistent
> > files. If the user with read-only perms opens, then open the
> > persistent file RO. Then a user with write-only (but not read) perms
> > opens; open another persistent file WO. Allow subsequent reads or
> > writes by the various users to go through whichever persistent file
> > has the right access. Then a user with RW access opens the file; close
> > both the RO file and the WO file and open a new file RW. All the
> > while, make sure that ecryptfs_open() performs all the requisite perm
> > checks.
> 
> Correct me if I'm wrong, but what you call "persistent" file is
> simply (in my stacking-speek :-) the lower file, right?  If so, then
> calling it persistent may be confusing, b/c you could be stacked on
> top of a volatile f/s (e.g., tmpfs or even *gasp* another stackable
> f/s) -- yes, even if it may not make much sense from a security
> perspective.

So long as the eCryptfs inode is alive, it will have at least one open
file in the corresponding inode for the lower filesystem. This file is
"persistent" for the life of the eCryptfs inode.

> The comment above ecryptfs_init_persistent_file() states that you
> only ever keep a single open file for every lower inode.  Was this a
> security decision (say, to prevent different access modes to the
> same ciphertext?); or was that the attempt to workaround the
> limitation of the address_space_operations API (that ->writepage
> doesn't get a struct file, which you need to pass to vfs_write)?

I intend to use the persistent file as a means to use vfs_read() and
vfs_write() in readpage() and writepage().

> I'm unclear what you mean above wrt two open files: do you mean
> 
> 1. two struct file's, each pointing to a SEPARATE copy of the physical lower
>   file?
> 	or
> 2. two struct file's, BOTH pointing to the same lower physical file?

I intned two lower files, both opened against the same lower inode,
via any dentry that happened to be used to get to that inode.

> Option #1 may be racy and I'm not clear what happens if a data file
> gets 'forked' that way: do you then have to merge it later on?
> 
> Or, are you propsing to keep alternating b/t an open struct file for
> RO, and another open struct file for RW (or WO)?  If you alternate
> b/t the two, keeping only one open at a time, what happens if two
> threads want to access the same file: one reading and another
> writing the file.

I proposed two open files, one with RO access, and one with WO access,
in the circumstance that one process only had read access on open, and
the other process only had write access on open. A write-only
ecryptfs_open() would result in an eCryptfs file struct that only used
the WO persistent file, after the appropriate permission checks are
done.

> Does that mean that for every read(2) or write(2) you'll have to
> essentially release one open file and open another (could slow
> things down a lot)?

Not really; I was proposing to keep two lower files open to use for
write-only or read-only operations, and if I ever got RW permission on
a file linked to the lower inode, I would just get rid of both and
replace them with a single RW persistent file. All while maintaining
proper permission checks in ecryptfs_open(), so a RO ecryptfs_open()
would only ever be allowed to read from the RW persistent file.

Really, this business with possibly two lower persistent files is
probably too much of a game to play. Rusty suggested that perhaps I
should just create a root-owned kernel thread in eCryptfs that would
have the responsibility of acquiring a RW lower persistent file,
regardless of the euid and permissions of whatever process made the
initial syscall that brought the eCryptfs inode into existence. I am
tempted to go that route instead.

Mike

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 15:59                       ` Al Viro
  2008-04-24 16:16                         ` Miklos Szeredi
@ 2008-04-25  7:22                         ` Miklos Szeredi
  1 sibling, 0 replies; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-25  7:22 UTC (permalink / raw)
  To: viro
  Cc: miklos, akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel,
	linux-kernel

> OK, explain me, in small words, WTF should something that wants to do
> operations on filesystem tree have a vfsmount.  Slowly.

And BTW in the highly unlikely case, that something didn't have a
vfsmount and wanted to do some operation, it _should_ definitely just
allocate a kernel-private mount for that.  Why?  Because that means
that another interface, namely the grab-sb-for-write and
release-sb-for-write interfaces won't have to be exported for such
users, resulting in an overall smaller and simpler API.  And that's
good.

Now tell me in small words, why do we need the vfs_ interface to stay
around?

Miklos

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 16:16                         ` Miklos Szeredi
@ 2008-04-28 10:15                           ` Miklos Szeredi
  2008-04-28 14:20                             ` Michael Halcrow
  0 siblings, 1 reply; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-28 10:15 UTC (permalink / raw)
  To: miklos
  Cc: viro, miklos, akpm, torvalds, dave, ezk, mhalcrow, linux-fsdevel,
	linux-kernel

> > > > let alone removing the interface that doesn't require checks to be
> > > > vfsmount-based for all users.
> > > 
> > > What users?  There are paractically _no_ other users.  The ones that
> > > there are (like reiserfs) should not be using them, and there are
> > > already some patches cleaning that mess up.
> > 
> > OK, explain me, in small words, WTF should something that wants to do
> > operations on filesystem tree have a vfsmount.  Slowly.  And "r/o
> > bind loses value if it can be bypassed" is a hogwash - fs methods are
> > still there, so it *can* be bypassed just fine, thank you very much.
> 
> And we know what to do with such users.

Which begs the question: why is ecryptfs doing that with the xattr
methods?  Does it need to bypass the permission checks?  Seems very
fishy to me.

Mike?

Miklos

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-28 10:15                           ` Miklos Szeredi
@ 2008-04-28 14:20                             ` Michael Halcrow
  2008-04-28 14:52                               ` Miklos Szeredi
  0 siblings, 1 reply; 63+ messages in thread
From: Michael Halcrow @ 2008-04-28 14:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: viro, akpm, torvalds, dave, ezk, linux-fsdevel, linux-kernel

On Mon, Apr 28, 2008 at 12:15:33PM +0200, Miklos Szeredi wrote:
> Which begs the question: why is ecryptfs doing that with the xattr
> methods?  Does it need to bypass the permission checks?  Seems very
> fishy to me.

Yes, it was mainly to avoid the permission checks, since eCryptfs
needs to be able to freely manipulate the cryptographic metadata
stored in the xattr region of the lower file when the user mounts with
the option to use the xattr region. I just used the same function to
access the lower xattr (ecryptfs_setxattr(), for instance) for both
xattr passthrough and metadata manipulation. This clearly can be
changed at this point so that at least the xattr passthrough of xattr
ops explicitly done by the user uses the vfs_* xattr calls instead.

However, in terms of permissions that eCryptfs needs, there are some
semantics that I need to work out. For instance, if eCryptfs
absolutely respects a rule that says that the lower file may only be
opened append-only, even by root, then eCryptfs cannot do its job,
which may include writing out the crypto metadata to the xattr of the
lower file. In that case, an operation on the lower fs will succeed,
but that exact same operation on the file under eCryptfs will fail,
since xattr.c::xattr_permission() will return -EPERM if
IS_APPEND(inode), and an open in eCryptfs will automatically entail an
xattr write if the mount is done with instructions to write the
metadata to the xattr regions of the lower files.

Mike

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-28 14:20                             ` Michael Halcrow
@ 2008-04-28 14:52                               ` Miklos Szeredi
  0 siblings, 0 replies; 63+ messages in thread
From: Miklos Szeredi @ 2008-04-28 14:52 UTC (permalink / raw)
  To: mhalcrow
  Cc: miklos, viro, akpm, torvalds, dave, ezk, linux-fsdevel,
	linux-kernel

> > Which begs the question: why is ecryptfs doing that with the xattr
> > methods?  Does it need to bypass the permission checks?  Seems very
> > fishy to me.
> 
> Yes, it was mainly to avoid the permission checks, since eCryptfs
> needs to be able to freely manipulate the cryptographic metadata
> stored in the xattr region of the lower file when the user mounts with
> the option to use the xattr region. I just used the same function to
> access the lower xattr (ecryptfs_setxattr(), for instance) for both
> xattr passthrough and metadata manipulation. This clearly can be
> changed at this point so that at least the xattr passthrough of xattr
> ops explicitly done by the user uses the vfs_* xattr calls instead.
> 
> However, in terms of permissions that eCryptfs needs, there are some
> semantics that I need to work out. For instance, if eCryptfs
> absolutely respects a rule that says that the lower file may only be
> opened append-only, even by root, then eCryptfs cannot do its job,
> which may include writing out the crypto metadata to the xattr of the
> lower file. In that case, an operation on the lower fs will succeed,
> but that exact same operation on the file under eCryptfs will fail,
> since xattr.c::xattr_permission() will return -EPERM if
> IS_APPEND(inode), and an open in eCryptfs will automatically entail an
> xattr write if the mount is done with instructions to write the
> metadata to the xattr regions of the lower files.

A more serious problem, is that permissions are not always checked at
the VFS level, but often at some place in the filesystem (as well)
like the NFS server for example.  Which means, that the current design
will fail miserably in those cases.

You don't have to care, of course, but I would rather have chosen a
design, where the stack doesn't have to care about implementation
details like that in the underlying filesystem.

Miklos

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 18:13             ` Al Viro
  2008-04-24 19:40               ` Erez Zadok
@ 2008-04-28 21:53               ` J. Bruce Fields
  1 sibling, 0 replies; 63+ messages in thread
From: J. Bruce Fields @ 2008-04-28 21:53 UTC (permalink / raw)
  To: Al Viro
  Cc: Erez Zadok, Miklos Szeredi, akpm, torvalds, dave, mhalcrow,
	linux-fsdevel, linux-kernel

On Thu, Apr 24, 2008 at 07:13:32PM +0100, Al Viro wrote:
> On Thu, Apr 24, 2008 at 01:29:49PM -0400, Erez Zadok wrote:
> > In message <20080424142857.GF15214@ZenIV.linux.org.uk>, Al Viro writes:
> > > On Thu, Apr 24, 2008 at 04:09:18PM +0200, Miklos Szeredi wrote:
> > [...]
> > > FWIW, I'm not all that happy about the way ecryptfs_interpose() is done,
> > > while we are at it.  We get the sucker opened by whoever steps on given
> > > place in the tree first, with subsequent operations done using the resulting
> > > struct file.  With fallback to r/o open.  What happens to somebody who
> > > tries to open it with enough permissions to do r/w?
> > 
> > Yes, ecryptfs_interpose() calls ecryptfs_init_persistent_file() which calls
> > dentry_open(O_RDWR).  What's the proposed solution for this in the face of
> > r/o vfsmounts?  How could ecryptfs avoid calling this dentry_open in the
> > first place?
> 
> Doesn't have anything to do with vfsmounts (you have one to deal with and
> if it's r/o, it's equivalent to just doing the entire thing on top of r/o
> fs; not interesting).
> 
> No, what I'm worried about is much simpler.  Look: we have a file on
> underlying fs, owned by root.root with 644 for permissions.  Comes a
> luser and tries to open the counterpart of that file in ecryptfs; that
> triggers ecryptfs_interpose() and attempts to open file.  Of course,
> that's going to fail - it's not world-writable.  So then it (actually
> ecryptfs_init_persistent_file()) falls back to opening with O_RDONLY.
> Which succeeds just fine and file (opened r/o) is set as ->lower_file.
> 
> Now comes root and tries to open the damn thing r/w.  It should be able
> to and if it came first it'd get it; as it is, what it gets is ->lower_file
> and that puppy is opened read-only and you have no guarantee that underlying
> fs will not go bonkers seeing write attempts on it (e.g. open for write
> doing a bit more setup of ->private_data, etc.).

This looks like the same ugly stuff that the nfsv4 server is currently
doing; see, e.g., fs/nfsd/nfs4state.c:nfs4_upgrade_open().

A concrete example of a bug (e.g., export FOOfs over nfsd, do an open
upgrade, watch the oops) would be great motivation to finally get this
fixed, if you know of such an example off the top of your head....

--b.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-04-24 11:39 [patch 00/13] vfs: add helpers to check r/o bind mounts Miklos Szeredi
                   ` (14 preceding siblings ...)
  2008-04-24 16:52 ` Erez Zadok
@ 2008-05-01  5:40 ` Dave Hansen
  2008-05-01  8:08   ` Miklos Szeredi
  15 siblings, 1 reply; 63+ messages in thread
From: Dave Hansen @ 2008-05-01  5:40 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: viro, akpm, torvalds, ezk, mhalcrow, linux-fsdevel, linux-kernel

On Thu, 2008-04-24 at 13:39 +0200, Miklos Szeredi wrote:
> R/O bind mounts patches have been reviewed numerous times.  Still a
> relatively large number of places remained where checking for R/O
> mounts was missed.

In all the other... erm... interesting discussion, I missed this point.

Were there some bits outside of the nfsd code where I missed the r/o
mount checks?

-- Dave


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-05-01  5:40 ` Dave Hansen
@ 2008-05-01  8:08   ` Miklos Szeredi
  2008-05-01 16:40     ` Dave Hansen
  0 siblings, 1 reply; 63+ messages in thread
From: Miklos Szeredi @ 2008-05-01  8:08 UTC (permalink / raw)
  To: dave
  Cc: miklos, viro, akpm, torvalds, ezk, mhalcrow, linux-fsdevel,
	linux-kernel

> > R/O bind mounts patches have been reviewed numerous times.  Still a
> > relatively large number of places remained where checking for R/O
> > mounts was missed.
> 
> In all the other... erm... interesting discussion, I missed this point.
> 
> Were there some bits outside of the nfsd code where I missed the r/o
> mount checks?

Yes, the removexattr syscalls.  Al fixed those in the final
submission.

And the whole of ecryptfs was missed as a source of filesystem
modification.  How that's supposed to get fixed, along with nfsd is
arguable.

But regardless of all that, I think the path_* interface is a good
one, even if it has just a couple of users that actually _require_ the
r/o bracketing.  It's good because:

  - it's consistent

  - it provides some (not all) guarantees, i.e. it's easier to prove
    that all callers play by the rules

  - for the syscall case it has zero cost

  - for all the other cases it has either zero, or minimal cost

Yes, it does require the caller to have a vfsmount available, but it's
hard to imagine that the caller does not have it:

  - most userspace calls do have it, as they are either operating on a
    path or a file descriptor.  There are some exceptions like sync(2)
    and ustat(2), the latter not being a very exemplary interface, and
    neither of them being relevant to this discussion.

  - all kernel callers (nfs export, stacking) should have it, as they
    need it for open() anyway

And even if some theoretical caller didn't have the vfsmount, it still
should be easy to allocate one, providing a clean way to do the r/o
bracketing, and not requiring another mechanism to be exported that
provides the same functionality on the superblock.

Sorry for the claptrap.  I'm going to resubmit this one more time
with some slight modifications and additions, and it'd be really nice
if you or other interested parties would provide comments and opinions
(either way), because I don't think me and Al have anything new to say
to each other.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-05-01  8:08   ` Miklos Szeredi
@ 2008-05-01 16:40     ` Dave Hansen
  2008-05-01 17:04       ` Miklos Szeredi
  0 siblings, 1 reply; 63+ messages in thread
From: Dave Hansen @ 2008-05-01 16:40 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: viro, akpm, torvalds, ezk, mhalcrow, linux-fsdevel, linux-kernel

On Thu, 2008-05-01 at 10:08 +0200, Miklos Szeredi wrote:
> Sorry for the claptrap.  I'm going to resubmit this one more time
> with some slight modifications and additions, and it'd be really nice
> if you or other interested parties would provide comments and opinions
> (either way), because I don't think me and Al have anything new to say
> to each other.

I was trying to stay out of it a bit, since you both have very good
points. :)

But, from a purely aesthetic point of view, I like the patches.  They do
make r/o mount detection less error-prone in coding.  They also remove
more code than they add.

I also certainly understand Al's point about why it puts vfsmounts in a
place where they probably shouldn't be exposed.

Is there a way we could pass around a vfs-internal object to these
things?  Maybe something that is like an opaque token only known to the
vfs?

linux/mount.h:

	struct hidden_mount;

fs/namespace.c:

struct hidden_mount {
	struct vfsmount *mnt;
};

int mnt_want_write(struct hidden_mount *hidmnt)
{
	return __mnt_want_write(hidmnt->mnt);
}

We could have *that* passed down the way that you're currently passing
the vfsmount.

-- Dave

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
  2008-05-01 16:40     ` Dave Hansen
@ 2008-05-01 17:04       ` Miklos Szeredi
  0 siblings, 0 replies; 63+ messages in thread
From: Miklos Szeredi @ 2008-05-01 17:04 UTC (permalink / raw)
  To: dave
  Cc: miklos, viro, akpm, torvalds, ezk, mhalcrow, linux-fsdevel,
	linux-kernel

> Is there a way we could pass around a vfs-internal object to these
> things?  Maybe something that is like an opaque token only known to the
> vfs?

vfsmount itself should be an opaque token to anything but the
namespace code.  So the definition could just be moved into say
fs/vfsmount.h, and be done with that.

Probably not a trivial cleanup to do (s_op->show_options() is a major
offender), but I don't see any major roadblocks.

Miklos

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2008-05-01 17:04 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-24 11:39 [patch 00/13] vfs: add helpers to check r/o bind mounts Miklos Szeredi
2008-04-24 11:39 ` [patch 01/13] ecryptfs: add missing lock around notify_change Miklos Szeredi
2008-04-24 16:56   ` Erez Zadok
2008-04-24 17:09     ` Miklos Szeredi
2008-04-24 11:39 ` [patch 02/13] ecryptfs: clean up (un)lock_parent Miklos Szeredi
2008-04-24 11:39 ` [patch 03/13] nfsd: clean up mnt_want_write calls Miklos Szeredi
2008-04-24 11:39 ` [patch 04/13] vfs: add path_create() and path_mknod() Miklos Szeredi
2008-04-24 11:39 ` [patch 05/13] vfs: add path_mkdir() Miklos Szeredi
2008-04-24 11:39 ` [patch 06/13] vfs: add path_rmdir() Miklos Szeredi
2008-04-24 11:39 ` [patch 07/13] vfs: add path_unlink() Miklos Szeredi
2008-04-24 11:39 ` [patch 08/13] vfs: add path_symlink() Miklos Szeredi
2008-04-24 11:39 ` [patch 09/13] vfs: add path_link() Miklos Szeredi
2008-04-24 11:40 ` [patch 10/13] vfs: add path_rename() Miklos Szeredi
2008-04-24 11:40 ` [patch 11/13] vfs: add path_setattr() Miklos Szeredi
2008-04-24 11:40 ` [patch 12/13] vfs: add path_setxattr() Miklos Szeredi
2008-04-24 11:40 ` [patch 13/13] vfs: add path_removexattr() Miklos Szeredi
2008-04-24 12:42 ` [patch 00/13] vfs: add helpers to check r/o bind mounts Al Viro
2008-04-24 13:05   ` Miklos Szeredi
2008-04-24 13:48     ` Al Viro
2008-04-24 14:00       ` Al Viro
2008-04-24 14:16         ` Miklos Szeredi
2008-04-24 14:35           ` Al Viro
2008-04-24 14:42             ` Miklos Szeredi
2008-04-24 14:48               ` Al Viro
2008-04-24 14:58                 ` Miklos Szeredi
2008-04-24 15:21                   ` Al Viro
2008-04-24 15:37                     ` Miklos Szeredi
2008-04-24 15:59                       ` Al Viro
2008-04-24 16:16                         ` Miklos Szeredi
2008-04-28 10:15                           ` Miklos Szeredi
2008-04-28 14:20                             ` Michael Halcrow
2008-04-28 14:52                               ` Miklos Szeredi
2008-04-25  7:22                         ` Miklos Szeredi
2008-04-24 17:55                       ` Dave Hansen
2008-04-24 18:47                         ` Miklos Szeredi
2008-04-24 14:09       ` Miklos Szeredi
2008-04-24 14:28         ` Al Viro
2008-04-24 14:36           ` Miklos Szeredi
2008-04-24 14:44             ` Al Viro
2008-04-24 14:53               ` Miklos Szeredi
2008-04-24 15:12                 ` Al Viro
2008-04-24 15:18                   ` Miklos Szeredi
2008-04-24 15:38                     ` Al Viro
2008-04-24 15:43                       ` Miklos Szeredi
2008-04-24 17:29           ` Erez Zadok
2008-04-24 18:13             ` Al Viro
2008-04-24 19:40               ` Erez Zadok
2008-04-24 20:16                 ` Michael Halcrow
2008-04-24 22:39                   ` Erez Zadok
2008-04-24 23:33                     ` Michael Halcrow
2008-04-28 21:53               ` J. Bruce Fields
2008-04-24 17:25       ` Erez Zadok
2008-04-24 17:30         ` Al Viro
2008-04-24 19:56           ` Erez Zadok
2008-04-24 17:04   ` Erez Zadok
2008-04-24 16:52 ` Erez Zadok
2008-04-24 16:58   ` Miklos Szeredi
2008-04-24 17:14     ` Erez Zadok
2008-04-24 17:23       ` Miklos Szeredi
2008-05-01  5:40 ` Dave Hansen
2008-05-01  8:08   ` Miklos Szeredi
2008-05-01 16:40     ` Dave Hansen
2008-05-01 17:04       ` Miklos Szeredi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).