linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] new truncate sequence, part3
@ 2010-06-01 11:39 Christoph Hellwig
  2010-06-01 11:39 ` [PATCH 1/2] always call inode_change_ok early in ->setattr Christoph Hellwig
  2010-06-01 11:39 ` [PATCH 2/2] check ATTR_SIZE contraints in inode_change_ok Christoph Hellwig
  0 siblings, 2 replies; 12+ messages in thread
From: Christoph Hellwig @ 2010-06-01 11:39 UTC (permalink / raw)
  To: viro, npiggin; +Cc: jack, linux-fsdevel

Make sure we do the truncate checks earlier on, as pointed out by Dmitry
Monakhov.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] always call inode_change_ok early in ->setattr
  2010-06-01 11:39 [PATCH 0/2] new truncate sequence, part3 Christoph Hellwig
@ 2010-06-01 11:39 ` Christoph Hellwig
  2010-06-01 11:39 ` [PATCH 2/2] check ATTR_SIZE contraints in inode_change_ok Christoph Hellwig
  1 sibling, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2010-06-01 11:39 UTC (permalink / raw)
  To: viro, npiggin; +Cc: jack, linux-fsdevel

Make sure we call inode_change_ok before doing any changes in ->setattr,
and make sure to call it even if our fs wants to ignore normal UNIX
permissions, but use the ATTR_FORCE to skip those.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/cifs/inode.c
===================================================================
--- linux-2.6.orig/fs/cifs/inode.c	2010-06-01 12:19:48.004255568 +0200
+++ linux-2.6/fs/cifs/inode.c	2010-06-01 12:20:04.402023064 +0200
@@ -1776,14 +1776,12 @@ cifs_setattr_unix(struct dentry *direntr
 
 	xid = GetXid();
 
-	if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_PERM) == 0) {
-		/* check if we have permission to change attrs */
-		rc = inode_change_ok(inode, attrs);
-		if (rc < 0)
-			goto out;
-		else
-			rc = 0;
-	}
+	if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_PERM)
+		attrs->ia_valid |= ATTR_FORCE;
+
+	rc = inode_change_ok(inode, attrs);
+	if (rc < 0)
+		goto out;
 
 	full_path = build_path_from_dentry(direntry);
 	if (full_path == NULL) {
@@ -1914,14 +1912,13 @@ cifs_setattr_nounix(struct dentry *diren
 	cFYI(1, "setattr on file %s attrs->iavalid 0x%x",
 		 direntry->d_name.name, attrs->ia_valid);
 
-	if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_PERM) == 0) {
-		/* check if we have permission to change attrs */
-		rc = inode_change_ok(inode, attrs);
-		if (rc < 0) {
-			FreeXid(xid);
-			return rc;
-		} else
-			rc = 0;
+	if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_PERM)
+		attrs->ia_valid |= ATTR_FORCE;
+
+	rc = inode_change_ok(inode, attrs);
+	if (rc < 0) {
+		FreeXid(xid);
+		return rc;
 	}
 
 	full_path = build_path_from_dentry(direntry);
Index: linux-2.6/fs/fat/file.c
===================================================================
--- linux-2.6.orig/fs/fat/file.c	2010-06-01 12:19:23.681011749 +0200
+++ linux-2.6/fs/fat/file.c	2010-06-01 12:21:01.134027230 +0200
@@ -387,21 +387,6 @@ int fat_setattr(struct dentry *dentry, s
 	unsigned int ia_valid;
 	int error;
 
-	/*
-	 * Expand the file. Since inode_setattr() updates ->i_size
-	 * before calling the ->truncate(), but FAT needs to fill the
-	 * hole before it. XXX: this is no longer true with new truncate
-	 * sequence.
-	 */
-	if (attr->ia_valid & ATTR_SIZE) {
-		if (attr->ia_size > inode->i_size) {
-			error = fat_cont_expand(inode, attr->ia_size);
-			if (error || attr->ia_valid == ATTR_SIZE)
-				goto out;
-			attr->ia_valid &= ~ATTR_SIZE;
-		}
-	}
-
 	/* Check for setting the inode time. */
 	ia_valid = attr->ia_valid;
 	if (ia_valid & TIMES_SET_FLAGS) {
@@ -417,6 +402,21 @@ int fat_setattr(struct dentry *dentry, s
 		goto out;
 	}
 
+	/*
+	 * Expand the file. Since inode_setattr() updates ->i_size
+	 * before calling the ->truncate(), but FAT needs to fill the
+	 * hole before it. XXX: this is no longer true with new truncate
+	 * sequence.
+	 */
+	if (attr->ia_valid & ATTR_SIZE) {
+		if (attr->ia_size > inode->i_size) {
+			error = fat_cont_expand(inode, attr->ia_size);
+			if (error || attr->ia_valid == ATTR_SIZE)
+				goto out;
+			attr->ia_valid &= ~ATTR_SIZE;
+		}
+	}
+
 	if (((attr->ia_valid & ATTR_UID) &&
 	     (attr->ia_uid != sbi->options.fs_uid)) ||
 	    ((attr->ia_valid & ATTR_GID) &&
Index: linux-2.6/fs/fuse/dir.c
===================================================================
--- linux-2.6.orig/fs/fuse/dir.c	2010-06-01 12:19:23.688003647 +0200
+++ linux-2.6/fs/fuse/dir.c	2010-06-01 12:20:04.409254311 +0200
@@ -1270,11 +1270,12 @@ static int fuse_do_setattr(struct dentry
 	if (!fuse_allow_task(fc, current))
 		return -EACCES;
 
-	if (fc->flags & FUSE_DEFAULT_PERMISSIONS) {
-		err = inode_change_ok(inode, attr);
-		if (err)
-			return err;
-	}
+	if (!(fc->flags & FUSE_DEFAULT_PERMISSIONS))
+		attr->ia_valid |= ATTR_FORCE;
+
+	err = inode_change_ok(inode, attr);
+	if (err)
+		return err;
 
 	if ((attr->ia_valid & ATTR_OPEN) && fc->atomic_o_trunc)
 		return 0;
Index: linux-2.6/fs/logfs/file.c
===================================================================
--- linux-2.6.orig/fs/logfs/file.c	2010-06-01 12:19:48.056005672 +0200
+++ linux-2.6/fs/logfs/file.c	2010-06-01 12:20:04.412282458 +0200
@@ -232,16 +232,16 @@ static int logfs_setattr(struct dentry *
 	struct inode *inode = dentry->d_inode;
 	int err = 0;
 
+	err = inode_change_ok(inode, attr);
+	if (err)
+		return err;
+
 	if (attr->ia_valid & ATTR_SIZE) {
 		err = logfs_truncate(inode, attr->ia_size);
 		if (err)
 			return err;
 	}
 
-	err = inode_change_ok(inode, attr);
-	if (err)
-		return err;
-
 	setattr_copy(inode, attr);
 	mark_inode_dirty(inode);
 	return 0;
Index: linux-2.6/fs/reiserfs/inode.c
===================================================================
--- linux-2.6.orig/fs/reiserfs/inode.c	2010-06-01 12:19:48.162005532 +0200
+++ linux-2.6/fs/reiserfs/inode.c	2010-06-01 12:20:28.959005660 +0200
@@ -3086,6 +3086,10 @@ int reiserfs_setattr(struct dentry *dent
 	int depth;
 	int error;
 
+	error = inode_change_ok(inode, attr);
+	if (error)
+		return error;
+
 	/* must be turned off for recursive notify_change calls */
 	ia_valid = attr->ia_valid &= ~(ATTR_KILL_SUID|ATTR_KILL_SGID);
 
@@ -3135,10 +3139,6 @@ int reiserfs_setattr(struct dentry *dent
 		goto out;
 	}
 
-	error = inode_change_ok(inode, attr);
-	if (error)
-		goto out;
-
 	if ((ia_valid & ATTR_UID && attr->ia_uid != inode->i_uid) ||
 	    (ia_valid & ATTR_GID && attr->ia_gid != inode->i_gid)) {
 		struct reiserfs_transaction_handle th;
Index: linux-2.6/mm/shmem.c
===================================================================
--- linux-2.6.orig/mm/shmem.c	2010-06-01 12:19:23.712004415 +0200
+++ linux-2.6/mm/shmem.c	2010-06-01 12:20:04.424255568 +0200
@@ -766,6 +766,10 @@ static int shmem_notify_change(struct de
 	struct inode *inode = dentry->d_inode;
 	int error;
 
+	error = inode_change_ok(inode, attr);
+	if (error)
+		return error;
+
 	if (S_ISREG(inode->i_mode) && (attr->ia_valid & ATTR_SIZE)) {
 		loff_t newsize = attr->ia_size;
 		struct page *page = NULL;
@@ -808,11 +812,9 @@ static int shmem_notify_change(struct de
 		shmem_truncate_range(inode, newsize, (loff_t)-1);
 	}
 
-	error = inode_change_ok(inode, attr);
-	if (!error)
-		setattr_copy(inode, attr);
+	setattr_copy(inode, attr);
 #ifdef CONFIG_TMPFS_POSIX_ACL
-	if (!error && (attr->ia_valid & ATTR_MODE))
+	if (attr->ia_valid & ATTR_MODE)
 		error = generic_acl_chmod(inode);
 #endif
 	return error;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/2] check ATTR_SIZE contraints in inode_change_ok
  2010-06-01 11:39 [PATCH 0/2] new truncate sequence, part3 Christoph Hellwig
  2010-06-01 11:39 ` [PATCH 1/2] always call inode_change_ok early in ->setattr Christoph Hellwig
@ 2010-06-01 11:39 ` Christoph Hellwig
  2010-06-01 11:49   ` Steven Whitehouse
  2010-06-09  7:33   ` Nick Piggin
  1 sibling, 2 replies; 12+ messages in thread
From: Christoph Hellwig @ 2010-06-01 11:39 UTC (permalink / raw)
  To: viro, npiggin; +Cc: jack, linux-fsdevel

Make sure we check the truncate constraints early on in ->setattr by adding
those checks to inode_change_ok.  Also clean up and document inode_change_ok
to make this obvious.

As a fallout we don't have to call inode_newsize_ok from simple_setsize and
simplify it down to a truncate_setsize which doesn't return an error.  This
simplifies a lot of setattr implementations and means we use truncate_setsize
almost everywhere.  Get rid of fat_setsize now that it's trivial and mark
ext2_setsize static to make the calling convention obvious.

Keep the inode_newsize_ok in vmtruncate for now as all callers need an
audit for it's removal anyway.

Note: setattr code in ecryptfs doesn't call inode_change_ok at all and
needs a deeper audit, but that is left for later.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/attr.c
===================================================================
--- linux-2.6.orig/fs/attr.c	2010-06-01 12:19:47.987277079 +0200
+++ linux-2.6/fs/attr.c	2010-06-01 12:48:26.063005672 +0200
@@ -14,35 +14,53 @@
 #include <linux/fcntl.h>
 #include <linux/security.h>
 
-/* Taken over from the old code... */
-
-/* POSIX UID/GID verification for setting inode attributes. */
+/**
+ * inode_change_ok - check if attribute changes to an inode are allowed
+ * @inode:	inode to check
+ * @attr:	attributes to change
+ *
+ * Check if we are allowed to change the attributes contained in @attr
+ * in the given inode.  This includes the normal unix access permission
+ * checks, as well as checks for rlimits and others.
+ *
+ * Should be called as the first thing in ->setattr implementations,
+ * possibly after taking additional locks.
+ */
 int inode_change_ok(const struct inode *inode, struct iattr *attr)
 {
-	int retval = -EPERM;
 	unsigned int ia_valid = attr->ia_valid;
 
+	/*
+	 * First check size constraints.  These can't be overriden using
+	 * ATTR_FORCE.
+	 */
+	if (attr->ia_mode & ATTR_SIZE) {
+		int error = inode_newsize_ok(inode, attr->ia_size);
+		if (error)
+			return error;
+	}
+
 	/* If force is set do it anyway. */
 	if (ia_valid & ATTR_FORCE)
-		goto fine;
+		return 0;
 
 	/* Make sure a caller can chown. */
 	if ((ia_valid & ATTR_UID) &&
 	    (current_fsuid() != inode->i_uid ||
 	     attr->ia_uid != inode->i_uid) && !capable(CAP_CHOWN))
-		goto error;
+		return -EPERM;
 
 	/* Make sure caller can chgrp. */
 	if ((ia_valid & ATTR_GID) &&
 	    (current_fsuid() != inode->i_uid ||
 	    (!in_group_p(attr->ia_gid) && attr->ia_gid != inode->i_gid)) &&
 	    !capable(CAP_CHOWN))
-		goto error;
+		return -EPERM;
 
 	/* Make sure a caller can chmod. */
 	if (ia_valid & ATTR_MODE) {
 		if (!is_owner_or_cap(inode))
-			goto error;
+			return -EPERM;
 		/* Also check the setgid bit! */
 		if (!in_group_p((ia_valid & ATTR_GID) ? attr->ia_gid :
 				inode->i_gid) && !capable(CAP_FSETID))
@@ -52,12 +70,10 @@ int inode_change_ok(const struct inode *
 	/* Check for setting the inode time. */
 	if (ia_valid & (ATTR_MTIME_SET | ATTR_ATIME_SET | ATTR_TIMES_SET)) {
 		if (!is_owner_or_cap(inode))
-			goto error;
+			return -EPERM;
 	}
-fine:
-	retval = 0;
-error:
-	return retval;
+
+	return 0;
 }
 EXPORT_SYMBOL(inode_change_ok);
 
@@ -113,7 +129,7 @@ EXPORT_SYMBOL(inode_newsize_ok);
  *
  * setattr_copy updates the inode's metadata with that specified
  * in attr. Noticably missing is inode size update, which is more complex
- * as it requires pagecache updates. See simple_setsize.
+ * as it requires pagecache updates.
  *
  * The inode is not marked as dirty after this operation. The rationale is
  * that for "simple" filesystems, the struct inode is the inode storage.
Index: linux-2.6/fs/fuse/dir.c
===================================================================
--- linux-2.6.orig/fs/fuse/dir.c	2010-06-01 12:20:04.409254311 +0200
+++ linux-2.6/fs/fuse/dir.c	2010-06-01 12:21:25.285273727 +0200
@@ -1280,12 +1280,8 @@ static int fuse_do_setattr(struct dentry
 	if ((attr->ia_valid & ATTR_OPEN) && fc->atomic_o_trunc)
 		return 0;
 
-	if (attr->ia_valid & ATTR_SIZE) {
-		err = inode_newsize_ok(inode, attr->ia_size);
-		if (err)
-			return err;
+	if (attr->ia_valid & ATTR_SIZE)
 		is_truncate = true;
-	}
 
 	req = fuse_get_req(fc);
 	if (IS_ERR(req))
Index: linux-2.6/fs/adfs/inode.c
===================================================================
--- linux-2.6.orig/fs/adfs/inode.c	2010-06-01 12:19:22.754023343 +0200
+++ linux-2.6/fs/adfs/inode.c	2010-06-01 12:21:25.290022016 +0200
@@ -333,10 +333,7 @@ adfs_notify_change(struct dentry *dentry
 
 	/* XXX: this is missing some actual on-disk truncation.. */
 	if (ia_valid & ATTR_SIZE)
-		error = simple_setsize(inode, attr->ia_size);
-
-	if (error)
-		goto out;
+		truncate_setsize(inode, attr->ia_size);
 
 	if (ia_valid & ATTR_MTIME) {
 		inode->i_mtime = attr->ia_mtime;
Index: linux-2.6/fs/gfs2/ops_inode.c
===================================================================
--- linux-2.6.orig/fs/gfs2/ops_inode.c	2010-06-01 12:19:48.121034167 +0200
+++ linux-2.6/fs/gfs2/ops_inode.c	2010-06-01 12:21:25.294022715 +0200
@@ -1072,7 +1072,7 @@ int gfs2_permission(struct inode *inode,
 }
 
 /*
- * XXX: should be changed to have proper ordering by opencoding simple_setsize
+ * XXX(truncate): the truncate_setsize calls should be moved to the end.
  */
 static int setattr_size(struct inode *inode, struct iattr *attr)
 {
@@ -1084,10 +1084,8 @@ static int setattr_size(struct inode *in
 		error = gfs2_trans_begin(sdp, 0, sdp->sd_jdesc->jd_blocks);
 		if (error)
 			return error;
-		error = simple_setsize(inode, attr->ia_size);
+		truncate_setsize(inode, attr->ia_size);
 		gfs2_trans_end(sdp);
-		if (error) 
-			return error;
 	}
 
 	error = gfs2_truncatei(ip, attr->ia_size);
Index: linux-2.6/fs/ufs/truncate.c
===================================================================
--- linux-2.6.orig/fs/ufs/truncate.c	2010-06-01 12:19:48.097005602 +0200
+++ linux-2.6/fs/ufs/truncate.c	2010-06-01 12:21:25.298021667 +0200
@@ -500,11 +500,6 @@ out:
 	return err;
 }
 
-/*
- * TODO:
- *	- truncate case should use proper ordering instead of using
- *	  simple_setsize
- */
 int ufs_setattr(struct dentry *dentry, struct iattr *attr)
 {
 	struct inode *inode = dentry->d_inode;
@@ -518,9 +513,9 @@ int ufs_setattr(struct dentry *dentry, s
 	if (ia_valid & ATTR_SIZE && attr->ia_size != inode->i_size) {
 		loff_t old_i_size = inode->i_size;
 
-		error = simple_setsize(inode, attr->ia_size);
-		if (error)
-			return error;
+		/* XXX(truncate): truncate_setsize should be called last */
+		truncate_setsize(inode, attr->ia_size);
+
 		error = ufs_truncate(inode, old_i_size);
 		if (error)
 			return error;
Index: linux-2.6/mm/shmem.c
===================================================================
--- linux-2.6.orig/mm/shmem.c	2010-06-01 12:20:04.424255568 +0200
+++ linux-2.6/mm/shmem.c	2010-06-01 12:21:25.300003438 +0200
@@ -804,11 +804,10 @@ static int shmem_notify_change(struct de
 			}
 		}
 
-		error = simple_setsize(inode, newsize);
+		/* XXX(truncate): truncate_setsize should be called last */
+		truncate_setsize(inode, newsize);
 		if (page)
 			page_cache_release(page);
-		if (error)
-			return error;
 		shmem_truncate_range(inode, newsize, (loff_t)-1);
 	}
 
Index: linux-2.6/mm/truncate.c
===================================================================
--- linux-2.6.orig/mm/truncate.c	2010-06-01 12:19:22.909012727 +0200
+++ linux-2.6/mm/truncate.c	2010-06-01 12:21:25.307005883 +0200
@@ -541,28 +541,48 @@ void truncate_pagecache(struct inode *in
 EXPORT_SYMBOL(truncate_pagecache);
 
 /**
+ * truncate_setsize - update inode and pagecache for a new file size
+ * @inode: inode
+ * @newsize: new file size
+ *
+ * truncate_setsize updastes i_size update and performs pagecache
+ * truncation (if necessary) for a file size updates. It will be
+ * typically be called from the filesystem's setattr function when
+ * ATTR_SIZE is passed in.
+ *
+ * Must be called with inode_mutex held and after all filesystem
+ * specific block truncation has been performed.
+ */
+void truncate_setsize(struct inode *inode, loff_t newsize)
+{
+	loff_t oldsize;
+
+	oldsize = inode->i_size;
+	i_size_write(inode, newsize);
+
+	truncate_pagecache(inode, oldsize, newsize);
+}
+EXPORT_SYMBOL(truncate_setsize);
+
+/**
  * vmtruncate - unmap mappings "freed" by truncate() syscall
  * @inode: inode of the file used
  * @offset: file offset to start truncating
  *
- * NOTE! We have to be ready to update the memory sharing
- * between the file and the memory map for a potential last
- * incomplete page.  Ugly, but necessary.
- *
- * This function is deprecated and simple_setsize or truncate_pagecache
- * should be used instead.
+ * This function is deprecated and truncate_setsize or truncate_pagecache
+ * should be used instead, together with filesystem specific block truncation.
  */
 int vmtruncate(struct inode *inode, loff_t offset)
 {
 	int error;
 
-	error = simple_setsize(inode, offset);
+	error = inode_newsize_ok(inode, offset);
 	if (error)
 		return error;
 
+	truncate_setsize(inode, offset);
 	if (inode->i_op->truncate)
 		inode->i_op->truncate(inode);
-
-	return error;
+	return 0;
 }
 EXPORT_SYMBOL(vmtruncate);
Index: linux-2.6/fs/ext2/inode.c
===================================================================
--- linux-2.6.orig/fs/ext2/inode.c	2010-06-01 12:19:22.792026136 +0200
+++ linux-2.6/fs/ext2/inode.c	2010-06-01 13:21:47.471253822 +0200
@@ -1156,15 +1156,10 @@ static void ext2_truncate_blocks(struct
 	__ext2_truncate_blocks(inode, offset);
 }
 
-int ext2_setsize(struct inode *inode, loff_t newsize)
+static int ext2_setsize(struct inode *inode, loff_t newsize)
 {
-	loff_t oldsize;
 	int error;
 
-	error = inode_newsize_ok(inode, newsize);
-	if (error)
-		return error;
-
 	if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
 	    S_ISLNK(inode->i_mode)))
 		return -EINVAL;
@@ -1184,10 +1179,7 @@ int ext2_setsize(struct inode *inode, lo
 	if (error)
 		return error;
 
-	oldsize = inode->i_size;
-	i_size_write(inode, newsize);
-	truncate_pagecache(inode, oldsize, newsize);
-
+	truncate_setsize(inode, newsize);
 	__ext2_truncate_blocks(inode, newsize);
 
 	inode->i_mtime = inode->i_ctime = CURRENT_TIME_SEC;
Index: linux-2.6/fs/fat/fat.h
===================================================================
--- linux-2.6.orig/fs/fat/fat.h	2010-06-01 12:19:22.800004206 +0200
+++ linux-2.6/fs/fat/fat.h	2010-06-01 12:21:25.324034448 +0200
@@ -306,7 +306,6 @@ extern long fat_generic_ioctl(struct fil
 extern const struct file_operations fat_file_operations;
 extern const struct inode_operations fat_file_inode_operations;
 extern int fat_setattr(struct dentry * dentry, struct iattr * attr);
-extern int fat_setsize(struct inode *inode, loff_t offset);
 extern void fat_truncate_blocks(struct inode *inode, loff_t offset);
 extern int fat_getattr(struct vfsmount *mnt, struct dentry *dentry,
 		       struct kstat *stat);
Index: linux-2.6/fs/fat/file.c
===================================================================
--- linux-2.6.orig/fs/fat/file.c	2010-06-01 12:21:01.134027230 +0200
+++ linux-2.6/fs/fat/file.c	2010-06-01 12:21:25.329066785 +0200
@@ -364,18 +364,6 @@ static int fat_allow_set_time(struct msd
 	return 0;
 }
 
-int fat_setsize(struct inode *inode, loff_t offset)
-{
-	int error;
-
-	error = simple_setsize(inode, offset);
-	if (error)
-		return error;
-	fat_truncate_blocks(inode, offset);
-
-	return error;
-}
-
 #define TIMES_SET_FLAGS	(ATTR_MTIME_SET | ATTR_ATIME_SET | ATTR_TIMES_SET)
 /* valid file mode bits */
 #define FAT_VALID_MODE	(S_IFREG | S_IFDIR | S_IRWXUGO)
@@ -441,9 +429,8 @@ int fat_setattr(struct dentry *dentry, s
 	}
 
 	if (attr->ia_valid & ATTR_SIZE) {
-		error = fat_setsize(inode, attr->ia_size);
-		if (error)
-			goto out;
+		truncate_setsize(inode, attr->ia_size);
+		fat_truncate_blocks(inode, attr->ia_size);
 	}
 
 	setattr_copy(inode, attr);
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h	2010-06-01 12:19:22.918003717 +0200
+++ linux-2.6/include/linux/mm.h	2010-06-01 12:21:25.333005743 +0200
@@ -815,6 +815,7 @@ static inline void unmap_shared_mapping_
 }
 
 extern void truncate_pagecache(struct inode *inode, loff_t old, loff_t new);
+extern void truncate_setsize(struct inode *inode, loff_t newsize);
 extern int vmtruncate(struct inode *inode, loff_t offset);
 extern int vmtruncate_range(struct inode *inode, loff_t offset, loff_t end);
 
Index: linux-2.6/fs/ecryptfs/inode.c
===================================================================
--- linux-2.6.orig/fs/ecryptfs/inode.c	2010-06-01 12:19:22.825003996 +0200
+++ linux-2.6/fs/ecryptfs/inode.c	2010-06-01 12:21:25.338031445 +0200
@@ -804,10 +804,20 @@ static int truncate_upper(struct dentry
 		size_t num_zeros = (PAGE_CACHE_SIZE
 				    - (ia->ia_size & ~PAGE_CACHE_MASK));
 
+
+		/*
+		 * XXX(truncate) this should really happen at the begginning
+		 * of ->setattr.  But the code is too messy to that as part
+		 * of a larger patch.  ecryptfs is also totally missing out
+		 * on the inode_change_ok check at the beginning of
+		 * ->setattr while would include this.
+		 */
+		rc = inode_newsize_ok(inode, ia->ia_size);
+		if (rc)
+			goto out;
+
 		if (!(crypt_stat->flags & ECRYPTFS_ENCRYPTED)) {
-			rc = simple_setsize(inode, ia->ia_size);
-			if (rc)
-				goto out;
+			truncate_setsize(inode, ia->ia_size);
 			lower_ia->ia_size = ia->ia_size;
 			lower_ia->ia_valid |= ATTR_SIZE;
 			goto out;
@@ -830,7 +840,7 @@ static int truncate_upper(struct dentry
 				goto out;
 			}
 		}
-		simple_setsize(inode, ia->ia_size);
+		truncate_setsize(inode, ia->ia_size);
 		rc = ecryptfs_write_inode_size_to_metadata(inode);
 		if (rc) {
 			printk(KERN_ERR	"Problem with "
Index: linux-2.6/fs/gfs2/aops.c
===================================================================
--- linux-2.6.orig/fs/gfs2/aops.c	2010-06-01 12:19:22.774004276 +0200
+++ linux-2.6/fs/gfs2/aops.c	2010-06-01 12:21:25.345018524 +0200
@@ -702,12 +702,12 @@ out:
 	page_cache_release(page);
 
 	/*
-	 * XXX(hch): the call below should probably be replaced with
+	 * XXX(truncate): the call below should probably be replaced with
 	 * a call to the gfs2-specific truncate blocks helper to actually
 	 * release disk blocks..
 	 */
 	if (pos + len > ip->i_inode.i_size)
-		simple_setsize(&ip->i_inode, ip->i_inode.i_size);
+		truncate_setsize(&ip->i_inode, ip->i_inode.i_size);
 out_endtrans:
 	gfs2_trans_end(sdp);
 out_trans_fail:
Index: linux-2.6/fs/jffs2/fs.c
===================================================================
--- linux-2.6.orig/fs/jffs2/fs.c	2010-06-01 12:19:22.834023482 +0200
+++ linux-2.6/fs/jffs2/fs.c	2010-06-01 12:21:25.350255429 +0200
@@ -169,13 +169,13 @@ int jffs2_do_setattr (struct inode *inod
 	mutex_unlock(&f->sem);
 	jffs2_complete_reservation(c);
 
-	/* We have to do the simple_setsize() without f->sem held, since
+	/* We have to do the truncate_setsize() without f->sem held, since
 	   some pages may be locked and waiting for it in readpage().
 	   We are protected from a simultaneous write() extending i_size
 	   back past iattr->ia_size, because do_truncate() holds the
 	   generic inode semaphore. */
 	if (ivalid & ATTR_SIZE && inode->i_size > iattr->ia_size) {
-		simple_setsize(inode, iattr->ia_size);
+		truncate_setsize(inode, iattr->ia_size);
 		inode->i_blocks = (inode->i_size + 511) >> 9;
 	}	
 
Index: linux-2.6/fs/ocfs2/file.c
===================================================================
--- linux-2.6.orig/fs/ocfs2/file.c	2010-06-01 12:19:48.155067901 +0200
+++ linux-2.6/fs/ocfs2/file.c	2010-06-01 12:21:25.360261505 +0200
@@ -1052,7 +1052,7 @@ int ocfs2_setattr(struct dentry *dentry,
 	}
 
 	/*
-	 * This will intentionally not wind up calling simple_setsize(),
+	 * This will intentionally not wind up calling truncate_setsize(),
 	 * since all the work for a size change has been done above.
 	 * Otherwise, we could get into problems with truncate as
 	 * ip_alloc_sem is used there to protect against i_size
@@ -2127,12 +2127,12 @@ relock:
 			 * blocks outside i_size. Trim these off again.
 			 * Don't need i_size_read because we hold i_mutex.
 			 *
-			 * XXX(hch): this looks buggy because ocfs2 did not
+			 * XXX(truncate): this looks buggy because ocfs2 did not
 			 * actually implement ->truncate.  Take a look at
 			 * the new truncate sequence and update this accordingly
 			 */
 			if (*ppos + count > inode->i_size)
-				simple_setsize(inode, inode->i_size);
+				truncate_setsize(inode, inode->i_size);
 			ret = written;
 			goto out_dio;
 		}
Index: linux-2.6/fs/ramfs/file-nommu.c
===================================================================
--- linux-2.6.orig/fs/ramfs/file-nommu.c	2010-06-01 12:19:22.853004346 +0200
+++ linux-2.6/fs/ramfs/file-nommu.c	2010-06-01 12:21:25.365255638 +0200
@@ -146,9 +146,8 @@ static int ramfs_nommu_resize(struct ino
 			return ret;
 	}
 
-	ret = simple_setsize(inode, newsize);
-
-	return ret;
+	truncate_setsize(inode, newsize);
+	return 0;
 }
 
 /*****************************************************************************/
Index: linux-2.6/fs/smbfs/inode.c
===================================================================
--- linux-2.6.orig/fs/smbfs/inode.c	2010-06-01 12:19:22.865004555 +0200
+++ linux-2.6/fs/smbfs/inode.c	2010-06-01 12:21:25.370005394 +0200
@@ -714,9 +714,7 @@ smb_notify_change(struct dentry *dentry,
 		error = server->ops->truncate(inode, attr->ia_size);
 		if (error)
 			goto out;
-		error = simple_setsize(inode, attr->ia_size);
-		if (error)
-			goto out;
+		truncate_setsize(inode, attr->ia_size);
 		refresh = 1;
 	}
 
Index: linux-2.6/fs/ubifs/file.c
===================================================================
--- linux-2.6.orig/fs/ubifs/file.c	2010-06-01 12:19:22.874004974 +0200
+++ linux-2.6/fs/ubifs/file.c	2010-06-01 12:21:25.377005534 +0200
@@ -967,14 +967,15 @@ static int do_writepage(struct page *pag
  * the page locked, and it locks @ui_mutex. However, write-back does take inode
  * @i_mutex, which means other VFS operations may be run on this inode at the
  * same time. And the problematic one is truncation to smaller size, from where
- * we have to call 'simple_setsize()', which first changes @inode->i_size, then
+ * we have to call 'truncate_setsize()', which first changes @inode->i_size, then
  * drops the truncated pages. And while dropping the pages, it takes the page
- * lock. This means that 'do_truncation()' cannot call 'simple_setsize()' with
+ * lock. This means that 'do_truncation()' cannot call 'truncate_setsize()' with
  * @ui_mutex locked, because it would deadlock with 'ubifs_writepage()'. This
  * means that @inode->i_size is changed while @ui_mutex is unlocked.
  *
- * XXX: with the new truncate the above is not true anymore, the simple_setsize
- * calls can be replaced with the individual components.
+ * XXX(truncate): with the new truncate sequence this is not true anymore,
+ * and the calls to truncate_setsize can be move around freely.  They should
+ * be moved to the very end of the truncate sequence.
  *
  * But in 'ubifs_writepage()' we have to guarantee that we do not write beyond
  * inode size. How do we do this if @inode->i_size may became smaller while we
@@ -1128,9 +1129,7 @@ static int do_truncation(struct ubifs_in
 		budgeted = 0;
 	}
 
-	err = simple_setsize(inode, new_size);
-	if (err)
-		goto out_budg;
+	truncate_setsize(inode, new_size);
 
 	if (offset) {
 		pgoff_t index = new_size >> PAGE_CACHE_SHIFT;
@@ -1217,16 +1216,14 @@ static int do_setattr(struct ubifs_info
 
 	if (attr->ia_valid & ATTR_SIZE) {
 		dbg_gen("size %lld -> %lld", inode->i_size, new_size);
-		err = simple_setsize(inode, new_size);
-		if (err)
-			goto out;
+		truncate_setsize(inode, new_size);
 	}
 
 	mutex_lock(&ui->ui_mutex);
 	if (attr->ia_valid & ATTR_SIZE) {
 		/* Truncation changes inode [mc]time */
 		inode->i_mtime = inode->i_ctime = ubifs_current_time(inode);
-		/* 'simple_setsize()' changed @i_size, update @ui_size */
+		/* 'truncate_setsize()' changed @i_size, update @ui_size */
 		ui->ui_size = inode->i_size;
 	}
 
@@ -1248,10 +1245,6 @@ static int do_setattr(struct ubifs_info
 	if (IS_SYNC(inode))
 		err = inode->i_sb->s_op->write_inode(inode, NULL);
 	return err;
-
-out:
-	ubifs_release_budget(c, &req);
-	return err;
 }
 
 int ubifs_setattr(struct dentry *dentry, struct iattr *attr)
Index: linux-2.6/fs/ubifs/ubifs.h
===================================================================
--- linux-2.6.orig/fs/ubifs/ubifs.h	2010-06-01 12:19:22.881004276 +0200
+++ linux-2.6/fs/ubifs/ubifs.h	2010-06-01 12:21:25.385033052 +0200
@@ -379,7 +379,7 @@ struct ubifs_gced_idx_leb {
  * The @ui_size is a "shadow" variable for @inode->i_size and UBIFS uses
  * @ui_size instead of @inode->i_size. The reason for this is that UBIFS cannot
  * make sure @inode->i_size is always changed under @ui_mutex, because it
- * cannot call 'simple_setsize()' with @ui_mutex locked, because it would deadlock
+ * cannot call 'truncate_setsize()' with @ui_mutex locked, because it would deadlock
  * with 'ubifs_writepage()' (see file.c). All the other inode fields are
  * changed under @ui_mutex, so they do not need "shadow" fields. Note, one
  * could consider to rework locking and base it on "shadow" fields.
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2010-06-01 12:19:48.167005602 +0200
+++ linux-2.6/include/linux/fs.h	2010-06-01 12:21:25.392005743 +0200
@@ -2341,7 +2341,6 @@ extern int simple_link(struct dentry *,
 extern int simple_unlink(struct inode *, struct dentry *);
 extern int simple_rmdir(struct inode *, struct dentry *);
 extern int simple_rename(struct inode *, struct dentry *, struct inode *, struct dentry *);
-extern int simple_setsize(struct inode *, loff_t);
 extern int noop_fsync(struct file *, int);
 extern int simple_empty(struct dentry *);
 extern int simple_readpage(struct file *file, struct page *page);
Index: linux-2.6/fs/libfs.c
===================================================================
--- linux-2.6.orig/fs/libfs.c	2010-06-01 12:19:22.890024949 +0200
+++ linux-2.6/fs/libfs.c	2010-06-01 12:21:25.398033191 +0200
@@ -327,49 +327,6 @@ int simple_rename(struct inode *old_dir,
 }
 
 /**
- * simple_setsize - handle core mm and vfs requirements for file size change
- * @inode: inode
- * @newsize: new file size
- *
- * Returns 0 on success, -error on failure.
- *
- * simple_setsize must be called with inode_mutex held.
- *
- * simple_setsize will check that the requested new size is OK (see
- * inode_newsize_ok), and then will perform the necessary i_size update
- * and pagecache truncation (if necessary). It will be typically be called
- * from the filesystem's setattr function when ATTR_SIZE is passed in.
- *
- * The inode itself must have correct permissions and attributes to allow
- * i_size to be changed, this function then just checks that the new size
- * requested is valid.
- *
- * In the case of simple in-memory filesystems with inodes stored solely
- * in the inode cache, and file data in the pagecache, nothing more needs
- * to be done to satisfy a truncate request. Filesystems with on-disk
- * blocks for example will need to free them in the case of truncate, in
- * that case it may be easier not to use simple_setsize (but each of its
- * components will likely be required at some point to update pagecache
- * and inode etc).
- */
-int simple_setsize(struct inode *inode, loff_t newsize)
-{
-	loff_t oldsize;
-	int error;
-
-	error = inode_newsize_ok(inode, newsize);
-	if (error)
-		return error;
-
-	oldsize = inode->i_size;
-	i_size_write(inode, newsize);
-	truncate_pagecache(inode, oldsize, newsize);
-
-	return error;
-}
-EXPORT_SYMBOL(simple_setsize);
-
-/**
  * simple_setattr - setattr for simple filesystem
  * @dentry: dentry
  * @iattr: iattr structure
@@ -394,12 +351,8 @@ int simple_setattr(struct dentry *dentry
 	if (error)
 		return error;
 
-	if (iattr->ia_valid & ATTR_SIZE) {
-		error = simple_setsize(inode, iattr->ia_size);
-		if (error)
-			return error;
-	}
-
+	if (iattr->ia_valid & ATTR_SIZE)
+		truncate_setsize(inode, iattr->ia_size);
 	setattr_copy(inode, iattr);
 	mark_inode_dirty(inode);
 	return 0;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] check ATTR_SIZE contraints in inode_change_ok
  2010-06-01 11:49   ` Steven Whitehouse
@ 2010-06-01 11:47     ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2010-06-01 11:47 UTC (permalink / raw)
  To: Steven Whitehouse; +Cc: Christoph Hellwig, viro, npiggin, jack, linux-fsdevel

On Tue, Jun 01, 2010 at 12:49:35PM +0100, Steven Whitehouse wrote:
> Hi,
> 
> Can you drop the GFS2 bit from this? I've got patches for the new
> truncate sequence for GFS2 and that bit you have altered in the patch
> below doesn't exist after that patch is applied,

I'd really prefer to get all these preparation in before converting
filesystems if possible.  They change a lot of the helpers to be used
in ->setattr, so getting the infrastructure right should make it
a lot more clear what's happening in the filesystem code.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] check ATTR_SIZE contraints in inode_change_ok
  2010-06-01 11:39 ` [PATCH 2/2] check ATTR_SIZE contraints in inode_change_ok Christoph Hellwig
@ 2010-06-01 11:49   ` Steven Whitehouse
  2010-06-01 11:47     ` Christoph Hellwig
  2010-06-09  7:33   ` Nick Piggin
  1 sibling, 1 reply; 12+ messages in thread
From: Steven Whitehouse @ 2010-06-01 11:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: viro, npiggin, jack, linux-fsdevel

Hi,

Can you drop the GFS2 bit from this? I've got patches for the new
truncate sequence for GFS2 and that bit you have altered in the patch
below doesn't exist after that patch is applied,

Steve.

On Tue, 2010-06-01 at 13:39 +0200, Christoph Hellwig wrote:
> Make sure we check the truncate constraints early on in ->setattr by adding
> those checks to inode_change_ok.  Also clean up and document inode_change_ok
> to make this obvious.
> 
> As a fallout we don't have to call inode_newsize_ok from simple_setsize and
> simplify it down to a truncate_setsize which doesn't return an error.  This
> simplifies a lot of setattr implementations and means we use truncate_setsize
> almost everywhere.  Get rid of fat_setsize now that it's trivial and mark
> ext2_setsize static to make the calling convention obvious.
> 
> Keep the inode_newsize_ok in vmtruncate for now as all callers need an
> audit for it's removal anyway.
> 
> Note: setattr code in ecryptfs doesn't call inode_change_ok at all and
> needs a deeper audit, but that is left for later.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Index: linux-2.6/fs/attr.c
> ===================================================================
> --- linux-2.6.orig/fs/attr.c	2010-06-01 12:19:47.987277079 +0200
> +++ linux-2.6/fs/attr.c	2010-06-01 12:48:26.063005672 +0200
> @@ -14,35 +14,53 @@
>  #include <linux/fcntl.h>
>  #include <linux/security.h>
>  
> -/* Taken over from the old code... */
> -
> -/* POSIX UID/GID verification for setting inode attributes. */
> +/**
> + * inode_change_ok - check if attribute changes to an inode are allowed
> + * @inode:	inode to check
> + * @attr:	attributes to change
> + *
> + * Check if we are allowed to change the attributes contained in @attr
> + * in the given inode.  This includes the normal unix access permission
> + * checks, as well as checks for rlimits and others.
> + *
> + * Should be called as the first thing in ->setattr implementations,
> + * possibly after taking additional locks.
> + */
>  int inode_change_ok(const struct inode *inode, struct iattr *attr)
>  {
> -	int retval = -EPERM;
>  	unsigned int ia_valid = attr->ia_valid;
>  
> +	/*
> +	 * First check size constraints.  These can't be overriden using
> +	 * ATTR_FORCE.
> +	 */
> +	if (attr->ia_mode & ATTR_SIZE) {
> +		int error = inode_newsize_ok(inode, attr->ia_size);
> +		if (error)
> +			return error;
> +	}
> +
>  	/* If force is set do it anyway. */
>  	if (ia_valid & ATTR_FORCE)
> -		goto fine;
> +		return 0;
>  
>  	/* Make sure a caller can chown. */
>  	if ((ia_valid & ATTR_UID) &&
>  	    (current_fsuid() != inode->i_uid ||
>  	     attr->ia_uid != inode->i_uid) && !capable(CAP_CHOWN))
> -		goto error;
> +		return -EPERM;
>  
>  	/* Make sure caller can chgrp. */
>  	if ((ia_valid & ATTR_GID) &&
>  	    (current_fsuid() != inode->i_uid ||
>  	    (!in_group_p(attr->ia_gid) && attr->ia_gid != inode->i_gid)) &&
>  	    !capable(CAP_CHOWN))
> -		goto error;
> +		return -EPERM;
>  
>  	/* Make sure a caller can chmod. */
>  	if (ia_valid & ATTR_MODE) {
>  		if (!is_owner_or_cap(inode))
> -			goto error;
> +			return -EPERM;
>  		/* Also check the setgid bit! */
>  		if (!in_group_p((ia_valid & ATTR_GID) ? attr->ia_gid :
>  				inode->i_gid) && !capable(CAP_FSETID))
> @@ -52,12 +70,10 @@ int inode_change_ok(const struct inode *
>  	/* Check for setting the inode time. */
>  	if (ia_valid & (ATTR_MTIME_SET | ATTR_ATIME_SET | ATTR_TIMES_SET)) {
>  		if (!is_owner_or_cap(inode))
> -			goto error;
> +			return -EPERM;
>  	}
> -fine:
> -	retval = 0;
> -error:
> -	return retval;
> +
> +	return 0;
>  }
>  EXPORT_SYMBOL(inode_change_ok);
>  
> @@ -113,7 +129,7 @@ EXPORT_SYMBOL(inode_newsize_ok);
>   *
>   * setattr_copy updates the inode's metadata with that specified
>   * in attr. Noticably missing is inode size update, which is more complex
> - * as it requires pagecache updates. See simple_setsize.
> + * as it requires pagecache updates.
>   *
>   * The inode is not marked as dirty after this operation. The rationale is
>   * that for "simple" filesystems, the struct inode is the inode storage.
> Index: linux-2.6/fs/fuse/dir.c
> ===================================================================
> --- linux-2.6.orig/fs/fuse/dir.c	2010-06-01 12:20:04.409254311 +0200
> +++ linux-2.6/fs/fuse/dir.c	2010-06-01 12:21:25.285273727 +0200
> @@ -1280,12 +1280,8 @@ static int fuse_do_setattr(struct dentry
>  	if ((attr->ia_valid & ATTR_OPEN) && fc->atomic_o_trunc)
>  		return 0;
>  
> -	if (attr->ia_valid & ATTR_SIZE) {
> -		err = inode_newsize_ok(inode, attr->ia_size);
> -		if (err)
> -			return err;
> +	if (attr->ia_valid & ATTR_SIZE)
>  		is_truncate = true;
> -	}
>  
>  	req = fuse_get_req(fc);
>  	if (IS_ERR(req))
> Index: linux-2.6/fs/adfs/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/adfs/inode.c	2010-06-01 12:19:22.754023343 +0200
> +++ linux-2.6/fs/adfs/inode.c	2010-06-01 12:21:25.290022016 +0200
> @@ -333,10 +333,7 @@ adfs_notify_change(struct dentry *dentry
>  
>  	/* XXX: this is missing some actual on-disk truncation.. */
>  	if (ia_valid & ATTR_SIZE)
> -		error = simple_setsize(inode, attr->ia_size);
> -
> -	if (error)
> -		goto out;
> +		truncate_setsize(inode, attr->ia_size);
>  
>  	if (ia_valid & ATTR_MTIME) {
>  		inode->i_mtime = attr->ia_mtime;
> Index: linux-2.6/fs/gfs2/ops_inode.c
> ===================================================================
> --- linux-2.6.orig/fs/gfs2/ops_inode.c	2010-06-01 12:19:48.121034167 +0200
> +++ linux-2.6/fs/gfs2/ops_inode.c	2010-06-01 12:21:25.294022715 +0200
> @@ -1072,7 +1072,7 @@ int gfs2_permission(struct inode *inode,
>  }
>  
>  /*
> - * XXX: should be changed to have proper ordering by opencoding simple_setsize
> + * XXX(truncate): the truncate_setsize calls should be moved to the end.
>   */
>  static int setattr_size(struct inode *inode, struct iattr *attr)
>  {
> @@ -1084,10 +1084,8 @@ static int setattr_size(struct inode *in
>  		error = gfs2_trans_begin(sdp, 0, sdp->sd_jdesc->jd_blocks);
>  		if (error)
>  			return error;
> -		error = simple_setsize(inode, attr->ia_size);
> +		truncate_setsize(inode, attr->ia_size);
>  		gfs2_trans_end(sdp);
> -		if (error) 
> -			return error;
>  	}
>  
>  	error = gfs2_truncatei(ip, attr->ia_size);
> Index: linux-2.6/fs/ufs/truncate.c
> ===================================================================
> --- linux-2.6.orig/fs/ufs/truncate.c	2010-06-01 12:19:48.097005602 +0200
> +++ linux-2.6/fs/ufs/truncate.c	2010-06-01 12:21:25.298021667 +0200
> @@ -500,11 +500,6 @@ out:
>  	return err;
>  }
>  
> -/*
> - * TODO:
> - *	- truncate case should use proper ordering instead of using
> - *	  simple_setsize
> - */
>  int ufs_setattr(struct dentry *dentry, struct iattr *attr)
>  {
>  	struct inode *inode = dentry->d_inode;
> @@ -518,9 +513,9 @@ int ufs_setattr(struct dentry *dentry, s
>  	if (ia_valid & ATTR_SIZE && attr->ia_size != inode->i_size) {
>  		loff_t old_i_size = inode->i_size;
>  
> -		error = simple_setsize(inode, attr->ia_size);
> -		if (error)
> -			return error;
> +		/* XXX(truncate): truncate_setsize should be called last */
> +		truncate_setsize(inode, attr->ia_size);
> +
>  		error = ufs_truncate(inode, old_i_size);
>  		if (error)
>  			return error;
> Index: linux-2.6/mm/shmem.c
> ===================================================================
> --- linux-2.6.orig/mm/shmem.c	2010-06-01 12:20:04.424255568 +0200
> +++ linux-2.6/mm/shmem.c	2010-06-01 12:21:25.300003438 +0200
> @@ -804,11 +804,10 @@ static int shmem_notify_change(struct de
>  			}
>  		}
>  
> -		error = simple_setsize(inode, newsize);
> +		/* XXX(truncate): truncate_setsize should be called last */
> +		truncate_setsize(inode, newsize);
>  		if (page)
>  			page_cache_release(page);
> -		if (error)
> -			return error;
>  		shmem_truncate_range(inode, newsize, (loff_t)-1);
>  	}
>  
> Index: linux-2.6/mm/truncate.c
> ===================================================================
> --- linux-2.6.orig/mm/truncate.c	2010-06-01 12:19:22.909012727 +0200
> +++ linux-2.6/mm/truncate.c	2010-06-01 12:21:25.307005883 +0200
> @@ -541,28 +541,48 @@ void truncate_pagecache(struct inode *in
>  EXPORT_SYMBOL(truncate_pagecache);
>  
>  /**
> + * truncate_setsize - update inode and pagecache for a new file size
> + * @inode: inode
> + * @newsize: new file size
> + *
> + * truncate_setsize updastes i_size update and performs pagecache
> + * truncation (if necessary) for a file size updates. It will be
> + * typically be called from the filesystem's setattr function when
> + * ATTR_SIZE is passed in.
> + *
> + * Must be called with inode_mutex held and after all filesystem
> + * specific block truncation has been performed.
> + */
> +void truncate_setsize(struct inode *inode, loff_t newsize)
> +{
> +	loff_t oldsize;
> +
> +	oldsize = inode->i_size;
> +	i_size_write(inode, newsize);
> +
> +	truncate_pagecache(inode, oldsize, newsize);
> +}
> +EXPORT_SYMBOL(truncate_setsize);
> +
> +/**
>   * vmtruncate - unmap mappings "freed" by truncate() syscall
>   * @inode: inode of the file used
>   * @offset: file offset to start truncating
>   *
> - * NOTE! We have to be ready to update the memory sharing
> - * between the file and the memory map for a potential last
> - * incomplete page.  Ugly, but necessary.
> - *
> - * This function is deprecated and simple_setsize or truncate_pagecache
> - * should be used instead.
> + * This function is deprecated and truncate_setsize or truncate_pagecache
> + * should be used instead, together with filesystem specific block truncation.
>   */
>  int vmtruncate(struct inode *inode, loff_t offset)
>  {
>  	int error;
>  
> -	error = simple_setsize(inode, offset);
> +	error = inode_newsize_ok(inode, offset);
>  	if (error)
>  		return error;
>  
> +	truncate_setsize(inode, offset);
>  	if (inode->i_op->truncate)
>  		inode->i_op->truncate(inode);
> -
> -	return error;
> +	return 0;
>  }
>  EXPORT_SYMBOL(vmtruncate);
> Index: linux-2.6/fs/ext2/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/ext2/inode.c	2010-06-01 12:19:22.792026136 +0200
> +++ linux-2.6/fs/ext2/inode.c	2010-06-01 13:21:47.471253822 +0200
> @@ -1156,15 +1156,10 @@ static void ext2_truncate_blocks(struct
>  	__ext2_truncate_blocks(inode, offset);
>  }
>  
> -int ext2_setsize(struct inode *inode, loff_t newsize)
> +static int ext2_setsize(struct inode *inode, loff_t newsize)
>  {
> -	loff_t oldsize;
>  	int error;
>  
> -	error = inode_newsize_ok(inode, newsize);
> -	if (error)
> -		return error;
> -
>  	if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
>  	    S_ISLNK(inode->i_mode)))
>  		return -EINVAL;
> @@ -1184,10 +1179,7 @@ int ext2_setsize(struct inode *inode, lo
>  	if (error)
>  		return error;
>  
> -	oldsize = inode->i_size;
> -	i_size_write(inode, newsize);
> -	truncate_pagecache(inode, oldsize, newsize);
> -
> +	truncate_setsize(inode, newsize);
>  	__ext2_truncate_blocks(inode, newsize);
>  
>  	inode->i_mtime = inode->i_ctime = CURRENT_TIME_SEC;
> Index: linux-2.6/fs/fat/fat.h
> ===================================================================
> --- linux-2.6.orig/fs/fat/fat.h	2010-06-01 12:19:22.800004206 +0200
> +++ linux-2.6/fs/fat/fat.h	2010-06-01 12:21:25.324034448 +0200
> @@ -306,7 +306,6 @@ extern long fat_generic_ioctl(struct fil
>  extern const struct file_operations fat_file_operations;
>  extern const struct inode_operations fat_file_inode_operations;
>  extern int fat_setattr(struct dentry * dentry, struct iattr * attr);
> -extern int fat_setsize(struct inode *inode, loff_t offset);
>  extern void fat_truncate_blocks(struct inode *inode, loff_t offset);
>  extern int fat_getattr(struct vfsmount *mnt, struct dentry *dentry,
>  		       struct kstat *stat);
> Index: linux-2.6/fs/fat/file.c
> ===================================================================
> --- linux-2.6.orig/fs/fat/file.c	2010-06-01 12:21:01.134027230 +0200
> +++ linux-2.6/fs/fat/file.c	2010-06-01 12:21:25.329066785 +0200
> @@ -364,18 +364,6 @@ static int fat_allow_set_time(struct msd
>  	return 0;
>  }
>  
> -int fat_setsize(struct inode *inode, loff_t offset)
> -{
> -	int error;
> -
> -	error = simple_setsize(inode, offset);
> -	if (error)
> -		return error;
> -	fat_truncate_blocks(inode, offset);
> -
> -	return error;
> -}
> -
>  #define TIMES_SET_FLAGS	(ATTR_MTIME_SET | ATTR_ATIME_SET | ATTR_TIMES_SET)
>  /* valid file mode bits */
>  #define FAT_VALID_MODE	(S_IFREG | S_IFDIR | S_IRWXUGO)
> @@ -441,9 +429,8 @@ int fat_setattr(struct dentry *dentry, s
>  	}
>  
>  	if (attr->ia_valid & ATTR_SIZE) {
> -		error = fat_setsize(inode, attr->ia_size);
> -		if (error)
> -			goto out;
> +		truncate_setsize(inode, attr->ia_size);
> +		fat_truncate_blocks(inode, attr->ia_size);
>  	}
>  
>  	setattr_copy(inode, attr);
> Index: linux-2.6/include/linux/mm.h
> ===================================================================
> --- linux-2.6.orig/include/linux/mm.h	2010-06-01 12:19:22.918003717 +0200
> +++ linux-2.6/include/linux/mm.h	2010-06-01 12:21:25.333005743 +0200
> @@ -815,6 +815,7 @@ static inline void unmap_shared_mapping_
>  }
>  
>  extern void truncate_pagecache(struct inode *inode, loff_t old, loff_t new);
> +extern void truncate_setsize(struct inode *inode, loff_t newsize);
>  extern int vmtruncate(struct inode *inode, loff_t offset);
>  extern int vmtruncate_range(struct inode *inode, loff_t offset, loff_t end);
>  
> Index: linux-2.6/fs/ecryptfs/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/ecryptfs/inode.c	2010-06-01 12:19:22.825003996 +0200
> +++ linux-2.6/fs/ecryptfs/inode.c	2010-06-01 12:21:25.338031445 +0200
> @@ -804,10 +804,20 @@ static int truncate_upper(struct dentry
>  		size_t num_zeros = (PAGE_CACHE_SIZE
>  				    - (ia->ia_size & ~PAGE_CACHE_MASK));
>  
> +
> +		/*
> +		 * XXX(truncate) this should really happen at the begginning
> +		 * of ->setattr.  But the code is too messy to that as part
> +		 * of a larger patch.  ecryptfs is also totally missing out
> +		 * on the inode_change_ok check at the beginning of
> +		 * ->setattr while would include this.
> +		 */
> +		rc = inode_newsize_ok(inode, ia->ia_size);
> +		if (rc)
> +			goto out;
> +
>  		if (!(crypt_stat->flags & ECRYPTFS_ENCRYPTED)) {
> -			rc = simple_setsize(inode, ia->ia_size);
> -			if (rc)
> -				goto out;
> +			truncate_setsize(inode, ia->ia_size);
>  			lower_ia->ia_size = ia->ia_size;
>  			lower_ia->ia_valid |= ATTR_SIZE;
>  			goto out;
> @@ -830,7 +840,7 @@ static int truncate_upper(struct dentry
>  				goto out;
>  			}
>  		}
> -		simple_setsize(inode, ia->ia_size);
> +		truncate_setsize(inode, ia->ia_size);
>  		rc = ecryptfs_write_inode_size_to_metadata(inode);
>  		if (rc) {
>  			printk(KERN_ERR	"Problem with "
> Index: linux-2.6/fs/gfs2/aops.c
> ===================================================================
> --- linux-2.6.orig/fs/gfs2/aops.c	2010-06-01 12:19:22.774004276 +0200
> +++ linux-2.6/fs/gfs2/aops.c	2010-06-01 12:21:25.345018524 +0200
> @@ -702,12 +702,12 @@ out:
>  	page_cache_release(page);
>  
>  	/*
> -	 * XXX(hch): the call below should probably be replaced with
> +	 * XXX(truncate): the call below should probably be replaced with
>  	 * a call to the gfs2-specific truncate blocks helper to actually
>  	 * release disk blocks..
>  	 */
>  	if (pos + len > ip->i_inode.i_size)
> -		simple_setsize(&ip->i_inode, ip->i_inode.i_size);
> +		truncate_setsize(&ip->i_inode, ip->i_inode.i_size);
>  out_endtrans:
>  	gfs2_trans_end(sdp);
>  out_trans_fail:
> Index: linux-2.6/fs/jffs2/fs.c
> ===================================================================
> --- linux-2.6.orig/fs/jffs2/fs.c	2010-06-01 12:19:22.834023482 +0200
> +++ linux-2.6/fs/jffs2/fs.c	2010-06-01 12:21:25.350255429 +0200
> @@ -169,13 +169,13 @@ int jffs2_do_setattr (struct inode *inod
>  	mutex_unlock(&f->sem);
>  	jffs2_complete_reservation(c);
>  
> -	/* We have to do the simple_setsize() without f->sem held, since
> +	/* We have to do the truncate_setsize() without f->sem held, since
>  	   some pages may be locked and waiting for it in readpage().
>  	   We are protected from a simultaneous write() extending i_size
>  	   back past iattr->ia_size, because do_truncate() holds the
>  	   generic inode semaphore. */
>  	if (ivalid & ATTR_SIZE && inode->i_size > iattr->ia_size) {
> -		simple_setsize(inode, iattr->ia_size);
> +		truncate_setsize(inode, iattr->ia_size);
>  		inode->i_blocks = (inode->i_size + 511) >> 9;
>  	}	
>  
> Index: linux-2.6/fs/ocfs2/file.c
> ===================================================================
> --- linux-2.6.orig/fs/ocfs2/file.c	2010-06-01 12:19:48.155067901 +0200
> +++ linux-2.6/fs/ocfs2/file.c	2010-06-01 12:21:25.360261505 +0200
> @@ -1052,7 +1052,7 @@ int ocfs2_setattr(struct dentry *dentry,
>  	}
>  
>  	/*
> -	 * This will intentionally not wind up calling simple_setsize(),
> +	 * This will intentionally not wind up calling truncate_setsize(),
>  	 * since all the work for a size change has been done above.
>  	 * Otherwise, we could get into problems with truncate as
>  	 * ip_alloc_sem is used there to protect against i_size
> @@ -2127,12 +2127,12 @@ relock:
>  			 * blocks outside i_size. Trim these off again.
>  			 * Don't need i_size_read because we hold i_mutex.
>  			 *
> -			 * XXX(hch): this looks buggy because ocfs2 did not
> +			 * XXX(truncate): this looks buggy because ocfs2 did not
>  			 * actually implement ->truncate.  Take a look at
>  			 * the new truncate sequence and update this accordingly
>  			 */
>  			if (*ppos + count > inode->i_size)
> -				simple_setsize(inode, inode->i_size);
> +				truncate_setsize(inode, inode->i_size);
>  			ret = written;
>  			goto out_dio;
>  		}
> Index: linux-2.6/fs/ramfs/file-nommu.c
> ===================================================================
> --- linux-2.6.orig/fs/ramfs/file-nommu.c	2010-06-01 12:19:22.853004346 +0200
> +++ linux-2.6/fs/ramfs/file-nommu.c	2010-06-01 12:21:25.365255638 +0200
> @@ -146,9 +146,8 @@ static int ramfs_nommu_resize(struct ino
>  			return ret;
>  	}
>  
> -	ret = simple_setsize(inode, newsize);
> -
> -	return ret;
> +	truncate_setsize(inode, newsize);
> +	return 0;
>  }
>  
>  /*****************************************************************************/
> Index: linux-2.6/fs/smbfs/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/smbfs/inode.c	2010-06-01 12:19:22.865004555 +0200
> +++ linux-2.6/fs/smbfs/inode.c	2010-06-01 12:21:25.370005394 +0200
> @@ -714,9 +714,7 @@ smb_notify_change(struct dentry *dentry,
>  		error = server->ops->truncate(inode, attr->ia_size);
>  		if (error)
>  			goto out;
> -		error = simple_setsize(inode, attr->ia_size);
> -		if (error)
> -			goto out;
> +		truncate_setsize(inode, attr->ia_size);
>  		refresh = 1;
>  	}
>  
> Index: linux-2.6/fs/ubifs/file.c
> ===================================================================
> --- linux-2.6.orig/fs/ubifs/file.c	2010-06-01 12:19:22.874004974 +0200
> +++ linux-2.6/fs/ubifs/file.c	2010-06-01 12:21:25.377005534 +0200
> @@ -967,14 +967,15 @@ static int do_writepage(struct page *pag
>   * the page locked, and it locks @ui_mutex. However, write-back does take inode
>   * @i_mutex, which means other VFS operations may be run on this inode at the
>   * same time. And the problematic one is truncation to smaller size, from where
> - * we have to call 'simple_setsize()', which first changes @inode->i_size, then
> + * we have to call 'truncate_setsize()', which first changes @inode->i_size, then
>   * drops the truncated pages. And while dropping the pages, it takes the page
> - * lock. This means that 'do_truncation()' cannot call 'simple_setsize()' with
> + * lock. This means that 'do_truncation()' cannot call 'truncate_setsize()' with
>   * @ui_mutex locked, because it would deadlock with 'ubifs_writepage()'. This
>   * means that @inode->i_size is changed while @ui_mutex is unlocked.
>   *
> - * XXX: with the new truncate the above is not true anymore, the simple_setsize
> - * calls can be replaced with the individual components.
> + * XXX(truncate): with the new truncate sequence this is not true anymore,
> + * and the calls to truncate_setsize can be move around freely.  They should
> + * be moved to the very end of the truncate sequence.
>   *
>   * But in 'ubifs_writepage()' we have to guarantee that we do not write beyond
>   * inode size. How do we do this if @inode->i_size may became smaller while we
> @@ -1128,9 +1129,7 @@ static int do_truncation(struct ubifs_in
>  		budgeted = 0;
>  	}
>  
> -	err = simple_setsize(inode, new_size);
> -	if (err)
> -		goto out_budg;
> +	truncate_setsize(inode, new_size);
>  
>  	if (offset) {
>  		pgoff_t index = new_size >> PAGE_CACHE_SHIFT;
> @@ -1217,16 +1216,14 @@ static int do_setattr(struct ubifs_info
>  
>  	if (attr->ia_valid & ATTR_SIZE) {
>  		dbg_gen("size %lld -> %lld", inode->i_size, new_size);
> -		err = simple_setsize(inode, new_size);
> -		if (err)
> -			goto out;
> +		truncate_setsize(inode, new_size);
>  	}
>  
>  	mutex_lock(&ui->ui_mutex);
>  	if (attr->ia_valid & ATTR_SIZE) {
>  		/* Truncation changes inode [mc]time */
>  		inode->i_mtime = inode->i_ctime = ubifs_current_time(inode);
> -		/* 'simple_setsize()' changed @i_size, update @ui_size */
> +		/* 'truncate_setsize()' changed @i_size, update @ui_size */
>  		ui->ui_size = inode->i_size;
>  	}
>  
> @@ -1248,10 +1245,6 @@ static int do_setattr(struct ubifs_info
>  	if (IS_SYNC(inode))
>  		err = inode->i_sb->s_op->write_inode(inode, NULL);
>  	return err;
> -
> -out:
> -	ubifs_release_budget(c, &req);
> -	return err;
>  }
>  
>  int ubifs_setattr(struct dentry *dentry, struct iattr *attr)
> Index: linux-2.6/fs/ubifs/ubifs.h
> ===================================================================
> --- linux-2.6.orig/fs/ubifs/ubifs.h	2010-06-01 12:19:22.881004276 +0200
> +++ linux-2.6/fs/ubifs/ubifs.h	2010-06-01 12:21:25.385033052 +0200
> @@ -379,7 +379,7 @@ struct ubifs_gced_idx_leb {
>   * The @ui_size is a "shadow" variable for @inode->i_size and UBIFS uses
>   * @ui_size instead of @inode->i_size. The reason for this is that UBIFS cannot
>   * make sure @inode->i_size is always changed under @ui_mutex, because it
> - * cannot call 'simple_setsize()' with @ui_mutex locked, because it would deadlock
> + * cannot call 'truncate_setsize()' with @ui_mutex locked, because it would deadlock
>   * with 'ubifs_writepage()' (see file.c). All the other inode fields are
>   * changed under @ui_mutex, so they do not need "shadow" fields. Note, one
>   * could consider to rework locking and base it on "shadow" fields.
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h	2010-06-01 12:19:48.167005602 +0200
> +++ linux-2.6/include/linux/fs.h	2010-06-01 12:21:25.392005743 +0200
> @@ -2341,7 +2341,6 @@ extern int simple_link(struct dentry *,
>  extern int simple_unlink(struct inode *, struct dentry *);
>  extern int simple_rmdir(struct inode *, struct dentry *);
>  extern int simple_rename(struct inode *, struct dentry *, struct inode *, struct dentry *);
> -extern int simple_setsize(struct inode *, loff_t);
>  extern int noop_fsync(struct file *, int);
>  extern int simple_empty(struct dentry *);
>  extern int simple_readpage(struct file *file, struct page *page);
> Index: linux-2.6/fs/libfs.c
> ===================================================================
> --- linux-2.6.orig/fs/libfs.c	2010-06-01 12:19:22.890024949 +0200
> +++ linux-2.6/fs/libfs.c	2010-06-01 12:21:25.398033191 +0200
> @@ -327,49 +327,6 @@ int simple_rename(struct inode *old_dir,
>  }
>  
>  /**
> - * simple_setsize - handle core mm and vfs requirements for file size change
> - * @inode: inode
> - * @newsize: new file size
> - *
> - * Returns 0 on success, -error on failure.
> - *
> - * simple_setsize must be called with inode_mutex held.
> - *
> - * simple_setsize will check that the requested new size is OK (see
> - * inode_newsize_ok), and then will perform the necessary i_size update
> - * and pagecache truncation (if necessary). It will be typically be called
> - * from the filesystem's setattr function when ATTR_SIZE is passed in.
> - *
> - * The inode itself must have correct permissions and attributes to allow
> - * i_size to be changed, this function then just checks that the new size
> - * requested is valid.
> - *
> - * In the case of simple in-memory filesystems with inodes stored solely
> - * in the inode cache, and file data in the pagecache, nothing more needs
> - * to be done to satisfy a truncate request. Filesystems with on-disk
> - * blocks for example will need to free them in the case of truncate, in
> - * that case it may be easier not to use simple_setsize (but each of its
> - * components will likely be required at some point to update pagecache
> - * and inode etc).
> - */
> -int simple_setsize(struct inode *inode, loff_t newsize)
> -{
> -	loff_t oldsize;
> -	int error;
> -
> -	error = inode_newsize_ok(inode, newsize);
> -	if (error)
> -		return error;
> -
> -	oldsize = inode->i_size;
> -	i_size_write(inode, newsize);
> -	truncate_pagecache(inode, oldsize, newsize);
> -
> -	return error;
> -}
> -EXPORT_SYMBOL(simple_setsize);
> -
> -/**
>   * simple_setattr - setattr for simple filesystem
>   * @dentry: dentry
>   * @iattr: iattr structure
> @@ -394,12 +351,8 @@ int simple_setattr(struct dentry *dentry
>  	if (error)
>  		return error;
>  
> -	if (iattr->ia_valid & ATTR_SIZE) {
> -		error = simple_setsize(inode, iattr->ia_size);
> -		if (error)
> -			return error;
> -	}
> -
> +	if (iattr->ia_valid & ATTR_SIZE)
> +		truncate_setsize(inode, iattr->ia_size);
>  	setattr_copy(inode, iattr);
>  	mark_inode_dirty(inode);
>  	return 0;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] check ATTR_SIZE contraints in inode_change_ok
  2010-06-01 11:39 ` [PATCH 2/2] check ATTR_SIZE contraints in inode_change_ok Christoph Hellwig
  2010-06-01 11:49   ` Steven Whitehouse
@ 2010-06-09  7:33   ` Nick Piggin
  2010-06-09  9:41     ` Jan Kara
  1 sibling, 1 reply; 12+ messages in thread
From: Nick Piggin @ 2010-06-09  7:33 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: viro, jack, linux-fsdevel

On Tue, Jun 01, 2010 at 01:39:37PM +0200, Christoph Hellwig wrote:
> Make sure we check the truncate constraints early on in ->setattr by adding
> those checks to inode_change_ok.  Also clean up and document inode_change_ok
> to make this obvious.
> 
> As a fallout we don't have to call inode_newsize_ok from simple_setsize and
> simplify it down to a truncate_setsize which doesn't return an error.  This
> simplifies a lot of setattr implementations and means we use truncate_setsize
> almost everywhere.  Get rid of fat_setsize now that it's trivial and mark
> ext2_setsize static to make the calling convention obvious.
> 
> Keep the inode_newsize_ok in vmtruncate for now as all callers need an
> audit for it's removal anyway.
> 
> Note: setattr code in ecryptfs doesn't call inode_change_ok at all and
> needs a deeper audit, but that is left for later.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Index: linux-2.6/fs/attr.c
> ===================================================================
> --- linux-2.6.orig/fs/attr.c	2010-06-01 12:19:47.987277079 +0200
> +++ linux-2.6/fs/attr.c	2010-06-01 12:48:26.063005672 +0200
> @@ -14,35 +14,53 @@
>  #include <linux/fcntl.h>
>  #include <linux/security.h>
>  
> -/* Taken over from the old code... */
> -
> -/* POSIX UID/GID verification for setting inode attributes. */
> +/**
> + * inode_change_ok - check if attribute changes to an inode are allowed
> + * @inode:	inode to check
> + * @attr:	attributes to change
> + *
> + * Check if we are allowed to change the attributes contained in @attr
> + * in the given inode.  This includes the normal unix access permission
> + * checks, as well as checks for rlimits and others.
> + *
> + * Should be called as the first thing in ->setattr implementations,
> + * possibly after taking additional locks.
> + */
>  int inode_change_ok(const struct inode *inode, struct iattr *attr)
>  {
> -	int retval = -EPERM;
>  	unsigned int ia_valid = attr->ia_valid;
>  
> +	/*
> +	 * First check size constraints.  These can't be overriden using
> +	 * ATTR_FORCE.
> +	 */
> +	if (attr->ia_mode & ATTR_SIZE) {
> +		int error = inode_newsize_ok(inode, attr->ia_size);
> +		if (error)
> +			return error;
> +	}

Hmm, I don't know if we can do this unless you have audited the
filesystems (in which case they should be on the cc list of this
pach).

The problem is whether the i_size is valid and stable at this
point. And it doesn't help even if you do leave the inode_newsize_ok
check inside the vmtruncate part of the fs if the check incorrectly
fails here.

ocfs2 performs inode_change_ok outside ocfs2_rw_lock and
ocfs2_inode_lock, and inode_newsize_ok inside; cifs holds i_lock
while checking inode_newsize_ok and updating size; gfs2 inside
gfs2_trans_begin.

Haven't looked closely at all fses or the consequences of the above.

Thoughts?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] check ATTR_SIZE contraints in inode_change_ok
  2010-06-09  7:33   ` Nick Piggin
@ 2010-06-09  9:41     ` Jan Kara
  2010-06-09 10:06       ` Nick Piggin
  2010-06-09 10:07       ` Christoph Hellwig
  0 siblings, 2 replies; 12+ messages in thread
From: Jan Kara @ 2010-06-09  9:41 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Christoph Hellwig, viro, jack, linux-fsdevel, mfasheh

On Wed 09-06-10 17:33:36, Nick Piggin wrote:
> On Tue, Jun 01, 2010 at 01:39:37PM +0200, Christoph Hellwig wrote:
> > Make sure we check the truncate constraints early on in ->setattr by adding
> > those checks to inode_change_ok.  Also clean up and document inode_change_ok
> > to make this obvious.
> > 
> > As a fallout we don't have to call inode_newsize_ok from simple_setsize and
> > simplify it down to a truncate_setsize which doesn't return an error.  This
> > simplifies a lot of setattr implementations and means we use truncate_setsize
> > almost everywhere.  Get rid of fat_setsize now that it's trivial and mark
> > ext2_setsize static to make the calling convention obvious.
> > 
> > Keep the inode_newsize_ok in vmtruncate for now as all callers need an
> > audit for it's removal anyway.
> > 
> > Note: setattr code in ecryptfs doesn't call inode_change_ok at all and
> > needs a deeper audit, but that is left for later.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > 
> > Index: linux-2.6/fs/attr.c
> > ===================================================================
> > --- linux-2.6.orig/fs/attr.c	2010-06-01 12:19:47.987277079 +0200
> > +++ linux-2.6/fs/attr.c	2010-06-01 12:48:26.063005672 +0200
> > @@ -14,35 +14,53 @@
> >  #include <linux/fcntl.h>
> >  #include <linux/security.h>
> >  
> > -/* Taken over from the old code... */
> > -
> > -/* POSIX UID/GID verification for setting inode attributes. */
> > +/**
> > + * inode_change_ok - check if attribute changes to an inode are allowed
> > + * @inode:	inode to check
> > + * @attr:	attributes to change
> > + *
> > + * Check if we are allowed to change the attributes contained in @attr
> > + * in the given inode.  This includes the normal unix access permission
> > + * checks, as well as checks for rlimits and others.
> > + *
> > + * Should be called as the first thing in ->setattr implementations,
> > + * possibly after taking additional locks.
> > + */
> >  int inode_change_ok(const struct inode *inode, struct iattr *attr)
> >  {
> > -	int retval = -EPERM;
> >  	unsigned int ia_valid = attr->ia_valid;
> >  
> > +	/*
> > +	 * First check size constraints.  These can't be overriden using
> > +	 * ATTR_FORCE.
> > +	 */
> > +	if (attr->ia_mode & ATTR_SIZE) {
> > +		int error = inode_newsize_ok(inode, attr->ia_size);
> > +		if (error)
> > +			return error;
> > +	}
> 
> Hmm, I don't know if we can do this unless you have audited the
> filesystems (in which case they should be on the cc list of this
> pach).
> 
> The problem is whether the i_size is valid and stable at this
> point. And it doesn't help even if you do leave the inode_newsize_ok
> check inside the vmtruncate part of the fs if the check incorrectly
> fails here.
> 
> ocfs2 performs inode_change_ok outside ocfs2_rw_lock and
> ocfs2_inode_lock, and inode_newsize_ok inside; cifs holds i_lock
> while checking inode_newsize_ok and updating size; gfs2 inside
> gfs2_trans_begin.
  That's a good point. For all local filesystems I know, holding i_mutex is
enough for having stable i_size. But for clustered filesystems it
definitely isn't. They have to hold cluster locks to be able to reliably
check current i_size (at least OCFS2 does). Looking at what
inode_newsize_ok currently does, i_size is only used to decide whether
we need to check for rlimit or not. So we could falsely miss this
check (other node is truncating the file below new offset)... Hmm, OK, so
we really need the cluster lock...
  BTW: Mark, don't we need the cluster lock also for the permission
checks in inode_change_ok? Otherwise we could see:
	Node1				Node2
	chmod("file", 000);
					truncate("file", 0)
					  inode_change_ok still see old perms
					    -> success

  And Node1 and Node2 can be fully serialized via some userspace
synchronization and still hit this so it's not just a race...

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] check ATTR_SIZE contraints in inode_change_ok
  2010-06-09  9:41     ` Jan Kara
@ 2010-06-09 10:06       ` Nick Piggin
  2010-06-09 10:07       ` Christoph Hellwig
  1 sibling, 0 replies; 12+ messages in thread
From: Nick Piggin @ 2010-06-09 10:06 UTC (permalink / raw)
  To: Jan Kara; +Cc: Christoph Hellwig, viro, linux-fsdevel, mfasheh

On Wed, Jun 09, 2010 at 11:41:21AM +0200, Jan Kara wrote:
> On Wed 09-06-10 17:33:36, Nick Piggin wrote:
> > On Tue, Jun 01, 2010 at 01:39:37PM +0200, Christoph Hellwig wrote:
> > >  int inode_change_ok(const struct inode *inode, struct iattr *attr)
> > >  {
> > > -	int retval = -EPERM;
> > >  	unsigned int ia_valid = attr->ia_valid;
> > >  
> > > +	/*
> > > +	 * First check size constraints.  These can't be overriden using
> > > +	 * ATTR_FORCE.
> > > +	 */
> > > +	if (attr->ia_mode & ATTR_SIZE) {
> > > +		int error = inode_newsize_ok(inode, attr->ia_size);
> > > +		if (error)
> > > +			return error;
> > > +	}
> > 
> > Hmm, I don't know if we can do this unless you have audited the
> > filesystems (in which case they should be on the cc list of this
> > pach).
> > 
> > The problem is whether the i_size is valid and stable at this
> > point. And it doesn't help even if you do leave the inode_newsize_ok
> > check inside the vmtruncate part of the fs if the check incorrectly
> > fails here.
> > 
> > ocfs2 performs inode_change_ok outside ocfs2_rw_lock and
> > ocfs2_inode_lock, and inode_newsize_ok inside; cifs holds i_lock
> > while checking inode_newsize_ok and updating size; gfs2 inside
> > gfs2_trans_begin.
>   That's a good point. For all local filesystems I know, holding i_mutex is
> enough for having stable i_size. But for clustered filesystems it
> definitely isn't. They have to hold cluster locks to be able to reliably
> check current i_size (at least OCFS2 does). Looking at what
> inode_newsize_ok currently does, i_size is only used to decide whether
> we need to check for rlimit or not. So we could falsely miss this
> check (other node is truncating the file below new offset)...

Yes, or falsely disallow a shrinking truncate if it is above our
rlimit.


> Hmm, OK, so
> we really need the cluster lock...
>   BTW: Mark, don't we need the cluster lock also for the permission
> checks in inode_change_ok? Otherwise we could see:
> 	Node1				Node2
> 	chmod("file", 000);
> 					truncate("file", 0)
> 					  inode_change_ok still see old perms
> 					    -> success
> 
>   And Node1 and Node2 can be fully serialized via some userspace
> synchronization and still hit this so it's not just a race...

That's a good point too, yes. I think if the inode_change_ok check
were moved inside the cluster lock, that would solve that problem
and Christoph's i_size problem here.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] check ATTR_SIZE contraints in inode_change_ok
  2010-06-09  9:41     ` Jan Kara
  2010-06-09 10:06       ` Nick Piggin
@ 2010-06-09 10:07       ` Christoph Hellwig
  2010-06-09 10:26         ` Nick Piggin
  2010-06-09 10:59         ` Steven Whitehouse
  1 sibling, 2 replies; 12+ messages in thread
From: Christoph Hellwig @ 2010-06-09 10:07 UTC (permalink / raw)
  To: Jan Kara; +Cc: Nick Piggin, Christoph Hellwig, viro, linux-fsdevel, mfasheh

On Wed, Jun 09, 2010 at 11:41:21AM +0200, Jan Kara wrote:
>   That's a good point. For all local filesystems I know, holding i_mutex is
> enough for having stable i_size. But for clustered filesystems it
> definitely isn't. They have to hold cluster locks to be able to reliably
> check current i_size (at least OCFS2 does). Looking at what
> inode_newsize_ok currently does, i_size is only used to decide whether
> we need to check for rlimit or not. So we could falsely miss this
> check (other node is truncating the file below new offset)... Hmm, OK, so
> we really need the cluster lock...
>   BTW: Mark, don't we need the cluster lock also for the permission
> checks in inode_change_ok? Otherwise we could see:

Yes, we should have it for all of the checks.  It would be good if
the cluster folks came up with proper patches for vfs.git #for-next
to fix up the cluster locking for all of ->setattr.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] check ATTR_SIZE contraints in inode_change_ok
  2010-06-09 10:07       ` Christoph Hellwig
@ 2010-06-09 10:26         ` Nick Piggin
  2010-06-10  8:21           ` Christoph Hellwig
  2010-06-09 10:59         ` Steven Whitehouse
  1 sibling, 1 reply; 12+ messages in thread
From: Nick Piggin @ 2010-06-09 10:26 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jan Kara, viro, linux-fsdevel, mfasheh

On Wed, Jun 09, 2010 at 12:07:15PM +0200, Christoph Hellwig wrote:
> On Wed, Jun 09, 2010 at 11:41:21AM +0200, Jan Kara wrote:
> >   That's a good point. For all local filesystems I know, holding i_mutex is
> > enough for having stable i_size. But for clustered filesystems it
> > definitely isn't. They have to hold cluster locks to be able to reliably
> > check current i_size (at least OCFS2 does). Looking at what
> > inode_newsize_ok currently does, i_size is only used to decide whether
> > we need to check for rlimit or not. So we could falsely miss this
> > check (other node is truncating the file below new offset)... Hmm, OK, so
> > we really need the cluster lock...
> >   BTW: Mark, don't we need the cluster lock also for the permission
> > checks in inode_change_ok? Otherwise we could see:
> 
> Yes, we should have it for all of the checks.  It would be good if
> the cluster folks came up with proper patches for vfs.git #for-next
> to fix up the cluster locking for all of ->setattr.

What about network filesystems? Some (ceph, cifs) seem to do i_size
checks under i_lock.

Pseudo filesystems -- most seem OK. Miklos why do you time out the
attributes on inodes? Is this accounted for when doing setattr
checks (ie. might any attributes be stale here?)

And what to do about intermediate breakage in vfs tree? Should we
instead add a new helper which does all the required checks and
then push non obvious changes through filesystem trees?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] check ATTR_SIZE contraints in inode_change_ok
  2010-06-09 10:07       ` Christoph Hellwig
  2010-06-09 10:26         ` Nick Piggin
@ 2010-06-09 10:59         ` Steven Whitehouse
  1 sibling, 0 replies; 12+ messages in thread
From: Steven Whitehouse @ 2010-06-09 10:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jan Kara, Nick Piggin, viro, linux-fsdevel, mfasheh

Hi,

On Wed, 2010-06-09 at 12:07 +0200, Christoph Hellwig wrote:
> On Wed, Jun 09, 2010 at 11:41:21AM +0200, Jan Kara wrote:
> >   That's a good point. For all local filesystems I know, holding i_mutex is
> > enough for having stable i_size. But for clustered filesystems it
> > definitely isn't. They have to hold cluster locks to be able to reliably
> > check current i_size (at least OCFS2 does). Looking at what
> > inode_newsize_ok currently does, i_size is only used to decide whether
> > we need to check for rlimit or not. So we could falsely miss this
> > check (other node is truncating the file below new offset)... Hmm, OK, so
> > we really need the cluster lock...
> >   BTW: Mark, don't we need the cluster lock also for the permission
> > checks in inode_change_ok? Otherwise we could see:
> 
> Yes, we should have it for all of the checks.  It would be good if
> the cluster folks came up with proper patches for vfs.git #for-next
> to fix up the cluster locking for all of ->setattr.
> 
I already have such a patch, and as per your request, I'm only waiting
for your changes in the same area to hit mainline before I merge it into
my git tree to avoid conflicts in -next. If you'd like to merge the
changes via the vfs tree instead, thats ok too, but let me know which,

Steve.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] check ATTR_SIZE contraints in inode_change_ok
  2010-06-09 10:26         ` Nick Piggin
@ 2010-06-10  8:21           ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2010-06-10  8:21 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Christoph Hellwig, Jan Kara, viro, linux-fsdevel, mfasheh

On Wed, Jun 09, 2010 at 08:26:23PM +1000, Nick Piggin wrote:
> What about network filesystems? Some (ceph, cifs) seem to do i_size
> checks under i_lock.

Hmm, If we don't get that sorted we need to split out the current
inode_change_ok into __inode_change_ok.  Then again these filesystems
will also need locking for the other attributes.  I'll do another
audit for these.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-06-10  8:21 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-01 11:39 [PATCH 0/2] new truncate sequence, part3 Christoph Hellwig
2010-06-01 11:39 ` [PATCH 1/2] always call inode_change_ok early in ->setattr Christoph Hellwig
2010-06-01 11:39 ` [PATCH 2/2] check ATTR_SIZE contraints in inode_change_ok Christoph Hellwig
2010-06-01 11:49   ` Steven Whitehouse
2010-06-01 11:47     ` Christoph Hellwig
2010-06-09  7:33   ` Nick Piggin
2010-06-09  9:41     ` Jan Kara
2010-06-09 10:06       ` Nick Piggin
2010-06-09 10:07       ` Christoph Hellwig
2010-06-09 10:26         ` Nick Piggin
2010-06-10  8:21           ` Christoph Hellwig
2010-06-09 10:59         ` Steven Whitehouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).