[patch 1/5] fs: truncate introduce new sequence

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [patch 1/5] fs: truncate introduce new sequence
@ 2009-12-08  8:38 Nick Piggin
  2009-12-08  8:39 ` [patch 2/5] fs: convert simple fs to new truncate Nick Piggin
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Nick Piggin @ 2009-12-08  8:38 UTC (permalink / raw)
  To: Al Viro, Christoph Hellwig, Jan Kara, linux-fsdevel

Changes: avoid .truncate_kludge_to_be_killed hack by implementing additional
fs helper functions that do not call vmtruncate.

So far have converted ext2 and fat, tested with fsx-linux and the recent
writev truncate bug tester. The reset of the patches that were already
made are pretty simple to convert but I will likely prefer to send those
via their maintainers after this initial set is agreed and merged.

The new helpers, *_newtrunc, are a bit ugly, but we can do a rename back
to the old names after all support for old sequence is removed.

--
Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
setattr > vmtruncate > truncate, have filesystems call their truncate sequence
from ->setattr if filesystem specific operations are required. vmtruncate is
deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
previously should be used.

simple_setattr is introduced for simple in-ram filesystems to implement
the new truncate sequence. Eventually all filesystems should be converted
to implement a setattr, and the default code in notify_change should go
away.

simple_setsize is also introduced to perform just the ATTR_SIZE portion
of simple_setattr (ie. changing i_size and trimming pagecache).

To implement the new truncate sequence:
- filesystem specific manipulations (eg freeing blocks) must be done in
  the setattr method rather than ->truncate.
- vmtruncate can not be used by core code to trim blocks past i_size in
  the event of write failure after allocation, so this must be performed
  in the fs code.
- convert usage of helpers block_write_begin, nobh_write_begin,
  cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed
  variants. These avoid calling vmtruncate to trim blocks (see previous).
- inode_setattr should not be used. generic_setattr is a new function
  to be used to copy simple attributes into the generic inode.
- make use of the better opportunity to handle errors with the new sequence.

Big problem with the previous calling sequence: the filesystem is not called
until i_size has already changed.  This means it is not allowed to fail the
call, and also it does not know what the previous i_size was. Also, generic
code calling vmtruncate to truncate allocated blocks in case of error had
no good way to return a meaningful error (or, for example, atomically handle
block deallocation).

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
 Documentation/filesystems/vfs.txt |    7 +-
 fs/attr.c                         |   50 ++++++++++++---
 fs/buffer.c                       |  123 ++++++++++++++++++++++++++++++--------
 fs/direct-io.c                    |   85 ++++++++++++++++----------
 fs/libfs.c                        |   83 +++++++++++++++++++++++++
 include/linux/buffer_head.h       |    9 ++
 include/linux/fs.h                |   36 ++++++++++-
 mm/truncate.c                     |   10 +--
 8 files changed, 328 insertions(+), 75 deletions(-)

Index: linux-2.6/Documentation/filesystems/vfs.txt
===================================================================
--- linux-2.6.orig/Documentation/filesystems/vfs.txt
+++ linux-2.6/Documentation/filesystems/vfs.txt
@@ -401,11 +401,16 @@ otherwise noted.
   	started might not be in the page cache at the end of the
   	walk).
 
-  truncate: called by the VFS to change the size of a file.  The
+  truncate: Deprecated. This will not be called if ->setsize is defined.
+	Called by the VFS to change the size of a file.  The
  	i_size field of the inode is set to the desired size by the
  	VFS before this method is called.  This method is called by
  	the truncate(2) system call and related functionality.
 
+	Note: ->truncate and vmtruncate are deprecated. Do not add new
+	instances/calls of these. Filesystems shoud be converted to do their
+	truncate sequence via ->setattr().
+
   permission: called by the VFS to check for access rights on a POSIX-like
   	filesystem.
 
Index: linux-2.6/fs/attr.c
===================================================================
--- linux-2.6.orig/fs/attr.c
+++ linux-2.6/fs/attr.c
@@ -68,14 +68,14 @@ EXPORT_SYMBOL(inode_change_ok);
  * @offset:	the new size to assign to the inode
  * @Returns:	0 on success, -ve errno on failure
  *
+ * inode_newsize_ok must be called with i_mutex held.
+ *
  * inode_newsize_ok will check filesystem limits and ulimits to check that the
  * new inode size is within limits. inode_newsize_ok will also send SIGXFSZ
  * when necessary. Caller must not proceed with inode size change if failure is
  * returned. @inode must be a file (not directory), with appropriate
  * permissions to allow truncate (inode_newsize_ok does NOT check these
  * conditions).
- *
- * inode_newsize_ok must be called with i_mutex held.
  */
 int inode_newsize_ok(const struct inode *inode, loff_t offset)
 {
@@ -105,17 +105,25 @@ out_big:
 }
 EXPORT_SYMBOL(inode_newsize_ok);
 
-int inode_setattr(struct inode * inode, struct iattr * attr)
+/**
+ * generic_setattr - copy simple metadata updates into the generic inode
+ * @inode:	the inode to be updated
+ * @attr:	the new attributes
+ *
+ * generic_setattr must be called with i_mutex held.
+ *
+ * generic_setattr updates the inode's metadata with that specified
+ * in attr. Noticably missing is inode size update, which is more complex
+ * as it requires pagecache updates. See simple_setsize.
+ *
+ * The inode is not marked as dirty after this operation. The rationale is
+ * that for "simple" filesystems, the struct inode is the inode storage.
+ * The caller is free to mark the inode dirty afterwards if needed.
+ */
+void generic_setattr(struct inode *inode, const struct iattr *attr)
 {
 	unsigned int ia_valid = attr->ia_valid;
 
-	if (ia_valid & ATTR_SIZE &&
-	    attr->ia_size != i_size_read(inode)) {
-		int error = vmtruncate(inode, attr->ia_size);
-		if (error)
-			return error;
-	}
-
 	if (ia_valid & ATTR_UID)
 		inode->i_uid = attr->ia_uid;
 	if (ia_valid & ATTR_GID)
@@ -136,6 +144,28 @@ int inode_setattr(struct inode * inode,
 			mode &= ~S_ISGID;
 		inode->i_mode = mode;
 	}
+}
+EXPORT_SYMBOL(generic_setattr);
+
+/*
+ * note this function is deprecated, the new truncate sequence should be
+ * used instead -- see eg. simple_setsize, generic_setattr.
+ */
+int inode_setattr(struct inode *inode, const struct iattr *attr)
+{
+	unsigned int ia_valid = attr->ia_valid;
+
+	if (ia_valid & ATTR_SIZE &&
+	    attr->ia_size != i_size_read(inode)) {
+		int error;
+
+		error = vmtruncate(inode, attr->ia_size);
+		if (error)
+			return error;
+	}
+
+	generic_setattr(inode, attr);
+
 	mark_inode_dirty(inode);
 
 	return 0;
Index: linux-2.6/fs/buffer.c
===================================================================
--- linux-2.6.orig/fs/buffer.c
+++ linux-2.6/fs/buffer.c
@@ -1957,14 +1957,11 @@ static int __block_commit_write(struct i
 }
 
 /*
- * block_write_begin takes care of the basic task of block allocation and
- * bringing partial write blocks uptodate first.
- *
- * If *pagep is not NULL, then block_write_begin uses the locked page
- * at *pagep rather than allocating its own. In this case, the page will
- * not be unlocked or deallocated on failure.
+ * Filesystems implementing the new truncate sequence should use the
+ * _newtrunc postfix variant which won't incorrectly call vmtruncate.
+ * The filesystem needs to handle block truncation upon failure.
  */
-int block_write_begin(struct file *file, struct address_space *mapping,
+int block_write_begin_newtrunc(struct file *file, struct address_space *mapping,
 			loff_t pos, unsigned len, unsigned flags,
 			struct page **pagep, void **fsdata,
 			get_block_t *get_block)
@@ -2000,20 +1997,50 @@ int block_write_begin(struct file *file,
 			unlock_page(page);
 			page_cache_release(page);
 			*pagep = NULL;
-
-			/*
-			 * prepare_write() may have instantiated a few blocks
-			 * outside i_size.  Trim these off again. Don't need
-			 * i_size_read because we hold i_mutex.
-			 */
-			if (pos + len > inode->i_size)
-				vmtruncate(inode, inode->i_size);
 		}
 	}
 
 out:
 	return status;
 }
+EXPORT_SYMBOL(block_write_begin_newtrunc);
+
+/*
+ * block_write_begin takes care of the basic task of block allocation and
+ * bringing partial write blocks uptodate first.
+ *
+ * If *pagep is not NULL, then block_write_begin uses the locked page
+ * at *pagep rather than allocating its own. In this case, the page will
+ * not be unlocked or deallocated on failure.
+ */
+int block_write_begin(struct file *file, struct address_space *mapping,
+			loff_t pos, unsigned len, unsigned flags,
+			struct page **pagep, void **fsdata,
+			get_block_t *get_block)
+{
+	int ret;
+
+	ret = block_write_begin_newtrunc(file, mapping, pos, len, flags,
+					pagep, fsdata, get_block);
+
+	/*
+	 * prepare_write() may have instantiated a few blocks
+	 * outside i_size.  Trim these off again. Don't need
+	 * i_size_read because we hold i_mutex.
+	 *
+	 * Filesystems which pass down their own page also cannot
+	 * call into vmtruncate here because it would lead to lock
+	 * inversion problems (*pagep is locked). This is a further
+	 * example of where the old truncate sequence is inadequate.
+	 */
+	if (unlikely(ret) && *pagep == NULL) {
+		loff_t isize = mapping->host->i_size;
+		if (pos + len > isize)
+			vmtruncate(mapping->host, isize);
+	}
+
+	return ret;
+}
 EXPORT_SYMBOL(block_write_begin);
 
 int block_write_end(struct file *file, struct address_space *mapping,
@@ -2332,7 +2359,7 @@ out:
  * For moronic filesystems that do not allow holes in file.
  * We may have to extend the file.
  */
-int cont_write_begin(struct file *file, struct address_space *mapping,
+int cont_write_begin_newtrunc(struct file *file, struct address_space *mapping,
 			loff_t pos, unsigned len, unsigned flags,
 			struct page **pagep, void **fsdata,
 			get_block_t *get_block, loff_t *bytes)
@@ -2353,11 +2380,30 @@ int cont_write_begin(struct file *file,
 	}
 
 	*pagep = NULL;
-	err = block_write_begin(file, mapping, pos, len,
+	err = block_write_begin_newtrunc(file, mapping, pos, len,
 				flags, pagep, fsdata, get_block);
 out:
 	return err;
 }
+EXPORT_SYMBOL(cont_write_begin_newtrunc);
+
+int cont_write_begin(struct file *file, struct address_space *mapping,
+			loff_t pos, unsigned len, unsigned flags,
+			struct page **pagep, void **fsdata,
+			get_block_t *get_block, loff_t *bytes)
+{
+	int ret;
+
+	ret = cont_write_begin_newtrunc(file, mapping, pos, len, flags,
+					pagep, fsdata, get_block, bytes);
+	if (unlikely(ret)) {
+		loff_t isize = mapping->host->i_size;
+		if (pos + len > isize)
+			vmtruncate(mapping->host, isize);
+	}
+
+	return ret;
+}
 EXPORT_SYMBOL(cont_write_begin);
 
 int block_prepare_write(struct page *page, unsigned from, unsigned to,
@@ -2389,7 +2435,7 @@ EXPORT_SYMBOL(block_commit_write);
  *
  * We are not allowed to take the i_mutex here so we have to play games to
  * protect against truncate races as the page could now be beyond EOF.  Because
- * vmtruncate() writes the inode size before removing pages, once we have the
+ * truncate writes the inode size before removing pages, once we have the
  * page lock we can determine safely if the page is beyond EOF. If it is not
  * beyond EOF, then the page is guaranteed safe against truncation until we
  * unlock the page.
@@ -2472,10 +2518,11 @@ static void attach_nobh_buffers(struct p
 }
 
 /*
- * On entry, the page is fully not uptodate.
- * On exit the page is fully uptodate in the areas outside (from,to)
+ * Filesystems implementing the new truncate sequence should use the
+ * _newtrunc postfix variant which won't incorrectly call vmtruncate.
+ * The filesystem needs to handle block truncation upon failure.
  */
-int nobh_write_begin(struct file *file, struct address_space *mapping,
+int nobh_write_begin_newtrunc(struct file *file, struct address_space *mapping,
 			loff_t pos, unsigned len, unsigned flags,
 			struct page **pagep, void **fsdata,
 			get_block_t *get_block)
@@ -2508,8 +2555,8 @@ int nobh_write_begin(struct file *file,
 		unlock_page(page);
 		page_cache_release(page);
 		*pagep = NULL;
-		return block_write_begin(file, mapping, pos, len, flags, pagep,
-					fsdata, get_block);
+		return block_write_begin_newtrunc(file, mapping, pos, len,
+					flags, pagep, fsdata, get_block);
 	}
 
 	if (PageMappedToDisk(page))
@@ -2613,8 +2660,34 @@ out_release:
 	page_cache_release(page);
 	*pagep = NULL;
 
-	if (pos + len > inode->i_size)
-		vmtruncate(inode, inode->i_size);
+	return ret;
+}
+EXPORT_SYMBOL(nobh_write_begin_newtrunc);
+
+/*
+ * On entry, the page is fully not uptodate.
+ * On exit the page is fully uptodate in the areas outside (from,to)
+ */
+int nobh_write_begin(struct file *file, struct address_space *mapping,
+			loff_t pos, unsigned len, unsigned flags,
+			struct page **pagep, void **fsdata,
+			get_block_t *get_block)
+{
+	int ret;
+
+	ret = nobh_write_begin_newtrunc(file, mapping, pos, len, flags,
+					pagep, fsdata, get_block);
+
+	/*
+	 * prepare_write() may have instantiated a few blocks
+	 * outside i_size.  Trim these off again. Don't need
+	 * i_size_read because we hold i_mutex.
+	 */
+	if (unlikely(ret)) {
+		loff_t isize = mapping->host->i_size;
+		if (pos + len > isize)
+			vmtruncate(mapping->host, isize);
+	}
 
 	return ret;
 }
Index: linux-2.6/fs/direct-io.c
===================================================================
--- linux-2.6.orig/fs/direct-io.c
+++ linux-2.6/fs/direct-io.c
@@ -1085,28 +1085,12 @@ direct_io_worker(int rw, struct kiocb *i
 }
 
 /*
- * This is a library function for use by filesystem drivers.
- * The locking rules are governed by the dio_lock_type parameter.
- *
- * DIO_NO_LOCKING (no locking, for raw block device access)
- * For writes, i_mutex is not held on entry; it is never taken.
- *
- * DIO_LOCKING (simple locking for regular files)
- * For writes we are called under i_mutex and return with i_mutex held, even
- * though it is internally dropped.
- * For reads, i_mutex is not held on entry, but it is taken and dropped before
- * returning.
- *
- * DIO_OWN_LOCKING (filesystem provides synchronisation and handling of
- *	uninitialised data, allowing parallel direct readers and writers)
- * For writes we are called without i_mutex, return without it, never touch it.
- * For reads we are called under i_mutex and return with i_mutex held, even
- * though it may be internally dropped.
- *
- * Additional i_alloc_sem locking requirements described inline below.
+ * Filesystems implementing the new truncate sequence should use the
+ * _newtrunc postfix variant which won't incorrectly call vmtruncate.
+ * The filesystem needs to handle block truncation upon failure.
  */
 ssize_t
-__blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
+__blockdev_direct_IO_newtrunc(int rw, struct kiocb *iocb, struct inode *inode,
 	struct block_device *bdev, const struct iovec *iov, loff_t offset, 
 	unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io,
 	int dio_lock_type)
@@ -1207,19 +1191,6 @@ __blockdev_direct_IO(int rw, struct kioc
 	retval = direct_io_worker(rw, iocb, inode, iov, offset,
 				nr_segs, blkbits, get_block, end_io, dio);
 
-	/*
-	 * In case of error extending write may have instantiated a few
-	 * blocks outside i_size. Trim these off again for DIO_LOCKING.
-	 * NOTE: DIO_NO_LOCK/DIO_OWN_LOCK callers have to handle this by
-	 * it's own meaner.
-	 */
-	if (unlikely(retval < 0 && (rw & WRITE))) {
-		loff_t isize = i_size_read(inode);
-
-		if (end > isize && dio_lock_type == DIO_LOCKING)
-			vmtruncate(inode, isize);
-	}
-
 	if (rw == READ && dio_lock_type == DIO_LOCKING)
 		release_i_mutex = 0;
 
@@ -1230,4 +1201,52 @@ out:
 		mutex_lock(&inode->i_mutex);
 	return retval;
 }
+EXPORT_SYMBOL(__blockdev_direct_IO_newtrunc);
+
+/*
+ * This is a library function for use by filesystem drivers.
+ * The locking rules are governed by the dio_lock_type parameter.
+ *
+ * DIO_NO_LOCKING (no locking, for raw block device access)
+ * For writes, i_mutex is not held on entry; it is never taken.
+ *
+ * DIO_LOCKING (simple locking for regular files)
+ * For writes we are called under i_mutex and return with i_mutex held, even
+ * though it is internally dropped.
+ * For reads, i_mutex is not held on entry, but it is taken and dropped before
+ * returning.
+ *
+ * DIO_OWN_LOCKING (filesystem provides synchronisation and handling of
+ *	uninitialised data, allowing parallel direct readers and writers)
+ * For writes we are called without i_mutex, return without it, never touch it.
+ * For reads we are called under i_mutex and return with i_mutex held, even
+ * though it may be internally dropped.
+ *
+ * Additional i_alloc_sem locking requirements described inline below.
+ */
+ssize_t
+__blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
+	struct block_device *bdev, const struct iovec *iov, loff_t offset,
+	unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io,
+	int dio_lock_type)
+{
+	ssize_t ret;
+
+	ret = __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset,
+				nr_segs, get_block, end_io, dio_lock_type);
+	/*
+	 * In case of error extending write may have instantiated a few
+	 * blocks outside i_size. Trim these off again for DIO_LOCKING.
+	 * NOTE: DIO_NO_LOCK/DIO_OWN_LOCK callers have to handle this in
+	 * their own manner. This is a further example of where the old
+	 * truncate sequence is inadequate.
+	 */
+	if (unlikely(ret < 0 && (rw & WRITE)) && dio_lock_type == DIO_LOCKING) {
+		loff_t isize = i_size_read(inode);
+
+		vmtruncate(inode, isize);
+	}
+
+	return ret;
+}
 EXPORT_SYMBOL(__blockdev_direct_IO);
Index: linux-2.6/fs/libfs.c
===================================================================
--- linux-2.6.orig/fs/libfs.c
+++ linux-2.6/fs/libfs.c
@@ -7,6 +7,7 @@
 #include <linux/pagemap.h>
 #include <linux/mount.h>
 #include <linux/vfs.h>
+#include <linux/quotaops.h>
 #include <linux/mutex.h>
 #include <linux/exportfs.h>
 #include <linux/writeback.h>
@@ -329,6 +330,88 @@ int simple_rename(struct inode *old_dir,
 	return 0;
 }
 
+/**
+ * simple_setsize - handle core mm and vfs requirements for file size change
+ * @inode: inode
+ * @newsize: new file size
+ *
+ * Returns 0 on success, -error on failure.
+ *
+ * simple_setsize must be called with inode_mutex held.
+ *
+ * simple_setsize will check that the requested new size is OK (see
+ * inode_newsize_ok), and then will perform the necessary i_size update
+ * and pagecache truncation (if necessary). It will be typically be called
+ * from the filesystem's setattr function when ATTR_SIZE is passed in.
+ *
+ * The inode itself must have correct permissions and attributes to allow
+ * i_size to be changed, this function then just checks that the new size
+ * requested is valid.
+ *
+ * In the case of simple in-memory filesystems with inodes stored solely
+ * in the inode cache, and file data in the pagecache, nothing more needs
+ * to be done to satisfy a truncate request. Filesystems with on-disk
+ * blocks for example will need to free them in the case of truncate, in
+ * that case it may be easier not to use simple_setsize (but each of its
+ * components will likely be required at some point to update pagecache
+ * and inode etc).
+ */
+int simple_setsize(struct inode *inode, loff_t newsize)
+{
+	loff_t oldsize;
+	int error;
+
+	error = inode_newsize_ok(inode, newsize);
+	if (error)
+		return error;
+
+	oldsize = inode->i_size;
+	i_size_write(inode, newsize);
+	truncate_pagecache(inode, oldsize, newsize);
+
+	return error;
+}
+EXPORT_SYMBOL(simple_setsize);
+
+/**
+ * simple_setattr - setattr for simple in-memory filesystem
+ * @dentry: dentry
+ * @iattr: iattr structure
+ *
+ * Returns 0 on success, -error on failure.
+ *
+ * simple_setattr implements setattr for an in-memory filesystem which
+ * does not store its own file data or metadata (eg. uses the page cache
+ * and inode cache as its data store).
+ */
+int simple_setattr(struct dentry *dentry, struct iattr *iattr)
+{
+	struct inode *inode = dentry->d_inode;
+	int error;
+
+	error = inode_change_ok(inode, iattr);
+	if (error)
+		return error;
+
+	if ((iattr->ia_valid & ATTR_UID && iattr->ia_uid != inode->i_uid) ||
+	    (iattr->ia_valid & ATTR_GID && iattr->ia_gid != inode->i_gid)) {
+		error = vfs_dq_transfer(inode, iattr) ? -EDQUOT : 0;
+		if (error)
+			return error;
+	}
+
+	if (iattr->ia_valid & ATTR_SIZE) {
+		error = simple_setsize(inode, iattr->ia_size);
+		if (error)
+			return error;
+	}
+
+	generic_setattr(inode, iattr);
+
+	return error;
+}
+EXPORT_SYMBOL(simple_setattr);
+
 int simple_readpage(struct file *file, struct page *page)
 {
 	clear_highpage(page);
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -2259,6 +2259,10 @@ static inline int xip_truncate_page(stru
 #endif
 
 #ifdef CONFIG_BLOCK
+ssize_t __blockdev_direct_IO_newtrunc(int rw, struct kiocb *iocb, struct inode *inode,
+	struct block_device *bdev, const struct iovec *iov, loff_t offset,
+	unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io,
+	int lock_type);
 ssize_t __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
 	struct block_device *bdev, const struct iovec *iov, loff_t offset,
 	unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io,
@@ -2270,6 +2274,33 @@ enum {
 	DIO_OWN_LOCKING, /* filesystem locks buffered and direct internally */
 };
 
+static inline ssize_t blockdev_direct_IO_newtrunc(int rw, struct kiocb *iocb,
+	struct inode *inode, struct block_device *bdev, const struct iovec *iov,
+	loff_t offset, unsigned long nr_segs, get_block_t get_block,
+	dio_iodone_t end_io)
+{
+	return __blockdev_direct_IO_newtrunc(rw, iocb, inode, bdev, iov, offset,
+				nr_segs, get_block, end_io, DIO_LOCKING);
+}
+
+static inline ssize_t blockdev_direct_IO_no_locking_newtrunc(int rw, struct kiocb *iocb,
+	struct inode *inode, struct block_device *bdev, const struct iovec *iov,
+	loff_t offset, unsigned long nr_segs, get_block_t get_block,
+	dio_iodone_t end_io)
+{
+	return __blockdev_direct_IO_newtrunc(rw, iocb, inode, bdev, iov, offset,
+				nr_segs, get_block, end_io, DIO_NO_LOCKING);
+}
+
+static inline ssize_t blockdev_direct_IO_own_locking_newtrunc(int rw, struct kiocb *iocb,
+	struct inode *inode, struct block_device *bdev, const struct iovec *iov,
+	loff_t offset, unsigned long nr_segs, get_block_t get_block,
+	dio_iodone_t end_io)
+{
+	return __blockdev_direct_IO_newtrunc(rw, iocb, inode, bdev, iov, offset,
+				nr_segs, get_block, end_io, DIO_OWN_LOCKING);
+}
+
 static inline ssize_t blockdev_direct_IO(int rw, struct kiocb *iocb,
 	struct inode *inode, struct block_device *bdev, const struct iovec *iov,
 	loff_t offset, unsigned long nr_segs, get_block_t get_block,
@@ -2347,12 +2378,14 @@ extern int dcache_dir_open(struct inode
 extern int dcache_dir_close(struct inode *, struct file *);
 extern loff_t dcache_dir_lseek(struct file *, loff_t, int);
 extern int dcache_readdir(struct file *, void *, filldir_t);
+extern int simple_setattr(struct dentry *dentry, struct iattr *attr);
 extern int simple_getattr(struct vfsmount *, struct dentry *, struct kstat *);
 extern int simple_statfs(struct dentry *, struct kstatfs *);
 extern int simple_link(struct dentry *, struct inode *, struct dentry *);
 extern int simple_unlink(struct inode *, struct dentry *);
 extern int simple_rmdir(struct inode *, struct dentry *);
 extern int simple_rename(struct inode *, struct dentry *, struct inode *, struct dentry *);
+extern int simple_setsize(struct inode *inode, loff_t newsize);
 extern int simple_sync_file(struct file *, struct dentry *, int);
 extern int simple_empty(struct dentry *);
 extern int simple_readpage(struct file *file, struct page *page);
@@ -2389,7 +2422,8 @@ extern int buffer_migrate_page(struct ad
 
 extern int inode_change_ok(const struct inode *, struct iattr *);
 extern int inode_newsize_ok(const struct inode *, loff_t offset);
-extern int __must_check inode_setattr(struct inode *, struct iattr *);
+extern int __must_check inode_setattr(struct inode *, const struct iattr *);
+extern void generic_setattr(struct inode *inode, const struct iattr *attr);
 
 extern void file_update_time(struct file *file);
 
Index: linux-2.6/mm/truncate.c
===================================================================
--- linux-2.6.orig/mm/truncate.c
+++ linux-2.6/mm/truncate.c
@@ -543,18 +543,18 @@ EXPORT_SYMBOL(truncate_pagecache);
  * NOTE! We have to be ready to update the memory sharing
  * between the file and the memory map for a potential last
  * incomplete page.  Ugly, but necessary.
+ *
+ * This function is deprecated and simple_setsize or truncate_pagecache
+ * should be used instead.
  */
 int vmtruncate(struct inode *inode, loff_t offset)
 {
-	loff_t oldsize;
 	int error;
 
-	error = inode_newsize_ok(inode, offset);
+	error = simple_setsize(inode, offset);
 	if (error)
 		return error;
-	oldsize = inode->i_size;
-	i_size_write(inode, offset);
-	truncate_pagecache(inode, oldsize, offset);
+
 	if (inode->i_op->truncate)
 		inode->i_op->truncate(inode);
 
Index: linux-2.6/include/linux/buffer_head.h
===================================================================
--- linux-2.6.orig/include/linux/buffer_head.h
+++ linux-2.6/include/linux/buffer_head.h
@@ -203,6 +203,9 @@ int block_write_full_page_endio(struct p
 int block_read_full_page(struct page*, get_block_t*);
 int block_is_partially_uptodate(struct page *page, read_descriptor_t *desc,
 				unsigned long from);
+int block_write_begin_newtrunc(struct file *, struct address_space *,
+				loff_t, unsigned, unsigned,
+				struct page **, void **, get_block_t*);
 int block_write_begin(struct file *, struct address_space *,
 				loff_t, unsigned, unsigned,
 				struct page **, void **, get_block_t*);
@@ -214,6 +217,9 @@ int generic_write_end(struct file *, str
 				struct page *, void *);
 void page_zero_new_buffers(struct page *page, unsigned from, unsigned to);
 int block_prepare_write(struct page*, unsigned, unsigned, get_block_t*);
+int cont_write_begin_newtrunc(struct file *, struct address_space *, loff_t,
+			unsigned, unsigned, struct page **, void **,
+			get_block_t *, loff_t *);
 int cont_write_begin(struct file *, struct address_space *, loff_t,
 			unsigned, unsigned, struct page **, void **,
 			get_block_t *, loff_t *);
@@ -225,6 +231,9 @@ void block_sync_page(struct page *);
 sector_t generic_block_bmap(struct address_space *, sector_t, get_block_t *);
 int block_truncate_page(struct address_space *, loff_t, get_block_t *);
 int file_fsync(struct file *, struct dentry *, int);
+int nobh_write_begin_newtrunc(struct file *, struct address_space *,
+				loff_t, unsigned, unsigned,
+				struct page **, void **, get_block_t*);
 int nobh_write_begin(struct file *, struct address_space *,
 				loff_t, unsigned, unsigned,
 				struct page **, void **, get_block_t*);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 2/5] fs: convert simple fs to new truncate
  2009-12-08  8:38 [patch 1/5] fs: truncate introduce new sequence Nick Piggin
@ 2009-12-08  8:39 ` Nick Piggin
  2009-12-09 14:39   ` Jan Kara
  2009-12-08  8:41 ` [patch 3/5] tmpfs: convert to use the new truncate convention Nick Piggin
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Nick Piggin @ 2009-12-08  8:39 UTC (permalink / raw)
  To: Al Viro, Christoph Hellwig, Jan Kara, linux-fsdevel

Convert simple filesystems: ramfs, configfs, sysfs, block_dev to new truncate
sequence.

Signed-off-by: Nick Piggin <npiggin@suse.de>
---
 fs/block_dev.c        |    9 +++++----
 fs/configfs/inode.c   |   10 +++++++---
 fs/ramfs/file-mmu.c   |    1 +
 fs/ramfs/file-nommu.c |    7 ++++---
 fs/sysfs/inode.c      |    6 +-----
 5 files changed, 18 insertions(+), 15 deletions(-)

Index: linux-2.6/fs/configfs/inode.c
===================================================================
--- linux-2.6.orig/fs/configfs/inode.c
+++ linux-2.6/fs/configfs/inode.c
@@ -77,9 +77,13 @@ int configfs_setattr(struct dentry * den
 	if (error)
 		return error;
 
-	error = inode_setattr(inode, iattr);
-	if (error)
-		return error;
+	if (iattr->ia_valid & ATTR_SIZE) {
+		error = simple_setsize(inode, iattr->ia_size);
+		if (error)
+			return error;
+	}
+
+	generic_setattr(inode, iattr);
 
 	if (!sd_iattr) {
 		/* setting attributes for the first time, allocate now */
Index: linux-2.6/fs/ramfs/file-mmu.c
===================================================================
--- linux-2.6.orig/fs/ramfs/file-mmu.c
+++ linux-2.6/fs/ramfs/file-mmu.c
@@ -50,5 +50,6 @@ const struct file_operations ramfs_file_
 };
 
 const struct inode_operations ramfs_file_inode_operations = {
+	.setattr	= simple_setattr,
 	.getattr	= simple_getattr,
 };
Index: linux-2.6/fs/ramfs/file-nommu.c
===================================================================
--- linux-2.6.orig/fs/ramfs/file-nommu.c
+++ linux-2.6/fs/ramfs/file-nommu.c
@@ -169,7 +169,7 @@ static int ramfs_nommu_resize(struct ino
 			return ret;
 	}
 
-	ret = vmtruncate(inode, newsize);
+	ret = simple_setsize(inode, newsize);
 
 	return ret;
 }
@@ -192,7 +192,8 @@ static int ramfs_nommu_setattr(struct de
 
 	/* pick out size-changing events */
 	if (ia->ia_valid & ATTR_SIZE) {
-		loff_t size = i_size_read(inode);
+		loff_t size = inode->i_size;
+
 		if (ia->ia_size != size) {
 			ret = ramfs_nommu_resize(inode, ia->ia_size, size);
 			if (ret < 0 || ia->ia_valid == ATTR_SIZE)
@@ -205,7 +206,7 @@ static int ramfs_nommu_setattr(struct de
 		}
 	}
 
-	ret = inode_setattr(inode, ia);
+	generic_setattr(inode, ia);
  out:
 	ia->ia_valid = old_ia_valid;
 	return ret;
Index: linux-2.6/fs/sysfs/inode.c
===================================================================
--- linux-2.6.orig/fs/sysfs/inode.c
+++ linux-2.6/fs/sysfs/inode.c
@@ -82,11 +82,7 @@ int sysfs_setattr(struct dentry * dentry
 	if (error)
 		return error;
 
-	iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */
-
-	error = inode_setattr(inode, iattr);
-	if (error)
-		return error;
+	generic_setattr(inode, iattr);
 
 	if (!sd_attrs) {
 		/* setting attributes for the first time, allocate now */
Index: linux-2.6/fs/block_dev.c
===================================================================
--- linux-2.6.orig/fs/block_dev.c
+++ linux-2.6/fs/block_dev.c
@@ -172,8 +172,9 @@ blkdev_direct_IO(int rw, struct kiocb *i
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file->f_mapping->host;
 
-	return blockdev_direct_IO_no_locking(rw, iocb, inode, I_BDEV(inode),
-				iov, offset, nr_segs, blkdev_get_blocks, NULL);
+	return blockdev_direct_IO_no_locking_newtrunc(rw, iocb, inode,
+				I_BDEV(inode), iov, offset, nr_segs,
+				blkdev_get_blocks, NULL);
 }
 
 int __sync_blockdev(struct block_device *bdev, int wait)
@@ -349,8 +350,8 @@ static int blkdev_write_begin(struct fil
 			struct page **pagep, void **fsdata)
 {
 	*pagep = NULL;
-	return block_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
-				blkdev_get_block);
+	return block_write_begin_newtrunc(file, mapping, pos, len, flags,
+				pagep, fsdata, blkdev_get_block);
 }
 
 static int blkdev_write_end(struct file *file, struct address_space *mapping,

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 3/5] tmpfs: convert to use the new truncate convention
  2009-12-08  8:38 [patch 1/5] fs: truncate introduce new sequence Nick Piggin
  2009-12-08  8:39 ` [patch 2/5] fs: convert simple fs to new truncate Nick Piggin
@ 2009-12-08  8:41 ` Nick Piggin
  2009-12-09 14:45   ` Jan Kara
  2009-12-08  8:42 ` [patch 4/5] ext2: " Nick Piggin
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Nick Piggin @ 2009-12-08  8:41 UTC (permalink / raw)
  To: Al Viro, Christoph Hellwig, Jan Kara, linux-fsdevel

Cc: Christoph Hellwig <hch@lst.de>
Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
 mm/shmem.c |   43 ++++++++++++++++++++++---------------------
 1 file changed, 22 insertions(+), 21 deletions(-)

Index: linux-2.6/mm/shmem.c
===================================================================
--- linux-2.6.orig/mm/shmem.c
+++ linux-2.6/mm/shmem.c
@@ -729,10 +729,11 @@ done2:
 	if (inode->i_mapping->nrpages && (info->flags & SHMEM_PAGEIN)) {
 		/*
 		 * Call truncate_inode_pages again: racing shmem_unuse_inode
-		 * may have swizzled a page in from swap since vmtruncate or
-		 * generic_delete_inode did it, before we lowered next_index.
-		 * Also, though shmem_getpage checks i_size before adding to
-		 * cache, no recheck after: so fix the narrow window there too.
+		 * may have swizzled a page in from swap since
+		 * truncate_pagecache or generic_delete_inode did it, before we
+		 * lowered next_index.  Also, though shmem_getpage checks
+		 * i_size before adding to cache, no recheck after: so fix the
+		 * narrow window there too.
 		 *
 		* Recalling truncate_inode_pages_range and unmap_mapping_range
 		 * every time for punch_hole (which never got a chance to clear
@@ -762,19 +763,16 @@ done2:
 	}
 }
 
-static void shmem_truncate(struct inode *inode)
-{
-	shmem_truncate_range(inode, inode->i_size, (loff_t)-1);
-}
-
 static int shmem_notify_change(struct dentry *dentry, struct iattr *attr)
 {
 	struct inode *inode = dentry->d_inode;
-	struct page *page = NULL;
 	int error;
 
 	if (S_ISREG(inode->i_mode) && (attr->ia_valid & ATTR_SIZE)) {
-		if (attr->ia_size < inode->i_size) {
+		loff_t newsize = attr->ia_size;
+		struct page *page = NULL;
+
+		if (newsize < inode->i_size) {
 			/*
 			 * If truncating down to a partial page, then
 			 * if that page is already allocated, hold it
@@ -782,9 +780,9 @@ static int shmem_notify_change(struct de
 			 * truncate_partial_page cannnot miss it were
 			 * it assigned to swap.
 			 */
-			if (attr->ia_size & (PAGE_CACHE_SIZE-1)) {
+			if (newsize & (PAGE_CACHE_SIZE-1)) {
 				(void) shmem_getpage(inode,
-					attr->ia_size>>PAGE_CACHE_SHIFT,
+					newsize >> PAGE_CACHE_SHIFT,
 						&page, SGP_READ, NULL);
 				if (page)
 					unlock_page(page);
@@ -796,24 +794,29 @@ static int shmem_notify_change(struct de
 			 * if it's being fully truncated to zero-length: the
 			 * nrpages check is efficient enough in that case.
 			 */
-			if (attr->ia_size) {
+			if (newsize) {
 				struct shmem_inode_info *info = SHMEM_I(inode);
 				spin_lock(&info->lock);
 				info->flags &= ~SHMEM_PAGEIN;
 				spin_unlock(&info->lock);
 			}
 		}
+
+		error = simple_setsize(inode, newsize);
+		if (page)
+			page_cache_release(page);
+		if (error)
+			return error;
+		shmem_truncate_range(inode, newsize, (loff_t)-1);
 	}
 
 	error = inode_change_ok(inode, attr);
 	if (!error)
-		error = inode_setattr(inode, attr);
+		generic_setattr(inode, attr);
 #ifdef CONFIG_TMPFS_POSIX_ACL
 	if (!error && (attr->ia_valid & ATTR_MODE))
 		error = generic_acl_chmod(inode, &shmem_acl_ops);
 #endif
-	if (page)
-		page_cache_release(page);
 	return error;
 }
 
@@ -821,11 +824,11 @@ static void shmem_delete_inode(struct in
 {
 	struct shmem_inode_info *info = SHMEM_I(inode);
 
-	if (inode->i_op->truncate == shmem_truncate) {
+	if (inode->i_mapping->a_ops == &shmem_aops) {
 		truncate_inode_pages(inode->i_mapping, 0);
 		shmem_unacct_size(info->flags, inode->i_size);
 		inode->i_size = 0;
-		shmem_truncate(inode);
+		shmem_truncate_range(inode, 0, (loff_t)-1);
 		if (!list_empty(&info->swaplist)) {
 			mutex_lock(&shmem_swaplist_mutex);
 			list_del_init(&info->swaplist);
@@ -2022,7 +2025,6 @@ static const struct inode_operations shm
 };
 
 static const struct inode_operations shmem_symlink_inode_operations = {
-	.truncate	= shmem_truncate,
 	.readlink	= generic_readlink,
 	.follow_link	= shmem_follow_link,
 	.put_link	= shmem_put_link,
@@ -2439,7 +2441,6 @@ static const struct file_operations shme
 };
 
 static const struct inode_operations shmem_inode_operations = {
-	.truncate	= shmem_truncate,
 	.setattr	= shmem_notify_change,
 	.truncate_range	= shmem_truncate_range,
 #ifdef CONFIG_TMPFS_POSIX_ACL

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 4/5] ext2: convert to use the new truncate convention
  2009-12-08  8:38 [patch 1/5] fs: truncate introduce new sequence Nick Piggin
  2009-12-08  8:39 ` [patch 2/5] fs: convert simple fs to new truncate Nick Piggin
  2009-12-08  8:41 ` [patch 3/5] tmpfs: convert to use the new truncate convention Nick Piggin
@ 2009-12-08  8:42 ` Nick Piggin
  2009-12-09 14:57   ` Jan Kara
  2009-12-08  8:43 ` [patch 5/5] fat: " Nick Piggin
  2009-12-09 14:38 ` [patch 1/5] fs: truncate introduce new sequence Jan Kara
  4 siblings, 1 reply; 16+ messages in thread
From: Nick Piggin @ 2009-12-08  8:42 UTC (permalink / raw)
  To: Al Viro, Christoph Hellwig, Jan Kara, linux-fsdevel; +Cc: linux-ext4

I also have commented a possible bug in existing ext2 code, marked with XXX.

Cc: linux-ext4@vger.kernel.org
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
 fs/ext2/ext2.h  |    1 
 fs/ext2/file.c  |    1 
 fs/ext2/inode.c |  153 +++++++++++++++++++++++++++++++++++++++++++-------------
 3 files changed, 119 insertions(+), 36 deletions(-)

Index: linux-2.6/fs/ext2/ext2.h
===================================================================
--- linux-2.6.orig/fs/ext2/ext2.h
+++ linux-2.6/fs/ext2/ext2.h
@@ -122,7 +122,6 @@ extern int ext2_write_inode (struct inod
 extern void ext2_delete_inode (struct inode *);
 extern int ext2_sync_inode (struct inode *);
 extern int ext2_get_block(struct inode *, sector_t, struct buffer_head *, int);
-extern void ext2_truncate (struct inode *);
 extern int ext2_setattr (struct dentry *, struct iattr *);
 extern void ext2_set_inode_flags(struct inode *inode);
 extern void ext2_get_inode_flags(struct ext2_inode_info *);
Index: linux-2.6/fs/ext2/file.c
===================================================================
--- linux-2.6.orig/fs/ext2/file.c
+++ linux-2.6/fs/ext2/file.c
@@ -77,7 +77,6 @@ const struct file_operations ext2_xip_fi
 #endif
 
 const struct inode_operations ext2_file_inode_operations = {
-	.truncate	= ext2_truncate,
 #ifdef CONFIG_EXT2_FS_XATTR
 	.setxattr	= generic_setxattr,
 	.getxattr	= generic_getxattr,
Index: linux-2.6/fs/ext2/inode.c
===================================================================
--- linux-2.6.orig/fs/ext2/inode.c
+++ linux-2.6/fs/ext2/inode.c
@@ -53,6 +53,18 @@ static inline int ext2_inode_is_fast_sym
 		inode->i_blocks - ea_blocks == 0);
 }
 
+static void ext2_truncate_blocks(struct inode *inode, loff_t offset);
+
+static void ext2_write_failed(struct address_space *mapping, loff_t to)
+{
+	struct inode *inode = mapping->host;
+
+	if (to > inode->i_size) {
+		truncate_pagecache(inode, to, inode->i_size);
+		ext2_truncate_blocks(inode, inode->i_size);
+	}
+}
+
 /*
  * Called at the last iput() if i_nlink is zero.
  */
@@ -68,7 +80,7 @@ void ext2_delete_inode (struct inode * i
 
 	inode->i_size = 0;
 	if (inode->i_blocks)
-		ext2_truncate (inode);
+		ext2_truncate_blocks(inode, 0);
 	ext2_free_inode (inode);
 
 	return;
@@ -752,8 +764,8 @@ int __ext2_write_begin(struct file *file
 		loff_t pos, unsigned len, unsigned flags,
 		struct page **pagep, void **fsdata)
 {
-	return block_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
-							ext2_get_block);
+	return block_write_begin_newtrunc(file, mapping, pos, len, flags,
+					pagep, fsdata, ext2_get_block);
 }
 
 static int
@@ -761,8 +773,25 @@ ext2_write_begin(struct file *file, stru
 		loff_t pos, unsigned len, unsigned flags,
 		struct page **pagep, void **fsdata)
 {
+	int ret;
+
 	*pagep = NULL;
-	return __ext2_write_begin(file, mapping, pos, len, flags, pagep,fsdata);
+	ret = __ext2_write_begin(file, mapping, pos, len, flags, pagep, fsdata);
+	if (ret < 0)
+		ext2_write_failed(mapping, pos + len);
+	return ret;
+}
+
+static int ext2_write_end(struct file *file, struct address_space *mapping,
+			loff_t pos, unsigned len, unsigned copied,
+			struct page *page, void *fsdata)
+{
+	int ret;
+
+	ret = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
+	if (ret < len)
+		ext2_write_failed(mapping, pos + len);
+	return ret;
 }
 
 static int
@@ -770,13 +799,18 @@ ext2_nobh_write_begin(struct file *file,
 		loff_t pos, unsigned len, unsigned flags,
 		struct page **pagep, void **fsdata)
 {
+	int ret;
+
 	/*
 	 * Dir-in-pagecache still uses ext2_write_begin. Would have to rework
 	 * directory handling code to pass around offsets rather than struct
 	 * pages in order to make this work easily.
 	 */
-	return nobh_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
-							ext2_get_block);
+	ret = nobh_write_begin_newtrunc(file, mapping, pos, len, flags, pagep,
+						fsdata, ext2_get_block);
+	if (ret < 0)
+		ext2_write_failed(mapping, pos + len);
+	return ret;
 }
 
 static int ext2_nobh_writepage(struct page *page,
@@ -795,10 +829,15 @@ ext2_direct_IO(int rw, struct kiocb *ioc
 			loff_t offset, unsigned long nr_segs)
 {
 	struct file *file = iocb->ki_filp;
-	struct inode *inode = file->f_mapping->host;
-
-	return blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov,
-				offset, nr_segs, ext2_get_block, NULL);
+	struct address_space *mapping = file->f_mapping;
+	struct inode *inode = mapping->host;
+	ssize_t ret;
+
+	ret = blockdev_direct_IO_newtrunc(rw, iocb, inode, inode->i_sb->s_bdev,
+				iov, offset, nr_segs, ext2_get_block, NULL);
+	if (ret < 0 && (rw & WRITE))
+		ext2_write_failed(mapping, offset + iov_length(iov, nr_segs));
+	return ret;
 }
 
 static int
@@ -813,7 +852,7 @@ const struct address_space_operations ex
 	.writepage		= ext2_writepage,
 	.sync_page		= block_sync_page,
 	.write_begin		= ext2_write_begin,
-	.write_end		= generic_write_end,
+	.write_end		= ext2_write_end,
 	.bmap			= ext2_bmap,
 	.direct_IO		= ext2_direct_IO,
 	.writepages		= ext2_writepages,
@@ -1022,7 +1061,7 @@ static void ext2_free_branches(struct in
 		ext2_free_data(inode, p, q);
 }
 
-void ext2_truncate(struct inode *inode)
+static void __ext2_truncate_blocks(struct inode *inode, loff_t offset)
 {
 	__le32 *i_data = EXT2_I(inode)->i_data;
 	struct ext2_inode_info *ei = EXT2_I(inode);
@@ -1034,27 +1073,8 @@ void ext2_truncate(struct inode *inode)
 	int n;
 	long iblock;
 	unsigned blocksize;
-
-	if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
-	    S_ISLNK(inode->i_mode)))
-		return;
-	if (ext2_inode_is_fast_symlink(inode))
-		return;
-	if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
-		return;
-
 	blocksize = inode->i_sb->s_blocksize;
-	iblock = (inode->i_size + blocksize-1)
-					>> EXT2_BLOCK_SIZE_BITS(inode->i_sb);
-
-	if (mapping_is_xip(inode->i_mapping))
-		xip_truncate_page(inode->i_mapping, inode->i_size);
-	else if (test_opt(inode->i_sb, NOBH))
-		nobh_truncate_page(inode->i_mapping,
-				inode->i_size, ext2_get_block);
-	else
-		block_truncate_page(inode->i_mapping,
-				inode->i_size, ext2_get_block);
+	iblock = (offset + blocksize-1) >> EXT2_BLOCK_SIZE_BITS(inode->i_sb);
 
 	n = ext2_block_to_path(inode, iblock, offsets, NULL);
 	if (n == 0)
@@ -1122,6 +1142,62 @@ do_indirects:
 	ext2_discard_reservation(inode);
 
 	mutex_unlock(&ei->truncate_mutex);
+}
+
+static void ext2_truncate_blocks(struct inode *inode, loff_t offset)
+{
+	/*
+	 * XXX: it seems like a bug here that we don't allow
+	 * IS_APPEND inode to have blocks-past-i_size trimmed off.
+	 * review and fix this.
+	 *
+	 * Also would be nice to be able to handle IO errors and such,
+	 * but that's probably too much to ask.
+	 */
+	if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
+	    S_ISLNK(inode->i_mode)))
+		return;
+	if (ext2_inode_is_fast_symlink(inode))
+		return;
+	if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
+		return;
+	__ext2_truncate_blocks(inode, offset);
+}
+
+int ext2_setsize(struct inode *inode, loff_t newsize)
+{
+	loff_t oldsize;
+	int error;
+
+	error = inode_newsize_ok(inode, newsize);
+	if (error)
+		return error;
+
+	if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
+	    S_ISLNK(inode->i_mode)))
+		return -EINVAL;
+	if (ext2_inode_is_fast_symlink(inode))
+		return -EINVAL;
+	if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
+		return -EPERM;
+
+	if (mapping_is_xip(inode->i_mapping))
+		error = xip_truncate_page(inode->i_mapping, newsize);
+	else if (test_opt(inode->i_sb, NOBH))
+		error = nobh_truncate_page(inode->i_mapping,
+				newsize, ext2_get_block);
+	else
+		error = block_truncate_page(inode->i_mapping,
+				newsize, ext2_get_block);
+	if (error)
+		return error;
+
+	oldsize = inode->i_size;
+	i_size_write(inode, newsize);
+	truncate_pagecache(inode, oldsize, newsize);
+
+	__ext2_truncate_blocks(inode, newsize);
+
 	inode->i_mtime = inode->i_ctime = CURRENT_TIME_SEC;
 	if (inode_needs_sync(inode)) {
 		sync_mapping_buffers(inode->i_mapping);
@@ -1129,6 +1205,8 @@ do_indirects:
 	} else {
 		mark_inode_dirty(inode);
 	}
+
+	return 0;
 }
 
 static struct ext2_inode *ext2_get_inode(struct super_block *sb, ino_t ino,
@@ -1461,8 +1539,15 @@ int ext2_setattr(struct dentry *dentry,
 		if (error)
 			return error;
 	}
-	error = inode_setattr(inode, iattr);
-	if (!error && (iattr->ia_valid & ATTR_MODE))
+	if (iattr->ia_valid & ATTR_SIZE) {
+		error = ext2_setsize(inode, iattr->ia_size);
+		if (error)
+			return error;
+	}
+	generic_setattr(inode, iattr);
+	if (iattr->ia_valid & ATTR_MODE)
 		error = ext2_acl_chmod(inode);
+	mark_inode_dirty(inode);
+
 	return error;
 }

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 5/5] fat: convert to use the new truncate convention
  2009-12-08  8:38 [patch 1/5] fs: truncate introduce new sequence Nick Piggin
                   ` (2 preceding siblings ...)
  2009-12-08  8:42 ` [patch 4/5] ext2: " Nick Piggin
@ 2009-12-08  8:43 ` Nick Piggin
  2009-12-09 15:08   ` Jan Kara
  2009-12-09 14:38 ` [patch 1/5] fs: truncate introduce new sequence Jan Kara
  4 siblings, 1 reply; 16+ messages in thread
From: Nick Piggin @ 2009-12-08  8:43 UTC (permalink / raw)
  To: Al Viro, Christoph Hellwig, Jan Kara, linux-fsdevel; +Cc: OGAWA Hirofumi

Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
 fs/fat/fat.h   |    3 ++-
 fs/fat/file.c  |   34 ++++++++++++++++++++++++++--------
 fs/fat/inode.c |   35 +++++++++++++++++++++++++++++------
 3 files changed, 57 insertions(+), 15 deletions(-)

Index: linux-2.6/fs/fat/fat.h
===================================================================
--- linux-2.6.orig/fs/fat/fat.h
+++ linux-2.6/fs/fat/fat.h
@@ -302,7 +302,8 @@ extern int fat_generic_ioctl(struct inod
 extern const struct file_operations fat_file_operations;
 extern const struct inode_operations fat_file_inode_operations;
 extern int fat_setattr(struct dentry * dentry, struct iattr * attr);
-extern void fat_truncate(struct inode *inode);
+extern int fat_setsize(struct inode *inode, loff_t offset);
+extern void fat_truncate_blocks(struct inode *inode, loff_t offset);
 extern int fat_getattr(struct vfsmount *mnt, struct dentry *dentry,
 		       struct kstat *stat);
 extern int fat_file_fsync(struct file *file, struct dentry *dentry,
Index: linux-2.6/fs/fat/file.c
===================================================================
--- linux-2.6.orig/fs/fat/file.c
+++ linux-2.6/fs/fat/file.c
@@ -270,7 +270,7 @@ static int fat_free(struct inode *inode,
 	return fat_free_clusters(inode, free_start);
 }
 
-void fat_truncate(struct inode *inode)
+void fat_truncate_blocks(struct inode *inode, loff_t offset)
 {
 	struct msdos_sb_info *sbi = MSDOS_SB(inode->i_sb);
 	const unsigned int cluster_size = sbi->cluster_size;
@@ -280,10 +280,10 @@ void fat_truncate(struct inode *inode)
 	 * This protects against truncating a file bigger than it was then
 	 * trying to write into the hole.
 	 */
-	if (MSDOS_I(inode)->mmu_private > inode->i_size)
-		MSDOS_I(inode)->mmu_private = inode->i_size;
+	if (MSDOS_I(inode)->mmu_private > offset)
+		MSDOS_I(inode)->mmu_private = offset;
 
-	nr_clusters = (inode->i_size + (cluster_size - 1)) >> sbi->cluster_bits;
+	nr_clusters = (offset + (cluster_size - 1)) >> sbi->cluster_bits;
 
 	fat_free(inode, nr_clusters);
 	fat_flush_inodes(inode->i_sb, inode, NULL);
@@ -351,6 +351,18 @@ static int fat_allow_set_time(struct msd
 	return 0;
 }
 
+int fat_setsize(struct inode *inode, loff_t offset)
+{
+	int error;
+
+	error = simple_setsize(inode, offset);
+	if (error)
+		return error;
+	fat_truncate_blocks(inode, offset);
+
+	return error;
+}
+
 #define TIMES_SET_FLAGS	(ATTR_MTIME_SET | ATTR_ATIME_SET | ATTR_TIMES_SET)
 /* valid file mode bits */
 #define FAT_VALID_MODE	(S_IFREG | S_IFDIR | S_IRWXUGO)
@@ -365,7 +377,8 @@ int fat_setattr(struct dentry *dentry, s
 	/*
 	 * Expand the file. Since inode_setattr() updates ->i_size
 	 * before calling the ->truncate(), but FAT needs to fill the
-	 * hole before it.
+	 * hole before it. XXX: this is no longer true with new truncate
+	 * sequence.
 	 */
 	if (attr->ia_valid & ATTR_SIZE) {
 		if (attr->ia_size > inode->i_size) {
@@ -414,15 +427,20 @@ int fat_setattr(struct dentry *dentry, s
 			attr->ia_valid &= ~ATTR_MODE;
 	}
 
-	if (attr->ia_valid)
-		error = inode_setattr(inode, attr);
+	if (attr->ia_valid & ATTR_SIZE) {
+		error = fat_setsize(inode, attr->ia_size);
+		if (error)
+			goto out;
+	}
+
+	generic_setattr(inode, attr);
+	mark_inode_dirty(inode);
 out:
 	return error;
 }
 EXPORT_SYMBOL_GPL(fat_setattr);
 
 const struct inode_operations fat_file_inode_operations = {
-	.truncate	= fat_truncate,
 	.setattr	= fat_setattr,
 	.getattr	= fat_getattr,
 };
Index: linux-2.6/fs/fat/inode.c
===================================================================
--- linux-2.6.orig/fs/fat/inode.c
+++ linux-2.6/fs/fat/inode.c
@@ -142,14 +142,29 @@ static int fat_readpages(struct file *fi
 	return mpage_readpages(mapping, pages, nr_pages, fat_get_block);
 }
 
+static void fat_write_failed(struct address_space *mapping, loff_t to)
+{
+	struct inode *inode = mapping->host;
+
+	if (to > inode->i_size) {
+		truncate_pagecache(inode, to, inode->i_size);
+		fat_truncate_blocks(inode, inode->i_size);
+	}
+}
+
 static int fat_write_begin(struct file *file, struct address_space *mapping,
 			loff_t pos, unsigned len, unsigned flags,
 			struct page **pagep, void **fsdata)
 {
+	int err;
+
 	*pagep = NULL;
-	return cont_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
-				fat_get_block,
+	err = cont_write_begin_newtrunc(file, mapping, pos, len, flags,
+				pagep, fsdata, fat_get_block,
 				&MSDOS_I(mapping->host)->mmu_private);
+	if (err < 0)
+		fat_write_failed(mapping, pos + len);
+	return err;
 }
 
 static int fat_write_end(struct file *file, struct address_space *mapping,
@@ -159,6 +174,8 @@ static int fat_write_end(struct file *fi
 	struct inode *inode = mapping->host;
 	int err;
 	err = generic_write_end(file, mapping, pos, len, copied, pagep, fsdata);
+	if (err < len)
+		fat_write_failed(mapping, pos + len);
 	if (!(err < 0) && !(MSDOS_I(inode)->i_attrs & ATTR_ARCH)) {
 		inode->i_mtime = inode->i_ctime = CURRENT_TIME_SEC;
 		MSDOS_I(inode)->i_attrs |= ATTR_ARCH;
@@ -172,7 +189,9 @@ static ssize_t fat_direct_IO(int rw, str
 			     loff_t offset, unsigned long nr_segs)
 {
 	struct file *file = iocb->ki_filp;
-	struct inode *inode = file->f_mapping->host;
+	struct address_space *mapping = file->f_mapping;
+	struct inode *inode = mapping->host;
+	ssize_t ret;
 
 	if (rw == WRITE) {
 		/*
@@ -193,8 +212,12 @@ static ssize_t fat_direct_IO(int rw, str
 	 * FAT need to use the DIO_LOCKING for avoiding the race
 	 * condition of fat_get_block() and ->truncate().
 	 */
-	return blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov,
-				  offset, nr_segs, fat_get_block, NULL);
+	ret = blockdev_direct_IO_newtrunc(rw, iocb, inode, inode->i_sb->s_bdev,
+				iov, offset, nr_segs, fat_get_block, NULL);
+	if (ret < 0 && (rw & WRITE))
+		fat_write_failed(mapping, offset + iov_length(iov, nr_segs));
+
+	return ret;
 }
 
 static sector_t _fat_bmap(struct address_space *mapping, sector_t block)
@@ -429,7 +452,7 @@ static void fat_delete_inode(struct inod
 {
 	truncate_inode_pages(&inode->i_data, 0);
 	inode->i_size = 0;
-	fat_truncate(inode);
+	fat_truncate_blocks(inode, 0);
 	clear_inode(inode);
 }
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 1/5] fs: truncate introduce new sequence
  2009-12-08  8:38 [patch 1/5] fs: truncate introduce new sequence Nick Piggin
                   ` (3 preceding siblings ...)
  2009-12-08  8:43 ` [patch 5/5] fat: " Nick Piggin
@ 2009-12-09 14:38 ` Jan Kara
  4 siblings, 0 replies; 16+ messages in thread
From: Jan Kara @ 2009-12-09 14:38 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Al Viro, Christoph Hellwig, Jan Kara, linux-fsdevel

  Hi,

On Tue 08-12-09 09:38:32, Nick Piggin wrote:
> Changes: avoid .truncate_kludge_to_be_killed hack by implementing additional
> fs helper functions that do not call vmtruncate.
> 
> So far have converted ext2 and fat, tested with fsx-linux and the recent
> writev truncate bug tester. The reset of the patches that were already
> made are pretty simple to convert but I will likely prefer to send those
> via their maintainers after this initial set is agreed and merged.
> 
> The new helpers, *_newtrunc, are a bit ugly, but we can do a rename back
> to the old names after all support for old sequence is removed.
  Yeah. I'm not sure what Al dislikes about the original temporary hack.
Maybe he's afraid that the hack will last longer than we'd like... What I
dislike about this approach is that we'll have to change every filesystem
in several places also in the second patch. Anyway I'd like to get this
change merged and I don't care too much about details of the intermediate
period...

> --
> Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
> setattr > vmtruncate > truncate, have filesystems call their truncate sequence
> from ->setattr if filesystem specific operations are required. vmtruncate is
> deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
> previously should be used.
> 
> simple_setattr is introduced for simple in-ram filesystems to implement
> the new truncate sequence. Eventually all filesystems should be converted
> to implement a setattr, and the default code in notify_change should go
> away.
> 
> simple_setsize is also introduced to perform just the ATTR_SIZE portion
> of simple_setattr (ie. changing i_size and trimming pagecache).
> 
> To implement the new truncate sequence:
> - filesystem specific manipulations (eg freeing blocks) must be done in
>   the setattr method rather than ->truncate.
> - vmtruncate can not be used by core code to trim blocks past i_size in
>   the event of write failure after allocation, so this must be performed
>   in the fs code.
> - convert usage of helpers block_write_begin, nobh_write_begin,
>   cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed
>   variants. These avoid calling vmtruncate to trim blocks (see previous).
> - inode_setattr should not be used. generic_setattr is a new function
>   to be used to copy simple attributes into the generic inode.
> - make use of the better opportunity to handle errors with the new sequence.
> 
> Big problem with the previous calling sequence: the filesystem is not called
> until i_size has already changed.  This means it is not allowed to fail the
> call, and also it does not know what the previous i_size was. Also, generic
> code calling vmtruncate to truncate allocated blocks in case of error had
> no good way to return a meaningful error (or, for example, atomically handle
> block deallocation).
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
  The patch looks fine except for a typo below.
Acked-by: Jan Kara <jack@suse.cz>

									Honza
> ---
>  Documentation/filesystems/vfs.txt |    7 +-
>  fs/attr.c                         |   50 ++++++++++++---
>  fs/buffer.c                       |  123 ++++++++++++++++++++++++++++++--------
>  fs/direct-io.c                    |   85 ++++++++++++++++----------
>  fs/libfs.c                        |   83 +++++++++++++++++++++++++
>  include/linux/buffer_head.h       |    9 ++
>  include/linux/fs.h                |   36 ++++++++++-
>  mm/truncate.c                     |   10 +--
>  8 files changed, 328 insertions(+), 75 deletions(-)
> 
> Index: linux-2.6/Documentation/filesystems/vfs.txt
> ===================================================================
> --- linux-2.6.orig/Documentation/filesystems/vfs.txt
> +++ linux-2.6/Documentation/filesystems/vfs.txt
> @@ -401,11 +401,16 @@ otherwise noted.
>    	started might not be in the page cache at the end of the
>    	walk).
>  
> -  truncate: called by the VFS to change the size of a file.  The
> +  truncate: Deprecated. This will not be called if ->setsize is defined.
> +	Called by the VFS to change the size of a file.  The
>   	i_size field of the inode is set to the desired size by the
>   	VFS before this method is called.  This method is called by
>   	the truncate(2) system call and related functionality.
>  
> +	Note: ->truncate and vmtruncate are deprecated. Do not add new
> +	instances/calls of these. Filesystems shoud be converted to do their
                                              ^^^^^ should
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 2/5] fs: convert simple fs to new truncate
  2009-12-08  8:39 ` [patch 2/5] fs: convert simple fs to new truncate Nick Piggin
@ 2009-12-09 14:39   ` Jan Kara
  2009-12-10  5:15     ` Nick Piggin
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2009-12-09 14:39 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Al Viro, Christoph Hellwig, Jan Kara, linux-fsdevel

On Tue 08-12-09 09:39:57, Nick Piggin wrote:
> Convert simple filesystems: ramfs, configfs, sysfs, block_dev to new truncate
> sequence.
> 
> Signed-off-by: Nick Piggin <npiggin@suse.de>
  Looks good.
Acked-by: Jan Kara <jack@suse.cz>

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 3/5] tmpfs: convert to use the new truncate convention
  2009-12-08  8:41 ` [patch 3/5] tmpfs: convert to use the new truncate convention Nick Piggin
@ 2009-12-09 14:45   ` Jan Kara
  2009-12-10  0:48     ` Nick Piggin
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2009-12-09 14:45 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Al Viro, Christoph Hellwig, Jan Kara, linux-fsdevel

On Tue 08-12-09 09:41:09, Nick Piggin wrote:
> Cc: Christoph Hellwig <hch@lst.de>
> Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
...
> @@ -821,11 +824,11 @@ static void shmem_delete_inode(struct in
>  {
>  	struct shmem_inode_info *info = SHMEM_I(inode);
>  
> -	if (inode->i_op->truncate == shmem_truncate) {
> +	if (inode->i_mapping->a_ops == &shmem_aops) {
  How about symlinks here? Originally we've truncated inode for them as
well... BTW the check is really strange. Shouldn't we rather make it
something like S_ISLNK || S_ISREG?

>  		truncate_inode_pages(inode->i_mapping, 0);
>  		shmem_unacct_size(info->flags, inode->i_size);
>  		inode->i_size = 0;
> -		shmem_truncate(inode);
> +		shmem_truncate_range(inode, 0, (loff_t)-1);
>  		if (!list_empty(&info->swaplist)) {
>  			mutex_lock(&shmem_swaplist_mutex);
>  			list_del_init(&info->swaplist);

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 4/5] ext2: convert to use the new truncate convention
  2009-12-08  8:42 ` [patch 4/5] ext2: " Nick Piggin
@ 2009-12-09 14:57   ` Jan Kara
  2009-12-10  1:11     ` Nick Piggin
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2009-12-09 14:57 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Al Viro, Christoph Hellwig, Jan Kara, linux-fsdevel, linux-ext4

On Tue 08-12-09 09:42:09, Nick Piggin wrote:
> I also have commented a possible bug in existing ext2 code, marked with XXX.
> 
> Cc: linux-ext4@vger.kernel.org
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
...
> @@ -752,8 +764,8 @@ int __ext2_write_begin(struct file *file
>  		loff_t pos, unsigned len, unsigned flags,
>  		struct page **pagep, void **fsdata)
>  {
> -	return block_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
> -							ext2_get_block);
> +	return block_write_begin_newtrunc(file, mapping, pos, len, flags,
> +					pagep, fsdata, ext2_get_block);
>  }
  OK, but you should update the code in dir.c using __ext2_write_begin,
shouldn't you?

> +static int ext2_write_end(struct file *file, struct address_space *mapping,
> +			loff_t pos, unsigned len, unsigned copied,
> +			struct page *page, void *fsdata)
> +{
> +	int ret;
> +
> +	ret = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
> +	if (ret < len)
> +		ext2_write_failed(mapping, pos + len);
> +	return ret;
>  }
  OK, when doing this, please also update ext2_commit_chunk in dir.c...

> +static void ext2_truncate_blocks(struct inode *inode, loff_t offset)
> +{
> +	/*
> +	 * XXX: it seems like a bug here that we don't allow
> +	 * IS_APPEND inode to have blocks-past-i_size trimmed off.
> +	 * review and fix this.
> +	 *
> +	 * Also would be nice to be able to handle IO errors and such,
> +	 * but that's probably too much to ask.
> +	 */
> +	if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
> +	    S_ISLNK(inode->i_mode)))
> +		return;
> +	if (ext2_inode_is_fast_symlink(inode))
> +		return;
> +	if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
> +		return;
  Yes, I'd remove IS_APPEND check from here.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 5/5] fat: convert to use the new truncate convention
  2009-12-08  8:43 ` [patch 5/5] fat: " Nick Piggin
@ 2009-12-09 15:08   ` Jan Kara
  0 siblings, 0 replies; 16+ messages in thread
From: Jan Kara @ 2009-12-09 15:08 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Al Viro, Christoph Hellwig, Jan Kara, linux-fsdevel,
	OGAWA Hirofumi

On Tue 08-12-09 09:43:02, Nick Piggin wrote:
> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
  Looks good except that the code in fat_setattr could be simplified I
think.
Acked-by: Jan Kara <jack@suse.cz>

> @@ -365,7 +377,8 @@ int fat_setattr(struct dentry *dentry, s
>  	/*
>  	 * Expand the file. Since inode_setattr() updates ->i_size
>  	 * before calling the ->truncate(), but FAT needs to fill the
> -	 * hole before it.
> +	 * hole before it. XXX: this is no longer true with new truncate
> +	 * sequence.
>  	 */
>  	if (attr->ia_valid & ATTR_SIZE) {
>  		if (attr->ia_size > inode->i_size) {
  I think the content of this condition can be moved inside the ATTR_SIZE
check you've added below and the comment can be removed...

> @@ -414,15 +427,20 @@ int fat_setattr(struct dentry *dentry, s
>  			attr->ia_valid &= ~ATTR_MODE;
>  	}
>  
> -	if (attr->ia_valid)
> -		error = inode_setattr(inode, attr);
> +	if (attr->ia_valid & ATTR_SIZE) {
> +		error = fat_setsize(inode, attr->ia_size);
> +		if (error)
> +			goto out;
> +	}
> +
> +	generic_setattr(inode, attr);
> +	mark_inode_dirty(inode);

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 3/5] tmpfs: convert to use the new truncate convention
  2009-12-09 14:45   ` Jan Kara
@ 2009-12-10  0:48     ` Nick Piggin
  0 siblings, 0 replies; 16+ messages in thread
From: Nick Piggin @ 2009-12-10  0:48 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, Christoph Hellwig, linux-fsdevel

On Wed, Dec 09, 2009 at 03:45:59PM +0100, Jan Kara wrote:
> On Tue 08-12-09 09:41:09, Nick Piggin wrote:
> > Cc: Christoph Hellwig <hch@lst.de>
> > Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> > Signed-off-by: Nick Piggin <npiggin@suse.de>
> ...
> > @@ -821,11 +824,11 @@ static void shmem_delete_inode(struct in
> >  {
> >  	struct shmem_inode_info *info = SHMEM_I(inode);
> >  
> > -	if (inode->i_op->truncate == shmem_truncate) {
> > +	if (inode->i_mapping->a_ops == &shmem_aops) {
>   How about symlinks here? Originally we've truncated inode for them as
> well... BTW the check is really strange. Shouldn't we rather make it
> something like S_ISLNK || S_ISREG?

Yes you're right. I don't see why we can't use the mode check,
Hugh?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 4/5] ext2: convert to use the new truncate convention
  2009-12-09 14:57   ` Jan Kara
@ 2009-12-10  1:11     ` Nick Piggin
  2009-12-10 10:20       ` Jan Kara
  0 siblings, 1 reply; 16+ messages in thread
From: Nick Piggin @ 2009-12-10  1:11 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, Christoph Hellwig, linux-fsdevel, linux-ext4

On Wed, Dec 09, 2009 at 03:57:13PM +0100, Jan Kara wrote:
> On Tue 08-12-09 09:42:09, Nick Piggin wrote:
> > I also have commented a possible bug in existing ext2 code, marked with XXX.
> > 
> > Cc: linux-ext4@vger.kernel.org
> > Cc: Christoph Hellwig <hch@lst.de>
> > Signed-off-by: Nick Piggin <npiggin@suse.de>
> ...
> > @@ -752,8 +764,8 @@ int __ext2_write_begin(struct file *file
> >  		loff_t pos, unsigned len, unsigned flags,
> >  		struct page **pagep, void **fsdata)
> >  {
> > -	return block_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
> > -							ext2_get_block);
> > +	return block_write_begin_newtrunc(file, mapping, pos, len, flags,
> > +					pagep, fsdata, ext2_get_block);
> >  }
>   OK, but you should update the code in dir.c using __ext2_write_begin,
> shouldn't you?

To trim off blocks past i_size on failure? Yes I guess it should.
In fact, the ext2_write_failed call could just go into here I think.
I'll take a look.


> > +static int ext2_write_end(struct file *file, struct address_space *mapping,
> > +			loff_t pos, unsigned len, unsigned copied,
> > +			struct page *page, void *fsdata)
> > +{
> > +	int ret;
> > +
> > +	ret = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
> > +	if (ret < len)
> > +		ext2_write_failed(mapping, pos + len);
> > +	return ret;
> >  }
>   OK, when doing this, please also update ext2_commit_chunk in dir.c...

I think commit_chunk is OK. Because block_write_end does not fail
and the only reason for checking failure here is in case of a short
copy (which won't happen in dir.c code).

 
> > +static void ext2_truncate_blocks(struct inode *inode, loff_t offset)
> > +{
> > +	/*
> > +	 * XXX: it seems like a bug here that we don't allow
> > +	 * IS_APPEND inode to have blocks-past-i_size trimmed off.
> > +	 * review and fix this.
> > +	 *
> > +	 * Also would be nice to be able to handle IO errors and such,
> > +	 * but that's probably too much to ask.
> > +	 */
> > +	if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
> > +	    S_ISLNK(inode->i_mode)))
> > +		return;
> > +	if (ext2_inode_is_fast_symlink(inode))
> > +		return;
> > +	if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
> > +		return;
>   Yes, I'd remove IS_APPEND check from here.

Didn't want to change that in this patch, but I'm happy for someone to
fix it (and in ext3/4 etc).

The checks probably should be done at a different level anyway: by the
time that this code gets called, the pagecache is truncated and i_size
modified anyway so it seems buggy if this made any difference.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 2/5] fs: convert simple fs to new truncate
  2009-12-09 14:39   ` Jan Kara
@ 2009-12-10  5:15     ` Nick Piggin
  2009-12-10  8:44       ` Boaz Harrosh
  0 siblings, 1 reply; 16+ messages in thread
From: Nick Piggin @ 2009-12-10  5:15 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, Christoph Hellwig, linux-fsdevel

On Wed, Dec 09, 2009 at 03:39:45PM +0100, Jan Kara wrote:
> On Tue 08-12-09 09:39:57, Nick Piggin wrote:
> > Convert simple filesystems: ramfs, configfs, sysfs, block_dev to new truncate
> > sequence.
> > 
> > Signed-off-by: Nick Piggin <npiggin@suse.de>
>   Looks good.
> Acked-by: Jan Kara <jack@suse.cz>

Thanks.

Al, any chance to merge patches 1 and 2 upstream early in this round? They
should be fairly non intrusive and back compatible. Then I can get the
filesystem specific patches to their maintainers and you won't have to
bother with them.

ecryptfs is being fixed to use notify_change rather than vmtruncate (which
is anyway buggy today), so that won't be a problem.

I think we'll be able to get rid of old protocol completely within just
a couple of releases. Jan and Christoph have made pretty good progress
there.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 2/5] fs: convert simple fs to new truncate
  2009-12-10  8:44       ` Boaz Harrosh
@ 2009-12-10  7:57         ` Nick Piggin
  0 siblings, 0 replies; 16+ messages in thread
From: Nick Piggin @ 2009-12-10  7:57 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: Jan Kara, Al Viro, Christoph Hellwig, linux-fsdevel

On Thu, Dec 10, 2009 at 10:44:08AM +0200, Boaz Harrosh wrote:
> On 12/10/2009 07:15 AM, Nick Piggin wrote:
> > 
> > Al, any chance to merge patches 1 and 2 upstream early in this round? They
> > should be fairly non intrusive and back compatible. Then I can get the
> > filesystem specific patches to their maintainers and you won't have to
> > bother with them.
> > 
> > ecryptfs is being fixed to use notify_change rather than vmtruncate (which
> > is anyway buggy today), so that won't be a problem.
> > 
> > I think we'll be able to get rid of old protocol completely within just
> > a couple of releases. Jan and Christoph have made pretty good progress
> > there.
> > 
> 
> Nick Hi.
> 
> I want to do the exofs conversion. Is there a public git-tree with all this that
> I can base my work, and run tests?
> 
> Thanks
> Boaz

Hi Boaz,

I don't have any git trees published. Apart from merging the patch by
hand, I should hope it goes upstream with Al's vfs merge.

If you've got patches based on the previous patchset in Al's tree,
then it is identical except that we don't switch those helper functions
with the .truncate_kludge = 1 flag but instead use _newtrunc variants.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 2/5] fs: convert simple fs to new truncate
  2009-12-10  5:15     ` Nick Piggin
@ 2009-12-10  8:44       ` Boaz Harrosh
  2009-12-10  7:57         ` Nick Piggin
  0 siblings, 1 reply; 16+ messages in thread
From: Boaz Harrosh @ 2009-12-10  8:44 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Jan Kara, Al Viro, Christoph Hellwig, linux-fsdevel

On 12/10/2009 07:15 AM, Nick Piggin wrote:
> 
> Al, any chance to merge patches 1 and 2 upstream early in this round? They
> should be fairly non intrusive and back compatible. Then I can get the
> filesystem specific patches to their maintainers and you won't have to
> bother with them.
> 
> ecryptfs is being fixed to use notify_change rather than vmtruncate (which
> is anyway buggy today), so that won't be a problem.
> 
> I think we'll be able to get rid of old protocol completely within just
> a couple of releases. Jan and Christoph have made pretty good progress
> there.
> 

Nick Hi.

I want to do the exofs conversion. Is there a public git-tree with all this that
I can base my work, and run tests?

Thanks
Boaz

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 4/5] ext2: convert to use the new truncate convention
  2009-12-10  1:11     ` Nick Piggin
@ 2009-12-10 10:20       ` Jan Kara
  0 siblings, 0 replies; 16+ messages in thread
From: Jan Kara @ 2009-12-10 10:20 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Jan Kara, Al Viro, Christoph Hellwig, linux-fsdevel, linux-ext4

On Thu 10-12-09 12:11:03, Nick Piggin wrote:
> On Wed, Dec 09, 2009 at 03:57:13PM +0100, Jan Kara wrote:
> > On Tue 08-12-09 09:42:09, Nick Piggin wrote:
> > > I also have commented a possible bug in existing ext2 code, marked with XXX.
> > > 
> > > Cc: linux-ext4@vger.kernel.org
> > > Cc: Christoph Hellwig <hch@lst.de>
> > > Signed-off-by: Nick Piggin <npiggin@suse.de>
> > ...
> > > @@ -752,8 +764,8 @@ int __ext2_write_begin(struct file *file
> > >  		loff_t pos, unsigned len, unsigned flags,
> > >  		struct page **pagep, void **fsdata)
> > >  {
> > > -	return block_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
> > > -							ext2_get_block);
> > > +	return block_write_begin_newtrunc(file, mapping, pos, len, flags,
> > > +					pagep, fsdata, ext2_get_block);
> > >  }
> >   OK, but you should update the code in dir.c using __ext2_write_begin,
> > shouldn't you?
> 
> To trim off blocks past i_size on failure? Yes I guess it should.
> In fact, the ext2_write_failed call could just go into here I think.
> I'll take a look.
  Yes, that would be probably the easiest.

> > > +static int ext2_write_end(struct file *file, struct address_space *mapping,
> > > +			loff_t pos, unsigned len, unsigned copied,
> > > +			struct page *page, void *fsdata)
> > > +{
> > > +	int ret;
> > > +
> > > +	ret = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
> > > +	if (ret < len)
> > > +		ext2_write_failed(mapping, pos + len);
> > > +	return ret;
> > >  }
> >   OK, when doing this, please also update ext2_commit_chunk in dir.c...
> 
> I think commit_chunk is OK. Because block_write_end does not fail
> and the only reason for checking failure here is in case of a short
> copy (which won't happen in dir.c code).
  Ah, right.

> > > +static void ext2_truncate_blocks(struct inode *inode, loff_t offset)
> > > +{
> > > +	/*
> > > +	 * XXX: it seems like a bug here that we don't allow
> > > +	 * IS_APPEND inode to have blocks-past-i_size trimmed off.
> > > +	 * review and fix this.
> > > +	 *
> > > +	 * Also would be nice to be able to handle IO errors and such,
> > > +	 * but that's probably too much to ask.
> > > +	 */
> > > +	if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
> > > +	    S_ISLNK(inode->i_mode)))
> > > +		return;
> > > +	if (ext2_inode_is_fast_symlink(inode))
> > > +		return;
> > > +	if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
> > > +		return;
> >   Yes, I'd remove IS_APPEND check from here.
> 
> Didn't want to change that in this patch, but I'm happy for someone to
> fix it (and in ext3/4 etc).
  OK.

> The checks probably should be done at a different level anyway: by the
> time that this code gets called, the pagecache is truncated and i_size
> modified anyway so it seems buggy if this made any difference.
  It depends. The first if can be WARN_ON as well - we shouldn't really get
different inode types here. For fast symlinks we have nothing to truncate
so we want to immediately return doing nothing. For IMMUTABLE inode it is
again a bug if we get to this function and for APPEND inode the only
acceptable way to get here is the error recovery. We can clean that up
when your patch series goes in.

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2009-12-11 20:45 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-08  8:38 [patch 1/5] fs: truncate introduce new sequence Nick Piggin
2009-12-08  8:39 ` [patch 2/5] fs: convert simple fs to new truncate Nick Piggin
2009-12-09 14:39   ` Jan Kara
2009-12-10  5:15     ` Nick Piggin
2009-12-10  8:44       ` Boaz Harrosh
2009-12-10  7:57         ` Nick Piggin
2009-12-08  8:41 ` [patch 3/5] tmpfs: convert to use the new truncate convention Nick Piggin
2009-12-09 14:45   ` Jan Kara
2009-12-10  0:48     ` Nick Piggin
2009-12-08  8:42 ` [patch 4/5] ext2: " Nick Piggin
2009-12-09 14:57   ` Jan Kara
2009-12-10  1:11     ` Nick Piggin
2009-12-10 10:20       ` Jan Kara
2009-12-08  8:43 ` [patch 5/5] fat: " Nick Piggin
2009-12-09 15:08   ` Jan Kara
2009-12-09 14:38 ` [patch 1/5] fs: truncate introduce new sequence Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).