linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] [PATCH 0/7] Improve VFS to handle better mmaps when blocksize < pagesize (v3)
@ 2009-09-17 15:21 Jan Kara
  2009-09-17 15:21 ` [PATCH 1/7] fs: buffer_head writepage no invalidate Jan Kara
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Jan Kara @ 2009-09-17 15:21 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: LKML, linux-ext4, linux-mm, npiggin


  Hi,

  here is my next attempt to solve a problems arising with mmaped writes when
blocksize < pagesize. To recall what's the problem:

We'd like to use page_mkwrite() to allocate blocks under a page which is
becoming writeably mmapped in some process address space. This allows a
filesystem to return a page fault if there is not enough space available, user
exceeds quota or similar problem happens, rather than silently discarding data
later when writepage is called.

On filesystems where blocksize < pagesize the situation is complicated though.
Think for example that blocksize = 1024, pagesize = 4096 and a process does:
  ftruncate(fd, 0);
  pwrite(fd, buf, 1024, 0);
  map = mmap(NULL, 4096, PROT_WRITE, MAP_SHARED, fd, 0);
  map[0] = 'a';  ----> page_mkwrite() for index 0 is called
  ftruncate(fd, 10000); /* or even pwrite(fd, buf, 1, 10000) */
  fsync(fd); ----> writepage() for index 0 is called

At the moment page_mkwrite() is called, filesystem can allocate only one block
for the page because i_size == 1024. Otherwise it would create blocks beyond
i_size which is generally undesirable. But later at writepage() time, we would
like to have blocks allocated for the whole page (and in principle we have to
allocate them because user could have filled the page with data after the
second ftruncate()).
---

  The patches depend on Nick's truncate calling convention rewrite. The first
three patches in the patchset are just cleanups. The series converts ext4 and
ext2 filesystems just to give an idea how conversion of a filesystem will
look like.
  A few notes to the changes the main patch (patch number 4) does:
1) zeroing of tail of the last block now does not happen in writepage (which is
racy anyway as Nick pointed out) and foo_truncate_page but rather when i_size
is going to be extended.
2) writeback path does not care about i_size anymore, it uses buffer flags
instead. An exception is a nobh case where we have to use i_size. Thus
filesystems not using nobh code can update i_size in write_end without holding
page_lock.  Filesystems using nobh code still have to update i_size under the
page_lock since otherwise __mpage_writepage could come early, write just part
of the page, and clear all dirty bits, thus causing a data loss.
3) converted filesystems have to make sure that the buffers with valid data
to write are either mapped or delay before they call block_write_full_page.
The idea is that they should use page_mkwrite() to setup buffers.

  Both ext2 and ext4 have survived some beating with fsx-linux so they should
be at least moderately safe to use :). Any comments?
									Honza

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread
* [PATCH 5/7] ext4: Convert filesystem to the new truncate calling convention
@ 2009-09-22 17:42 Jan Kara
  2009-09-22 17:48 ` Jan Kara
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Kara @ 2009-09-22 17:42 UTC (permalink / raw)
  To: LKML; +Cc: npiggin, viro, linux-ext4, hch, Andrew Morton, Jan Kara, tytso

CC: linux-ext4@vger.kernel.org
CC: tytso@mit.edu
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/file.c  |    2 +-
 fs/ext4/inode.c |  166 ++++++++++++++++++++++++++++++++----------------------
 2 files changed, 99 insertions(+), 69 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 3f1873f..22f49d7 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -198,7 +198,7 @@ const struct file_operations ext4_file_operations = {
 };
 
 const struct inode_operations ext4_file_inode_operations = {
-	.truncate	= ext4_truncate,
+	.new_truncate	= 1,
 	.setattr	= ext4_setattr,
 	.getattr	= ext4_getattr,
 #ifdef CONFIG_EXT4_FS_XATTR
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 58492ab..be25874 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4682,28 +4686,97 @@ int ext4_write_inode(struct inode *inode, int wait)
 }
 
 /*
- * ext4_setattr()
+ * ext4_setsize()
+ *
+ * This is a helper for ext4_setattr(). It sets i_size, truncates page cache
+ * and truncates inode blocks if they are over i_size.
  *
- * Called from notify_change.
+ * We take care of updating i_disksize and adding inode to the orphan list.
+ * That makes sure that we can guarantee that any commit will leave the blocks
+ * being truncated in an unused state on disk.  (On recovery, the inode will
+ * get truncated and the blocks will be freed, so we have a strong guarantee
+ * that no future commit will leave these blocks visible to the user.)
  *
- * We want to trap VFS attempts to truncate the file as soon as
- * possible.  In particular, we want to make sure that when the VFS
- * shrinks i_size, we put the inode on the orphan list and modify
- * i_disksize immediately, so that during the subsequent flushing of
- * dirty pages and freeing of disk blocks, we can guarantee that any
- * commit will leave the blocks being flushed in an unused state on
- * disk.  (On recovery, the inode will get truncated and the blocks will
- * be freed, so we have a strong guarantee that no future commit will
- * leave these blocks visible to the user.)
+ * Another thing we have to assure is that if we are in ordered mode and inode
+ * is still attached to the committing transaction, we must we start writeout
+ * of all the dirty pages which are being truncated.  This way we are sure that
+ * all the data written in the previous transaction are already on disk
+ * (truncate waits for pages under writeback).
+ */
+static int ext4_setsize(struct inode *inode, loff_t newsize)
+{
+	int error = 0, rc;
+	loff_t oldsize = inode->i_size;
+	handle_t *handle;
+
+	error = inode_newsize_ok(inode, newsize);
+	if (error)
+		goto out;
+	/* VFS should have checked these and return error... */
+	WARN_ON(!S_ISREG(inode->i_mode) || IS_APPEND(inode) ||
+		IS_IMMUTABLE(inode));
+
+	if (newsize < oldsize) {
+		handle = ext4_journal_start(inode, 3);
+		if (IS_ERR(handle)) {
+			error = PTR_ERR(handle);
+			goto err_out;
+		}
+
+		error = ext4_orphan_add(handle, inode);
+		EXT4_I(inode)->i_disksize = newsize;
+		rc = ext4_mark_inode_dirty(handle, inode);
+		if (!error)
+			error = rc;
+		ext4_journal_stop(handle);
+
+		if (ext4_should_order_data(inode)) {
+			error = ext4_begin_ordered_truncate(inode, newsize);
+			if (error) {
+				/* Do as much error cleanup as possible */
+				handle = ext4_journal_start(inode, 3);
+				if (IS_ERR(handle)) {
+					ext4_orphan_del(NULL, inode);
+					goto err_out;
+				}
+				ext4_orphan_del(handle, inode);
+				ext4_journal_stop(handle);
+				goto err_out;
+			}
+		}
+	} else if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)) {
+		struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+
+		if (newsize > sbi->s_bitmap_maxbytes) {
+			error = -EFBIG;
+			goto out;
+		}
+	}
+
+	i_size_write(inode, newsize);
+	truncate_pagecache(inode, oldsize, newsize);
+	ext4_truncate(inode);
+
+	/*
+	 * If we failed to get a transaction handle at all, we need to clean up
+         * the in-core orphan list manually.
+	 */
+	if (inode->i_nlink)
+		ext4_orphan_del(NULL, inode);
+err_out:
+	ext4_std_error(inode->i_sb, error);
+out:
+	return error;
+}
+
+
+/*
+ * ext4_setattr()
  *
- * Another thing we have to assure is that if we are in ordered mode
- * and inode is still attached to the committing transaction, we must
- * we start writeout of all the dirty pages which are being truncated.
- * This way we are sure that all the data written in the previous
- * transaction are already on disk (truncate waits for pages under
- * writeback).
+ * Handle special things ext4 needs for changing owner of the file, changing
+ * ACLs, or truncating file.
  *
- * Called with inode->i_mutex down.
+ * Called from notify_change with inode->i_mutex down.
  */
 int ext4_setattr(struct dentry *dentry, struct iattr *attr)
 {
@@ -4743,61 +4816,18 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
 	}
 
 	if (attr->ia_valid & ATTR_SIZE) {
-		if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)) {
-			struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
-
-			if (attr->ia_size > sbi->s_bitmap_maxbytes) {
-				error = -EFBIG;
-				goto err_out;
-			}
-		}
-	}
-
-	if (S_ISREG(inode->i_mode) &&
-	    attr->ia_valid & ATTR_SIZE && attr->ia_size < inode->i_size) {
-		handle_t *handle;
-
-		handle = ext4_journal_start(inode, 3);
-		if (IS_ERR(handle)) {
-			error = PTR_ERR(handle);
-			goto err_out;
-		}
-
-		error = ext4_orphan_add(handle, inode);
-		EXT4_I(inode)->i_disksize = attr->ia_size;
-		rc = ext4_mark_inode_dirty(handle, inode);
-		if (!error)
-			error = rc;
-		ext4_journal_stop(handle);
-
-		if (ext4_should_order_data(inode)) {
-			error = ext4_begin_ordered_truncate(inode,
-							    attr->ia_size);
-			if (error) {
-				/* Do as much error cleanup as possible */
-				handle = ext4_journal_start(inode, 3);
-				if (IS_ERR(handle)) {
-					ext4_orphan_del(NULL, inode);
-					goto err_out;
-				}
-				ext4_orphan_del(handle, inode);
-				ext4_journal_stop(handle);
-				goto err_out;
-			}
-		}
+		error = ext4_setsize(inode, attr->ia_size);
+		if (error)
+			return error;
 	}
 
-	rc = inode_setattr(inode, attr);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-09-22 17:48 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-17 15:21 [RFC] [PATCH 0/7] Improve VFS to handle better mmaps when blocksize < pagesize (v3) Jan Kara
2009-09-17 15:21 ` [PATCH 1/7] fs: buffer_head writepage no invalidate Jan Kara
2009-09-17 15:21 ` [PATCH 2/7] fs: Remove zeroing from nobh_writepage Jan Kara
2009-09-17 15:21 ` [PATCH 3/7] ext4: Deprecate nobh mount option Jan Kara
2009-09-17 15:21 ` [PATCH 4/7] vfs: Add better VFS support for page_mkwrite when blocksize < pagesize Jan Kara
2009-09-17 15:21 ` [PATCH 5/7] ext4: Convert filesystem to the new truncate calling convention Jan Kara
2009-09-22 14:36   ` Al Viro
2009-09-22 17:16     ` Jan Kara
2009-09-22 17:23       ` Al Viro
2009-09-22 17:37         ` Jan Kara
2009-09-17 15:21 ` [PATCH 6/7] ext4: Convert ext4 to new mkwrite code Jan Kara
2009-09-17 15:21 ` [PATCH 7/7] ext2: Convert ext2 " Jan Kara
  -- strict thread matches above, loose matches on Subject: below --
2009-09-22 17:42 [PATCH 5/7] ext4: Convert filesystem to the new truncate calling convention Jan Kara
2009-09-22 17:48 ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).