linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RESEND 1/2] ext4: Add inode to the orphan list during block allocation failure
@ 2009-05-04  9:13 Aneesh Kumar K.V
  2009-05-04  9:13 ` [PATCH RESEND 2/2] ext4: truncate the file properly if we fail to copy data from userspace Aneesh Kumar K.V
  2009-05-11 11:45 ` [PATCH RESEND 1/2] ext4: Add inode to the orphan list during block allocation failure Theodore Tso
  0 siblings, 2 replies; 4+ messages in thread
From: Aneesh Kumar K.V @ 2009-05-04  9:13 UTC (permalink / raw)
  To: cmm, tytso, sandeen; +Cc: linux-ext4, Aneesh Kumar K.V, Jan Kara

We should add inode to the orphan list in the same transaction
as block allocation. This ensures that if we crash after a failed
block allocation and before we do a vmtruncate we don't leak block
(ie block marked as used in bitmap but not claimed by the inode).

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
CC:  Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c |   15 +++++++++++++--
 1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3e90965..28e95ee 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1426,7 +1426,7 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
 				struct page **pagep, void **fsdata)
 {
 	struct inode *inode = mapping->host;
-	int ret, needed_blocks = ext4_writepage_trans_blocks(inode);
+	int ret, needed_blocks;
 	handle_t *handle;
 	int retries = 0;
 	struct page *page;
@@ -1437,6 +1437,11 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
 		   "dev %s ino %lu pos %llu len %u flags %u",
 		   inode->i_sb->s_id, inode->i_ino,
 		   (unsigned long long) pos, len, flags);
+	/*
+	 * Reserve one block more for addition to orphan list in case
+	 * we allocate blocks but write fails for some reason
+	 */
+	needed_blocks = ext4_writepage_trans_blocks(inode) + 1;
  	index = pos >> PAGE_CACHE_SHIFT;
 	from = pos & (PAGE_CACHE_SIZE - 1);
 	to = from + len;
@@ -1470,14 +1475,20 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
 
 	if (ret) {
 		unlock_page(page);
-		ext4_journal_stop(handle);
 		page_cache_release(page);
 		/*
 		 * block_write_begin may have instantiated a few blocks
 		 * outside i_size.  Trim these off again. Don't need
 		 * i_size_read because we hold i_mutex.
+		 *
+		 * Add inode to orphan list in case we crash before
+		 * truncate finishes
 		 */
 		if (pos + len > inode->i_size)
+			ext4_orphan_add(handle, inode);
+
+		ext4_journal_stop(handle);
+		if (pos + len > inode->i_size)
 			vmtruncate(inode, inode->i_size);
 	}
 
-- 
1.6.3.rc4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH RESEND 2/2] ext4: truncate the file properly if we fail to copy data from userspace
  2009-05-04  9:13 [PATCH RESEND 1/2] ext4: Add inode to the orphan list during block allocation failure Aneesh Kumar K.V
@ 2009-05-04  9:13 ` Aneesh Kumar K.V
  2009-05-04 11:23   ` Jan Kara
  2009-05-11 11:45 ` [PATCH RESEND 1/2] ext4: Add inode to the orphan list during block allocation failure Theodore Tso
  1 sibling, 1 reply; 4+ messages in thread
From: Aneesh Kumar K.V @ 2009-05-04  9:13 UTC (permalink / raw)
  To: cmm, tytso, sandeen; +Cc: linux-ext4, Aneesh Kumar K.V, Jan Kara

In generic_perform_write if we fail to copy the user data we don't
update the inode->i_size. We should truncate the file in the above case
so that we don't have blocks allocated outside inode->i_size. Add
the inode to orphan list in the same transaction as block allocation
This ensures that if we crash in between the recovery would do the truncate.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
CC:  Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c |  103 +++++++++++++++++++++++++++++++++++++++++--------------
 1 files changed, 77 insertions(+), 26 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 28e95ee..43884e3 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1507,6 +1507,52 @@ static int write_end_fn(handle_t *handle, struct buffer_head *bh)
 	return ext4_handle_dirty_metadata(handle, NULL, bh);
 }
 
+static int ext4_generic_write_end(struct file *file,
+				struct address_space *mapping,
+				loff_t pos, unsigned len, unsigned copied,
+				struct page *page, void *fsdata)
+{
+	int i_size_changed = 0;
+	struct inode *inode = mapping->host;
+	handle_t *handle = ext4_journal_current_handle();
+
+	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
+
+	/*
+	 * No need to use i_size_read() here, the i_size
+	 * cannot change under us because we hold i_mutex.
+	 *
+	 * But it's important to update i_size while still holding page lock:
+	 * page writeout could otherwise come in and zero beyond i_size.
+	 */
+	if (pos + copied > inode->i_size) {
+		i_size_write(inode, pos + copied);
+		i_size_changed = 1;
+	}
+
+	if (pos + copied >  EXT4_I(inode)->i_disksize) {
+		/* We need to mark inode dirty even if
+		 * new_i_size is less that inode->i_size
+		 * bu greater than i_disksize.(hint delalloc)
+		 */
+		ext4_update_i_disksize(inode, (pos + copied));
+		i_size_changed = 1;
+	}
+	unlock_page(page);
+	page_cache_release(page);
+
+	/*
+	 * Don't mark the inode dirty under page lock. First, it unnecessarily
+	 * makes the holding time of page lock longer. Second, it forces lock
+	 * ordering of page lock and transaction start for journaling
+	 * filesystems.
+	 */
+	if (i_size_changed)
+		ext4_mark_inode_dirty(handle, inode);
+
+	return copied;
+}
+
 /*
  * We need to pick up the new inode size which generic_commit_write gave us
  * `file' can be NULL - eg, when called from page_symlink().
@@ -1530,21 +1576,15 @@ static int ext4_ordered_write_end(struct file *file,
 	ret = ext4_jbd2_file_inode(handle, inode);
 
 	if (ret == 0) {
-		loff_t new_i_size;
-
-		new_i_size = pos + copied;
-		if (new_i_size > EXT4_I(inode)->i_disksize) {
-			ext4_update_i_disksize(inode, new_i_size);
-			/* We need to mark inode dirty even if
-			 * new_i_size is less that inode->i_size
-			 * bu greater than i_disksize.(hint delalloc)
-			 */
-			ext4_mark_inode_dirty(handle, inode);
-		}
-
-		ret2 = generic_write_end(file, mapping, pos, len, copied,
+		ret2 = ext4_generic_write_end(file, mapping, pos, len, copied,
 							page, fsdata);
 		copied = ret2;
+		if (pos + len > inode->i_size)
+			/* if we have allocated more blocks and copied
+			 * less. We will have blocks allocated outside
+			 * inode->i_size. So truncate them
+			 */
+			ext4_orphan_add(handle, inode);
 		if (ret2 < 0)
 			ret = ret2;
 	}
@@ -1552,6 +1592,9 @@ static int ext4_ordered_write_end(struct file *file,
 	if (!ret)
 		ret = ret2;
 
+	if (pos + len > inode->i_size)
+		vmtruncate(inode, inode->i_size);
+
 	return ret ? ret : copied;
 }
 
@@ -1563,25 +1606,21 @@ static int ext4_writeback_write_end(struct file *file,
 	handle_t *handle = ext4_journal_current_handle();
 	struct inode *inode = mapping->host;
 	int ret = 0, ret2;
-	loff_t new_i_size;
 
 	trace_mark(ext4_writeback_write_end,
 		   "dev %s ino %lu pos %llu len %u copied %u",
 		   inode->i_sb->s_id, inode->i_ino,
 		   (unsigned long long) pos, len, copied);
-	new_i_size = pos + copied;
-	if (new_i_size > EXT4_I(inode)->i_disksize) {
-		ext4_update_i_disksize(inode, new_i_size);
-		/* We need to mark inode dirty even if
-		 * new_i_size is less that inode->i_size
-		 * bu greater than i_disksize.(hint delalloc)
-		 */
-		ext4_mark_inode_dirty(handle, inode);
-	}
-
-	ret2 = generic_write_end(file, mapping, pos, len, copied,
+	ret2 = ext4_generic_write_end(file, mapping, pos, len, copied,
 							page, fsdata);
 	copied = ret2;
+	if (pos + len > inode->i_size)
+		/* if we have allocated more blocks and copied
+		 * less. We will have blocks allocated outside
+		 * inode->i_size. So truncate them
+		 */
+		ext4_orphan_add(handle, inode);
+
 	if (ret2 < 0)
 		ret = ret2;
 
@@ -1589,6 +1628,9 @@ static int ext4_writeback_write_end(struct file *file,
 	if (!ret)
 		ret = ret2;
 
+	if (pos + len > inode->i_size)
+		vmtruncate(inode, inode->i_size);
+
 	return ret ? ret : copied;
 }
 
@@ -1633,10 +1675,19 @@ static int ext4_journalled_write_end(struct file *file,
 	}
 
 	unlock_page(page);
+	page_cache_release(page);
+	if (pos + len > inode->i_size)
+		/* if we have allocated more blocks and copied
+		 * less. We will have blocks allocated outside
+		 * inode->i_size. So truncate them
+		 */
+		ext4_orphan_add(handle, inode);
+
 	ret2 = ext4_journal_stop(handle);
 	if (!ret)
 		ret = ret2;
-	page_cache_release(page);
+	if (pos + len > inode->i_size)
+		vmtruncate(inode, inode->i_size);
 
 	return ret ? ret : copied;
 }
-- 
1.6.3.rc4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH RESEND 2/2] ext4: truncate the file properly if we fail to copy data from userspace
  2009-05-04  9:13 ` [PATCH RESEND 2/2] ext4: truncate the file properly if we fail to copy data from userspace Aneesh Kumar K.V
@ 2009-05-04 11:23   ` Jan Kara
  0 siblings, 0 replies; 4+ messages in thread
From: Jan Kara @ 2009-05-04 11:23 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: cmm, tytso, sandeen, linux-ext4, Jan Kara

On Mon 04-05-09 14:43:01, Aneesh Kumar K.V wrote:
> In generic_perform_write if we fail to copy the user data we don't
> update the inode->i_size. We should truncate the file in the above case
> so that we don't have blocks allocated outside inode->i_size. Add
> the inode to orphan list in the same transaction as block allocation
> This ensures that if we crash in between the recovery would do the truncate.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> CC:  Jan Kara <jack@suse.cz>
  This patch looks fine as well.
  Acked-by: Jan Kara <jack@suse.cz>

									Honza

> ---
>  fs/ext4/inode.c |  103 +++++++++++++++++++++++++++++++++++++++++--------------
>  1 files changed, 77 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 28e95ee..43884e3 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1507,6 +1507,52 @@ static int write_end_fn(handle_t *handle, struct buffer_head *bh)
>  	return ext4_handle_dirty_metadata(handle, NULL, bh);
>  }
>  
> +static int ext4_generic_write_end(struct file *file,
> +				struct address_space *mapping,
> +				loff_t pos, unsigned len, unsigned copied,
> +				struct page *page, void *fsdata)
> +{
> +	int i_size_changed = 0;
> +	struct inode *inode = mapping->host;
> +	handle_t *handle = ext4_journal_current_handle();
> +
> +	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
> +
> +	/*
> +	 * No need to use i_size_read() here, the i_size
> +	 * cannot change under us because we hold i_mutex.
> +	 *
> +	 * But it's important to update i_size while still holding page lock:
> +	 * page writeout could otherwise come in and zero beyond i_size.
> +	 */
> +	if (pos + copied > inode->i_size) {
> +		i_size_write(inode, pos + copied);
> +		i_size_changed = 1;
> +	}
> +
> +	if (pos + copied >  EXT4_I(inode)->i_disksize) {
> +		/* We need to mark inode dirty even if
> +		 * new_i_size is less that inode->i_size
> +		 * bu greater than i_disksize.(hint delalloc)
> +		 */
> +		ext4_update_i_disksize(inode, (pos + copied));
> +		i_size_changed = 1;
> +	}
> +	unlock_page(page);
> +	page_cache_release(page);
> +
> +	/*
> +	 * Don't mark the inode dirty under page lock. First, it unnecessarily
> +	 * makes the holding time of page lock longer. Second, it forces lock
> +	 * ordering of page lock and transaction start for journaling
> +	 * filesystems.
> +	 */
> +	if (i_size_changed)
> +		ext4_mark_inode_dirty(handle, inode);
> +
> +	return copied;
> +}
> +
>  /*
>   * We need to pick up the new inode size which generic_commit_write gave us
>   * `file' can be NULL - eg, when called from page_symlink().
> @@ -1530,21 +1576,15 @@ static int ext4_ordered_write_end(struct file *file,
>  	ret = ext4_jbd2_file_inode(handle, inode);
>  
>  	if (ret == 0) {
> -		loff_t new_i_size;
> -
> -		new_i_size = pos + copied;
> -		if (new_i_size > EXT4_I(inode)->i_disksize) {
> -			ext4_update_i_disksize(inode, new_i_size);
> -			/* We need to mark inode dirty even if
> -			 * new_i_size is less that inode->i_size
> -			 * bu greater than i_disksize.(hint delalloc)
> -			 */
> -			ext4_mark_inode_dirty(handle, inode);
> -		}
> -
> -		ret2 = generic_write_end(file, mapping, pos, len, copied,
> +		ret2 = ext4_generic_write_end(file, mapping, pos, len, copied,
>  							page, fsdata);
>  		copied = ret2;
> +		if (pos + len > inode->i_size)
> +			/* if we have allocated more blocks and copied
> +			 * less. We will have blocks allocated outside
> +			 * inode->i_size. So truncate them
> +			 */
> +			ext4_orphan_add(handle, inode);
>  		if (ret2 < 0)
>  			ret = ret2;
>  	}
> @@ -1552,6 +1592,9 @@ static int ext4_ordered_write_end(struct file *file,
>  	if (!ret)
>  		ret = ret2;
>  
> +	if (pos + len > inode->i_size)
> +		vmtruncate(inode, inode->i_size);
> +
>  	return ret ? ret : copied;
>  }
>  
> @@ -1563,25 +1606,21 @@ static int ext4_writeback_write_end(struct file *file,
>  	handle_t *handle = ext4_journal_current_handle();
>  	struct inode *inode = mapping->host;
>  	int ret = 0, ret2;
> -	loff_t new_i_size;
>  
>  	trace_mark(ext4_writeback_write_end,
>  		   "dev %s ino %lu pos %llu len %u copied %u",
>  		   inode->i_sb->s_id, inode->i_ino,
>  		   (unsigned long long) pos, len, copied);
> -	new_i_size = pos + copied;
> -	if (new_i_size > EXT4_I(inode)->i_disksize) {
> -		ext4_update_i_disksize(inode, new_i_size);
> -		/* We need to mark inode dirty even if
> -		 * new_i_size is less that inode->i_size
> -		 * bu greater than i_disksize.(hint delalloc)
> -		 */
> -		ext4_mark_inode_dirty(handle, inode);
> -	}
> -
> -	ret2 = generic_write_end(file, mapping, pos, len, copied,
> +	ret2 = ext4_generic_write_end(file, mapping, pos, len, copied,
>  							page, fsdata);
>  	copied = ret2;
> +	if (pos + len > inode->i_size)
> +		/* if we have allocated more blocks and copied
> +		 * less. We will have blocks allocated outside
> +		 * inode->i_size. So truncate them
> +		 */
> +		ext4_orphan_add(handle, inode);
> +
>  	if (ret2 < 0)
>  		ret = ret2;
>  
> @@ -1589,6 +1628,9 @@ static int ext4_writeback_write_end(struct file *file,
>  	if (!ret)
>  		ret = ret2;
>  
> +	if (pos + len > inode->i_size)
> +		vmtruncate(inode, inode->i_size);
> +
>  	return ret ? ret : copied;
>  }
>  
> @@ -1633,10 +1675,19 @@ static int ext4_journalled_write_end(struct file *file,
>  	}
>  
>  	unlock_page(page);
> +	page_cache_release(page);
> +	if (pos + len > inode->i_size)
> +		/* if we have allocated more blocks and copied
> +		 * less. We will have blocks allocated outside
> +		 * inode->i_size. So truncate them
> +		 */
> +		ext4_orphan_add(handle, inode);
> +
>  	ret2 = ext4_journal_stop(handle);
>  	if (!ret)
>  		ret = ret2;
> -	page_cache_release(page);
> +	if (pos + len > inode->i_size)
> +		vmtruncate(inode, inode->i_size);
>  
>  	return ret ? ret : copied;
>  }
> -- 
> 1.6.3.rc4
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH RESEND 1/2] ext4: Add inode to the orphan list during block allocation failure
  2009-05-04  9:13 [PATCH RESEND 1/2] ext4: Add inode to the orphan list during block allocation failure Aneesh Kumar K.V
  2009-05-04  9:13 ` [PATCH RESEND 2/2] ext4: truncate the file properly if we fail to copy data from userspace Aneesh Kumar K.V
@ 2009-05-11 11:45 ` Theodore Tso
  1 sibling, 0 replies; 4+ messages in thread
From: Theodore Tso @ 2009-05-11 11:45 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: cmm, sandeen, linux-ext4, Jan Kara

Did you see the comments I made to the original version of your patch
(e-mail dated April 6, 2008).

The comments can be easily found in the ext4 patchwork website:

    http://patchwork.ozlabs.org/patch/25380/
    http://patchwork.ozlabs.org/patch/25381/

My concern is about stray inodes on the orphan list causing a panic on
umount, and it looks like these patches increases the exposure of this
happening.

						- Ted

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-05-11 11:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-04  9:13 [PATCH RESEND 1/2] ext4: Add inode to the orphan list during block allocation failure Aneesh Kumar K.V
2009-05-04  9:13 ` [PATCH RESEND 2/2] ext4: truncate the file properly if we fail to copy data from userspace Aneesh Kumar K.V
2009-05-04 11:23   ` Jan Kara
2009-05-11 11:45 ` [PATCH RESEND 1/2] ext4: Add inode to the orphan list during block allocation failure Theodore Tso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).