public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: Linux Kernel Developers List <linux-kernel@vger.kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	"Theodore Ts'o" <tytso@mit.edu>
Subject: [PATCH 03/49] ext4: Mark the unwritten buffer_head as mapped during write_begin
Date: Mon,  8 Jun 2009 15:22:21 -0400	[thread overview]
Message-ID: <1244488987-32564-4-git-send-email-tytso@mit.edu> (raw)
In-Reply-To: <1244488987-32564-3-git-send-email-tytso@mit.edu>

From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Setting BH_Unwritten buffer_heads as BH_Mapped avoids multiple
(unnecessary) calls to get_block() during the call to the write(2)
system call.  Setting BH_Unwritten buffer heads as BH_Mapped requires
that the writepages() functions can handle BH_Unwritten buffer_heads.

After this commit, things work as follows:

ext4_ext_get_block() returns unmapped, unwritten, buffer head when
called with create = 0 for prealloc space. This makes sure we handle
the read path and non-delayed allocation case correctly.  Even though
the buffer head is marked unmapped we have valid b_blocknr and b_bdev
values in the buffer_head.

ext4_da_get_block_prep() called for block resrevation will now return
mapped, unwritten, new buffer_head for prealloc space. This avoids
multiple calls to get_block() for write to same offset. By making such
buffers as BH_New, we also assure that sub-block zeroing of buffered
writes happens correctly.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/ext4/extents.c |    4 +-
 fs/ext4/inode.c   |   82 +++++++++++++++++++++++++++++++++-------------------
 2 files changed, 54 insertions(+), 32 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index a953214..ea5c476 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2872,6 +2872,8 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
 			if (create == EXT4_CREATE_UNINITIALIZED_EXT)
 				goto out;
 			if (!create) {
+				if (allocated > max_blocks)
+					allocated = max_blocks;
 				/*
 				 * We have blocks reserved already.  We
 				 * return allocated blocks so that delalloc
@@ -2879,8 +2881,6 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
 				 * the buffer head will be unmapped so that
 				 * a read from the block returns 0s.
 				 */
-				if (allocated > max_blocks)
-					allocated = max_blocks;
 				set_buffer_unwritten(bh_result);
 				bh_result->b_bdev = inode->i_sb->s_bdev;
 				bh_result->b_blocknr = newblock;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d7ad0bb..96f3366 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1852,7 +1852,7 @@ static int mpage_da_submit_io(struct mpage_da_data *mpd)
  * @logical - first logical block to start assignment with
  *
  * the function goes through all passed space and put actual disk
- * block numbers into buffer heads, dropping BH_Delay
+ * block numbers into buffer heads, dropping BH_Delay and BH_Unwritten
  */
 static void mpage_put_bnr_to_bhs(struct mpage_da_data *mpd, sector_t logical,
 				 struct buffer_head *exbh)
@@ -1902,16 +1902,24 @@ static void mpage_put_bnr_to_bhs(struct mpage_da_data *mpd, sector_t logical,
 			do {
 				if (cur_logical >= logical + blocks)
 					break;
-				if (buffer_delay(bh)) {
-					bh->b_blocknr = pblock;
-					clear_buffer_delay(bh);
-					bh->b_bdev = inode->i_sb->s_bdev;
-				} else if (buffer_unwritten(bh)) {
-					bh->b_blocknr = pblock;
-					clear_buffer_unwritten(bh);
-					set_buffer_mapped(bh);
-					set_buffer_new(bh);
-					bh->b_bdev = inode->i_sb->s_bdev;
+
+				if (buffer_delay(bh) ||
+						buffer_unwritten(bh)) {
+
+					BUG_ON(bh->b_bdev != inode->i_sb->s_bdev);
+
+					if (buffer_delay(bh)) {
+						clear_buffer_delay(bh);
+						bh->b_blocknr = pblock;
+					} else {
+						/*
+						 * unwritten already should have
+						 * blocknr assigned. Verify that
+						 */
+						clear_buffer_unwritten(bh);
+						BUG_ON(bh->b_blocknr != pblock);
+					}
+
 				} else if (buffer_mapped(bh))
 					BUG_ON(bh->b_blocknr != pblock);
 
@@ -2053,7 +2061,8 @@ static int mpage_da_map_blocks(struct mpage_da_data *mpd)
 	 * We consider only non-mapped and non-allocated blocks
 	 */
 	if ((mpd->b_state  & (1 << BH_Mapped)) &&
-	    !(mpd->b_state & (1 << BH_Delay)))
+		!(mpd->b_state & (1 << BH_Delay)) &&
+		!(mpd->b_state & (1 << BH_Unwritten)))
 		return 0;
 	/*
 	 * We need to make sure the BH_Delay flag is passed down to
@@ -2205,6 +2214,17 @@ flush_it:
 	return;
 }
 
+static int ext4_bh_unmapped_or_delay(handle_t *handle, struct buffer_head *bh)
+{
+	/*
+	 * unmapped buffer is possible for holes.
+	 * delay buffer is possible with delayed allocation.
+	 * We also need to consider unwritten buffer as unmapped.
+	 */
+	return (!buffer_mapped(bh) || buffer_delay(bh) ||
+				buffer_unwritten(bh)) && buffer_dirty(bh);
+}
+
 /*
  * __mpage_da_writepage - finds extent of pages and blocks
  *
@@ -2289,8 +2309,7 @@ static int __mpage_da_writepage(struct page *page,
 			 * Otherwise we won't make progress
 			 * with the page in ext4_da_writepage
 			 */
-			if (buffer_dirty(bh) &&
-			    (!buffer_mapped(bh) || buffer_delay(bh))) {
+			if (ext4_bh_unmapped_or_delay(NULL, bh)) {
 				mpage_add_bh_to_extent(mpd, logical,
 						       bh->b_size,
 						       bh->b_state);
@@ -2318,6 +2337,14 @@ static int __mpage_da_writepage(struct page *page,
 /*
  * this is a special callback for ->write_begin() only
  * it's intention is to return mapped block or reserve space
+ *
+ * For delayed buffer_head we have BH_Mapped, BH_New, BH_Delay set.
+ * We also have b_blocknr = -1 and b_bdev initialized properly
+ *
+ * For unwritten buffer_head we have BH_Mapped, BH_New, BH_Unwritten set.
+ * We also have b_blocknr = physicalblock mapping unwritten extent and b_bdev
+ * initialized properly.
+ *
  */
 static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
 				  struct buffer_head *bh_result, int create)
@@ -2353,28 +2380,23 @@ static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
 		set_buffer_delay(bh_result);
 	} else if (ret > 0) {
 		bh_result->b_size = (ret << inode->i_blkbits);
-		/*
-		 * With sub-block writes into unwritten extents
-		 * we also need to mark the buffer as new so that
-		 * the unwritten parts of the buffer gets correctly zeroed.
-		 */
-		if (buffer_unwritten(bh_result))
+		if (buffer_unwritten(bh_result)) {
+			/* A delayed write to unwritten bh should
+			 * be marked new and mapped.  Mapped ensures
+			 * that we don't do get_block multiple times
+			 * when we write to the same offset and new
+			 * ensures that we do proper zero out for
+			 * partial write.
+			 */
 			set_buffer_new(bh_result);
+			set_buffer_mapped(bh_result);
+		}
 		ret = 0;
 	}
 
 	return ret;
 }
 
-static int ext4_bh_unmapped_or_delay(handle_t *handle, struct buffer_head *bh)
-{
-	/*
-	 * unmapped buffer is possible for holes.
-	 * delay buffer is possible with delayed allocation
-	 */
-	return ((!buffer_mapped(bh) || buffer_delay(bh)) && buffer_dirty(bh));
-}
-
 static int ext4_normal_get_block_write(struct inode *inode, sector_t iblock,
 				   struct buffer_head *bh_result, int create)
 {
@@ -2828,7 +2850,7 @@ static int ext4_da_should_update_i_disksize(struct page *page,
 	for (i = 0; i < idx; i++)
 		bh = bh->b_this_page;
 
-	if (!buffer_mapped(bh) || (buffer_delay(bh)))
+	if (!buffer_mapped(bh) || (buffer_delay(bh)) || buffer_unwritten(bh))
 		return 0;
 	return 1;
 }
-- 
1.6.3.2.1.gb9f7d.dirty


  reply	other threads:[~2009-06-08 19:25 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-08 19:22 [PATCH 00/49] Ext4 patches currently queued for mainline Theodore Ts'o
2009-06-08 19:22 ` [PATCH 01/49] ext4: Properly initialize the buffer_head state Theodore Ts'o
2009-06-08 19:22   ` [PATCH 02/49] vfs: Add BUG_ON for delayed and unwritten flags in submit_bh() Theodore Ts'o
2009-06-08 19:22     ` Theodore Ts'o [this message]
2009-06-08 19:22       ` [PATCH 04/49] ext4: Fallback to vmalloc if kmalloc can't allocate s_flex_groups array Theodore Ts'o
2009-06-08 19:22         ` [PATCH 05/49] ext4: Use is_power_of_2() for clarity Theodore Ts'o
2009-06-08 19:22           ` [PATCH 06/49] ext3: avoid unnecessary spinlock in critical POSIX ACL path Theodore Ts'o
2009-06-08 19:22             ` [PATCH 07/49] ext4: " Theodore Ts'o
2009-06-08 19:22               ` [PATCH 08/49] ext4: Simplify ext4_commit_super()'s function signature Theodore Ts'o
2009-06-08 19:22                 ` [PATCH 09/49] ext4: Fix and simplify s_dirt handling Theodore Ts'o
2009-06-08 19:22                   ` [PATCH 10/49] ext4: Use separate super_operations structure for no_journal filesystems Theodore Ts'o
2009-06-08 19:22                     ` [PATCH 11/49] ext4: Avoid races caused by on-line resizing and SMP memory reordering Theodore Ts'o
2009-06-08 19:22                       ` [PATCH 12/49] ext4: Remove outdated comment about lock_super() Theodore Ts'o
2009-06-08 19:22                         ` [PATCH 13/49] ext4: ext4_mark_recovery_complete() doesn't need to use lock_super Theodore Ts'o
2009-06-08 19:22                           ` [PATCH 14/49] ext4: Replace lock/unlock_super() with an explicit lock for the orphan list Theodore Ts'o
2009-06-08 19:22                             ` [PATCH 15/49] ext4: Replace lock/unlock_super() with an explicit lock for resizing Theodore Ts'o
2009-06-08 19:22                               ` [PATCH 16/49] ext4: Don't avoid using BLOCK_UNINIT block groups in mballoc Theodore Ts'o
2009-06-08 19:22                                 ` [PATCH 17/49] ext4: Move the ext4_i.h header file into ext4.h Theodore Ts'o
2009-06-08 19:22                                   ` [PATCH 18/49] ext4: Move the ext4_sb.h " Theodore Ts'o
2009-06-08 19:22                                     ` [PATCH 19/49] ext4: Move fs/ext4/namei.h " Theodore Ts'o
2009-06-08 19:22                                       ` [PATCH 20/49] ext4: Move fs/ext4/group.h " Theodore Ts'o
2009-06-08 19:22                                         ` [PATCH 21/49] ext4: Make the length of the mb_history file tunable Theodore Ts'o
2009-06-08 19:22                                           ` [PATCH 22/49] ext4: hook fiemap operation for directories Theodore Ts'o
2009-06-08 19:22                                             ` [PATCH 23/49] vfs: Enable FS_IOC_FIEMAP and FIGETBSZ for all filetypes Theodore Ts'o
2009-06-08 19:22                                               ` [PATCH 24/49] ext4: fix for fiemap last-block test Theodore Ts'o
2009-06-08 19:22                                                 ` [PATCH 25/49] ext4: fix the length returned by fiemap for an unallocated extent Theodore Ts'o
2009-06-08 19:22                                                   ` [PATCH 26/49] ext4: Convert ext4_lock_group to use sb_bgl_lock Theodore Ts'o
2009-06-08 19:22                                                     ` [PATCH 27/49] ext4: Fix spinlock assertions on UP systems Theodore Ts'o
2009-06-08 19:22                                                       ` [PATCH 28/49] ext4: Simplify function signature for ext4_da_get_block_write() Theodore Ts'o
2009-06-08 19:22                                                         ` [PATCH 29/49] ext4: Rename ext4_get_blocks_handle() to be ext4_ind_get_blocks() Theodore Ts'o
2009-06-08 19:22                                                           ` [PATCH 30/49] ext4: Rename ext4_get_blocks_wrap() to be ext4_get_blocks() Theodore Ts'o
2009-06-08 19:22                                                             ` [PATCH 31/49] ext4: Define a new set of flags for ext4_get_blocks() Theodore Ts'o
2009-06-08 19:22                                                               ` [PATCH 32/49] ext4: Add documentation to the ext4_*get_block* functions Theodore Ts'o
2009-06-08 19:22                                                                 ` [PATCH 33/49] ext4: Add BUG_ON debugging checks to noalloc_get_block_write() Theodore Ts'o
2009-06-08 19:22                                                                   ` [PATCH 34/49] ext4: Merge ext4_da_get_block_write() into mpage_da_map_blocks() Theodore Ts'o
2009-06-08 19:22                                                                     ` [PATCH 35/49] ext4: Clean up ext4_get_blocks() so it does not depend on bh_result->b_state Theodore Ts'o
2009-06-08 19:22                                                                       ` [PATCH 36/49] ext4: Add a comprehensive block validity check to ext4_get_blocks() Theodore Ts'o
2009-06-08 19:22                                                                         ` [PATCH 37/49] ext4: down i_data_sem only for read when walking tree for fiemap Theodore Ts'o
2009-06-08 19:22                                                                           ` [PATCH 38/49] ext4: Fix memory leak in ext4_fill_super() in case of a failed mount Theodore Ts'o
2009-06-08 19:22                                                                             ` [PATCH 39/49] ext3: Fix memory leak in ext3_fill_super() " Theodore Ts'o
2009-06-08 19:22                                                                               ` [PATCH 40/49] ext2: Fix memory leak in ext2_fill_super() " Theodore Ts'o
2009-06-08 19:22                                                                                 ` [PATCH 41/49] ext4: remove unused function __ext4_write_dirty_metadata Theodore Ts'o
2009-06-08 19:23                                                                                   ` [PATCH 42/49] ext4: Clean up calls to ext4_get_group_desc() Theodore Ts'o
2009-06-08 19:23                                                                                     ` [PATCH 43/49] jbd2: Fix minor typos in comments in fs/jbd2/journal.c Theodore Ts'o
2009-06-08 19:23                                                                                       ` [PATCH 44/49] ext4: super.c whitespace cleanup Theodore Ts'o
2009-06-08 19:23                                                                                         ` [PATCH 45/49] ext4: Get rid of EXTEND_DISKSIZE flag of ext4_get_blocks_handle() Theodore Ts'o
2009-06-08 19:23                                                                                           ` [PATCH 46/49] ext4: Change all super.c messages to print the device Theodore Ts'o
2009-06-08 19:23                                                                                             ` [PATCH 47/49] ext4: Avoid leaking blocks after a block allocation failure Theodore Ts'o
2009-06-08 19:23                                                                                               ` [PATCH 48/49] ext4: truncate the file properly if we fail to copy data from userspace Theodore Ts'o
2009-06-08 19:23                                                                                                 ` [PATCH 49/49] ext4: fix dx_map_entry to support 256k directory blocks Theodore Ts'o
2009-06-08 19:41                                                                                       ` [PATCH 43/49] jbd2: Fix minor typos in comments in fs/jbd2/journal.c Alberto Bertogli
2009-06-09  4:06                                                                                         ` Theodore Tso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1244488987-32564-4-git-send-email-tytso@mit.edu \
    --to=tytso@mit.edu \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox