From: "Theodore Ts'o" <tytso@mit.edu>
To: Linux Kernel Developers List <linux-kernel@vger.kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
"Theodore Ts'o" <tytso@mit.edu>
Subject: [PATCH 03/49] ext4: Mark the unwritten buffer_head as mapped during write_begin
Date: Mon, 8 Jun 2009 15:22:21 -0400 [thread overview]
Message-ID: <1244488987-32564-4-git-send-email-tytso@mit.edu> (raw)
In-Reply-To: <1244488987-32564-3-git-send-email-tytso@mit.edu>
From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Setting BH_Unwritten buffer_heads as BH_Mapped avoids multiple
(unnecessary) calls to get_block() during the call to the write(2)
system call. Setting BH_Unwritten buffer heads as BH_Mapped requires
that the writepages() functions can handle BH_Unwritten buffer_heads.
After this commit, things work as follows:
ext4_ext_get_block() returns unmapped, unwritten, buffer head when
called with create = 0 for prealloc space. This makes sure we handle
the read path and non-delayed allocation case correctly. Even though
the buffer head is marked unmapped we have valid b_blocknr and b_bdev
values in the buffer_head.
ext4_da_get_block_prep() called for block resrevation will now return
mapped, unwritten, new buffer_head for prealloc space. This avoids
multiple calls to get_block() for write to same offset. By making such
buffers as BH_New, we also assure that sub-block zeroing of buffered
writes happens correctly.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
fs/ext4/extents.c | 4 +-
fs/ext4/inode.c | 82 +++++++++++++++++++++++++++++++++-------------------
2 files changed, 54 insertions(+), 32 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index a953214..ea5c476 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2872,6 +2872,8 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
if (create == EXT4_CREATE_UNINITIALIZED_EXT)
goto out;
if (!create) {
+ if (allocated > max_blocks)
+ allocated = max_blocks;
/*
* We have blocks reserved already. We
* return allocated blocks so that delalloc
@@ -2879,8 +2881,6 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
* the buffer head will be unmapped so that
* a read from the block returns 0s.
*/
- if (allocated > max_blocks)
- allocated = max_blocks;
set_buffer_unwritten(bh_result);
bh_result->b_bdev = inode->i_sb->s_bdev;
bh_result->b_blocknr = newblock;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d7ad0bb..96f3366 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1852,7 +1852,7 @@ static int mpage_da_submit_io(struct mpage_da_data *mpd)
* @logical - first logical block to start assignment with
*
* the function goes through all passed space and put actual disk
- * block numbers into buffer heads, dropping BH_Delay
+ * block numbers into buffer heads, dropping BH_Delay and BH_Unwritten
*/
static void mpage_put_bnr_to_bhs(struct mpage_da_data *mpd, sector_t logical,
struct buffer_head *exbh)
@@ -1902,16 +1902,24 @@ static void mpage_put_bnr_to_bhs(struct mpage_da_data *mpd, sector_t logical,
do {
if (cur_logical >= logical + blocks)
break;
- if (buffer_delay(bh)) {
- bh->b_blocknr = pblock;
- clear_buffer_delay(bh);
- bh->b_bdev = inode->i_sb->s_bdev;
- } else if (buffer_unwritten(bh)) {
- bh->b_blocknr = pblock;
- clear_buffer_unwritten(bh);
- set_buffer_mapped(bh);
- set_buffer_new(bh);
- bh->b_bdev = inode->i_sb->s_bdev;
+
+ if (buffer_delay(bh) ||
+ buffer_unwritten(bh)) {
+
+ BUG_ON(bh->b_bdev != inode->i_sb->s_bdev);
+
+ if (buffer_delay(bh)) {
+ clear_buffer_delay(bh);
+ bh->b_blocknr = pblock;
+ } else {
+ /*
+ * unwritten already should have
+ * blocknr assigned. Verify that
+ */
+ clear_buffer_unwritten(bh);
+ BUG_ON(bh->b_blocknr != pblock);
+ }
+
} else if (buffer_mapped(bh))
BUG_ON(bh->b_blocknr != pblock);
@@ -2053,7 +2061,8 @@ static int mpage_da_map_blocks(struct mpage_da_data *mpd)
* We consider only non-mapped and non-allocated blocks
*/
if ((mpd->b_state & (1 << BH_Mapped)) &&
- !(mpd->b_state & (1 << BH_Delay)))
+ !(mpd->b_state & (1 << BH_Delay)) &&
+ !(mpd->b_state & (1 << BH_Unwritten)))
return 0;
/*
* We need to make sure the BH_Delay flag is passed down to
@@ -2205,6 +2214,17 @@ flush_it:
return;
}
+static int ext4_bh_unmapped_or_delay(handle_t *handle, struct buffer_head *bh)
+{
+ /*
+ * unmapped buffer is possible for holes.
+ * delay buffer is possible with delayed allocation.
+ * We also need to consider unwritten buffer as unmapped.
+ */
+ return (!buffer_mapped(bh) || buffer_delay(bh) ||
+ buffer_unwritten(bh)) && buffer_dirty(bh);
+}
+
/*
* __mpage_da_writepage - finds extent of pages and blocks
*
@@ -2289,8 +2309,7 @@ static int __mpage_da_writepage(struct page *page,
* Otherwise we won't make progress
* with the page in ext4_da_writepage
*/
- if (buffer_dirty(bh) &&
- (!buffer_mapped(bh) || buffer_delay(bh))) {
+ if (ext4_bh_unmapped_or_delay(NULL, bh)) {
mpage_add_bh_to_extent(mpd, logical,
bh->b_size,
bh->b_state);
@@ -2318,6 +2337,14 @@ static int __mpage_da_writepage(struct page *page,
/*
* this is a special callback for ->write_begin() only
* it's intention is to return mapped block or reserve space
+ *
+ * For delayed buffer_head we have BH_Mapped, BH_New, BH_Delay set.
+ * We also have b_blocknr = -1 and b_bdev initialized properly
+ *
+ * For unwritten buffer_head we have BH_Mapped, BH_New, BH_Unwritten set.
+ * We also have b_blocknr = physicalblock mapping unwritten extent and b_bdev
+ * initialized properly.
+ *
*/
static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
@@ -2353,28 +2380,23 @@ static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
set_buffer_delay(bh_result);
} else if (ret > 0) {
bh_result->b_size = (ret << inode->i_blkbits);
- /*
- * With sub-block writes into unwritten extents
- * we also need to mark the buffer as new so that
- * the unwritten parts of the buffer gets correctly zeroed.
- */
- if (buffer_unwritten(bh_result))
+ if (buffer_unwritten(bh_result)) {
+ /* A delayed write to unwritten bh should
+ * be marked new and mapped. Mapped ensures
+ * that we don't do get_block multiple times
+ * when we write to the same offset and new
+ * ensures that we do proper zero out for
+ * partial write.
+ */
set_buffer_new(bh_result);
+ set_buffer_mapped(bh_result);
+ }
ret = 0;
}
return ret;
}
-static int ext4_bh_unmapped_or_delay(handle_t *handle, struct buffer_head *bh)
-{
- /*
- * unmapped buffer is possible for holes.
- * delay buffer is possible with delayed allocation
- */
- return ((!buffer_mapped(bh) || buffer_delay(bh)) && buffer_dirty(bh));
-}
-
static int ext4_normal_get_block_write(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
{
@@ -2828,7 +2850,7 @@ static int ext4_da_should_update_i_disksize(struct page *page,
for (i = 0; i < idx; i++)
bh = bh->b_this_page;
- if (!buffer_mapped(bh) || (buffer_delay(bh)))
+ if (!buffer_mapped(bh) || (buffer_delay(bh)) || buffer_unwritten(bh))
return 0;
return 1;
}
--
1.6.3.2.1.gb9f7d.dirty
next prev parent reply other threads:[~2009-06-08 19:25 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-08 19:22 [PATCH 00/49] Ext4 patches currently queued for mainline Theodore Ts'o
2009-06-08 19:22 ` [PATCH 01/49] ext4: Properly initialize the buffer_head state Theodore Ts'o
2009-06-08 19:22 ` [PATCH 02/49] vfs: Add BUG_ON for delayed and unwritten flags in submit_bh() Theodore Ts'o
2009-06-08 19:22 ` Theodore Ts'o [this message]
2009-06-08 19:22 ` [PATCH 04/49] ext4: Fallback to vmalloc if kmalloc can't allocate s_flex_groups array Theodore Ts'o
2009-06-08 19:22 ` [PATCH 05/49] ext4: Use is_power_of_2() for clarity Theodore Ts'o
2009-06-08 19:22 ` [PATCH 06/49] ext3: avoid unnecessary spinlock in critical POSIX ACL path Theodore Ts'o
2009-06-08 19:22 ` [PATCH 07/49] ext4: " Theodore Ts'o
2009-06-08 19:22 ` [PATCH 08/49] ext4: Simplify ext4_commit_super()'s function signature Theodore Ts'o
2009-06-08 19:22 ` [PATCH 09/49] ext4: Fix and simplify s_dirt handling Theodore Ts'o
2009-06-08 19:22 ` [PATCH 10/49] ext4: Use separate super_operations structure for no_journal filesystems Theodore Ts'o
2009-06-08 19:22 ` [PATCH 11/49] ext4: Avoid races caused by on-line resizing and SMP memory reordering Theodore Ts'o
2009-06-08 19:22 ` [PATCH 12/49] ext4: Remove outdated comment about lock_super() Theodore Ts'o
2009-06-08 19:22 ` [PATCH 13/49] ext4: ext4_mark_recovery_complete() doesn't need to use lock_super Theodore Ts'o
2009-06-08 19:22 ` [PATCH 14/49] ext4: Replace lock/unlock_super() with an explicit lock for the orphan list Theodore Ts'o
2009-06-08 19:22 ` [PATCH 15/49] ext4: Replace lock/unlock_super() with an explicit lock for resizing Theodore Ts'o
2009-06-08 19:22 ` [PATCH 16/49] ext4: Don't avoid using BLOCK_UNINIT block groups in mballoc Theodore Ts'o
2009-06-08 19:22 ` [PATCH 17/49] ext4: Move the ext4_i.h header file into ext4.h Theodore Ts'o
2009-06-08 19:22 ` [PATCH 18/49] ext4: Move the ext4_sb.h " Theodore Ts'o
2009-06-08 19:22 ` [PATCH 19/49] ext4: Move fs/ext4/namei.h " Theodore Ts'o
2009-06-08 19:22 ` [PATCH 20/49] ext4: Move fs/ext4/group.h " Theodore Ts'o
2009-06-08 19:22 ` [PATCH 21/49] ext4: Make the length of the mb_history file tunable Theodore Ts'o
2009-06-08 19:22 ` [PATCH 22/49] ext4: hook fiemap operation for directories Theodore Ts'o
2009-06-08 19:22 ` [PATCH 23/49] vfs: Enable FS_IOC_FIEMAP and FIGETBSZ for all filetypes Theodore Ts'o
2009-06-08 19:22 ` [PATCH 24/49] ext4: fix for fiemap last-block test Theodore Ts'o
2009-06-08 19:22 ` [PATCH 25/49] ext4: fix the length returned by fiemap for an unallocated extent Theodore Ts'o
2009-06-08 19:22 ` [PATCH 26/49] ext4: Convert ext4_lock_group to use sb_bgl_lock Theodore Ts'o
2009-06-08 19:22 ` [PATCH 27/49] ext4: Fix spinlock assertions on UP systems Theodore Ts'o
2009-06-08 19:22 ` [PATCH 28/49] ext4: Simplify function signature for ext4_da_get_block_write() Theodore Ts'o
2009-06-08 19:22 ` [PATCH 29/49] ext4: Rename ext4_get_blocks_handle() to be ext4_ind_get_blocks() Theodore Ts'o
2009-06-08 19:22 ` [PATCH 30/49] ext4: Rename ext4_get_blocks_wrap() to be ext4_get_blocks() Theodore Ts'o
2009-06-08 19:22 ` [PATCH 31/49] ext4: Define a new set of flags for ext4_get_blocks() Theodore Ts'o
2009-06-08 19:22 ` [PATCH 32/49] ext4: Add documentation to the ext4_*get_block* functions Theodore Ts'o
2009-06-08 19:22 ` [PATCH 33/49] ext4: Add BUG_ON debugging checks to noalloc_get_block_write() Theodore Ts'o
2009-06-08 19:22 ` [PATCH 34/49] ext4: Merge ext4_da_get_block_write() into mpage_da_map_blocks() Theodore Ts'o
2009-06-08 19:22 ` [PATCH 35/49] ext4: Clean up ext4_get_blocks() so it does not depend on bh_result->b_state Theodore Ts'o
2009-06-08 19:22 ` [PATCH 36/49] ext4: Add a comprehensive block validity check to ext4_get_blocks() Theodore Ts'o
2009-06-08 19:22 ` [PATCH 37/49] ext4: down i_data_sem only for read when walking tree for fiemap Theodore Ts'o
2009-06-08 19:22 ` [PATCH 38/49] ext4: Fix memory leak in ext4_fill_super() in case of a failed mount Theodore Ts'o
2009-06-08 19:22 ` [PATCH 39/49] ext3: Fix memory leak in ext3_fill_super() " Theodore Ts'o
2009-06-08 19:22 ` [PATCH 40/49] ext2: Fix memory leak in ext2_fill_super() " Theodore Ts'o
2009-06-08 19:22 ` [PATCH 41/49] ext4: remove unused function __ext4_write_dirty_metadata Theodore Ts'o
2009-06-08 19:23 ` [PATCH 42/49] ext4: Clean up calls to ext4_get_group_desc() Theodore Ts'o
2009-06-08 19:23 ` [PATCH 43/49] jbd2: Fix minor typos in comments in fs/jbd2/journal.c Theodore Ts'o
2009-06-08 19:23 ` [PATCH 44/49] ext4: super.c whitespace cleanup Theodore Ts'o
2009-06-08 19:23 ` [PATCH 45/49] ext4: Get rid of EXTEND_DISKSIZE flag of ext4_get_blocks_handle() Theodore Ts'o
2009-06-08 19:23 ` [PATCH 46/49] ext4: Change all super.c messages to print the device Theodore Ts'o
2009-06-08 19:23 ` [PATCH 47/49] ext4: Avoid leaking blocks after a block allocation failure Theodore Ts'o
2009-06-08 19:23 ` [PATCH 48/49] ext4: truncate the file properly if we fail to copy data from userspace Theodore Ts'o
2009-06-08 19:23 ` [PATCH 49/49] ext4: fix dx_map_entry to support 256k directory blocks Theodore Ts'o
2009-06-08 19:41 ` [PATCH 43/49] jbd2: Fix minor typos in comments in fs/jbd2/journal.c Alberto Bertogli
2009-06-09 4:06 ` Theodore Tso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1244488987-32564-4-git-send-email-tytso@mit.edu \
--to=tytso@mit.edu \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox