[PATCH 0/5] ext4: Several simplifications

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/5] ext4: Several simplifications
@ 2013-01-02 17:45 Jan Kara
  2013-01-02 17:45 ` [PATCH 1/5] ext4: Always use ext4_bio_write_page() for writeout Jan Kara
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Jan Kara @ 2013-01-02 17:45 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Jan Kara

  Hi,

  this patch series contains several simplifications of ext4 code I
came across when thinking how to simplify handling of uninitialized
extents. Since they are mostly independent, I'm submitting them now
for people to comment / merge them.

								Honza

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/5] ext4: Always use ext4_bio_write_page() for writeout
  2013-01-02 17:45 [PATCH 0/5] ext4: Several simplifications Jan Kara
@ 2013-01-02 17:45 ` Jan Kara
  2013-01-03 14:18   ` Jan Kara
                     ` (2 more replies)
  2013-01-02 17:45 ` [PATCH 2/5] ext4: Use redirty_page_for_writepage() in ext4_bio_write_page() Jan Kara
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 19+ messages in thread
From: Jan Kara @ 2013-01-02 17:45 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Jan Kara

Currently we sometimes used block_write_full_page() and sometimes
ext4_bio_write_page() for writeback (depending on mount options and call
path). Let's always use ext4_bio_write_page() to simplify things a bit.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/ext4.h  |    1 -
 fs/ext4/inode.c |  128 ++++++-------------------------------------------------
 fs/ext4/super.c |    7 +--
 3 files changed, 16 insertions(+), 120 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 8462eb3..70c7030 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -985,7 +985,6 @@ struct ext4_inode_info {
 #define EXT4_MOUNT_DIOREAD_NOLOCK	0x400000 /* Enable support for dio read nolocking */
 #define EXT4_MOUNT_JOURNAL_CHECKSUM	0x800000 /* Journal checksums */
 #define EXT4_MOUNT_JOURNAL_ASYNC_COMMIT	0x1000000 /* Journal Async Commit */
-#define EXT4_MOUNT_MBLK_IO_SUBMIT	0x4000000 /* multi-block io submits */
 #define EXT4_MOUNT_DELALLOC		0x8000000 /* Delalloc support */
 #define EXT4_MOUNT_DATA_ERR_ABORT	0x10000000 /* Abort on file data write */
 #define EXT4_MOUNT_BLOCK_VALIDITY	0x20000000 /* Block validity checking */
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index cb1c1ab..f95b511 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -134,8 +134,6 @@ static inline int ext4_begin_ordered_truncate(struct inode *inode,
 static void ext4_invalidatepage(struct page *page, unsigned long offset);
 static int noalloc_get_block_write(struct inode *inode, sector_t iblock,
 				   struct buffer_head *bh_result, int create);
-static int ext4_set_bh_endio(struct buffer_head *bh, struct inode *inode);
-static void ext4_end_io_buffer_write(struct buffer_head *bh, int uptodate);
 static int __ext4_journalled_writepage(struct page *page, unsigned int len);
 static int ext4_bh_delay_or_unwritten(handle_t *handle, struct buffer_head *bh);
 static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
@@ -808,11 +806,10 @@ int ext4_walk_page_buffers(handle_t *handle,
  * and the commit_write().  So doing the jbd2_journal_start at the start of
  * prepare_write() is the right place.
  *
- * Also, this function can nest inside ext4_writepage() ->
- * block_write_full_page(). In that case, we *know* that ext4_writepage()
- * has generated enough buffer credits to do the whole page.  So we won't
- * block on the journal in that case, which is good, because the caller may
- * be PF_MEMALLOC.
+ * Also, this function can nest inside ext4_writepage().  In that case, we
+ * *know* that ext4_writepage() has generated enough buffer credits to do the
+ * whole page.  So we won't block on the journal in that case, which is good,
+ * because the caller may be PF_MEMALLOC.
  *
  * By accident, ext4 can be reentered when a transaction is open via
  * quota file writes.  If we were to commit the transaction while thus
@@ -1463,18 +1460,9 @@ static int mpage_da_submit_io(struct mpage_da_data *mpd,
 			 */
 			if (unlikely(journal_data && PageChecked(page)))
 				err = __ext4_journalled_writepage(page, len);
-			else if (test_opt(inode->i_sb, MBLK_IO_SUBMIT))
+			else
 				err = ext4_bio_write_page(&io_submit, page,
 							  len, mpd->wbc);
-			else if (buffer_uninit(page_bufs)) {
-				ext4_set_bh_endio(page_bufs, inode);
-				err = block_write_full_page_endio(page,
-					noalloc_get_block_write,
-					mpd->wbc, ext4_end_io_buffer_write);
-			} else
-				err = block_write_full_page(page,
-					noalloc_get_block_write, mpd->wbc);
-
 			if (!err)
 				mpd->pages_written++;
 			/*
@@ -1891,16 +1879,16 @@ int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
 }
 
 /*
- * This function is used as a standard get_block_t calback function
- * when there is no desire to allocate any blocks.  It is used as a
- * callback function for block_write_begin() and block_write_full_page().
- * These functions should only try to map a single block at a time.
+ * This function is used as a standard get_block_t calback function when there
+ * is no desire to allocate any blocks.  It is used as a callback function for
+ * block_write_begin().  These functions should only try to map a single block
+ * at a time.
  *
  * Since this function doesn't do block allocations even if the caller
  * requests it by passing in create=1, it is critically important that
  * any caller checks to make sure that any buffer heads are returned
  * by this function are either all already mapped or marked for
- * delayed allocation before calling  block_write_full_page().  Otherwise,
+ * delayed allocation before calling ext4_bio_write_page().  Otherwise,
  * b_blocknr could be left unitialized, and the page write functions will
  * be taken by surprise.
  */
@@ -2040,6 +2028,7 @@ static int ext4_writepage(struct page *page,
 	unsigned int len;
 	struct buffer_head *page_bufs = NULL;
 	struct inode *inode = page->mapping->host;
+	struct ext4_io_submit io_submit;
 
 	trace_ext4_writepage(page);
 	size = i_size_read(inode);
@@ -2089,14 +2078,9 @@ static int ext4_writepage(struct page *page,
 		 */
 		return __ext4_journalled_writepage(page, len);
 
-	if (buffer_uninit(page_bufs)) {
-		ext4_set_bh_endio(page_bufs, inode);
-		ret = block_write_full_page_endio(page, noalloc_get_block_write,
-					    wbc, ext4_end_io_buffer_write);
-	} else
-		ret = block_write_full_page(page, noalloc_get_block_write,
-					    wbc);
-
+	memset(&io_submit, 0, sizeof(io_submit));
+	ret = ext4_bio_write_page(&io_submit, page, len, wbc);
+	ext4_io_submit(&io_submit);
 	return ret;
 }
 
@@ -2858,26 +2842,6 @@ ext4_readpages(struct file *file, struct address_space *mapping,
 	return mpage_readpages(mapping, pages, nr_pages, ext4_get_block);
 }
 
-static void ext4_invalidatepage_free_endio(struct page *page, unsigned long offset)
-{
-	struct buffer_head *head, *bh;
-	unsigned int curr_off = 0;
-
-	if (!page_has_buffers(page))
-		return;
-	head = bh = page_buffers(page);
-	do {
-		if (offset <= curr_off && test_clear_buffer_uninit(bh)
-					&& bh->b_private) {
-			ext4_free_io_end(bh->b_private);
-			bh->b_private = NULL;
-			bh->b_end_io = NULL;
-		}
-		curr_off = curr_off + bh->b_size;
-		bh = bh->b_this_page;
-	} while (bh != head);
-}
-
 static void ext4_invalidatepage(struct page *page, unsigned long offset)
 {
 	journal_t *journal = EXT4_JOURNAL(page->mapping->host);
@@ -2885,11 +2849,6 @@ static void ext4_invalidatepage(struct page *page, unsigned long offset)
 	trace_ext4_invalidatepage(page, offset);
 
 	/*
-	 * free any io_end structure allocated for buffers to be discarded
-	 */
-	if (ext4_should_dioread_nolock(page->mapping->host))
-		ext4_invalidatepage_free_endio(page, offset);
-	/*
 	 * If it's a full truncate we just forget about the pending dirtying
 	 */
 	if (offset == 0)
@@ -2977,65 +2936,6 @@ out:
 	ext4_add_complete_io(io_end);
 }
 
-static void ext4_end_io_buffer_write(struct buffer_head *bh, int uptodate)
-{
-	ext4_io_end_t *io_end = bh->b_private;
-	struct inode *inode;
-
-	if (!test_clear_buffer_uninit(bh) || !io_end)
-		goto out;
-
-	if (!(io_end->inode->i_sb->s_flags & MS_ACTIVE)) {
-		ext4_msg(io_end->inode->i_sb, KERN_INFO,
-			 "sb umounted, discard end_io request for inode %lu",
-			 io_end->inode->i_ino);
-		ext4_free_io_end(io_end);
-		goto out;
-	}
-
-	/*
-	 * It may be over-defensive here to check EXT4_IO_END_UNWRITTEN now,
-	 * but being more careful is always safe for the future change.
-	 */
-	inode = io_end->inode;
-	ext4_set_io_unwritten_flag(inode, io_end);
-	ext4_add_complete_io(io_end);
-out:
-	bh->b_private = NULL;
-	bh->b_end_io = NULL;
-	clear_buffer_uninit(bh);
-	end_buffer_async_write(bh, uptodate);
-}
-
-static int ext4_set_bh_endio(struct buffer_head *bh, struct inode *inode)
-{
-	ext4_io_end_t *io_end;
-	struct page *page = bh->b_page;
-	loff_t offset = (sector_t)page->index << PAGE_CACHE_SHIFT;
-	size_t size = bh->b_size;
-
-retry:
-	io_end = ext4_init_io_end(inode, GFP_ATOMIC);
-	if (!io_end) {
-		pr_warn_ratelimited("%s: allocation fail\n", __func__);
-		schedule();
-		goto retry;
-	}
-	io_end->offset = offset;
-	io_end->size = size;
-	/*
-	 * We need to hold a reference to the page to make sure it
-	 * doesn't get evicted before ext4_end_io_work() has a chance
-	 * to convert the extent from written to unwritten.
-	 */
-	io_end->page = page;
-	get_page(io_end->page);
-
-	bh->b_private = io_end;
-	bh->b_end_io = ext4_end_io_buffer_write;
-	return 0;
-}
-
 /*
  * For ext4 extent files, ext4 will do direct-io write to holes,
  * preallocated extents, and those write extend the file, no need to
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index e09f7d1..afbe974 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1280,8 +1280,8 @@ static const match_table_t tokens = {
 	{Opt_stripe, "stripe=%u"},
 	{Opt_delalloc, "delalloc"},
 	{Opt_nodelalloc, "nodelalloc"},
-	{Opt_mblk_io_submit, "mblk_io_submit"},
-	{Opt_nomblk_io_submit, "nomblk_io_submit"},
+	{Opt_removed, "mblk_io_submit"},
+	{Opt_removed, "nomblk_io_submit"},
 	{Opt_block_validity, "block_validity"},
 	{Opt_noblock_validity, "noblock_validity"},
 	{Opt_inode_readahead_blks, "inode_readahead_blks=%u"},
@@ -1414,8 +1414,6 @@ static const struct mount_opts {
 	{Opt_bsd_df, EXT4_MOUNT_MINIX_DF, MOPT_CLEAR},
 	{Opt_grpid, EXT4_MOUNT_GRPID, MOPT_SET},
 	{Opt_nogrpid, EXT4_MOUNT_GRPID, MOPT_CLEAR},
-	{Opt_mblk_io_submit, EXT4_MOUNT_MBLK_IO_SUBMIT, MOPT_SET},
-	{Opt_nomblk_io_submit, EXT4_MOUNT_MBLK_IO_SUBMIT, MOPT_CLEAR},
 	{Opt_block_validity, EXT4_MOUNT_BLOCK_VALIDITY, MOPT_SET},
 	{Opt_noblock_validity, EXT4_MOUNT_BLOCK_VALIDITY, MOPT_CLEAR},
 	{Opt_dioread_nolock, EXT4_MOUNT_DIOREAD_NOLOCK, MOPT_SET},
@@ -3373,7 +3371,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 #ifdef CONFIG_EXT4_FS_POSIX_ACL
 	set_opt(sb, POSIX_ACL);
 #endif
-	set_opt(sb, MBLK_IO_SUBMIT);
 	if ((def_mount_opts & EXT4_DEFM_JMODE) == EXT4_DEFM_JMODE_DATA)
 		set_opt(sb, JOURNAL_DATA);
 	else if ((def_mount_opts & EXT4_DEFM_JMODE) == EXT4_DEFM_JMODE_ORDERED)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/5] ext4: Use redirty_page_for_writepage() in ext4_bio_write_page()
  2013-01-02 17:45 [PATCH 0/5] ext4: Several simplifications Jan Kara
  2013-01-02 17:45 ` [PATCH 1/5] ext4: Always use ext4_bio_write_page() for writeout Jan Kara
@ 2013-01-02 17:45 ` Jan Kara
  2013-01-04  7:20   ` Zheng Liu
  2013-01-17 18:35   ` Theodore Ts'o
  2013-01-02 17:45 ` [PATCH 3/5] ext4: Remove bogus wait for unwritten extents in ext4_ind_direct_IO Jan Kara
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 19+ messages in thread
From: Jan Kara @ 2013-01-02 17:45 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Jan Kara

When we cannot write a page we should use redirty_page_for_writepage()
instead of plain set_page_dirty(). That tells writeback code we have
problems, redirties only the page (redirtying buffers is not needed),
and updates mm accounting of failed page writes.

Also move clearing of buffer dirty flag after io_submit_add_bh(). At that
moment we are sure buffer will be going to disk.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/page-io.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 0016fbc..8e5722a 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -23,6 +23,7 @@
 #include <linux/workqueue.h>
 #include <linux/kernel.h>
 #include <linux/slab.h>
+#include <linux/mm.h>
 
 #include "ext4_jbd2.h"
 #include "xattr.h"
@@ -436,7 +437,7 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 
 	io_page = kmem_cache_alloc(io_page_cachep, GFP_NOFS);
 	if (!io_page) {
-		set_page_dirty(page);
+		redirty_page_for_writepage(wbc, page);
 		unlock_page(page);
 		return -ENOMEM;
 	}
@@ -468,7 +469,6 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 			set_buffer_uptodate(bh);
 			continue;
 		}
-		clear_buffer_dirty(bh);
 		ret = io_submit_add_bh(io, io_page, inode, wbc, bh);
 		if (ret) {
 			/*
@@ -476,9 +476,10 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 			 * we can do but mark the page as dirty, and
 			 * better luck next time.
 			 */
-			set_page_dirty(page);
+			redirty_page_for_writepage(wbc, page);
 			break;
 		}
+		clear_buffer_dirty(bh);
 	}
 	unlock_page(page);
 	/*
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/5] ext4: Remove bogus wait for unwritten extents in ext4_ind_direct_IO
  2013-01-02 17:45 [PATCH 0/5] ext4: Several simplifications Jan Kara
  2013-01-02 17:45 ` [PATCH 1/5] ext4: Always use ext4_bio_write_page() for writeout Jan Kara
  2013-01-02 17:45 ` [PATCH 2/5] ext4: Use redirty_page_for_writepage() in ext4_bio_write_page() Jan Kara
@ 2013-01-02 17:45 ` Jan Kara
  2013-01-04  7:24   ` Zheng Liu
  2013-01-02 17:45 ` [PATCH 4/5] ext4: Disable merging of uninitialized extents Jan Kara
  2013-01-02 17:45 ` [PATCH 5/5] ext4: Remove unnecessary wait for extent conversion in ext4_fallocate() Jan Kara
  4 siblings, 1 reply; 19+ messages in thread
From: Jan Kara @ 2013-01-02 17:45 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Jan Kara

When using indirect blocks there is no possibility to have any unwritten
extents. So wait for them in ext4_ind_direct_IO() is just bogus.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/indirect.c |    5 -----
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 20862f9..993247c 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -807,11 +807,6 @@ ssize_t ext4_ind_direct_IO(int rw, struct kiocb *iocb,
 
 retry:
 	if (rw == READ && ext4_should_dioread_nolock(inode)) {
-		if (unlikely(atomic_read(&EXT4_I(inode)->i_unwritten))) {
-			mutex_lock(&inode->i_mutex);
-			ext4_flush_unwritten_io(inode);
-			mutex_unlock(&inode->i_mutex);
-		}
 		/*
 		 * Nolock dioread optimization may be dynamically disabled
 		 * via ext4_inode_block_unlocked_dio(). Check inode's state
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/5] ext4: Disable merging of uninitialized extents
  2013-01-02 17:45 [PATCH 0/5] ext4: Several simplifications Jan Kara
                   ` (2 preceding siblings ...)
  2013-01-02 17:45 ` [PATCH 3/5] ext4: Remove bogus wait for unwritten extents in ext4_ind_direct_IO Jan Kara
@ 2013-01-02 17:45 ` Jan Kara
  2013-01-04  7:25   ` Zheng Liu
  2013-01-02 17:45 ` [PATCH 5/5] ext4: Remove unnecessary wait for extent conversion in ext4_fallocate() Jan Kara
  4 siblings, 1 reply; 19+ messages in thread
From: Jan Kara @ 2013-01-02 17:45 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Jan Kara

Merging of uninitialized extents creates all sorts of interesting race
possibilities when writeback / DIO races with fallocate. Thus
ext4_convert_unwritten_extents_endio() has to deal with a case where
extent to be converted needs to be split out first. That isn't nice
for two reasons:

1) It may need allocation of extent tree block so ENOSPC is possible.
2) It complicates end_io handling code

So we disable merging of uninitialized extents which allows us to simplify
the code. Extents will get merged after they are converted to initialized
ones.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/extents.c |   61 +++++++++++++++-------------------------------------
 1 files changed, 18 insertions(+), 43 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 26af228..f1ce33a 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -54,9 +54,6 @@
 #define EXT4_EXT_MARK_UNINIT1	0x2  /* mark first half uninitialized */
 #define EXT4_EXT_MARK_UNINIT2	0x4  /* mark second half uninitialized */
 
-#define EXT4_EXT_DATA_VALID1	0x8  /* first half contains valid data */
-#define EXT4_EXT_DATA_VALID2	0x10 /* second half contains valid data */
-
 static __le32 ext4_extent_block_csum(struct inode *inode,
 				     struct ext4_extent_header *eh)
 {
@@ -1579,20 +1576,17 @@ int
 ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1,
 				struct ext4_extent *ex2)
 {
-	unsigned short ext1_ee_len, ext2_ee_len, max_len;
+	unsigned ext1_ee_len, ext2_ee_len;
 
 	/*
-	 * Make sure that either both extents are uninitialized, or
-	 * both are _not_.
+	 * Make sure that both extents are initialized. We don't merge
+	 * uninitialized extents so that we can be sure that end_io code has
+	 * the extent that was written properly split out and conversion to
+	 * initialized is trivial.
 	 */
-	if (ext4_ext_is_uninitialized(ex1) ^ ext4_ext_is_uninitialized(ex2))
+	if (ext4_ext_is_uninitialized(ex1) || ext4_ext_is_uninitialized(ex2))
 		return 0;
 
-	if (ext4_ext_is_uninitialized(ex1))
-		max_len = EXT_UNINIT_MAX_LEN;
-	else
-		max_len = EXT_INIT_MAX_LEN;
-
 	ext1_ee_len = ext4_ext_get_actual_len(ex1);
 	ext2_ee_len = ext4_ext_get_actual_len(ex2);
 
@@ -1605,7 +1599,7 @@ ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1,
 	 * as an RO_COMPAT feature, refuse to merge to extents if
 	 * this can result in the top bit of ee_len being set.
 	 */
-	if (ext1_ee_len + ext2_ee_len > max_len)
+	if (ext1_ee_len + ext2_ee_len > EXT_INIT_MAX_LEN)
 		return 0;
 #ifdef AGGRESSIVE_TEST
 	if (ext1_ee_len >= 4)
@@ -2959,9 +2953,6 @@ static int ext4_split_extent_at(handle_t *handle,
 	unsigned int ee_len, depth;
 	int err = 0;
 
-	BUG_ON((split_flag & (EXT4_EXT_DATA_VALID1 | EXT4_EXT_DATA_VALID2)) ==
-	       (EXT4_EXT_DATA_VALID1 | EXT4_EXT_DATA_VALID2));
-
 	ext_debug("ext4_split_extents_at: inode %lu, logical"
 		"block %llu\n", inode->i_ino, (unsigned long long)split);
 
@@ -3020,14 +3011,7 @@ static int ext4_split_extent_at(handle_t *handle,
 
 	err = ext4_ext_insert_extent(handle, inode, path, &newex, flags);
 	if (err == -ENOSPC && (EXT4_EXT_MAY_ZEROOUT & split_flag)) {
-		if (split_flag & (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) {
-			if (split_flag & EXT4_EXT_DATA_VALID1)
-				err = ext4_ext_zeroout(inode, ex2);
-			else
-				err = ext4_ext_zeroout(inode, ex);
-		} else
-			err = ext4_ext_zeroout(inode, &orig_ex);
-
+		err = ext4_ext_zeroout(inode, &orig_ex);
 		if (err)
 			goto fix_extent_len;
 		/* update the extent length and mark as initialized */
@@ -3085,8 +3069,6 @@ static int ext4_split_extent(handle_t *handle,
 		if (uninitialized)
 			split_flag1 |= EXT4_EXT_MARK_UNINIT1 |
 				       EXT4_EXT_MARK_UNINIT2;
-		if (split_flag & EXT4_EXT_DATA_VALID2)
-			split_flag1 |= EXT4_EXT_DATA_VALID1;
 		err = ext4_split_extent_at(handle, inode, path,
 				map->m_lblk + map->m_len, split_flag1, flags1);
 		if (err)
@@ -3099,8 +3081,7 @@ static int ext4_split_extent(handle_t *handle,
 		return PTR_ERR(path);
 
 	if (map->m_lblk >= ee_block) {
-		split_flag1 = split_flag & (EXT4_EXT_MAY_ZEROOUT |
-					    EXT4_EXT_DATA_VALID2);
+		split_flag1 = split_flag & EXT4_EXT_MAY_ZEROOUT;
 		if (uninitialized)
 			split_flag1 |= EXT4_EXT_MARK_UNINIT1;
 		if (split_flag & EXT4_EXT_MARK_UNINIT2)
@@ -3379,8 +3360,7 @@ static int ext4_split_unwritten_extents(handle_t *handle,
 
 	split_flag |= ee_block + ee_len <= eof_block ? EXT4_EXT_MAY_ZEROOUT : 0;
 	split_flag |= EXT4_EXT_MARK_UNINIT2;
-	if (flags & EXT4_GET_BLOCKS_CONVERT)
-		split_flag |= EXT4_EXT_DATA_VALID2;
+
 	flags |= EXT4_GET_BLOCKS_PRE_IO;
 	return ext4_split_extent(handle, inode, path, map, split_flag, flags);
 }
@@ -3405,20 +3385,15 @@ static int ext4_convert_unwritten_extents_endio(handle_t *handle,
 		"block %llu, max_blocks %u\n", inode->i_ino,
 		  (unsigned long long)ee_block, ee_len);
 
-	/* If extent is larger than requested then split is required */
+	/* Extent is larger than requested? */
 	if (ee_block != map->m_lblk || ee_len > map->m_len) {
-		err = ext4_split_unwritten_extents(handle, inode, map, path,
-						   EXT4_GET_BLOCKS_CONVERT);
-		if (err < 0)
-			goto out;
-		ext4_ext_drop_refs(path);
-		path = ext4_ext_find_extent(inode, map->m_lblk, path);
-		if (IS_ERR(path)) {
-			err = PTR_ERR(path);
-			goto out;
-		}
-		depth = ext_depth(inode);
-		ex = path[depth].p_ext;
+		EXT4_ERROR_INODE(inode, "Written extent modified before IO"
+			" finished: extent logical block %llu, len %u; IO"
+			" logical block %llu, len %u\n",
+			(unsigned long long)ee_block, ee_len,
+			(unsigned long long)map->m_lblk, map->m_len);
+		err = -EIO;
+		goto out;
 	}
 
 	err = ext4_ext_get_access(handle, inode, path + depth);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/5] ext4: Remove unnecessary wait for extent conversion in ext4_fallocate()
  2013-01-02 17:45 [PATCH 0/5] ext4: Several simplifications Jan Kara
                   ` (3 preceding siblings ...)
  2013-01-02 17:45 ` [PATCH 4/5] ext4: Disable merging of uninitialized extents Jan Kara
@ 2013-01-02 17:45 ` Jan Kara
  2013-01-04  7:26   ` Zheng Liu
  4 siblings, 1 reply; 19+ messages in thread
From: Jan Kara @ 2013-01-02 17:45 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Jan Kara

Now that we don't merge uninitialized extents anymore, ext4_fallocate()
is free to operate on the inode while there are still some extent
conversions pending - it won't disturb them in any way.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/extents.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index f1ce33a..5c7a46a 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4406,8 +4406,6 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	if (len <= EXT_UNINIT_MAX_LEN << blkbits)
 		flags |= EXT4_GET_BLOCKS_NO_NORMALIZE;
 
-	/* Prevent race condition between unwritten */
-	ext4_flush_unwritten_io(inode);
 retry:
 	while (ret >= 0 && ret < max_blocks) {
 		map.m_lblk = map.m_lblk + ret;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] ext4: Always use ext4_bio_write_page() for writeout
  2013-01-02 17:45 ` [PATCH 1/5] ext4: Always use ext4_bio_write_page() for writeout Jan Kara
@ 2013-01-03 14:18   ` Jan Kara
  2013-01-04  7:18   ` Zheng Liu
  2013-01-17 18:31   ` Theodore Ts'o
  2 siblings, 0 replies; 19+ messages in thread
From: Jan Kara @ 2013-01-03 14:18 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Jan Kara

On Wed 02-01-13 18:45:40, Jan Kara wrote:
> Currently we sometimes used block_write_full_page() and sometimes
> ext4_bio_write_page() for writeback (depending on mount options and call
> path). Let's always use ext4_bio_write_page() to simplify things a bit.
  I just found out that we can also drop ext4_io_end_t->page and its
handling (I can fix that in V2 but I'll wait for some feedback before
sending it). Actually one argument against this patch could be that
ext4_bio_write_page() is rather memory hungry - ext4_io_end is
over 1 KB. I don't think it's critical and it seems to be possible to trim
that down significantly but I wanted to mention it so that people are not
surprised.

								Honza

> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/ext4.h  |    1 -
>  fs/ext4/inode.c |  128 ++++++-------------------------------------------------
>  fs/ext4/super.c |    7 +--
>  3 files changed, 16 insertions(+), 120 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 8462eb3..70c7030 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -985,7 +985,6 @@ struct ext4_inode_info {
>  #define EXT4_MOUNT_DIOREAD_NOLOCK	0x400000 /* Enable support for dio read nolocking */
>  #define EXT4_MOUNT_JOURNAL_CHECKSUM	0x800000 /* Journal checksums */
>  #define EXT4_MOUNT_JOURNAL_ASYNC_COMMIT	0x1000000 /* Journal Async Commit */
> -#define EXT4_MOUNT_MBLK_IO_SUBMIT	0x4000000 /* multi-block io submits */
>  #define EXT4_MOUNT_DELALLOC		0x8000000 /* Delalloc support */
>  #define EXT4_MOUNT_DATA_ERR_ABORT	0x10000000 /* Abort on file data write */
>  #define EXT4_MOUNT_BLOCK_VALIDITY	0x20000000 /* Block validity checking */
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index cb1c1ab..f95b511 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -134,8 +134,6 @@ static inline int ext4_begin_ordered_truncate(struct inode *inode,
>  static void ext4_invalidatepage(struct page *page, unsigned long offset);
>  static int noalloc_get_block_write(struct inode *inode, sector_t iblock,
>  				   struct buffer_head *bh_result, int create);
> -static int ext4_set_bh_endio(struct buffer_head *bh, struct inode *inode);
> -static void ext4_end_io_buffer_write(struct buffer_head *bh, int uptodate);
>  static int __ext4_journalled_writepage(struct page *page, unsigned int len);
>  static int ext4_bh_delay_or_unwritten(handle_t *handle, struct buffer_head *bh);
>  static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
> @@ -808,11 +806,10 @@ int ext4_walk_page_buffers(handle_t *handle,
>   * and the commit_write().  So doing the jbd2_journal_start at the start of
>   * prepare_write() is the right place.
>   *
> - * Also, this function can nest inside ext4_writepage() ->
> - * block_write_full_page(). In that case, we *know* that ext4_writepage()
> - * has generated enough buffer credits to do the whole page.  So we won't
> - * block on the journal in that case, which is good, because the caller may
> - * be PF_MEMALLOC.
> + * Also, this function can nest inside ext4_writepage().  In that case, we
> + * *know* that ext4_writepage() has generated enough buffer credits to do the
> + * whole page.  So we won't block on the journal in that case, which is good,
> + * because the caller may be PF_MEMALLOC.
>   *
>   * By accident, ext4 can be reentered when a transaction is open via
>   * quota file writes.  If we were to commit the transaction while thus
> @@ -1463,18 +1460,9 @@ static int mpage_da_submit_io(struct mpage_da_data *mpd,
>  			 */
>  			if (unlikely(journal_data && PageChecked(page)))
>  				err = __ext4_journalled_writepage(page, len);
> -			else if (test_opt(inode->i_sb, MBLK_IO_SUBMIT))
> +			else
>  				err = ext4_bio_write_page(&io_submit, page,
>  							  len, mpd->wbc);
> -			else if (buffer_uninit(page_bufs)) {
> -				ext4_set_bh_endio(page_bufs, inode);
> -				err = block_write_full_page_endio(page,
> -					noalloc_get_block_write,
> -					mpd->wbc, ext4_end_io_buffer_write);
> -			} else
> -				err = block_write_full_page(page,
> -					noalloc_get_block_write, mpd->wbc);
> -
>  			if (!err)
>  				mpd->pages_written++;
>  			/*
> @@ -1891,16 +1879,16 @@ int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
>  }
>  
>  /*
> - * This function is used as a standard get_block_t calback function
> - * when there is no desire to allocate any blocks.  It is used as a
> - * callback function for block_write_begin() and block_write_full_page().
> - * These functions should only try to map a single block at a time.
> + * This function is used as a standard get_block_t calback function when there
> + * is no desire to allocate any blocks.  It is used as a callback function for
> + * block_write_begin().  These functions should only try to map a single block
> + * at a time.
>   *
>   * Since this function doesn't do block allocations even if the caller
>   * requests it by passing in create=1, it is critically important that
>   * any caller checks to make sure that any buffer heads are returned
>   * by this function are either all already mapped or marked for
> - * delayed allocation before calling  block_write_full_page().  Otherwise,
> + * delayed allocation before calling ext4_bio_write_page().  Otherwise,
>   * b_blocknr could be left unitialized, and the page write functions will
>   * be taken by surprise.
>   */
> @@ -2040,6 +2028,7 @@ static int ext4_writepage(struct page *page,
>  	unsigned int len;
>  	struct buffer_head *page_bufs = NULL;
>  	struct inode *inode = page->mapping->host;
> +	struct ext4_io_submit io_submit;
>  
>  	trace_ext4_writepage(page);
>  	size = i_size_read(inode);
> @@ -2089,14 +2078,9 @@ static int ext4_writepage(struct page *page,
>  		 */
>  		return __ext4_journalled_writepage(page, len);
>  
> -	if (buffer_uninit(page_bufs)) {
> -		ext4_set_bh_endio(page_bufs, inode);
> -		ret = block_write_full_page_endio(page, noalloc_get_block_write,
> -					    wbc, ext4_end_io_buffer_write);
> -	} else
> -		ret = block_write_full_page(page, noalloc_get_block_write,
> -					    wbc);
> -
> +	memset(&io_submit, 0, sizeof(io_submit));
> +	ret = ext4_bio_write_page(&io_submit, page, len, wbc);
> +	ext4_io_submit(&io_submit);
>  	return ret;
>  }
>  
> @@ -2858,26 +2842,6 @@ ext4_readpages(struct file *file, struct address_space *mapping,
>  	return mpage_readpages(mapping, pages, nr_pages, ext4_get_block);
>  }
>  
> -static void ext4_invalidatepage_free_endio(struct page *page, unsigned long offset)
> -{
> -	struct buffer_head *head, *bh;
> -	unsigned int curr_off = 0;
> -
> -	if (!page_has_buffers(page))
> -		return;
> -	head = bh = page_buffers(page);
> -	do {
> -		if (offset <= curr_off && test_clear_buffer_uninit(bh)
> -					&& bh->b_private) {
> -			ext4_free_io_end(bh->b_private);
> -			bh->b_private = NULL;
> -			bh->b_end_io = NULL;
> -		}
> -		curr_off = curr_off + bh->b_size;
> -		bh = bh->b_this_page;
> -	} while (bh != head);
> -}
> -
>  static void ext4_invalidatepage(struct page *page, unsigned long offset)
>  {
>  	journal_t *journal = EXT4_JOURNAL(page->mapping->host);
> @@ -2885,11 +2849,6 @@ static void ext4_invalidatepage(struct page *page, unsigned long offset)
>  	trace_ext4_invalidatepage(page, offset);
>  
>  	/*
> -	 * free any io_end structure allocated for buffers to be discarded
> -	 */
> -	if (ext4_should_dioread_nolock(page->mapping->host))
> -		ext4_invalidatepage_free_endio(page, offset);
> -	/*
>  	 * If it's a full truncate we just forget about the pending dirtying
>  	 */
>  	if (offset == 0)
> @@ -2977,65 +2936,6 @@ out:
>  	ext4_add_complete_io(io_end);
>  }
>  
> -static void ext4_end_io_buffer_write(struct buffer_head *bh, int uptodate)
> -{
> -	ext4_io_end_t *io_end = bh->b_private;
> -	struct inode *inode;
> -
> -	if (!test_clear_buffer_uninit(bh) || !io_end)
> -		goto out;
> -
> -	if (!(io_end->inode->i_sb->s_flags & MS_ACTIVE)) {
> -		ext4_msg(io_end->inode->i_sb, KERN_INFO,
> -			 "sb umounted, discard end_io request for inode %lu",
> -			 io_end->inode->i_ino);
> -		ext4_free_io_end(io_end);
> -		goto out;
> -	}
> -
> -	/*
> -	 * It may be over-defensive here to check EXT4_IO_END_UNWRITTEN now,
> -	 * but being more careful is always safe for the future change.
> -	 */
> -	inode = io_end->inode;
> -	ext4_set_io_unwritten_flag(inode, io_end);
> -	ext4_add_complete_io(io_end);
> -out:
> -	bh->b_private = NULL;
> -	bh->b_end_io = NULL;
> -	clear_buffer_uninit(bh);
> -	end_buffer_async_write(bh, uptodate);
> -}
> -
> -static int ext4_set_bh_endio(struct buffer_head *bh, struct inode *inode)
> -{
> -	ext4_io_end_t *io_end;
> -	struct page *page = bh->b_page;
> -	loff_t offset = (sector_t)page->index << PAGE_CACHE_SHIFT;
> -	size_t size = bh->b_size;
> -
> -retry:
> -	io_end = ext4_init_io_end(inode, GFP_ATOMIC);
> -	if (!io_end) {
> -		pr_warn_ratelimited("%s: allocation fail\n", __func__);
> -		schedule();
> -		goto retry;
> -	}
> -	io_end->offset = offset;
> -	io_end->size = size;
> -	/*
> -	 * We need to hold a reference to the page to make sure it
> -	 * doesn't get evicted before ext4_end_io_work() has a chance
> -	 * to convert the extent from written to unwritten.
> -	 */
> -	io_end->page = page;
> -	get_page(io_end->page);
> -
> -	bh->b_private = io_end;
> -	bh->b_end_io = ext4_end_io_buffer_write;
> -	return 0;
> -}
> -
>  /*
>   * For ext4 extent files, ext4 will do direct-io write to holes,
>   * preallocated extents, and those write extend the file, no need to
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index e09f7d1..afbe974 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1280,8 +1280,8 @@ static const match_table_t tokens = {
>  	{Opt_stripe, "stripe=%u"},
>  	{Opt_delalloc, "delalloc"},
>  	{Opt_nodelalloc, "nodelalloc"},
> -	{Opt_mblk_io_submit, "mblk_io_submit"},
> -	{Opt_nomblk_io_submit, "nomblk_io_submit"},
> +	{Opt_removed, "mblk_io_submit"},
> +	{Opt_removed, "nomblk_io_submit"},
>  	{Opt_block_validity, "block_validity"},
>  	{Opt_noblock_validity, "noblock_validity"},
>  	{Opt_inode_readahead_blks, "inode_readahead_blks=%u"},
> @@ -1414,8 +1414,6 @@ static const struct mount_opts {
>  	{Opt_bsd_df, EXT4_MOUNT_MINIX_DF, MOPT_CLEAR},
>  	{Opt_grpid, EXT4_MOUNT_GRPID, MOPT_SET},
>  	{Opt_nogrpid, EXT4_MOUNT_GRPID, MOPT_CLEAR},
> -	{Opt_mblk_io_submit, EXT4_MOUNT_MBLK_IO_SUBMIT, MOPT_SET},
> -	{Opt_nomblk_io_submit, EXT4_MOUNT_MBLK_IO_SUBMIT, MOPT_CLEAR},
>  	{Opt_block_validity, EXT4_MOUNT_BLOCK_VALIDITY, MOPT_SET},
>  	{Opt_noblock_validity, EXT4_MOUNT_BLOCK_VALIDITY, MOPT_CLEAR},
>  	{Opt_dioread_nolock, EXT4_MOUNT_DIOREAD_NOLOCK, MOPT_SET},
> @@ -3373,7 +3371,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>  #ifdef CONFIG_EXT4_FS_POSIX_ACL
>  	set_opt(sb, POSIX_ACL);
>  #endif
> -	set_opt(sb, MBLK_IO_SUBMIT);
>  	if ((def_mount_opts & EXT4_DEFM_JMODE) == EXT4_DEFM_JMODE_DATA)
>  		set_opt(sb, JOURNAL_DATA);
>  	else if ((def_mount_opts & EXT4_DEFM_JMODE) == EXT4_DEFM_JMODE_ORDERED)
> -- 
> 1.7.1
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] ext4: Always use ext4_bio_write_page() for writeout
  2013-01-02 17:45 ` [PATCH 1/5] ext4: Always use ext4_bio_write_page() for writeout Jan Kara
  2013-01-03 14:18   ` Jan Kara
@ 2013-01-04  7:18   ` Zheng Liu
  2013-01-17 18:31   ` Theodore Ts'o
  2 siblings, 0 replies; 19+ messages in thread
From: Zheng Liu @ 2013-01-04  7:18 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ted Tso, linux-ext4

On Wed, Jan 02, 2013 at 06:45:40PM +0100, Jan Kara wrote:
> Currently we sometimes used block_write_full_page() and sometimes
> ext4_bio_write_page() for writeback (depending on mount options and call
> path). Let's always use ext4_bio_write_page() to simplify things a bit.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>

Regards,
                                                - Zheng

> ---
>  fs/ext4/ext4.h  |    1 -
>  fs/ext4/inode.c |  128 ++++++-------------------------------------------------
>  fs/ext4/super.c |    7 +--
>  3 files changed, 16 insertions(+), 120 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 8462eb3..70c7030 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -985,7 +985,6 @@ struct ext4_inode_info {
>  #define EXT4_MOUNT_DIOREAD_NOLOCK	0x400000 /* Enable support for dio read nolocking */
>  #define EXT4_MOUNT_JOURNAL_CHECKSUM	0x800000 /* Journal checksums */
>  #define EXT4_MOUNT_JOURNAL_ASYNC_COMMIT	0x1000000 /* Journal Async Commit */
> -#define EXT4_MOUNT_MBLK_IO_SUBMIT	0x4000000 /* multi-block io submits */
>  #define EXT4_MOUNT_DELALLOC		0x8000000 /* Delalloc support */
>  #define EXT4_MOUNT_DATA_ERR_ABORT	0x10000000 /* Abort on file data write */
>  #define EXT4_MOUNT_BLOCK_VALIDITY	0x20000000 /* Block validity checking */
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index cb1c1ab..f95b511 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -134,8 +134,6 @@ static inline int ext4_begin_ordered_truncate(struct inode *inode,
>  static void ext4_invalidatepage(struct page *page, unsigned long offset);
>  static int noalloc_get_block_write(struct inode *inode, sector_t iblock,
>  				   struct buffer_head *bh_result, int create);
> -static int ext4_set_bh_endio(struct buffer_head *bh, struct inode *inode);
> -static void ext4_end_io_buffer_write(struct buffer_head *bh, int uptodate);
>  static int __ext4_journalled_writepage(struct page *page, unsigned int len);
>  static int ext4_bh_delay_or_unwritten(handle_t *handle, struct buffer_head *bh);
>  static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
> @@ -808,11 +806,10 @@ int ext4_walk_page_buffers(handle_t *handle,
>   * and the commit_write().  So doing the jbd2_journal_start at the start of
>   * prepare_write() is the right place.
>   *
> - * Also, this function can nest inside ext4_writepage() ->
> - * block_write_full_page(). In that case, we *know* that ext4_writepage()
> - * has generated enough buffer credits to do the whole page.  So we won't
> - * block on the journal in that case, which is good, because the caller may
> - * be PF_MEMALLOC.
> + * Also, this function can nest inside ext4_writepage().  In that case, we
> + * *know* that ext4_writepage() has generated enough buffer credits to do the
> + * whole page.  So we won't block on the journal in that case, which is good,
> + * because the caller may be PF_MEMALLOC.
>   *
>   * By accident, ext4 can be reentered when a transaction is open via
>   * quota file writes.  If we were to commit the transaction while thus
> @@ -1463,18 +1460,9 @@ static int mpage_da_submit_io(struct mpage_da_data *mpd,
>  			 */
>  			if (unlikely(journal_data && PageChecked(page)))
>  				err = __ext4_journalled_writepage(page, len);
> -			else if (test_opt(inode->i_sb, MBLK_IO_SUBMIT))
> +			else
>  				err = ext4_bio_write_page(&io_submit, page,
>  							  len, mpd->wbc);
> -			else if (buffer_uninit(page_bufs)) {
> -				ext4_set_bh_endio(page_bufs, inode);
> -				err = block_write_full_page_endio(page,
> -					noalloc_get_block_write,
> -					mpd->wbc, ext4_end_io_buffer_write);
> -			} else
> -				err = block_write_full_page(page,
> -					noalloc_get_block_write, mpd->wbc);
> -
>  			if (!err)
>  				mpd->pages_written++;
>  			/*
> @@ -1891,16 +1879,16 @@ int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
>  }
>  
>  /*
> - * This function is used as a standard get_block_t calback function
> - * when there is no desire to allocate any blocks.  It is used as a
> - * callback function for block_write_begin() and block_write_full_page().
> - * These functions should only try to map a single block at a time.
> + * This function is used as a standard get_block_t calback function when there
> + * is no desire to allocate any blocks.  It is used as a callback function for
> + * block_write_begin().  These functions should only try to map a single block
> + * at a time.
>   *
>   * Since this function doesn't do block allocations even if the caller
>   * requests it by passing in create=1, it is critically important that
>   * any caller checks to make sure that any buffer heads are returned
>   * by this function are either all already mapped or marked for
> - * delayed allocation before calling  block_write_full_page().  Otherwise,
> + * delayed allocation before calling ext4_bio_write_page().  Otherwise,
>   * b_blocknr could be left unitialized, and the page write functions will
>   * be taken by surprise.
>   */
> @@ -2040,6 +2028,7 @@ static int ext4_writepage(struct page *page,
>  	unsigned int len;
>  	struct buffer_head *page_bufs = NULL;
>  	struct inode *inode = page->mapping->host;
> +	struct ext4_io_submit io_submit;
>  
>  	trace_ext4_writepage(page);
>  	size = i_size_read(inode);
> @@ -2089,14 +2078,9 @@ static int ext4_writepage(struct page *page,
>  		 */
>  		return __ext4_journalled_writepage(page, len);
>  
> -	if (buffer_uninit(page_bufs)) {
> -		ext4_set_bh_endio(page_bufs, inode);
> -		ret = block_write_full_page_endio(page, noalloc_get_block_write,
> -					    wbc, ext4_end_io_buffer_write);
> -	} else
> -		ret = block_write_full_page(page, noalloc_get_block_write,
> -					    wbc);
> -
> +	memset(&io_submit, 0, sizeof(io_submit));
> +	ret = ext4_bio_write_page(&io_submit, page, len, wbc);
> +	ext4_io_submit(&io_submit);
>  	return ret;
>  }
>  
> @@ -2858,26 +2842,6 @@ ext4_readpages(struct file *file, struct address_space *mapping,
>  	return mpage_readpages(mapping, pages, nr_pages, ext4_get_block);
>  }
>  
> -static void ext4_invalidatepage_free_endio(struct page *page, unsigned long offset)
> -{
> -	struct buffer_head *head, *bh;
> -	unsigned int curr_off = 0;
> -
> -	if (!page_has_buffers(page))
> -		return;
> -	head = bh = page_buffers(page);
> -	do {
> -		if (offset <= curr_off && test_clear_buffer_uninit(bh)
> -					&& bh->b_private) {
> -			ext4_free_io_end(bh->b_private);
> -			bh->b_private = NULL;
> -			bh->b_end_io = NULL;
> -		}
> -		curr_off = curr_off + bh->b_size;
> -		bh = bh->b_this_page;
> -	} while (bh != head);
> -}
> -
>  static void ext4_invalidatepage(struct page *page, unsigned long offset)
>  {
>  	journal_t *journal = EXT4_JOURNAL(page->mapping->host);
> @@ -2885,11 +2849,6 @@ static void ext4_invalidatepage(struct page *page, unsigned long offset)
>  	trace_ext4_invalidatepage(page, offset);
>  
>  	/*
> -	 * free any io_end structure allocated for buffers to be discarded
> -	 */
> -	if (ext4_should_dioread_nolock(page->mapping->host))
> -		ext4_invalidatepage_free_endio(page, offset);
> -	/*
>  	 * If it's a full truncate we just forget about the pending dirtying
>  	 */
>  	if (offset == 0)
> @@ -2977,65 +2936,6 @@ out:
>  	ext4_add_complete_io(io_end);
>  }
>  
> -static void ext4_end_io_buffer_write(struct buffer_head *bh, int uptodate)
> -{
> -	ext4_io_end_t *io_end = bh->b_private;
> -	struct inode *inode;
> -
> -	if (!test_clear_buffer_uninit(bh) || !io_end)
> -		goto out;
> -
> -	if (!(io_end->inode->i_sb->s_flags & MS_ACTIVE)) {
> -		ext4_msg(io_end->inode->i_sb, KERN_INFO,
> -			 "sb umounted, discard end_io request for inode %lu",
> -			 io_end->inode->i_ino);
> -		ext4_free_io_end(io_end);
> -		goto out;
> -	}
> -
> -	/*
> -	 * It may be over-defensive here to check EXT4_IO_END_UNWRITTEN now,
> -	 * but being more careful is always safe for the future change.
> -	 */
> -	inode = io_end->inode;
> -	ext4_set_io_unwritten_flag(inode, io_end);
> -	ext4_add_complete_io(io_end);
> -out:
> -	bh->b_private = NULL;
> -	bh->b_end_io = NULL;
> -	clear_buffer_uninit(bh);
> -	end_buffer_async_write(bh, uptodate);
> -}
> -
> -static int ext4_set_bh_endio(struct buffer_head *bh, struct inode *inode)
> -{
> -	ext4_io_end_t *io_end;
> -	struct page *page = bh->b_page;
> -	loff_t offset = (sector_t)page->index << PAGE_CACHE_SHIFT;
> -	size_t size = bh->b_size;
> -
> -retry:
> -	io_end = ext4_init_io_end(inode, GFP_ATOMIC);
> -	if (!io_end) {
> -		pr_warn_ratelimited("%s: allocation fail\n", __func__);
> -		schedule();
> -		goto retry;
> -	}
> -	io_end->offset = offset;
> -	io_end->size = size;
> -	/*
> -	 * We need to hold a reference to the page to make sure it
> -	 * doesn't get evicted before ext4_end_io_work() has a chance
> -	 * to convert the extent from written to unwritten.
> -	 */
> -	io_end->page = page;
> -	get_page(io_end->page);
> -
> -	bh->b_private = io_end;
> -	bh->b_end_io = ext4_end_io_buffer_write;
> -	return 0;
> -}
> -
>  /*
>   * For ext4 extent files, ext4 will do direct-io write to holes,
>   * preallocated extents, and those write extend the file, no need to
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index e09f7d1..afbe974 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1280,8 +1280,8 @@ static const match_table_t tokens = {
>  	{Opt_stripe, "stripe=%u"},
>  	{Opt_delalloc, "delalloc"},
>  	{Opt_nodelalloc, "nodelalloc"},
> -	{Opt_mblk_io_submit, "mblk_io_submit"},
> -	{Opt_nomblk_io_submit, "nomblk_io_submit"},
> +	{Opt_removed, "mblk_io_submit"},
> +	{Opt_removed, "nomblk_io_submit"},
>  	{Opt_block_validity, "block_validity"},
>  	{Opt_noblock_validity, "noblock_validity"},
>  	{Opt_inode_readahead_blks, "inode_readahead_blks=%u"},
> @@ -1414,8 +1414,6 @@ static const struct mount_opts {
>  	{Opt_bsd_df, EXT4_MOUNT_MINIX_DF, MOPT_CLEAR},
>  	{Opt_grpid, EXT4_MOUNT_GRPID, MOPT_SET},
>  	{Opt_nogrpid, EXT4_MOUNT_GRPID, MOPT_CLEAR},
> -	{Opt_mblk_io_submit, EXT4_MOUNT_MBLK_IO_SUBMIT, MOPT_SET},
> -	{Opt_nomblk_io_submit, EXT4_MOUNT_MBLK_IO_SUBMIT, MOPT_CLEAR},
>  	{Opt_block_validity, EXT4_MOUNT_BLOCK_VALIDITY, MOPT_SET},
>  	{Opt_noblock_validity, EXT4_MOUNT_BLOCK_VALIDITY, MOPT_CLEAR},
>  	{Opt_dioread_nolock, EXT4_MOUNT_DIOREAD_NOLOCK, MOPT_SET},
> @@ -3373,7 +3371,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>  #ifdef CONFIG_EXT4_FS_POSIX_ACL
>  	set_opt(sb, POSIX_ACL);
>  #endif
> -	set_opt(sb, MBLK_IO_SUBMIT);
>  	if ((def_mount_opts & EXT4_DEFM_JMODE) == EXT4_DEFM_JMODE_DATA)
>  		set_opt(sb, JOURNAL_DATA);
>  	else if ((def_mount_opts & EXT4_DEFM_JMODE) == EXT4_DEFM_JMODE_ORDERED)
> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] ext4: Use redirty_page_for_writepage() in ext4_bio_write_page()
  2013-01-02 17:45 ` [PATCH 2/5] ext4: Use redirty_page_for_writepage() in ext4_bio_write_page() Jan Kara
@ 2013-01-04  7:20   ` Zheng Liu
  2013-01-17 18:35   ` Theodore Ts'o
  1 sibling, 0 replies; 19+ messages in thread
From: Zheng Liu @ 2013-01-04  7:20 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ted Tso, linux-ext4

On Wed, Jan 02, 2013 at 06:45:41PM +0100, Jan Kara wrote:
> When we cannot write a page we should use redirty_page_for_writepage()
> instead of plain set_page_dirty(). That tells writeback code we have
> problems, redirties only the page (redirtying buffers is not needed),
> and updates mm accounting of failed page writes.
> 
> Also move clearing of buffer dirty flag after io_submit_add_bh(). At that
> moment we are sure buffer will be going to disk.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Regards,
                                                - Zheng

> ---
>  fs/ext4/page-io.c |    7 ++++---
>  1 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
> index 0016fbc..8e5722a 100644
> --- a/fs/ext4/page-io.c
> +++ b/fs/ext4/page-io.c
> @@ -23,6 +23,7 @@
>  #include <linux/workqueue.h>
>  #include <linux/kernel.h>
>  #include <linux/slab.h>
> +#include <linux/mm.h>
>  
>  #include "ext4_jbd2.h"
>  #include "xattr.h"
> @@ -436,7 +437,7 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
>  
>  	io_page = kmem_cache_alloc(io_page_cachep, GFP_NOFS);
>  	if (!io_page) {
> -		set_page_dirty(page);
> +		redirty_page_for_writepage(wbc, page);
>  		unlock_page(page);
>  		return -ENOMEM;
>  	}
> @@ -468,7 +469,6 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
>  			set_buffer_uptodate(bh);
>  			continue;
>  		}
> -		clear_buffer_dirty(bh);
>  		ret = io_submit_add_bh(io, io_page, inode, wbc, bh);
>  		if (ret) {
>  			/*
> @@ -476,9 +476,10 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
>  			 * we can do but mark the page as dirty, and
>  			 * better luck next time.
>  			 */
> -			set_page_dirty(page);
> +			redirty_page_for_writepage(wbc, page);
>  			break;
>  		}
> +		clear_buffer_dirty(bh);
>  	}
>  	unlock_page(page);
>  	/*
> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/5] ext4: Remove bogus wait for unwritten extents in ext4_ind_direct_IO
  2013-01-02 17:45 ` [PATCH 3/5] ext4: Remove bogus wait for unwritten extents in ext4_ind_direct_IO Jan Kara
@ 2013-01-04  7:24   ` Zheng Liu
  2013-01-17 21:58     ` Theodore Ts'o
  0 siblings, 1 reply; 19+ messages in thread
From: Zheng Liu @ 2013-01-04  7:24 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ted Tso, linux-ext4

On Wed, Jan 02, 2013 at 06:45:42PM +0100, Jan Kara wrote:
> When using indirect blocks there is no possibility to have any unwritten
> extents. So wait for them in ext4_ind_direct_IO() is just bogus.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Hi Jan,

Just for the note, this patch conflicts with my patch set of extent
status tree.  I guess your patch series will be applied before my patch
set.  So I will rebase my patch set against the latest kernel. :-)

Otherwise, the patch looks good to me.  You can add:
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>

Regards,
                                                - Zheng

> ---
>  fs/ext4/indirect.c |    5 -----
>  1 files changed, 0 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
> index 20862f9..993247c 100644
> --- a/fs/ext4/indirect.c
> +++ b/fs/ext4/indirect.c
> @@ -807,11 +807,6 @@ ssize_t ext4_ind_direct_IO(int rw, struct kiocb *iocb,
>  
>  retry:
>  	if (rw == READ && ext4_should_dioread_nolock(inode)) {
> -		if (unlikely(atomic_read(&EXT4_I(inode)->i_unwritten))) {
> -			mutex_lock(&inode->i_mutex);
> -			ext4_flush_unwritten_io(inode);
> -			mutex_unlock(&inode->i_mutex);
> -		}
>  		/*
>  		 * Nolock dioread optimization may be dynamically disabled
>  		 * via ext4_inode_block_unlocked_dio(). Check inode's state
> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/5] ext4: Disable merging of uninitialized extents
  2013-01-02 17:45 ` [PATCH 4/5] ext4: Disable merging of uninitialized extents Jan Kara
@ 2013-01-04  7:25   ` Zheng Liu
  0 siblings, 0 replies; 19+ messages in thread
From: Zheng Liu @ 2013-01-04  7:25 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ted Tso, linux-ext4

On Wed, Jan 02, 2013 at 06:45:43PM +0100, Jan Kara wrote:
> Merging of uninitialized extents creates all sorts of interesting race
> possibilities when writeback / DIO races with fallocate. Thus
> ext4_convert_unwritten_extents_endio() has to deal with a case where
> extent to be converted needs to be split out first. That isn't nice
> for two reasons:
> 
> 1) It may need allocation of extent tree block so ENOSPC is possible.
> 2) It complicates end_io handling code
> 
> So we disable merging of uninitialized extents which allows us to simplify
> the code. Extents will get merged after they are converted to initialized
> ones.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Regards,
                                                - Zheng

> ---
>  fs/ext4/extents.c |   61 +++++++++++++++-------------------------------------
>  1 files changed, 18 insertions(+), 43 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 26af228..f1ce33a 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -54,9 +54,6 @@
>  #define EXT4_EXT_MARK_UNINIT1	0x2  /* mark first half uninitialized */
>  #define EXT4_EXT_MARK_UNINIT2	0x4  /* mark second half uninitialized */
>  
> -#define EXT4_EXT_DATA_VALID1	0x8  /* first half contains valid data */
> -#define EXT4_EXT_DATA_VALID2	0x10 /* second half contains valid data */
> -
>  static __le32 ext4_extent_block_csum(struct inode *inode,
>  				     struct ext4_extent_header *eh)
>  {
> @@ -1579,20 +1576,17 @@ int
>  ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1,
>  				struct ext4_extent *ex2)
>  {
> -	unsigned short ext1_ee_len, ext2_ee_len, max_len;
> +	unsigned ext1_ee_len, ext2_ee_len;
>  
>  	/*
> -	 * Make sure that either both extents are uninitialized, or
> -	 * both are _not_.
> +	 * Make sure that both extents are initialized. We don't merge
> +	 * uninitialized extents so that we can be sure that end_io code has
> +	 * the extent that was written properly split out and conversion to
> +	 * initialized is trivial.
>  	 */
> -	if (ext4_ext_is_uninitialized(ex1) ^ ext4_ext_is_uninitialized(ex2))
> +	if (ext4_ext_is_uninitialized(ex1) || ext4_ext_is_uninitialized(ex2))
>  		return 0;
>  
> -	if (ext4_ext_is_uninitialized(ex1))
> -		max_len = EXT_UNINIT_MAX_LEN;
> -	else
> -		max_len = EXT_INIT_MAX_LEN;
> -
>  	ext1_ee_len = ext4_ext_get_actual_len(ex1);
>  	ext2_ee_len = ext4_ext_get_actual_len(ex2);
>  
> @@ -1605,7 +1599,7 @@ ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1,
>  	 * as an RO_COMPAT feature, refuse to merge to extents if
>  	 * this can result in the top bit of ee_len being set.
>  	 */
> -	if (ext1_ee_len + ext2_ee_len > max_len)
> +	if (ext1_ee_len + ext2_ee_len > EXT_INIT_MAX_LEN)
>  		return 0;
>  #ifdef AGGRESSIVE_TEST
>  	if (ext1_ee_len >= 4)
> @@ -2959,9 +2953,6 @@ static int ext4_split_extent_at(handle_t *handle,
>  	unsigned int ee_len, depth;
>  	int err = 0;
>  
> -	BUG_ON((split_flag & (EXT4_EXT_DATA_VALID1 | EXT4_EXT_DATA_VALID2)) ==
> -	       (EXT4_EXT_DATA_VALID1 | EXT4_EXT_DATA_VALID2));
> -
>  	ext_debug("ext4_split_extents_at: inode %lu, logical"
>  		"block %llu\n", inode->i_ino, (unsigned long long)split);
>  
> @@ -3020,14 +3011,7 @@ static int ext4_split_extent_at(handle_t *handle,
>  
>  	err = ext4_ext_insert_extent(handle, inode, path, &newex, flags);
>  	if (err == -ENOSPC && (EXT4_EXT_MAY_ZEROOUT & split_flag)) {
> -		if (split_flag & (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) {
> -			if (split_flag & EXT4_EXT_DATA_VALID1)
> -				err = ext4_ext_zeroout(inode, ex2);
> -			else
> -				err = ext4_ext_zeroout(inode, ex);
> -		} else
> -			err = ext4_ext_zeroout(inode, &orig_ex);
> -
> +		err = ext4_ext_zeroout(inode, &orig_ex);
>  		if (err)
>  			goto fix_extent_len;
>  		/* update the extent length and mark as initialized */
> @@ -3085,8 +3069,6 @@ static int ext4_split_extent(handle_t *handle,
>  		if (uninitialized)
>  			split_flag1 |= EXT4_EXT_MARK_UNINIT1 |
>  				       EXT4_EXT_MARK_UNINIT2;
> -		if (split_flag & EXT4_EXT_DATA_VALID2)
> -			split_flag1 |= EXT4_EXT_DATA_VALID1;
>  		err = ext4_split_extent_at(handle, inode, path,
>  				map->m_lblk + map->m_len, split_flag1, flags1);
>  		if (err)
> @@ -3099,8 +3081,7 @@ static int ext4_split_extent(handle_t *handle,
>  		return PTR_ERR(path);
>  
>  	if (map->m_lblk >= ee_block) {
> -		split_flag1 = split_flag & (EXT4_EXT_MAY_ZEROOUT |
> -					    EXT4_EXT_DATA_VALID2);
> +		split_flag1 = split_flag & EXT4_EXT_MAY_ZEROOUT;
>  		if (uninitialized)
>  			split_flag1 |= EXT4_EXT_MARK_UNINIT1;
>  		if (split_flag & EXT4_EXT_MARK_UNINIT2)
> @@ -3379,8 +3360,7 @@ static int ext4_split_unwritten_extents(handle_t *handle,
>  
>  	split_flag |= ee_block + ee_len <= eof_block ? EXT4_EXT_MAY_ZEROOUT : 0;
>  	split_flag |= EXT4_EXT_MARK_UNINIT2;
> -	if (flags & EXT4_GET_BLOCKS_CONVERT)
> -		split_flag |= EXT4_EXT_DATA_VALID2;
> +
>  	flags |= EXT4_GET_BLOCKS_PRE_IO;
>  	return ext4_split_extent(handle, inode, path, map, split_flag, flags);
>  }
> @@ -3405,20 +3385,15 @@ static int ext4_convert_unwritten_extents_endio(handle_t *handle,
>  		"block %llu, max_blocks %u\n", inode->i_ino,
>  		  (unsigned long long)ee_block, ee_len);
>  
> -	/* If extent is larger than requested then split is required */
> +	/* Extent is larger than requested? */
>  	if (ee_block != map->m_lblk || ee_len > map->m_len) {
> -		err = ext4_split_unwritten_extents(handle, inode, map, path,
> -						   EXT4_GET_BLOCKS_CONVERT);
> -		if (err < 0)
> -			goto out;
> -		ext4_ext_drop_refs(path);
> -		path = ext4_ext_find_extent(inode, map->m_lblk, path);
> -		if (IS_ERR(path)) {
> -			err = PTR_ERR(path);
> -			goto out;
> -		}
> -		depth = ext_depth(inode);
> -		ex = path[depth].p_ext;
> +		EXT4_ERROR_INODE(inode, "Written extent modified before IO"
> +			" finished: extent logical block %llu, len %u; IO"
> +			" logical block %llu, len %u\n",
> +			(unsigned long long)ee_block, ee_len,
> +			(unsigned long long)map->m_lblk, map->m_len);
> +		err = -EIO;
> +		goto out;
>  	}
>  
>  	err = ext4_ext_get_access(handle, inode, path + depth);
> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] ext4: Remove unnecessary wait for extent conversion in ext4_fallocate()
  2013-01-02 17:45 ` [PATCH 5/5] ext4: Remove unnecessary wait for extent conversion in ext4_fallocate() Jan Kara
@ 2013-01-04  7:26   ` Zheng Liu
  0 siblings, 0 replies; 19+ messages in thread
From: Zheng Liu @ 2013-01-04  7:26 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ted Tso, linux-ext4

On Wed, Jan 02, 2013 at 06:45:44PM +0100, Jan Kara wrote:
> Now that we don't merge uninitialized extents anymore, ext4_fallocate()
> is free to operate on the inode while there are still some extent
> conversions pending - it won't disturb them in any way.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Regards,
                                                - Zheng

> ---
>  fs/ext4/extents.c |    2 --
>  1 files changed, 0 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index f1ce33a..5c7a46a 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4406,8 +4406,6 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>  	if (len <= EXT_UNINIT_MAX_LEN << blkbits)
>  		flags |= EXT4_GET_BLOCKS_NO_NORMALIZE;
>  
> -	/* Prevent race condition between unwritten */
> -	ext4_flush_unwritten_io(inode);
>  retry:
>  	while (ret >= 0 && ret < max_blocks) {
>  		map.m_lblk = map.m_lblk + ret;
> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] ext4: Always use ext4_bio_write_page() for writeout
  2013-01-02 17:45 ` [PATCH 1/5] ext4: Always use ext4_bio_write_page() for writeout Jan Kara
  2013-01-03 14:18   ` Jan Kara
  2013-01-04  7:18   ` Zheng Liu
@ 2013-01-17 18:31   ` Theodore Ts'o
  2013-01-17 21:30     ` Jan Kara
  2 siblings, 1 reply; 19+ messages in thread
From: Theodore Ts'o @ 2013-01-17 18:31 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4

On Wed, Jan 02, 2013 at 06:45:40PM +0100, Jan Kara wrote:
> Currently we sometimes used block_write_full_page() and sometimes
> ext4_bio_write_page() for writeback (depending on mount options and call
> path). Let's always use ext4_bio_write_page() to simplify things a bit.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

I left in block_write_full_page() deliberately because it was
page_io.c was very tricky to get right.  When you make changes, if you
aren't careful, you can end up dereferencing a data structure after
it's been freed, which sometimes doesn't become visible until you do
some very serious stress testing.  (We had one case where it only
showed up if you were using xfstests in a VM with the memory cranked
down to a ridiculously low amount of memory.)  So having a way to
disable the page_io.c code path was useful in trying to debug things.

Fortunately, we haven't had a bug in that part of the ext4 code base
in quite a while, so maybe it's time for us to get rid of this
alternate code path.

> @@ -2885,11 +2849,6 @@ static void ext4_invalidatepage(struct page *page, unsigned long offset)
>  	trace_ext4_invalidatepage(page, offset);
>  
>  	/*
> -	 * free any io_end structure allocated for buffers to be discarded
> -	 */
> -	if (ext4_should_dioread_nolock(page->mapping->host))
> -		ext4_invalidatepage_free_endio(page, offset);

What does this have to with always using ext4_bio_write_page()?  It
looks like this was a change that leaked from one of your follow-on
commits?

There was other removals of other functions, such as
ext4_set_bh_endio(), which I think should be broken out into another
commit.

					- Ted

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] ext4: Use redirty_page_for_writepage() in ext4_bio_write_page()
  2013-01-02 17:45 ` [PATCH 2/5] ext4: Use redirty_page_for_writepage() in ext4_bio_write_page() Jan Kara
  2013-01-04  7:20   ` Zheng Liu
@ 2013-01-17 18:35   ` Theodore Ts'o
  2013-01-17 21:24     ` Jan Kara
  1 sibling, 1 reply; 19+ messages in thread
From: Theodore Ts'o @ 2013-01-17 18:35 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4

On Wed, Jan 02, 2013 at 06:45:41PM +0100, Jan Kara wrote:
> When we cannot write a page we should use redirty_page_for_writepage()
> instead of plain set_page_dirty(). That tells writeback code we have
> problems, redirties only the page (redirtying buffers is not needed),
> and updates mm accounting of failed page writes.
> 
> Also move clearing of buffer dirty flag after io_submit_add_bh(). At that
> moment we are sure buffer will be going to disk.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

This sounds like a bug fix, not just a clean up.  Do you think the
impact of not using redirty_page_for_writeback() is bad enough that we
should add a cc: stable@vger.kernel.org tag?  As I recall, not doing
this would end up leaving the page flags and radix tree to be
inconsistent, right?

	    	       	   	      - Ted

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] ext4: Use redirty_page_for_writepage() in ext4_bio_write_page()
  2013-01-17 18:35   ` Theodore Ts'o
@ 2013-01-17 21:24     ` Jan Kara
  0 siblings, 0 replies; 19+ messages in thread
From: Jan Kara @ 2013-01-17 21:24 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Jan Kara, linux-ext4

On Thu 17-01-13 13:35:01, Ted Tso wrote:
> On Wed, Jan 02, 2013 at 06:45:41PM +0100, Jan Kara wrote:
> > When we cannot write a page we should use redirty_page_for_writepage()
> > instead of plain set_page_dirty(). That tells writeback code we have
> > problems, redirties only the page (redirtying buffers is not needed),
> > and updates mm accounting of failed page writes.
> > 
> > Also move clearing of buffer dirty flag after io_submit_add_bh(). At that
> > moment we are sure buffer will be going to disk.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> This sounds like a bug fix, not just a clean up.  Do you think the
  Yes, it's a bug fix.

> impact of not using redirty_page_for_writeback() is bad enough that we
> should add a cc: stable@vger.kernel.org tag?  As I recall, not doing
> this would end up leaving the page flags and radix tree to be
> inconsistent, right?
  No, it shouldn't cause any such serious harm. Just writeback may retry
writing the inode immediately (it backs off if it sees skipped pages) which
could result in needless spinning. Also some mm/bdi statistics will get
slightly off. But all in all I don't think it's a stable material.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] ext4: Always use ext4_bio_write_page() for writeout
  2013-01-17 18:31   ` Theodore Ts'o
@ 2013-01-17 21:30     ` Jan Kara
  0 siblings, 0 replies; 19+ messages in thread
From: Jan Kara @ 2013-01-17 21:30 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Jan Kara, linux-ext4

On Thu 17-01-13 13:31:37, Ted Tso wrote:
> On Wed, Jan 02, 2013 at 06:45:40PM +0100, Jan Kara wrote:
> > Currently we sometimes used block_write_full_page() and sometimes
> > ext4_bio_write_page() for writeback (depending on mount options and call
> > path). Let's always use ext4_bio_write_page() to simplify things a bit.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> I left in block_write_full_page() deliberately because it was
> page_io.c was very tricky to get right.  When you make changes, if you
> aren't careful, you can end up dereferencing a data structure after
> it's been freed, which sometimes doesn't become visible until you do
> some very serious stress testing.  (We had one case where it only
> showed up if you were using xfstests in a VM with the memory cranked
> down to a ridiculously low amount of memory.)  So having a way to
> disable the page_io.c code path was useful in trying to debug things.
  Yes, I understand this. Just I thought the time has come we may get rid
of the old code.

> Fortunately, we haven't had a bug in that part of the ext4 code base
> in quite a while, so maybe it's time for us to get rid of this
> alternate code path.
> 
> > @@ -2885,11 +2849,6 @@ static void ext4_invalidatepage(struct page *page, unsigned long offset)
> >  	trace_ext4_invalidatepage(page, offset);
> >  
> >  	/*
> > -	 * free any io_end structure allocated for buffers to be discarded
> > -	 */
> > -	if (ext4_should_dioread_nolock(page->mapping->host))
> > -		ext4_invalidatepage_free_endio(page, offset);
> 
> What does this have to with always using ext4_bio_write_page()?  It
> looks like this was a change that leaked from one of your follow-on
> commits?
> 
> There was other removals of other functions, such as
> ext4_set_bh_endio(), which I think should be broken out into another
> commit.
  Once we stop using block_write_full_page(), all the removed functions
(including the change to ext4_invalidatepage() - that is there only for
nomblk_io_submit,dioread_nolock case) become unused so I removed them.

If you think removing of unused functions should be in a separate patch I
can do that but frankly I don't see a point. So do you mean it?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/5] ext4: Remove bogus wait for unwritten extents in ext4_ind_direct_IO
  2013-01-04  7:24   ` Zheng Liu
@ 2013-01-17 21:58     ` Theodore Ts'o
  2013-01-17 23:02       ` Jan Kara
  0 siblings, 1 reply; 19+ messages in thread
From: Theodore Ts'o @ 2013-01-17 21:58 UTC (permalink / raw)
  To: Jan Kara, linux-ext4

On Fri, Jan 04, 2013 at 03:24:37PM +0800, Zheng Liu wrote:
> On Wed, Jan 02, 2013 at 06:45:42PM +0100, Jan Kara wrote:
> > When using indirect blocks there is no possibility to have any unwritten
> > extents. So wait for them in ext4_ind_direct_IO() is just bogus.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> Hi Jan,
> 
> Just for the note, this patch conflicts with my patch set of extent
> status tree.  I guess your patch series will be applied before my patch
> set.  So I will rebase my patch set against the latest kernel. :-)

Actually, the extent status tree patches are already in my tree,
although I'm still testing and reviewing them. so they haven't been
finalized yet (which is why I haven't sent an e-mail ack).  If the
conflict is minor, I'll take care of it.  If it's non-trivial, I'll
yell for help.  :-)

					- Ted

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/5] ext4: Remove bogus wait for unwritten extents in ext4_ind_direct_IO
  2013-01-17 21:58     ` Theodore Ts'o
@ 2013-01-17 23:02       ` Jan Kara
  2013-01-18  2:15         ` Zheng Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Kara @ 2013-01-17 23:02 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Jan Kara, linux-ext4

On Thu 17-01-13 16:58:14, Ted Tso wrote:
> On Fri, Jan 04, 2013 at 03:24:37PM +0800, Zheng Liu wrote:
> > On Wed, Jan 02, 2013 at 06:45:42PM +0100, Jan Kara wrote:
> > > When using indirect blocks there is no possibility to have any unwritten
> > > extents. So wait for them in ext4_ind_direct_IO() is just bogus.
> > > 
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > 
> > Hi Jan,
> > 
> > Just for the note, this patch conflicts with my patch set of extent
> > status tree.  I guess your patch series will be applied before my patch
> > set.  So I will rebase my patch set against the latest kernel. :-)
> 
> Actually, the extent status tree patches are already in my tree,
> although I'm still testing and reviewing them. so they haven't been
> finalized yet (which is why I haven't sent an e-mail ack).  If the
> conflict is minor, I'll take care of it.  If it's non-trivial, I'll
> yell for help.  :-)
  This patch actually isn't in Zheng's latest submission so there shouldn't
be any conflict.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/5] ext4: Remove bogus wait for unwritten extents in ext4_ind_direct_IO
  2013-01-17 23:02       ` Jan Kara
@ 2013-01-18  2:15         ` Zheng Liu
  0 siblings, 0 replies; 19+ messages in thread
From: Zheng Liu @ 2013-01-18  2:15 UTC (permalink / raw)
  To: Jan Kara; +Cc: Theodore Ts'o, linux-ext4

On Fri, Jan 18, 2013 at 12:02:39AM +0100, Jan Kara wrote:
> On Thu 17-01-13 16:58:14, Ted Tso wrote:
> > On Fri, Jan 04, 2013 at 03:24:37PM +0800, Zheng Liu wrote:
> > > On Wed, Jan 02, 2013 at 06:45:42PM +0100, Jan Kara wrote:
> > > > When using indirect blocks there is no possibility to have any unwritten
> > > > extents. So wait for them in ext4_ind_direct_IO() is just bogus.
> > > > 
> > > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > 
> > > Hi Jan,
> > > 
> > > Just for the note, this patch conflicts with my patch set of extent
> > > status tree.  I guess your patch series will be applied before my patch
> > > set.  So I will rebase my patch set against the latest kernel. :-)
> > 
> > Actually, the extent status tree patches are already in my tree,
> > although I'm still testing and reviewing them. so they haven't been
> > finalized yet (which is why I haven't sent an e-mail ack).  If the
> > conflict is minor, I'll take care of it.  If it's non-trivial, I'll
> > yell for help.  :-)
>   This patch actually isn't in Zheng's latest submission so there shouldn't
> be any conflict.

Hi Ted,

Sorry for the delay reply because of travelling.  As Jan said above, I
have dropped the patch of unwritten extent conversion from the patch set
of extent status tree.  So there isn't any conflict.

Thanks,
                                                - Zheng

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-01-18  2:02 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-02 17:45 [PATCH 0/5] ext4: Several simplifications Jan Kara
2013-01-02 17:45 ` [PATCH 1/5] ext4: Always use ext4_bio_write_page() for writeout Jan Kara
2013-01-03 14:18   ` Jan Kara
2013-01-04  7:18   ` Zheng Liu
2013-01-17 18:31   ` Theodore Ts'o
2013-01-17 21:30     ` Jan Kara
2013-01-02 17:45 ` [PATCH 2/5] ext4: Use redirty_page_for_writepage() in ext4_bio_write_page() Jan Kara
2013-01-04  7:20   ` Zheng Liu
2013-01-17 18:35   ` Theodore Ts'o
2013-01-17 21:24     ` Jan Kara
2013-01-02 17:45 ` [PATCH 3/5] ext4: Remove bogus wait for unwritten extents in ext4_ind_direct_IO Jan Kara
2013-01-04  7:24   ` Zheng Liu
2013-01-17 21:58     ` Theodore Ts'o
2013-01-17 23:02       ` Jan Kara
2013-01-18  2:15         ` Zheng Liu
2013-01-02 17:45 ` [PATCH 4/5] ext4: Disable merging of uninitialized extents Jan Kara
2013-01-04  7:25   ` Zheng Liu
2013-01-02 17:45 ` [PATCH 5/5] ext4: Remove unnecessary wait for extent conversion in ext4_fallocate() Jan Kara
2013-01-04  7:26   ` Zheng Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).