linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH v2 0/3] ext4: dio overwrite nolock
@ 2012-06-14  3:32 Zheng Liu
  2012-06-14  3:32 ` [RFC][PATCH v2 1/3] ext4: split ext4_file_write into buffered IO and direct IO Zheng Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Zheng Liu @ 2012-06-14  3:32 UTC (permalink / raw)
  To: linux-ext4; +Cc: Zheng Liu, Tao Ma, Eric Sandeen

Hello list,

Here is the second version of dio overwrite nolock.  In this version, I rework
the stuff as Eric said in order to avoid to copy almost all of
__generic_file_aio_write back into ext4.  Meanwhile, I fix some problems
according to Tao's reply.

This patch set can improve the performance of ext4 when the user does a dio
overwrite because, when a dio overwrite occurs, we don't need to take i_mutex
lock in some conditons.  The condition includes the size of file doesn't be
changed, no buffered I/O and align aio.  So dio write can be parallelized in
these conditions.

In patch 1, ext4_file_dio_write is defined to split buffered I/O and direct I/O
in ext4_file_write so that some code can be added to check whether we can do a
dio overwrite without i_mutex lock later.

In patch 2, a new flag called EXT4_GET_BLOCKS_NOLOCK and a new get_block
function that is named ext4_get_block_write_nolock are defined to do a lookup to
let me know whether the extent of the file at this offset has been initialized
because we need to know whether a dio overwrite needs to modify the metadata of
the file or not.

In patch 3, we implement dio overwrite nolock.  In ext4_file_dio_write, we check
whether we can do a dio overwrite without lock.  Then we use 'iocb->private'
to store this flag to tell ext4_ext_direct_IO to handle it because
file_update_time will start a new journal and it will cause a deadlock.  So we
need to finish to update file time with i_mutex lock, and release lock in
ext4_ext_direct_IO.

v2 <- v1:
 * rebase to 3.5
 * rework ext4_file_dio_write to avoid to copy vfs's code back into ext4
 * add some comments to explain how to determine whether we can do a nolocking
   overwrite dio

In this thread [1], it is the first version of patchset.
1. http://www.spinics.net/lists/linux-ext4/msg31859.html

Regards,
Zheng

Zheng Liu (3):
      ext4: split ext4_file_write into buffered IO and direct IO
      ext4: add a new flag for ext4_map_blocks
      ext4: add dio overwrite nolock

 fs/ext4/ext4.h  |    2 +
 fs/ext4/file.c  |  109 ++++++++++++++++++++++++++++++++++++++++++++-----------
 fs/ext4/inode.c |   86 ++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 165 insertions(+), 32 deletions(-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC][PATCH v2 1/3] ext4: split ext4_file_write into buffered IO and direct IO
  2012-06-14  3:32 [RFC][PATCH v2 0/3] ext4: dio overwrite nolock Zheng Liu
@ 2012-06-14  3:32 ` Zheng Liu
  2012-06-14  3:32 ` [RFC][PATCH v2 2/3] ext4: add a new flag for ext4_map_blocks Zheng Liu
  2012-06-14  3:32 ` [RFC][PATCH v2 3/3] ext4: add dio overwrite nolock Zheng Liu
  2 siblings, 0 replies; 7+ messages in thread
From: Zheng Liu @ 2012-06-14  3:32 UTC (permalink / raw)
  To: linux-ext4; +Cc: Tao Ma, Eric Sandeen, Zheng Liu

From: Zheng Liu <wenqing.lz@taobao.com>

ext4_file_dio_write is defined in order to split buffered IO and
direct IO in ext4.  This patch just refactor some stuff in write path.

CC: Tao Ma <tm@tao.ma>
CC: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
---
 fs/ext4/file.c |   60 +++++++++++++++++++++++++++++++++++--------------------
 1 files changed, 38 insertions(+), 22 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 8c7642a..a10dc77 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -90,34 +90,16 @@ ext4_unaligned_aio(struct inode *inode, const struct iovec *iov,
 }
 
 static ssize_t
-ext4_file_write(struct kiocb *iocb, const struct iovec *iov,
-		unsigned long nr_segs, loff_t pos)
+ext4_file_dio_write(struct kiocb *iocb, const struct iovec *iov,
+		    unsigned long nr_segs, loff_t pos)
 {
 	struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;
 	int unaligned_aio = 0;
 	ssize_t ret;
 
-	/*
-	 * If we have encountered a bitmap-format file, the size limit
-	 * is smaller than s_maxbytes, which is for extent-mapped files.
-	 */
-
-	if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) {
-		struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
-		size_t length = iov_length(iov, nr_segs);
-
-		if ((pos > sbi->s_bitmap_maxbytes ||
-		    (pos == sbi->s_bitmap_maxbytes && length > 0)))
-			return -EFBIG;
-
-		if (pos + length > sbi->s_bitmap_maxbytes) {
-			nr_segs = iov_shorten((struct iovec *)iov, nr_segs,
-					      sbi->s_bitmap_maxbytes - pos);
-		}
-	} else if (unlikely((iocb->ki_filp->f_flags & O_DIRECT) &&
-		   !is_sync_kiocb(iocb))) {
+	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS) &&
+	    !is_sync_kiocb(iocb))
 		unaligned_aio = ext4_unaligned_aio(inode, iov, nr_segs, pos);
-	}
 
 	/* Unaligned direct AIO must be serialized; see comment above */
 	if (unaligned_aio) {
@@ -141,6 +123,40 @@ ext4_file_write(struct kiocb *iocb, const struct iovec *iov,
 	return ret;
 }
 
+static ssize_t
+ext4_file_write(struct kiocb *iocb, const struct iovec *iov,
+		unsigned long nr_segs, loff_t pos)
+{
+	struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;
+	ssize_t ret;
+
+	/*
+	 * If we have encountered a bitmap-format file, the size limit
+	 * is smaller than s_maxbytes, which is for extent-mapped files.
+	 */
+
+	if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) {
+		struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+		size_t length = iov_length(iov, nr_segs);
+
+		if ((pos > sbi->s_bitmap_maxbytes ||
+		    (pos == sbi->s_bitmap_maxbytes && length > 0)))
+			return -EFBIG;
+
+		if (pos + length > sbi->s_bitmap_maxbytes) {
+			nr_segs = iov_shorten((struct iovec *)iov, nr_segs,
+					      sbi->s_bitmap_maxbytes - pos);
+		}
+	}
+
+	if (unlikely(iocb->ki_filp->f_flags & O_DIRECT))
+		ret = ext4_file_dio_write(iocb, iov, nr_segs, pos);
+	else
+		ret = generic_file_aio_write(iocb, iov, nr_segs, pos);
+
+	return ret;
+}
+
 static const struct vm_operations_struct ext4_file_vm_ops = {
 	.fault		= filemap_fault,
 	.page_mkwrite   = ext4_page_mkwrite,
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC][PATCH v2 2/3] ext4: add a new flag for ext4_map_blocks
  2012-06-14  3:32 [RFC][PATCH v2 0/3] ext4: dio overwrite nolock Zheng Liu
  2012-06-14  3:32 ` [RFC][PATCH v2 1/3] ext4: split ext4_file_write into buffered IO and direct IO Zheng Liu
@ 2012-06-14  3:32 ` Zheng Liu
  2012-06-15  9:29   ` Robin Dong
  2012-06-14  3:32 ` [RFC][PATCH v2 3/3] ext4: add dio overwrite nolock Zheng Liu
  2 siblings, 1 reply; 7+ messages in thread
From: Zheng Liu @ 2012-06-14  3:32 UTC (permalink / raw)
  To: linux-ext4; +Cc: Tao Ma, Eric Sandeen, Zheng Liu

From: Zheng Liu <wenqing.lz@taobao.com>

EXT4_GET_BLOCKS_NO_LOCK flag is added to indicate that we don't need to acquire
i_data_sem lock in ext4_map_blocks.  Meanwhile, it lets _ext4_get_block do not
start a new journal because when we do a overwrite dio, there is no any
metadata that needs to be modified.

We define a new function called ext4_get_block_write_nolock, which is used in
dio overwrite nolock.  In this function, it doesn't try to acquire i_data_sem
lock and doesn't start a new journal as it does a lookup.

CC: Tao Ma <tm@tao.ma>
CC: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
---
 fs/ext4/ext4.h  |    2 +
 fs/ext4/inode.c |   59 +++++++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 51 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index cfc4e01..d1a2b1e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -571,6 +571,8 @@ enum {
 #define EXT4_GET_BLOCKS_NO_NORMALIZE		0x0040
 	/* Request will not result in inode size update (user for fallocate) */
 #define EXT4_GET_BLOCKS_KEEP_SIZE		0x0080
+	/* Do not take i_data_sem locking in ext4_map_blocks */
+#define EXT4_GET_BLOCKS_NO_LOCK			0x0100
 
 /*
  * Flags used by ext4_free_blocks
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 02bc8cb..9a714ff 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -544,7 +544,8 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
 	 * Try to see if we can get the block without requesting a new
 	 * file system block.
 	 */
-	down_read((&EXT4_I(inode)->i_data_sem));
+	if (!(flags & EXT4_GET_BLOCKS_NO_LOCK))
+		down_read((&EXT4_I(inode)->i_data_sem));
 	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
 		retval = ext4_ext_map_blocks(handle, inode, map, flags &
 					     EXT4_GET_BLOCKS_KEEP_SIZE);
@@ -552,7 +553,8 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
 		retval = ext4_ind_map_blocks(handle, inode, map, flags &
 					     EXT4_GET_BLOCKS_KEEP_SIZE);
 	}
-	up_read((&EXT4_I(inode)->i_data_sem));
+	if (!(flags & EXT4_GET_BLOCKS_NO_LOCK))
+		up_read((&EXT4_I(inode)->i_data_sem));
 
 	if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
 		int ret = check_block_validity(inode, map);
@@ -2818,6 +2820,32 @@ static int ext4_get_block_write(struct inode *inode, sector_t iblock,
 			       EXT4_GET_BLOCKS_IO_CREATE_EXT);
 }
 
+static int ext4_get_block_write_nolock(struct inode *inode, sector_t iblock,
+		   struct buffer_head *bh_result, int create)
+{
+	handle_t *handle = ext4_journal_current_handle();
+	struct ext4_map_blocks map;
+	int ret = 0;
+
+	ext4_debug("ext4_get_block_write_nolock: inode %lu, create flag %d\n",
+		   inode->i_ino, create);
+
+	create = EXT4_GET_BLOCKS_NO_LOCK;
+
+	map.m_lblk = iblock;
+	map.m_len = bh_result->b_size >> inode->i_blkbits;
+
+	ret = ext4_map_blocks(handle, inode, &map, create);
+	if (ret > 0) {
+		map_bh(bh_result, inode->i_sb, map.m_pblk);
+		bh_result->b_state = (bh_result->b_state & ~EXT4_MAP_FLAGS) |
+					map.m_flags;
+		bh_result->b_size = inode->i_sb->s_blocksize * map.m_len;
+		ret = 0;
+	}
+	return ret;
+}
+
 static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
 			    ssize_t size, void *private, int ret,
 			    bool is_async)
@@ -2966,6 +2994,8 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
 
 	loff_t final_size = offset + count;
 	if (rw == WRITE && final_size <= inode->i_size) {
+		int overwrite = 0;
+
 		/*
  		 * We could direct write to holes and fallocate.
 		 *
@@ -3005,13 +3035,22 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
 			EXT4_I(inode)->cur_aio_dio = iocb->private;
 		}
 
-		ret = __blockdev_direct_IO(rw, iocb, inode,
-					 inode->i_sb->s_bdev, iov,
-					 offset, nr_segs,
-					 ext4_get_block_write,
-					 ext4_end_io_dio,
-					 NULL,
-					 DIO_LOCKING);
+		if (overwrite)
+			ret = __blockdev_direct_IO(rw, iocb, inode,
+						 inode->i_sb->s_bdev, iov,
+						 offset, nr_segs,
+						 ext4_get_block_write_nolock,
+						 ext4_end_io_dio,
+						 NULL,
+						 0);
+		else
+			ret = __blockdev_direct_IO(rw, iocb, inode,
+						 inode->i_sb->s_bdev, iov,
+						 offset, nr_segs,
+						 ext4_get_block_write,
+						 ext4_end_io_dio,
+						 NULL,
+						 DIO_LOCKING);
 		if (iocb->private)
 			EXT4_I(inode)->cur_aio_dio = NULL;
 		/*
@@ -3031,7 +3070,7 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
 		if (ret != -EIOCBQUEUED && ret <= 0 && iocb->private) {
 			ext4_free_io_end(iocb->private);
 			iocb->private = NULL;
-		} else if (ret > 0 && ext4_test_inode_state(inode,
+		} else if (ret > 0 && !overwrite && ext4_test_inode_state(inode,
 						EXT4_STATE_DIO_UNWRITTEN)) {
 			int err;
 			/*
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC][PATCH v2 3/3] ext4: add dio overwrite nolock
  2012-06-14  3:32 [RFC][PATCH v2 0/3] ext4: dio overwrite nolock Zheng Liu
  2012-06-14  3:32 ` [RFC][PATCH v2 1/3] ext4: split ext4_file_write into buffered IO and direct IO Zheng Liu
  2012-06-14  3:32 ` [RFC][PATCH v2 2/3] ext4: add a new flag for ext4_map_blocks Zheng Liu
@ 2012-06-14  3:32 ` Zheng Liu
  2012-06-15 10:16   ` Robin Dong
  2 siblings, 1 reply; 7+ messages in thread
From: Zheng Liu @ 2012-06-14  3:32 UTC (permalink / raw)
  To: linux-ext4; +Cc: Tao Ma, Eric Sandeen, Zheng Liu

From: Zheng Liu <wenqing.lz@taobao.com>

Aligned and overwrite direct I/O can be parallelized.  In ext4_file_dio_write,
we first check whether these conditions are satisfied or not.  If so, we
take i_data_sem and release i_mutex lock directly.  Meanwhile iocb->private is
set to indicate that this is a dio overwrite, and it will be handled in
ext4_ext_direct_IO.

CC: Tao Ma <tm@tao.ma>
CC: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
---
 fs/ext4/file.c  |   53 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 fs/ext4/inode.c |   27 +++++++++++++++++++++++++++
 2 files changed, 78 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index a10dc77..812358f 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -93,9 +93,13 @@ static ssize_t
 ext4_file_dio_write(struct kiocb *iocb, const struct iovec *iov,
 		    unsigned long nr_segs, loff_t pos)
 {
-	struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = file->f_mapping->host;
+	struct blk_plug plug;
 	int unaligned_aio = 0;
 	ssize_t ret;
+	int overwrite = 0;
+	size_t length = iov_length(iov, nr_segs);
 
 	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS) &&
 	    !is_sync_kiocb(iocb))
@@ -115,7 +119,52 @@ ext4_file_dio_write(struct kiocb *iocb, const struct iovec *iov,
 		ext4_aiodio_wait(inode);
 	}
 
-	ret = generic_file_aio_write(iocb, iov, nr_segs, pos);
+	BUG_ON(iocb->ki_pos != pos);
+
+	mutex_lock(&inode->i_mutex);
+	blk_start_plug(&plug);
+
+	iocb->private = &overwrite;
+
+	/* check whether we do a DIO overwrite or not */
+	if (ext4_should_dioread_nolock(inode) && !unaligned_aio &&
+	    !file->f_mapping->nrpages && pos + length <= i_size_read(inode)) {
+		struct ext4_map_blocks map;
+		unsigned int blkbits = inode->i_blkbits;
+		int err, len;
+
+		map.m_lblk = pos >> blkbits;
+		map.m_len = (EXT4_BLOCK_ALIGN(pos + length, blkbits) >> blkbits)
+			- map.m_lblk;
+		map.m_flags &= ~EXT4_MAP_FLAGS;
+		len = map.m_len;
+
+		err = ext4_map_blocks(NULL, inode, &map, 0);
+		/*
+		 * 'err==len' means that all of blocks has been preallocated no
+		 * matter they are initialized or not.  For excluding
+		 * uninitialized extents, we need to check m_flags.  There are
+		 * two conditions that indicate for initialized extents.
+		 * 1) If we hit extent cache, EXT4_MAP_MAPPED flag is returned;
+		 * 2) If we do a real lookup, non-flags are returned.
+		 * So we should check these two conditions.
+		 */
+		if (err == len && (!map.m_flags ||
+				   map.m_flags & EXT4_MAP_MAPPED))
+			overwrite = 1;
+	}
+
+	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
+	mutex_unlock(&inode->i_mutex);
+
+	if (ret > 0 || ret == -EIOCBQUEUED) {
+		ssize_t err;
+
+		err = generic_write_sync(file, pos, ret);
+		if (err < 0 && ret > 0)
+			ret = err;
+	}
+	blk_finish_plug(&plug);
 
 	if (unaligned_aio)
 		mutex_unlock(ext4_aio_mutex(inode));
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 9a714ff..98e9096 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2996,6 +2996,26 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
 	if (rw == WRITE && final_size <= inode->i_size) {
 		int overwrite = 0;
 
+		BUG_ON(iocb->private == NULL);
+
+		/* If we do a overwrite dio, i_mutex locking can be released */
+		overwrite = *((int *)iocb->private);
+
+		if (overwrite) {
+			down_read(&EXT4_I(inode)->i_data_sem);
+			mutex_unlock(&inode->i_mutex);
+		}
+
+		/*
+		 * If there are still some buffered I/O, we should fall back
+		 * to take i_mutex locking.
+		 */
+		if (overwrite && file->f_mapping->nrpages) {
+			overwrite = 0;
+			up_read(&EXT4_I(inode)->i_data_sem);
+			mutex_lock(&inode->i_mutex);
+		}
+
 		/*
  		 * We could direct write to holes and fallocate.
 		 *
@@ -3083,6 +3103,13 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
 				ret = err;
 			ext4_clear_inode_state(inode, EXT4_STATE_DIO_UNWRITTEN);
 		}
+
+		/* take i_mutex locking again if we do a ovewrite dio */
+		if (overwrite) {
+			up_read(&EXT4_I(inode)->i_data_sem);
+			mutex_lock(&inode->i_mutex);
+		}
+
 		return ret;
 	}
 
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH v2 2/3] ext4: add a new flag for ext4_map_blocks
  2012-06-14  3:32 ` [RFC][PATCH v2 2/3] ext4: add a new flag for ext4_map_blocks Zheng Liu
@ 2012-06-15  9:29   ` Robin Dong
  0 siblings, 0 replies; 7+ messages in thread
From: Robin Dong @ 2012-06-15  9:29 UTC (permalink / raw)
  To: Zheng Liu; +Cc: linux-ext4, Tao Ma, Eric Sandeen, Zheng Liu

2012/6/14 Zheng Liu <gnehzuil.liu@gmail.com>:
> From: Zheng Liu <wenqing.lz@taobao.com>
>
> EXT4_GET_BLOCKS_NO_LOCK flag is added to indicate that we don't need to acquire
> i_data_sem lock in ext4_map_blocks.  Meanwhile, it lets _ext4_get_block do not
> start a new journal because when we do a overwrite dio, there is no any
> metadata that needs to be modified.
>
> We define a new function called ext4_get_block_write_nolock, which is used in
> dio overwrite nolock.  In this function, it doesn't try to acquire i_data_sem
> lock and doesn't start a new journal as it does a lookup.
>
> CC: Tao Ma <tm@tao.ma>
> CC: Eric Sandeen <sandeen@redhat.com>
> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> ---
>  fs/ext4/ext4.h  |    2 +
>  fs/ext4/inode.c |   59 +++++++++++++++++++++++++++++++++++++++++++++---------
>  2 files changed, 51 insertions(+), 10 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index cfc4e01..d1a2b1e 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -571,6 +571,8 @@ enum {
>  #define EXT4_GET_BLOCKS_NO_NORMALIZE           0x0040
>        /* Request will not result in inode size update (user for fallocate) */
>  #define EXT4_GET_BLOCKS_KEEP_SIZE              0x0080
> +       /* Do not take i_data_sem locking in ext4_map_blocks */
> +#define EXT4_GET_BLOCKS_NO_LOCK                        0x0100
>
>  /*
>  * Flags used by ext4_free_blocks
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 02bc8cb..9a714ff 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -544,7 +544,8 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
>         * Try to see if we can get the block without requesting a new
>         * file system block.
>         */
> -       down_read((&EXT4_I(inode)->i_data_sem));
> +       if (!(flags & EXT4_GET_BLOCKS_NO_LOCK))
> +               down_read((&EXT4_I(inode)->i_data_sem));
>        if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
>                retval = ext4_ext_map_blocks(handle, inode, map, flags &
>                                             EXT4_GET_BLOCKS_KEEP_SIZE);
> @@ -552,7 +553,8 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
>                retval = ext4_ind_map_blocks(handle, inode, map, flags &
>                                             EXT4_GET_BLOCKS_KEEP_SIZE);
>        }
> -       up_read((&EXT4_I(inode)->i_data_sem));
> +       if (!(flags & EXT4_GET_BLOCKS_NO_LOCK))
> +               up_read((&EXT4_I(inode)->i_data_sem));
>
>        if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
>                int ret = check_block_validity(inode, map);
> @@ -2818,6 +2820,32 @@ static int ext4_get_block_write(struct inode *inode, sector_t iblock,
>                               EXT4_GET_BLOCKS_IO_CREATE_EXT);
>  }
>
> +static int ext4_get_block_write_nolock(struct inode *inode, sector_t iblock,
> +                  struct buffer_head *bh_result, int create)
> +{
> +       handle_t *handle = ext4_journal_current_handle();
> +       struct ext4_map_blocks map;
> +       int ret = 0;
> +
> +       ext4_debug("ext4_get_block_write_nolock: inode %lu, create flag %d\n",
> +                  inode->i_ino, create);
> +
> +       create = EXT4_GET_BLOCKS_NO_LOCK;

May be better to change the variable "create" to "flags"

> +
> +       map.m_lblk = iblock;
> +       map.m_len = bh_result->b_size >> inode->i_blkbits;
> +
> +       ret = ext4_map_blocks(handle, inode, &map, create);
> +       if (ret > 0) {
> +               map_bh(bh_result, inode->i_sb, map.m_pblk);
> +               bh_result->b_state = (bh_result->b_state & ~EXT4_MAP_FLAGS) |
> +                                       map.m_flags;
> +               bh_result->b_size = inode->i_sb->s_blocksize * map.m_len;
> +               ret = 0;
> +       }
> +       return ret;
> +}
> +
>  static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
>                            ssize_t size, void *private, int ret,
>                            bool is_async)
> @@ -2966,6 +2994,8 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
>
>        loff_t final_size = offset + count;
>        if (rw == WRITE && final_size <= inode->i_size) {
> +               int overwrite = 0;
> +
>                /*
>                 * We could direct write to holes and fallocate.
>                 *
> @@ -3005,13 +3035,22 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
>                        EXT4_I(inode)->cur_aio_dio = iocb->private;
>                }
>
> -               ret = __blockdev_direct_IO(rw, iocb, inode,
> -                                        inode->i_sb->s_bdev, iov,
> -                                        offset, nr_segs,
> -                                        ext4_get_block_write,
> -                                        ext4_end_io_dio,
> -                                        NULL,
> -                                        DIO_LOCKING);
> +               if (overwrite)
> +                       ret = __blockdev_direct_IO(rw, iocb, inode,
> +                                                inode->i_sb->s_bdev, iov,
> +                                                offset, nr_segs,
> +                                                ext4_get_block_write_nolock,
> +                                                ext4_end_io_dio,
> +                                                NULL,
> +                                                0);
> +               else
> +                       ret = __blockdev_direct_IO(rw, iocb, inode,
> +                                                inode->i_sb->s_bdev, iov,
> +                                                offset, nr_segs,
> +                                                ext4_get_block_write,
> +                                                ext4_end_io_dio,
> +                                                NULL,
> +                                                DIO_LOCKING);
>                if (iocb->private)
>                        EXT4_I(inode)->cur_aio_dio = NULL;
>                /*
> @@ -3031,7 +3070,7 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
>                if (ret != -EIOCBQUEUED && ret <= 0 && iocb->private) {
>                        ext4_free_io_end(iocb->private);
>                        iocb->private = NULL;
> -               } else if (ret > 0 && ext4_test_inode_state(inode,
> +               } else if (ret > 0 && !overwrite && ext4_test_inode_state(inode,
>                                                EXT4_STATE_DIO_UNWRITTEN)) {
>                        int err;
>                        /*
> --
> 1.7.4.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
--
Best Regard
Robin Dong
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH v2 3/3] ext4: add dio overwrite nolock
  2012-06-14  3:32 ` [RFC][PATCH v2 3/3] ext4: add dio overwrite nolock Zheng Liu
@ 2012-06-15 10:16   ` Robin Dong
  2012-06-15 11:02     ` Zheng Liu
  0 siblings, 1 reply; 7+ messages in thread
From: Robin Dong @ 2012-06-15 10:16 UTC (permalink / raw)
  To: Zheng Liu; +Cc: linux-ext4, Tao Ma, Eric Sandeen, Zheng Liu

2012/6/14 Zheng Liu <gnehzuil.liu@gmail.com>:
> From: Zheng Liu <wenqing.lz@taobao.com>
>
> Aligned and overwrite direct I/O can be parallelized.  In ext4_file_dio_write,
> we first check whether these conditions are satisfied or not.  If so, we
> take i_data_sem and release i_mutex lock directly.  Meanwhile iocb->private is
> set to indicate that this is a dio overwrite, and it will be handled in
> ext4_ext_direct_IO.
>
> CC: Tao Ma <tm@tao.ma>
> CC: Eric Sandeen <sandeen@redhat.com>
> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> ---
>  fs/ext4/file.c  |   53 +++++++++++++++++++++++++++++++++++++++++++++++++++--
>  fs/ext4/inode.c |   27 +++++++++++++++++++++++++++
>  2 files changed, 78 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index a10dc77..812358f 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -93,9 +93,13 @@ static ssize_t
>  ext4_file_dio_write(struct kiocb *iocb, const struct iovec *iov,
>                    unsigned long nr_segs, loff_t pos)
>  {
> -       struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;
> +       struct file *file = iocb->ki_filp;
> +       struct inode *inode = file->f_mapping->host;
> +       struct blk_plug plug;
>        int unaligned_aio = 0;
>        ssize_t ret;
> +       int overwrite = 0;
> +       size_t length = iov_length(iov, nr_segs);
>
>        if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS) &&
>            !is_sync_kiocb(iocb))
> @@ -115,7 +119,52 @@ ext4_file_dio_write(struct kiocb *iocb, const struct iovec *iov,
>                ext4_aiodio_wait(inode);
>        }
>
> -       ret = generic_file_aio_write(iocb, iov, nr_segs, pos);
> +       BUG_ON(iocb->ki_pos != pos);
> +
> +       mutex_lock(&inode->i_mutex);
> +       blk_start_plug(&plug);
> +
> +       iocb->private = &overwrite;
> +
> +       /* check whether we do a DIO overwrite or not */
> +       if (ext4_should_dioread_nolock(inode) && !unaligned_aio &&
> +           !file->f_mapping->nrpages && pos + length <= i_size_read(inode)) {
> +               struct ext4_map_blocks map;
> +               unsigned int blkbits = inode->i_blkbits;
> +               int err, len;
> +
> +               map.m_lblk = pos >> blkbits;
> +               map.m_len = (EXT4_BLOCK_ALIGN(pos + length, blkbits) >> blkbits)
> +                       - map.m_lblk;
> +               map.m_flags &= ~EXT4_MAP_FLAGS;
> +               len = map.m_len;
> +
> +               err = ext4_map_blocks(NULL, inode, &map, 0);

Nitpick:
May be better to change variable "err" to "ret"

> +               /*
> +                * 'err==len' means that all of blocks has been preallocated no
> +                * matter they are initialized or not.  For excluding
> +                * uninitialized extents, we need to check m_flags.  There are
> +                * two conditions that indicate for initialized extents.
> +                * 1) If we hit extent cache, EXT4_MAP_MAPPED flag is returned;
> +                * 2) If we do a real lookup, non-flags are returned.
> +                * So we should check these two conditions.
> +                */
> +               if (err == len && (!map.m_flags ||
> +                                  map.m_flags & EXT4_MAP_MAPPED))

If we do a real lookup in ext4_map_blocks, it also return with
EXT4_MAP_MAPPED flag, the condition should be:

                     if (err == len && (map.m_flags & EXT4_MAP_MAPPED))

> +                       overwrite = 1;
> +       }
> +
> +       ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
> +       mutex_unlock(&inode->i_mutex);
> +
> +       if (ret > 0 || ret == -EIOCBQUEUED) {
> +               ssize_t err;
> +
> +               err = generic_write_sync(file, pos, ret);
> +               if (err < 0 && ret > 0)
> +                       ret = err;
> +       }
> +       blk_finish_plug(&plug);
>
>        if (unaligned_aio)
>                mutex_unlock(ext4_aio_mutex(inode));
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 9a714ff..98e9096 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2996,6 +2996,26 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
>        if (rw == WRITE && final_size <= inode->i_size) {
>                int overwrite = 0;
>
> +               BUG_ON(iocb->private == NULL);
> +
> +               /* If we do a overwrite dio, i_mutex locking can be released */
> +               overwrite = *((int *)iocb->private);
> +
> +               if (overwrite) {
> +                       down_read(&EXT4_I(inode)->i_data_sem);
> +                       mutex_unlock(&inode->i_mutex);
> +               }
> +
> +               /*
> +                * If there are still some buffered I/O, we should fall back
> +                * to take i_mutex locking.
> +                */
> +               if (overwrite && file->f_mapping->nrpages) {
> +                       overwrite = 0;
> +                       up_read(&EXT4_I(inode)->i_data_sem);
> +                       mutex_lock(&inode->i_mutex);
> +               }
> +
>                /*
>                 * We could direct write to holes and fallocate.
>                 *
> @@ -3083,6 +3103,13 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
>                                ret = err;
>                        ext4_clear_inode_state(inode, EXT4_STATE_DIO_UNWRITTEN);
>                }
> +
> +               /* take i_mutex locking again if we do a ovewrite dio */
> +               if (overwrite) {
> +                       up_read(&EXT4_I(inode)->i_data_sem);
> +                       mutex_lock(&inode->i_mutex);
> +               }
> +
>                return ret;
>        }
>
> --
> 1.7.4.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
--
Best Regard
Robin Dong
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH v2 3/3] ext4: add dio overwrite nolock
  2012-06-15 10:16   ` Robin Dong
@ 2012-06-15 11:02     ` Zheng Liu
  0 siblings, 0 replies; 7+ messages in thread
From: Zheng Liu @ 2012-06-15 11:02 UTC (permalink / raw)
  To: Robin Dong; +Cc: linux-ext4, Tao Ma, Eric Sandeen, Zheng Liu

On Fri, Jun 15, 2012 at 06:16:29PM +0800, Robin Dong wrote:
> > +               /*
> > +                * 'err==len' means that all of blocks has been preallocated no
> > +                * matter they are initialized or not.  For excluding
> > +                * uninitialized extents, we need to check m_flags.  There are
> > +                * two conditions that indicate for initialized extents.
> > +                * 1) If we hit extent cache, EXT4_MAP_MAPPED flag is returned;
> > +                * 2) If we do a real lookup, non-flags are returned.
> > +                * So we should check these two conditions.
> > +                */
> > +               if (err == len && (!map.m_flags ||
> > +                                  map.m_flags & EXT4_MAP_MAPPED))
> 
> If we do a real lookup in ext4_map_blocks, it also return with
> EXT4_MAP_MAPPED flag, the condition should be:
> 
>                      if (err == len && (map.m_flags & EXT4_MAP_MAPPED))

Yes, you are right.  I will fix it in next version.

Regards,
Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-06-15 10:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-14  3:32 [RFC][PATCH v2 0/3] ext4: dio overwrite nolock Zheng Liu
2012-06-14  3:32 ` [RFC][PATCH v2 1/3] ext4: split ext4_file_write into buffered IO and direct IO Zheng Liu
2012-06-14  3:32 ` [RFC][PATCH v2 2/3] ext4: add a new flag for ext4_map_blocks Zheng Liu
2012-06-15  9:29   ` Robin Dong
2012-06-14  3:32 ` [RFC][PATCH v2 3/3] ext4: add dio overwrite nolock Zheng Liu
2012-06-15 10:16   ` Robin Dong
2012-06-15 11:02     ` Zheng Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).