* [RFC][PATCH v2 0/3] ext4: dio overwrite nolock @ 2012-06-14 3:32 Zheng Liu 2012-06-14 3:32 ` [RFC][PATCH v2 1/3] ext4: split ext4_file_write into buffered IO and direct IO Zheng Liu ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Zheng Liu @ 2012-06-14 3:32 UTC (permalink / raw) To: linux-ext4; +Cc: Zheng Liu, Tao Ma, Eric Sandeen Hello list, Here is the second version of dio overwrite nolock. In this version, I rework the stuff as Eric said in order to avoid to copy almost all of __generic_file_aio_write back into ext4. Meanwhile, I fix some problems according to Tao's reply. This patch set can improve the performance of ext4 when the user does a dio overwrite because, when a dio overwrite occurs, we don't need to take i_mutex lock in some conditons. The condition includes the size of file doesn't be changed, no buffered I/O and align aio. So dio write can be parallelized in these conditions. In patch 1, ext4_file_dio_write is defined to split buffered I/O and direct I/O in ext4_file_write so that some code can be added to check whether we can do a dio overwrite without i_mutex lock later. In patch 2, a new flag called EXT4_GET_BLOCKS_NOLOCK and a new get_block function that is named ext4_get_block_write_nolock are defined to do a lookup to let me know whether the extent of the file at this offset has been initialized because we need to know whether a dio overwrite needs to modify the metadata of the file or not. In patch 3, we implement dio overwrite nolock. In ext4_file_dio_write, we check whether we can do a dio overwrite without lock. Then we use 'iocb->private' to store this flag to tell ext4_ext_direct_IO to handle it because file_update_time will start a new journal and it will cause a deadlock. So we need to finish to update file time with i_mutex lock, and release lock in ext4_ext_direct_IO. v2 <- v1: * rebase to 3.5 * rework ext4_file_dio_write to avoid to copy vfs's code back into ext4 * add some comments to explain how to determine whether we can do a nolocking overwrite dio In this thread [1], it is the first version of patchset. 1. http://www.spinics.net/lists/linux-ext4/msg31859.html Regards, Zheng Zheng Liu (3): ext4: split ext4_file_write into buffered IO and direct IO ext4: add a new flag for ext4_map_blocks ext4: add dio overwrite nolock fs/ext4/ext4.h | 2 + fs/ext4/file.c | 109 ++++++++++++++++++++++++++++++++++++++++++++----------- fs/ext4/inode.c | 86 ++++++++++++++++++++++++++++++++++++++----- 3 files changed, 165 insertions(+), 32 deletions(-) ^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC][PATCH v2 1/3] ext4: split ext4_file_write into buffered IO and direct IO 2012-06-14 3:32 [RFC][PATCH v2 0/3] ext4: dio overwrite nolock Zheng Liu @ 2012-06-14 3:32 ` Zheng Liu 2012-06-14 3:32 ` [RFC][PATCH v2 2/3] ext4: add a new flag for ext4_map_blocks Zheng Liu 2012-06-14 3:32 ` [RFC][PATCH v2 3/3] ext4: add dio overwrite nolock Zheng Liu 2 siblings, 0 replies; 7+ messages in thread From: Zheng Liu @ 2012-06-14 3:32 UTC (permalink / raw) To: linux-ext4; +Cc: Tao Ma, Eric Sandeen, Zheng Liu From: Zheng Liu <wenqing.lz@taobao.com> ext4_file_dio_write is defined in order to split buffered IO and direct IO in ext4. This patch just refactor some stuff in write path. CC: Tao Ma <tm@tao.ma> CC: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> --- fs/ext4/file.c | 60 +++++++++++++++++++++++++++++++++++-------------------- 1 files changed, 38 insertions(+), 22 deletions(-) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 8c7642a..a10dc77 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -90,34 +90,16 @@ ext4_unaligned_aio(struct inode *inode, const struct iovec *iov, } static ssize_t -ext4_file_write(struct kiocb *iocb, const struct iovec *iov, - unsigned long nr_segs, loff_t pos) +ext4_file_dio_write(struct kiocb *iocb, const struct iovec *iov, + unsigned long nr_segs, loff_t pos) { struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode; int unaligned_aio = 0; ssize_t ret; - /* - * If we have encountered a bitmap-format file, the size limit - * is smaller than s_maxbytes, which is for extent-mapped files. - */ - - if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) { - struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); - size_t length = iov_length(iov, nr_segs); - - if ((pos > sbi->s_bitmap_maxbytes || - (pos == sbi->s_bitmap_maxbytes && length > 0))) - return -EFBIG; - - if (pos + length > sbi->s_bitmap_maxbytes) { - nr_segs = iov_shorten((struct iovec *)iov, nr_segs, - sbi->s_bitmap_maxbytes - pos); - } - } else if (unlikely((iocb->ki_filp->f_flags & O_DIRECT) && - !is_sync_kiocb(iocb))) { + if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS) && + !is_sync_kiocb(iocb)) unaligned_aio = ext4_unaligned_aio(inode, iov, nr_segs, pos); - } /* Unaligned direct AIO must be serialized; see comment above */ if (unaligned_aio) { @@ -141,6 +123,40 @@ ext4_file_write(struct kiocb *iocb, const struct iovec *iov, return ret; } +static ssize_t +ext4_file_write(struct kiocb *iocb, const struct iovec *iov, + unsigned long nr_segs, loff_t pos) +{ + struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode; + ssize_t ret; + + /* + * If we have encountered a bitmap-format file, the size limit + * is smaller than s_maxbytes, which is for extent-mapped files. + */ + + if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) { + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); + size_t length = iov_length(iov, nr_segs); + + if ((pos > sbi->s_bitmap_maxbytes || + (pos == sbi->s_bitmap_maxbytes && length > 0))) + return -EFBIG; + + if (pos + length > sbi->s_bitmap_maxbytes) { + nr_segs = iov_shorten((struct iovec *)iov, nr_segs, + sbi->s_bitmap_maxbytes - pos); + } + } + + if (unlikely(iocb->ki_filp->f_flags & O_DIRECT)) + ret = ext4_file_dio_write(iocb, iov, nr_segs, pos); + else + ret = generic_file_aio_write(iocb, iov, nr_segs, pos); + + return ret; +} + static const struct vm_operations_struct ext4_file_vm_ops = { .fault = filemap_fault, .page_mkwrite = ext4_page_mkwrite, -- 1.7.4.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC][PATCH v2 2/3] ext4: add a new flag for ext4_map_blocks 2012-06-14 3:32 [RFC][PATCH v2 0/3] ext4: dio overwrite nolock Zheng Liu 2012-06-14 3:32 ` [RFC][PATCH v2 1/3] ext4: split ext4_file_write into buffered IO and direct IO Zheng Liu @ 2012-06-14 3:32 ` Zheng Liu 2012-06-15 9:29 ` Robin Dong 2012-06-14 3:32 ` [RFC][PATCH v2 3/3] ext4: add dio overwrite nolock Zheng Liu 2 siblings, 1 reply; 7+ messages in thread From: Zheng Liu @ 2012-06-14 3:32 UTC (permalink / raw) To: linux-ext4; +Cc: Tao Ma, Eric Sandeen, Zheng Liu From: Zheng Liu <wenqing.lz@taobao.com> EXT4_GET_BLOCKS_NO_LOCK flag is added to indicate that we don't need to acquire i_data_sem lock in ext4_map_blocks. Meanwhile, it lets _ext4_get_block do not start a new journal because when we do a overwrite dio, there is no any metadata that needs to be modified. We define a new function called ext4_get_block_write_nolock, which is used in dio overwrite nolock. In this function, it doesn't try to acquire i_data_sem lock and doesn't start a new journal as it does a lookup. CC: Tao Ma <tm@tao.ma> CC: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> --- fs/ext4/ext4.h | 2 + fs/ext4/inode.c | 59 +++++++++++++++++++++++++++++++++++++++++++++--------- 2 files changed, 51 insertions(+), 10 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index cfc4e01..d1a2b1e 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -571,6 +571,8 @@ enum { #define EXT4_GET_BLOCKS_NO_NORMALIZE 0x0040 /* Request will not result in inode size update (user for fallocate) */ #define EXT4_GET_BLOCKS_KEEP_SIZE 0x0080 + /* Do not take i_data_sem locking in ext4_map_blocks */ +#define EXT4_GET_BLOCKS_NO_LOCK 0x0100 /* * Flags used by ext4_free_blocks diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 02bc8cb..9a714ff 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -544,7 +544,8 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode, * Try to see if we can get the block without requesting a new * file system block. */ - down_read((&EXT4_I(inode)->i_data_sem)); + if (!(flags & EXT4_GET_BLOCKS_NO_LOCK)) + down_read((&EXT4_I(inode)->i_data_sem)); if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) { retval = ext4_ext_map_blocks(handle, inode, map, flags & EXT4_GET_BLOCKS_KEEP_SIZE); @@ -552,7 +553,8 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode, retval = ext4_ind_map_blocks(handle, inode, map, flags & EXT4_GET_BLOCKS_KEEP_SIZE); } - up_read((&EXT4_I(inode)->i_data_sem)); + if (!(flags & EXT4_GET_BLOCKS_NO_LOCK)) + up_read((&EXT4_I(inode)->i_data_sem)); if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) { int ret = check_block_validity(inode, map); @@ -2818,6 +2820,32 @@ static int ext4_get_block_write(struct inode *inode, sector_t iblock, EXT4_GET_BLOCKS_IO_CREATE_EXT); } +static int ext4_get_block_write_nolock(struct inode *inode, sector_t iblock, + struct buffer_head *bh_result, int create) +{ + handle_t *handle = ext4_journal_current_handle(); + struct ext4_map_blocks map; + int ret = 0; + + ext4_debug("ext4_get_block_write_nolock: inode %lu, create flag %d\n", + inode->i_ino, create); + + create = EXT4_GET_BLOCKS_NO_LOCK; + + map.m_lblk = iblock; + map.m_len = bh_result->b_size >> inode->i_blkbits; + + ret = ext4_map_blocks(handle, inode, &map, create); + if (ret > 0) { + map_bh(bh_result, inode->i_sb, map.m_pblk); + bh_result->b_state = (bh_result->b_state & ~EXT4_MAP_FLAGS) | + map.m_flags; + bh_result->b_size = inode->i_sb->s_blocksize * map.m_len; + ret = 0; + } + return ret; +} + static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset, ssize_t size, void *private, int ret, bool is_async) @@ -2966,6 +2994,8 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb, loff_t final_size = offset + count; if (rw == WRITE && final_size <= inode->i_size) { + int overwrite = 0; + /* * We could direct write to holes and fallocate. * @@ -3005,13 +3035,22 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb, EXT4_I(inode)->cur_aio_dio = iocb->private; } - ret = __blockdev_direct_IO(rw, iocb, inode, - inode->i_sb->s_bdev, iov, - offset, nr_segs, - ext4_get_block_write, - ext4_end_io_dio, - NULL, - DIO_LOCKING); + if (overwrite) + ret = __blockdev_direct_IO(rw, iocb, inode, + inode->i_sb->s_bdev, iov, + offset, nr_segs, + ext4_get_block_write_nolock, + ext4_end_io_dio, + NULL, + 0); + else + ret = __blockdev_direct_IO(rw, iocb, inode, + inode->i_sb->s_bdev, iov, + offset, nr_segs, + ext4_get_block_write, + ext4_end_io_dio, + NULL, + DIO_LOCKING); if (iocb->private) EXT4_I(inode)->cur_aio_dio = NULL; /* @@ -3031,7 +3070,7 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb, if (ret != -EIOCBQUEUED && ret <= 0 && iocb->private) { ext4_free_io_end(iocb->private); iocb->private = NULL; - } else if (ret > 0 && ext4_test_inode_state(inode, + } else if (ret > 0 && !overwrite && ext4_test_inode_state(inode, EXT4_STATE_DIO_UNWRITTEN)) { int err; /* -- 1.7.4.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC][PATCH v2 2/3] ext4: add a new flag for ext4_map_blocks 2012-06-14 3:32 ` [RFC][PATCH v2 2/3] ext4: add a new flag for ext4_map_blocks Zheng Liu @ 2012-06-15 9:29 ` Robin Dong 0 siblings, 0 replies; 7+ messages in thread From: Robin Dong @ 2012-06-15 9:29 UTC (permalink / raw) To: Zheng Liu; +Cc: linux-ext4, Tao Ma, Eric Sandeen, Zheng Liu 2012/6/14 Zheng Liu <gnehzuil.liu@gmail.com>: > From: Zheng Liu <wenqing.lz@taobao.com> > > EXT4_GET_BLOCKS_NO_LOCK flag is added to indicate that we don't need to acquire > i_data_sem lock in ext4_map_blocks. Meanwhile, it lets _ext4_get_block do not > start a new journal because when we do a overwrite dio, there is no any > metadata that needs to be modified. > > We define a new function called ext4_get_block_write_nolock, which is used in > dio overwrite nolock. In this function, it doesn't try to acquire i_data_sem > lock and doesn't start a new journal as it does a lookup. > > CC: Tao Ma <tm@tao.ma> > CC: Eric Sandeen <sandeen@redhat.com> > Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> > --- > fs/ext4/ext4.h | 2 + > fs/ext4/inode.c | 59 +++++++++++++++++++++++++++++++++++++++++++++--------- > 2 files changed, 51 insertions(+), 10 deletions(-) > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > index cfc4e01..d1a2b1e 100644 > --- a/fs/ext4/ext4.h > +++ b/fs/ext4/ext4.h > @@ -571,6 +571,8 @@ enum { > #define EXT4_GET_BLOCKS_NO_NORMALIZE 0x0040 > /* Request will not result in inode size update (user for fallocate) */ > #define EXT4_GET_BLOCKS_KEEP_SIZE 0x0080 > + /* Do not take i_data_sem locking in ext4_map_blocks */ > +#define EXT4_GET_BLOCKS_NO_LOCK 0x0100 > > /* > * Flags used by ext4_free_blocks > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 02bc8cb..9a714ff 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -544,7 +544,8 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode, > * Try to see if we can get the block without requesting a new > * file system block. > */ > - down_read((&EXT4_I(inode)->i_data_sem)); > + if (!(flags & EXT4_GET_BLOCKS_NO_LOCK)) > + down_read((&EXT4_I(inode)->i_data_sem)); > if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) { > retval = ext4_ext_map_blocks(handle, inode, map, flags & > EXT4_GET_BLOCKS_KEEP_SIZE); > @@ -552,7 +553,8 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode, > retval = ext4_ind_map_blocks(handle, inode, map, flags & > EXT4_GET_BLOCKS_KEEP_SIZE); > } > - up_read((&EXT4_I(inode)->i_data_sem)); > + if (!(flags & EXT4_GET_BLOCKS_NO_LOCK)) > + up_read((&EXT4_I(inode)->i_data_sem)); > > if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) { > int ret = check_block_validity(inode, map); > @@ -2818,6 +2820,32 @@ static int ext4_get_block_write(struct inode *inode, sector_t iblock, > EXT4_GET_BLOCKS_IO_CREATE_EXT); > } > > +static int ext4_get_block_write_nolock(struct inode *inode, sector_t iblock, > + struct buffer_head *bh_result, int create) > +{ > + handle_t *handle = ext4_journal_current_handle(); > + struct ext4_map_blocks map; > + int ret = 0; > + > + ext4_debug("ext4_get_block_write_nolock: inode %lu, create flag %d\n", > + inode->i_ino, create); > + > + create = EXT4_GET_BLOCKS_NO_LOCK; May be better to change the variable "create" to "flags" > + > + map.m_lblk = iblock; > + map.m_len = bh_result->b_size >> inode->i_blkbits; > + > + ret = ext4_map_blocks(handle, inode, &map, create); > + if (ret > 0) { > + map_bh(bh_result, inode->i_sb, map.m_pblk); > + bh_result->b_state = (bh_result->b_state & ~EXT4_MAP_FLAGS) | > + map.m_flags; > + bh_result->b_size = inode->i_sb->s_blocksize * map.m_len; > + ret = 0; > + } > + return ret; > +} > + > static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset, > ssize_t size, void *private, int ret, > bool is_async) > @@ -2966,6 +2994,8 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb, > > loff_t final_size = offset + count; > if (rw == WRITE && final_size <= inode->i_size) { > + int overwrite = 0; > + > /* > * We could direct write to holes and fallocate. > * > @@ -3005,13 +3035,22 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb, > EXT4_I(inode)->cur_aio_dio = iocb->private; > } > > - ret = __blockdev_direct_IO(rw, iocb, inode, > - inode->i_sb->s_bdev, iov, > - offset, nr_segs, > - ext4_get_block_write, > - ext4_end_io_dio, > - NULL, > - DIO_LOCKING); > + if (overwrite) > + ret = __blockdev_direct_IO(rw, iocb, inode, > + inode->i_sb->s_bdev, iov, > + offset, nr_segs, > + ext4_get_block_write_nolock, > + ext4_end_io_dio, > + NULL, > + 0); > + else > + ret = __blockdev_direct_IO(rw, iocb, inode, > + inode->i_sb->s_bdev, iov, > + offset, nr_segs, > + ext4_get_block_write, > + ext4_end_io_dio, > + NULL, > + DIO_LOCKING); > if (iocb->private) > EXT4_I(inode)->cur_aio_dio = NULL; > /* > @@ -3031,7 +3070,7 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb, > if (ret != -EIOCBQUEUED && ret <= 0 && iocb->private) { > ext4_free_io_end(iocb->private); > iocb->private = NULL; > - } else if (ret > 0 && ext4_test_inode_state(inode, > + } else if (ret > 0 && !overwrite && ext4_test_inode_state(inode, > EXT4_STATE_DIO_UNWRITTEN)) { > int err; > /* > -- > 1.7.4.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- -- Best Regard Robin Dong -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC][PATCH v2 3/3] ext4: add dio overwrite nolock 2012-06-14 3:32 [RFC][PATCH v2 0/3] ext4: dio overwrite nolock Zheng Liu 2012-06-14 3:32 ` [RFC][PATCH v2 1/3] ext4: split ext4_file_write into buffered IO and direct IO Zheng Liu 2012-06-14 3:32 ` [RFC][PATCH v2 2/3] ext4: add a new flag for ext4_map_blocks Zheng Liu @ 2012-06-14 3:32 ` Zheng Liu 2012-06-15 10:16 ` Robin Dong 2 siblings, 1 reply; 7+ messages in thread From: Zheng Liu @ 2012-06-14 3:32 UTC (permalink / raw) To: linux-ext4; +Cc: Tao Ma, Eric Sandeen, Zheng Liu From: Zheng Liu <wenqing.lz@taobao.com> Aligned and overwrite direct I/O can be parallelized. In ext4_file_dio_write, we first check whether these conditions are satisfied or not. If so, we take i_data_sem and release i_mutex lock directly. Meanwhile iocb->private is set to indicate that this is a dio overwrite, and it will be handled in ext4_ext_direct_IO. CC: Tao Ma <tm@tao.ma> CC: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> --- fs/ext4/file.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++-- fs/ext4/inode.c | 27 +++++++++++++++++++++++++++ 2 files changed, 78 insertions(+), 2 deletions(-) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index a10dc77..812358f 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -93,9 +93,13 @@ static ssize_t ext4_file_dio_write(struct kiocb *iocb, const struct iovec *iov, unsigned long nr_segs, loff_t pos) { - struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode; + struct file *file = iocb->ki_filp; + struct inode *inode = file->f_mapping->host; + struct blk_plug plug; int unaligned_aio = 0; ssize_t ret; + int overwrite = 0; + size_t length = iov_length(iov, nr_segs); if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS) && !is_sync_kiocb(iocb)) @@ -115,7 +119,52 @@ ext4_file_dio_write(struct kiocb *iocb, const struct iovec *iov, ext4_aiodio_wait(inode); } - ret = generic_file_aio_write(iocb, iov, nr_segs, pos); + BUG_ON(iocb->ki_pos != pos); + + mutex_lock(&inode->i_mutex); + blk_start_plug(&plug); + + iocb->private = &overwrite; + + /* check whether we do a DIO overwrite or not */ + if (ext4_should_dioread_nolock(inode) && !unaligned_aio && + !file->f_mapping->nrpages && pos + length <= i_size_read(inode)) { + struct ext4_map_blocks map; + unsigned int blkbits = inode->i_blkbits; + int err, len; + + map.m_lblk = pos >> blkbits; + map.m_len = (EXT4_BLOCK_ALIGN(pos + length, blkbits) >> blkbits) + - map.m_lblk; + map.m_flags &= ~EXT4_MAP_FLAGS; + len = map.m_len; + + err = ext4_map_blocks(NULL, inode, &map, 0); + /* + * 'err==len' means that all of blocks has been preallocated no + * matter they are initialized or not. For excluding + * uninitialized extents, we need to check m_flags. There are + * two conditions that indicate for initialized extents. + * 1) If we hit extent cache, EXT4_MAP_MAPPED flag is returned; + * 2) If we do a real lookup, non-flags are returned. + * So we should check these two conditions. + */ + if (err == len && (!map.m_flags || + map.m_flags & EXT4_MAP_MAPPED)) + overwrite = 1; + } + + ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos); + mutex_unlock(&inode->i_mutex); + + if (ret > 0 || ret == -EIOCBQUEUED) { + ssize_t err; + + err = generic_write_sync(file, pos, ret); + if (err < 0 && ret > 0) + ret = err; + } + blk_finish_plug(&plug); if (unaligned_aio) mutex_unlock(ext4_aio_mutex(inode)); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 9a714ff..98e9096 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2996,6 +2996,26 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb, if (rw == WRITE && final_size <= inode->i_size) { int overwrite = 0; + BUG_ON(iocb->private == NULL); + + /* If we do a overwrite dio, i_mutex locking can be released */ + overwrite = *((int *)iocb->private); + + if (overwrite) { + down_read(&EXT4_I(inode)->i_data_sem); + mutex_unlock(&inode->i_mutex); + } + + /* + * If there are still some buffered I/O, we should fall back + * to take i_mutex locking. + */ + if (overwrite && file->f_mapping->nrpages) { + overwrite = 0; + up_read(&EXT4_I(inode)->i_data_sem); + mutex_lock(&inode->i_mutex); + } + /* * We could direct write to holes and fallocate. * @@ -3083,6 +3103,13 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb, ret = err; ext4_clear_inode_state(inode, EXT4_STATE_DIO_UNWRITTEN); } + + /* take i_mutex locking again if we do a ovewrite dio */ + if (overwrite) { + up_read(&EXT4_I(inode)->i_data_sem); + mutex_lock(&inode->i_mutex); + } + return ret; } -- 1.7.4.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC][PATCH v2 3/3] ext4: add dio overwrite nolock 2012-06-14 3:32 ` [RFC][PATCH v2 3/3] ext4: add dio overwrite nolock Zheng Liu @ 2012-06-15 10:16 ` Robin Dong 2012-06-15 11:02 ` Zheng Liu 0 siblings, 1 reply; 7+ messages in thread From: Robin Dong @ 2012-06-15 10:16 UTC (permalink / raw) To: Zheng Liu; +Cc: linux-ext4, Tao Ma, Eric Sandeen, Zheng Liu 2012/6/14 Zheng Liu <gnehzuil.liu@gmail.com>: > From: Zheng Liu <wenqing.lz@taobao.com> > > Aligned and overwrite direct I/O can be parallelized. In ext4_file_dio_write, > we first check whether these conditions are satisfied or not. If so, we > take i_data_sem and release i_mutex lock directly. Meanwhile iocb->private is > set to indicate that this is a dio overwrite, and it will be handled in > ext4_ext_direct_IO. > > CC: Tao Ma <tm@tao.ma> > CC: Eric Sandeen <sandeen@redhat.com> > Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> > --- > fs/ext4/file.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++-- > fs/ext4/inode.c | 27 +++++++++++++++++++++++++++ > 2 files changed, 78 insertions(+), 2 deletions(-) > > diff --git a/fs/ext4/file.c b/fs/ext4/file.c > index a10dc77..812358f 100644 > --- a/fs/ext4/file.c > +++ b/fs/ext4/file.c > @@ -93,9 +93,13 @@ static ssize_t > ext4_file_dio_write(struct kiocb *iocb, const struct iovec *iov, > unsigned long nr_segs, loff_t pos) > { > - struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode; > + struct file *file = iocb->ki_filp; > + struct inode *inode = file->f_mapping->host; > + struct blk_plug plug; > int unaligned_aio = 0; > ssize_t ret; > + int overwrite = 0; > + size_t length = iov_length(iov, nr_segs); > > if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS) && > !is_sync_kiocb(iocb)) > @@ -115,7 +119,52 @@ ext4_file_dio_write(struct kiocb *iocb, const struct iovec *iov, > ext4_aiodio_wait(inode); > } > > - ret = generic_file_aio_write(iocb, iov, nr_segs, pos); > + BUG_ON(iocb->ki_pos != pos); > + > + mutex_lock(&inode->i_mutex); > + blk_start_plug(&plug); > + > + iocb->private = &overwrite; > + > + /* check whether we do a DIO overwrite or not */ > + if (ext4_should_dioread_nolock(inode) && !unaligned_aio && > + !file->f_mapping->nrpages && pos + length <= i_size_read(inode)) { > + struct ext4_map_blocks map; > + unsigned int blkbits = inode->i_blkbits; > + int err, len; > + > + map.m_lblk = pos >> blkbits; > + map.m_len = (EXT4_BLOCK_ALIGN(pos + length, blkbits) >> blkbits) > + - map.m_lblk; > + map.m_flags &= ~EXT4_MAP_FLAGS; > + len = map.m_len; > + > + err = ext4_map_blocks(NULL, inode, &map, 0); Nitpick: May be better to change variable "err" to "ret" > + /* > + * 'err==len' means that all of blocks has been preallocated no > + * matter they are initialized or not. For excluding > + * uninitialized extents, we need to check m_flags. There are > + * two conditions that indicate for initialized extents. > + * 1) If we hit extent cache, EXT4_MAP_MAPPED flag is returned; > + * 2) If we do a real lookup, non-flags are returned. > + * So we should check these two conditions. > + */ > + if (err == len && (!map.m_flags || > + map.m_flags & EXT4_MAP_MAPPED)) If we do a real lookup in ext4_map_blocks, it also return with EXT4_MAP_MAPPED flag, the condition should be: if (err == len && (map.m_flags & EXT4_MAP_MAPPED)) > + overwrite = 1; > + } > + > + ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos); > + mutex_unlock(&inode->i_mutex); > + > + if (ret > 0 || ret == -EIOCBQUEUED) { > + ssize_t err; > + > + err = generic_write_sync(file, pos, ret); > + if (err < 0 && ret > 0) > + ret = err; > + } > + blk_finish_plug(&plug); > > if (unaligned_aio) > mutex_unlock(ext4_aio_mutex(inode)); > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 9a714ff..98e9096 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -2996,6 +2996,26 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb, > if (rw == WRITE && final_size <= inode->i_size) { > int overwrite = 0; > > + BUG_ON(iocb->private == NULL); > + > + /* If we do a overwrite dio, i_mutex locking can be released */ > + overwrite = *((int *)iocb->private); > + > + if (overwrite) { > + down_read(&EXT4_I(inode)->i_data_sem); > + mutex_unlock(&inode->i_mutex); > + } > + > + /* > + * If there are still some buffered I/O, we should fall back > + * to take i_mutex locking. > + */ > + if (overwrite && file->f_mapping->nrpages) { > + overwrite = 0; > + up_read(&EXT4_I(inode)->i_data_sem); > + mutex_lock(&inode->i_mutex); > + } > + > /* > * We could direct write to holes and fallocate. > * > @@ -3083,6 +3103,13 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb, > ret = err; > ext4_clear_inode_state(inode, EXT4_STATE_DIO_UNWRITTEN); > } > + > + /* take i_mutex locking again if we do a ovewrite dio */ > + if (overwrite) { > + up_read(&EXT4_I(inode)->i_data_sem); > + mutex_lock(&inode->i_mutex); > + } > + > return ret; > } > > -- > 1.7.4.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- -- Best Regard Robin Dong -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC][PATCH v2 3/3] ext4: add dio overwrite nolock 2012-06-15 10:16 ` Robin Dong @ 2012-06-15 11:02 ` Zheng Liu 0 siblings, 0 replies; 7+ messages in thread From: Zheng Liu @ 2012-06-15 11:02 UTC (permalink / raw) To: Robin Dong; +Cc: linux-ext4, Tao Ma, Eric Sandeen, Zheng Liu On Fri, Jun 15, 2012 at 06:16:29PM +0800, Robin Dong wrote: > > + /* > > + * 'err==len' means that all of blocks has been preallocated no > > + * matter they are initialized or not. For excluding > > + * uninitialized extents, we need to check m_flags. There are > > + * two conditions that indicate for initialized extents. > > + * 1) If we hit extent cache, EXT4_MAP_MAPPED flag is returned; > > + * 2) If we do a real lookup, non-flags are returned. > > + * So we should check these two conditions. > > + */ > > + if (err == len && (!map.m_flags || > > + map.m_flags & EXT4_MAP_MAPPED)) > > If we do a real lookup in ext4_map_blocks, it also return with > EXT4_MAP_MAPPED flag, the condition should be: > > if (err == len && (map.m_flags & EXT4_MAP_MAPPED)) Yes, you are right. I will fix it in next version. Regards, Zheng -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-06-15 10:54 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-06-14 3:32 [RFC][PATCH v2 0/3] ext4: dio overwrite nolock Zheng Liu 2012-06-14 3:32 ` [RFC][PATCH v2 1/3] ext4: split ext4_file_write into buffered IO and direct IO Zheng Liu 2012-06-14 3:32 ` [RFC][PATCH v2 2/3] ext4: add a new flag for ext4_map_blocks Zheng Liu 2012-06-15 9:29 ` Robin Dong 2012-06-14 3:32 ` [RFC][PATCH v2 3/3] ext4: add dio overwrite nolock Zheng Liu 2012-06-15 10:16 ` Robin Dong 2012-06-15 11:02 ` Zheng Liu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).