Re: [RFC][PATCH 3/3] ext4: add dio overwrite nolock

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Eric Sandeen <sandeen@redhat.com>
To: Zheng Liu <gnehzuil.liu@gmail.com>
Cc: linux-ext4@vger.kernel.org, Zheng Liu <wenqing.lz@taobao.com>
Subject: Re: [RFC][PATCH 3/3] ext4: add dio overwrite nolock
Date: Wed, 02 May 2012 10:05:33 -0500	[thread overview]
Message-ID: <4FA14D3D.40903@redhat.com> (raw)
In-Reply-To: <1335584346-8070-4-git-send-email-wenqing.lz@taobao.com>

On 4/27/12 10:39 PM, Zheng Liu wrote:
> From: Zheng Liu <wenqing.lz@taobao.com>
> 
> Aligned and overwrite direct IO can be parallelized.  In ext4_file_dio_write,
> we first check whether these conditions are satisfied or not.  If so, we unlock
> the i_mutex and acquire i_data_sem directly.  Meanwhile iocb->private is set to
> indicate that this is a overwrite dio, and it will be processed in
> ext4_ext_direct_IO.

This copies almost 100 lines of generic_file_aio_write() back into
ext4.  Do we really need to do this?  Copying core code into the
fs can be a maintenance nightmare...

I'll have to think more about the big picture and whether or not it's
possible, but my first reaction is to find a way to leverage or modify
existing IO code rather than pasting it all into ext4 with changes...

-Eric

> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> ---
>  fs/ext4/file.c |  140 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 files changed, 137 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index e5d6be3..8a5f713 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -100,9 +100,21 @@ static ssize_t
>  ext4_file_dio_write(struct kiocb *iocb, const struct iovec *iov,
>  		    unsigned long nr_segs, loff_t pos)
>  {
> -	struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;
> -	int unaligned_aio = 0;
> +	struct file *file = iocb->ki_filp;
> +	struct address_space * mapping = file->f_mapping;
> +	struct inode *inode = file->f_path.dentry->d_inode;
> +	struct blk_plug plug;
>  	ssize_t ret;
> +	ssize_t written, written_buffered;
> +	size_t length = iov_length(iov, nr_segs);
> +	size_t ocount;		/* original count */
> +	size_t count;		/* after file limit checks */
> +	int unaligned_aio = 0;
> +	int overwrite = 0;
> +	loff_t *ppos = &iocb->ki_pos;
> +	loff_t endbyte;
> +
> +	BUG_ON(iocb->ki_pos != pos);
>  
>  	if (!is_sync_kiocb(iocb))
>  		unaligned_aio = ext4_unaligned_aio(inode, iov, nr_segs, pos);
> @@ -121,7 +133,129 @@ ext4_file_dio_write(struct kiocb *iocb, const struct iovec *iov,
>  		ext4_aiodio_wait(inode);
>  	}
>  
> -	ret = generic_file_aio_write(iocb, iov, nr_segs, pos);
> +	mutex_lock(&inode->i_mutex);
> +	blk_start_plug(&plug);
> +
> +	ocount = 0;
> +	ret = generic_segment_checks(iov, &nr_segs, &ocount, VERIFY_READ);
> +	if (ret)
> +		goto unlock_out;
> +
> +	count = ocount;
> +	pos = *ppos;
> +
> +	vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
> +
> +	/* We can write back this queue in page reclaim */
> +	current->backing_dev_info = mapping->backing_dev_info;
> +	written = 0;
> +
> +	ret = generic_write_checks(file, &pos, &count, S_ISBLK(inode->i_mode));
> +	if (ret)
> +		goto out;
> +
> +	if (count == 0)
> +		goto out;
> +
> +	ret = file_remove_suid(file);
> +	if (ret)
> +		goto out;
> +
> +	file_update_time(file);
> +
> +	iocb->private = NULL;
> +
> +	if (!unaligned_aio && !file->f_mapping->nrpages &&
> +	    pos + length < i_size_read(inode) &&
> +	    ext4_should_dioread_nolock(inode)) {
> +		struct ext4_map_blocks map;
> +		unsigned int blkbits = inode->i_blkbits;
> +		int err;
> +		int len;
> +
> +		map.m_lblk = pos >> blkbits;
> +		map.m_len = (EXT4_BLOCK_ALIGN(pos + length, blkbits) >> blkbits)
> +			- map.m_lblk;
> +		len = map.m_len;
> +
> +		err = ext4_map_blocks(NULL, inode, &map, 0);
> +		if (err == len && (!map.m_flags ||
> +		    map.m_flags & EXT4_MAP_MAPPED)) {
> +			overwrite = 1;
> +			iocb->private = &overwrite;
> +			mutex_unlock(&inode->i_mutex);
> +			down_read(&EXT4_I(inode)->i_data_sem);
> +		}
> +	}
> +
> +	if (file->f_mapping->nrpages && overwrite) {
> +		overwrite = 0;
> +		up_read(&EXT4_I(inode)->i_data_sem);
> +		mutex_lock(&inode->i_mutex);
> +	}
> +
> +	written = generic_file_direct_write(iocb, iov, &nr_segs, pos,
> +						ppos, count, ocount);
> +	if (written < 0 || written == count)
> +		goto out;
> +	/*
> +	 * direct-io write to a hole: fall through to buffered I/O
> +	 * for completing the rest of the request.
> +	 */
> +	pos += written;
> +	count -= written;
> +	written_buffered = generic_file_buffered_write(iocb, iov,
> +					nr_segs, pos, ppos, count,
> +					written);
> +	/*
> +	 * If generic_file_buffered_write() retuned a synchronous error
> +	 * then we want to return the number of bytes which were
> +	 * direct-written, or the error code if that was zero.  Note
> +	 * that this differs from normal direct-io semantics, which
> +	 * will return -EFOO even if some bytes were written.
> +	 */
> +	if (written_buffered < 0) {
> +		ret = written_buffered;
> +		goto out;
> +	}
> +
> +	/*
> +	 * We need to ensure that the page cache pages are written to
> +	 * disk and invalidated to preserve the expected O_DIRECT
> +	 * semantics.
> +	 */
> +	endbyte = pos + written_buffered - written - 1;
> +	ret = filemap_write_and_wait_range(file->f_mapping, pos, endbyte);
> +	if (ret == 0) {
> +		written = written_buffered;
> +		invalidate_mapping_pages(mapping,
> +					 pos >> PAGE_CACHE_SHIFT,
> +					 endbyte >> PAGE_CACHE_SHIFT);
> +	} else {
> +		/*
> +		 * We don't know how much we wrote, so just return
> +		 * the number of bytes which were direct-written
> +		 */
> +	}
> +
> +out:
> +	current->backing_dev_info = NULL;
> +	ret = written ? written : ret;
> +
> +unlock_out:
> +	if (overwrite)
> +		up_read(&EXT4_I(inode)->i_data_sem);
> +	else
> +		mutex_unlock(&inode->i_mutex);
> +
> +	if (ret > 0 || ret == -EIOCBQUEUED) {
> +		ssize_t err;
> +
> +		err = generic_write_sync(file, pos, ret);
> +		if (err < 0 && ret > 0)
> +			ret = err;
> +	}
> +	blk_finish_plug(&plug);
>  
>  	if (unaligned_aio)
>  		mutex_unlock(ext4_aio_mutex(inode));

     prev parent reply	other threads:[~2012-05-02 15:05 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-28  3:39 [RFC][PATCH 0/3] ext4: dio overwrite nolock Zheng Liu
2012-04-28  3:39 ` [RFC][PATCH 1/3] ext4: split ext4_file_write into buffered IO and direct IO Zheng Liu
2012-05-02  4:11   ` Tao Ma
2012-05-02  5:50     ` Zheng Liu
2012-04-28  3:39 ` [RFC][PATCH 2/3] ext4: add a new flag for ext4_map_blocks Zheng Liu
2012-04-28  3:39 ` [RFC][PATCH 3/3] ext4: add dio overwrite nolock Zheng Liu
2012-05-02  6:59   ` Tao Ma
2012-05-02  8:16     ` Zheng Liu
2012-05-02 15:05   ` Eric Sandeen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FA14D3D.40903@redhat.com \
    --to=sandeen@redhat.com \
    --cc=gnehzuil.liu@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=wenqing.lz@taobao.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.