linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Miao Xie <miaox@cn.fujitsu.com>
To: bo.li.liu@oracle.com, Josef Bacik <jbacik@fusionio.com>
Cc: Linux Btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [RFC][PATCH 2/2] Btrfs: implement unlocked dio write
Date: Fri, 01 Feb 2013 15:39:18 +0800	[thread overview]
Message-ID: <510B7126.6070607@cn.fujitsu.com> (raw)
In-Reply-To: <510B3FB9.6050707@cn.fujitsu.com>

On 	fri, 01 Feb 2013 12:08:25 +0800, Miao Xie wrote:
> On 	fri, 1 Feb 2013 10:53:30 +0800, Liu Bo wrote:
>> On Thu, Jan 31, 2013 at 05:39:03PM +0800, Miao Xie wrote:
>>> This idea is from ext4. By this patch, we can make the dio write parallel,
>>> and improve the performance.
>>
>> Interesting, AFAIK, ext4 can only do nolock dio write on some
>> conditions(should be a overwrite, file size remains unchanged,
>> no aligned/buffer io in flight), btrfs is ok without any conditions?
> 
> ext4 don't have extent lock, it can not avoid 2 AIO  threads are at work on the same
> unwritten block, so it can not use unlocked dio write for unaligned dio/aio. But btrfs
> has extent lock, it can avoid this problem.

Besides that, btrfs doesn't allow doing a unaligned dio/aio.

I read the code again, found there is a race that several tasks may update i_size at
the same time. There are two methods to fix this problem:
1. just like ext4, don't do unlocked write dio if it is beyond the end of the file
2. use a spin lock to protect i_size update

I want to choose the 2nd one.

Thanks
Miao

> 
> And ext4 need take write lock of ->i_data_sem, when it allocate the free space,
> but in order to avoid truncation and hole punch during dio, it need take the read
> lock of ->i_data_sem before it release ->i_mutex, that is if it isn't a overwrite,
> deadlock will happen, so the unlocked dio of ext4 should be a overwrite. But btrfs
> doesn't have such limitation.
> 
> Thanks
> Miao
> 
>>
>> thanks,
>> liubo
>>
>>>
>>> We needn't worry about the race between dio write and truncate, because the
>>> truncate need wait untill all the dio write end.
>>>
>>> And we also needn't worry about the race between dio write and punch hole,
>>> because we have extent lock to protect our operation.
>>>
>>> I ran fio to test the performance of this feature.
>>>
>>> == Hardware ==
>>> CPU: Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz
>>> Mem: 2GB
>>> SSD: Intel X25-M 120GB (Test Partition: 60GB)
>>>
>>> == config file ==
>>> [global]
>>> ioengine=psync
>>> direct=1
>>> bs=4k
>>> size=32G
>>> runtime=60
>>> directory=/mnt/btrfs/
>>> filename=testfile
>>> group_reporting
>>> thread
>>>
>>> [file1]
>>> numjobs=1 # 2 4
>>> rw=randwrite
>>>
>>> == result (KBps) ==
>>> write	1	2	4
>>> lock	24936	24738	24726
>>> nolock	24962	30866	32101
>>>
>>> == result (iops) ==
>>> write	1	2	4
>>> lock	6234	6184	6181
>>> nolock	6240	7716	8025
>>>
>>> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
>>> ---
>>>  fs/btrfs/inode.c | 24 +++++++++++++-----------
>>>  1 file changed, 13 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>> index d17a04b..091593a 100644
>>> --- a/fs/btrfs/inode.c
>>> +++ b/fs/btrfs/inode.c
>>> @@ -6589,31 +6589,33 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb,
>>>  	struct file *file = iocb->ki_filp;
>>>  	struct inode *inode = file->f_mapping->host;
>>>  	int flags = 0;
>>> -	bool wakeup = false;
>>> +	bool wakeup = true;
>>>  	int ret;
>>>  
>>>  	if (check_direct_IO(BTRFS_I(inode)->root, rw, iocb, iov,
>>>  			    offset, nr_segs))
>>>  		return 0;
>>>  
>>> -	if (rw == READ) {
>>> -		atomic_inc(&inode->i_dio_count);
>>> -		smp_mb__after_atomic_inc();
>>> -		if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
>>> -				      &BTRFS_I(inode)->runtime_flags))) {
>>> -			inode_dio_done(inode);
>>> -			flags = DIO_LOCKING | DIO_SKIP_HOLES;
>>> -		} else {
>>> -			wakeup = true;
>>> -		}
>>> +	atomic_inc(&inode->i_dio_count);
>>> +	smp_mb__after_atomic_inc();
>>> +	if (rw == WRITE) {
>>> +		mutex_unlock(&inode->i_mutex);
>>> +	} else if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
>>> +				     &BTRFS_I(inode)->runtime_flags))) {
>>> +		inode_dio_done(inode);
>>> +		flags = DIO_LOCKING | DIO_SKIP_HOLES;
>>> +		wakeup = false;
>>>  	}
>>>  
>>>  	ret = __blockdev_direct_IO(rw, iocb, inode,
>>>  			BTRFS_I(inode)->root->fs_info->fs_devices->latest_bdev,
>>>  			iov, offset, nr_segs, btrfs_get_blocks_direct, NULL,
>>>  			btrfs_submit_direct, flags);
>>> +
>>>  	if (wakeup)
>>>  		inode_dio_done(inode);
>>> +	if (rw == WRITE)
>>> +		mutex_lock(&inode->i_mutex);
>>>  	return ret;
>>>  }
>>>  
>>> -- 
>>> 1.7.11.7
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


      reply	other threads:[~2013-02-01  7:38 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-31  9:39 [RFC][PATCH 2/2] Btrfs: implement unlocked dio write Miao Xie
2013-01-31 16:42 ` Josef Bacik
2013-02-01  2:53 ` Liu Bo
2013-02-01  4:08   ` Miao Xie
2013-02-01  7:39     ` Miao Xie [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=510B7126.6070607@cn.fujitsu.com \
    --to=miaox@cn.fujitsu.com \
    --cc=bo.li.liu@oracle.com \
    --cc=jbacik@fusionio.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).