linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 2/2] Btrfs: implement unlocked dio write
@ 2013-01-31  9:39 Miao Xie
  2013-01-31 16:42 ` Josef Bacik
  2013-02-01  2:53 ` Liu Bo
  0 siblings, 2 replies; 5+ messages in thread
From: Miao Xie @ 2013-01-31  9:39 UTC (permalink / raw)
  To: Linux Btrfs; +Cc: Josef Bacik

This idea is from ext4. By this patch, we can make the dio write parallel,
and improve the performance.

We needn't worry about the race between dio write and truncate, because the
truncate need wait untill all the dio write end.

And we also needn't worry about the race between dio write and punch hole,
because we have extent lock to protect our operation.

I ran fio to test the performance of this feature.

== Hardware ==
CPU: Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz
Mem: 2GB
SSD: Intel X25-M 120GB (Test Partition: 60GB)

== config file ==
[global]
ioengine=psync
direct=1
bs=4k
size=32G
runtime=60
directory=/mnt/btrfs/
filename=testfile
group_reporting
thread

[file1]
numjobs=1 # 2 4
rw=randwrite

== result (KBps) ==
write	1	2	4
lock	24936	24738	24726
nolock	24962	30866	32101

== result (iops) ==
write	1	2	4
lock	6234	6184	6181
nolock	6240	7716	8025

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
 fs/btrfs/inode.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d17a04b..091593a 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6589,31 +6589,33 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb,
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file->f_mapping->host;
 	int flags = 0;
-	bool wakeup = false;
+	bool wakeup = true;
 	int ret;
 
 	if (check_direct_IO(BTRFS_I(inode)->root, rw, iocb, iov,
 			    offset, nr_segs))
 		return 0;
 
-	if (rw == READ) {
-		atomic_inc(&inode->i_dio_count);
-		smp_mb__after_atomic_inc();
-		if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
-				      &BTRFS_I(inode)->runtime_flags))) {
-			inode_dio_done(inode);
-			flags = DIO_LOCKING | DIO_SKIP_HOLES;
-		} else {
-			wakeup = true;
-		}
+	atomic_inc(&inode->i_dio_count);
+	smp_mb__after_atomic_inc();
+	if (rw == WRITE) {
+		mutex_unlock(&inode->i_mutex);
+	} else if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
+				     &BTRFS_I(inode)->runtime_flags))) {
+		inode_dio_done(inode);
+		flags = DIO_LOCKING | DIO_SKIP_HOLES;
+		wakeup = false;
 	}
 
 	ret = __blockdev_direct_IO(rw, iocb, inode,
 			BTRFS_I(inode)->root->fs_info->fs_devices->latest_bdev,
 			iov, offset, nr_segs, btrfs_get_blocks_direct, NULL,
 			btrfs_submit_direct, flags);
+
 	if (wakeup)
 		inode_dio_done(inode);
+	if (rw == WRITE)
+		mutex_lock(&inode->i_mutex);
 	return ret;
 }
 
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC][PATCH 2/2] Btrfs: implement unlocked dio write
  2013-01-31  9:39 [RFC][PATCH 2/2] Btrfs: implement unlocked dio write Miao Xie
@ 2013-01-31 16:42 ` Josef Bacik
  2013-02-01  2:53 ` Liu Bo
  1 sibling, 0 replies; 5+ messages in thread
From: Josef Bacik @ 2013-01-31 16:42 UTC (permalink / raw)
  To: Miao Xie; +Cc: Linux Btrfs, Josef Bacik

On Thu, Jan 31, 2013 at 02:39:03AM -0700, Miao Xie wrote:
> This idea is from ext4. By this patch, we can make the dio write parallel,
> and improve the performance.
> 
> We needn't worry about the race between dio write and truncate, because the
> truncate need wait untill all the dio write end.
> 
> And we also needn't worry about the race between dio write and punch hole,
> because we have extent lock to protect our operation.
> 
> I ran fio to test the performance of this feature.
> 
> == Hardware ==
> CPU: Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz
> Mem: 2GB
> SSD: Intel X25-M 120GB (Test Partition: 60GB)
> 
> == config file ==
> [global]
> ioengine=psync
> direct=1
> bs=4k
> size=32G
> runtime=60
> directory=/mnt/btrfs/
> filename=testfile
> group_reporting
> thread
> 
> [file1]
> numjobs=1 # 2 4
> rw=randwrite
> 
> == result (KBps) ==
> write	1	2	4
> lock	24936	24738	24726
> nolock	24962	30866	32101
> 
> == result (iops) ==
> write	1	2	4
> lock	6234	6184	6181
> nolock	6240	7716	8025

So the one thing I worry about is interactions with fsync.  I've been depending
on the mutex to keep us from getting screwed by writers coming in while I'm
trying to write out the changed extents.  Could you test this with fsync and
make sure it doesn't break anything?  Thanks,

Josef

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC][PATCH 2/2] Btrfs: implement unlocked dio write
  2013-01-31  9:39 [RFC][PATCH 2/2] Btrfs: implement unlocked dio write Miao Xie
  2013-01-31 16:42 ` Josef Bacik
@ 2013-02-01  2:53 ` Liu Bo
  2013-02-01  4:08   ` Miao Xie
  1 sibling, 1 reply; 5+ messages in thread
From: Liu Bo @ 2013-02-01  2:53 UTC (permalink / raw)
  To: Miao Xie; +Cc: Linux Btrfs, Josef Bacik

On Thu, Jan 31, 2013 at 05:39:03PM +0800, Miao Xie wrote:
> This idea is from ext4. By this patch, we can make the dio write parallel,
> and improve the performance.

Interesting, AFAIK, ext4 can only do nolock dio write on some
conditions(should be a overwrite, file size remains unchanged,
no aligned/buffer io in flight), btrfs is ok without any conditions?

thanks,
liubo

> 
> We needn't worry about the race between dio write and truncate, because the
> truncate need wait untill all the dio write end.
> 
> And we also needn't worry about the race between dio write and punch hole,
> because we have extent lock to protect our operation.
> 
> I ran fio to test the performance of this feature.
> 
> == Hardware ==
> CPU: Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz
> Mem: 2GB
> SSD: Intel X25-M 120GB (Test Partition: 60GB)
> 
> == config file ==
> [global]
> ioengine=psync
> direct=1
> bs=4k
> size=32G
> runtime=60
> directory=/mnt/btrfs/
> filename=testfile
> group_reporting
> thread
> 
> [file1]
> numjobs=1 # 2 4
> rw=randwrite
> 
> == result (KBps) ==
> write	1	2	4
> lock	24936	24738	24726
> nolock	24962	30866	32101
> 
> == result (iops) ==
> write	1	2	4
> lock	6234	6184	6181
> nolock	6240	7716	8025
> 
> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
> ---
>  fs/btrfs/inode.c | 24 +++++++++++++-----------
>  1 file changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index d17a04b..091593a 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6589,31 +6589,33 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb,
>  	struct file *file = iocb->ki_filp;
>  	struct inode *inode = file->f_mapping->host;
>  	int flags = 0;
> -	bool wakeup = false;
> +	bool wakeup = true;
>  	int ret;
>  
>  	if (check_direct_IO(BTRFS_I(inode)->root, rw, iocb, iov,
>  			    offset, nr_segs))
>  		return 0;
>  
> -	if (rw == READ) {
> -		atomic_inc(&inode->i_dio_count);
> -		smp_mb__after_atomic_inc();
> -		if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
> -				      &BTRFS_I(inode)->runtime_flags))) {
> -			inode_dio_done(inode);
> -			flags = DIO_LOCKING | DIO_SKIP_HOLES;
> -		} else {
> -			wakeup = true;
> -		}
> +	atomic_inc(&inode->i_dio_count);
> +	smp_mb__after_atomic_inc();
> +	if (rw == WRITE) {
> +		mutex_unlock(&inode->i_mutex);
> +	} else if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
> +				     &BTRFS_I(inode)->runtime_flags))) {
> +		inode_dio_done(inode);
> +		flags = DIO_LOCKING | DIO_SKIP_HOLES;
> +		wakeup = false;
>  	}
>  
>  	ret = __blockdev_direct_IO(rw, iocb, inode,
>  			BTRFS_I(inode)->root->fs_info->fs_devices->latest_bdev,
>  			iov, offset, nr_segs, btrfs_get_blocks_direct, NULL,
>  			btrfs_submit_direct, flags);
> +
>  	if (wakeup)
>  		inode_dio_done(inode);
> +	if (rw == WRITE)
> +		mutex_lock(&inode->i_mutex);
>  	return ret;
>  }
>  
> -- 
> 1.7.11.7
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC][PATCH 2/2] Btrfs: implement unlocked dio write
  2013-02-01  2:53 ` Liu Bo
@ 2013-02-01  4:08   ` Miao Xie
  2013-02-01  7:39     ` Miao Xie
  0 siblings, 1 reply; 5+ messages in thread
From: Miao Xie @ 2013-02-01  4:08 UTC (permalink / raw)
  To: bo.li.liu; +Cc: Linux Btrfs, Josef Bacik

On 	fri, 1 Feb 2013 10:53:30 +0800, Liu Bo wrote:
> On Thu, Jan 31, 2013 at 05:39:03PM +0800, Miao Xie wrote:
>> This idea is from ext4. By this patch, we can make the dio write parallel,
>> and improve the performance.
> 
> Interesting, AFAIK, ext4 can only do nolock dio write on some
> conditions(should be a overwrite, file size remains unchanged,
> no aligned/buffer io in flight), btrfs is ok without any conditions?

ext4 don't have extent lock, it can not avoid 2 AIO  threads are at work on the same
unwritten block, so it can not use unlocked dio write for unaligned dio/aio. But btrfs
has extent lock, it can avoid this problem.

And ext4 need take write lock of ->i_data_sem, when it allocate the free space,
but in order to avoid truncation and hole punch during dio, it need take the read
lock of ->i_data_sem before it release ->i_mutex, that is if it isn't a overwrite,
deadlock will happen, so the unlocked dio of ext4 should be a overwrite. But btrfs
doesn't have such limitation.

Thanks
Miao

> 
> thanks,
> liubo
> 
>>
>> We needn't worry about the race between dio write and truncate, because the
>> truncate need wait untill all the dio write end.
>>
>> And we also needn't worry about the race between dio write and punch hole,
>> because we have extent lock to protect our operation.
>>
>> I ran fio to test the performance of this feature.
>>
>> == Hardware ==
>> CPU: Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz
>> Mem: 2GB
>> SSD: Intel X25-M 120GB (Test Partition: 60GB)
>>
>> == config file ==
>> [global]
>> ioengine=psync
>> direct=1
>> bs=4k
>> size=32G
>> runtime=60
>> directory=/mnt/btrfs/
>> filename=testfile
>> group_reporting
>> thread
>>
>> [file1]
>> numjobs=1 # 2 4
>> rw=randwrite
>>
>> == result (KBps) ==
>> write	1	2	4
>> lock	24936	24738	24726
>> nolock	24962	30866	32101
>>
>> == result (iops) ==
>> write	1	2	4
>> lock	6234	6184	6181
>> nolock	6240	7716	8025
>>
>> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
>> ---
>>  fs/btrfs/inode.c | 24 +++++++++++++-----------
>>  1 file changed, 13 insertions(+), 11 deletions(-)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index d17a04b..091593a 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -6589,31 +6589,33 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb,
>>  	struct file *file = iocb->ki_filp;
>>  	struct inode *inode = file->f_mapping->host;
>>  	int flags = 0;
>> -	bool wakeup = false;
>> +	bool wakeup = true;
>>  	int ret;
>>  
>>  	if (check_direct_IO(BTRFS_I(inode)->root, rw, iocb, iov,
>>  			    offset, nr_segs))
>>  		return 0;
>>  
>> -	if (rw == READ) {
>> -		atomic_inc(&inode->i_dio_count);
>> -		smp_mb__after_atomic_inc();
>> -		if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
>> -				      &BTRFS_I(inode)->runtime_flags))) {
>> -			inode_dio_done(inode);
>> -			flags = DIO_LOCKING | DIO_SKIP_HOLES;
>> -		} else {
>> -			wakeup = true;
>> -		}
>> +	atomic_inc(&inode->i_dio_count);
>> +	smp_mb__after_atomic_inc();
>> +	if (rw == WRITE) {
>> +		mutex_unlock(&inode->i_mutex);
>> +	} else if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
>> +				     &BTRFS_I(inode)->runtime_flags))) {
>> +		inode_dio_done(inode);
>> +		flags = DIO_LOCKING | DIO_SKIP_HOLES;
>> +		wakeup = false;
>>  	}
>>  
>>  	ret = __blockdev_direct_IO(rw, iocb, inode,
>>  			BTRFS_I(inode)->root->fs_info->fs_devices->latest_bdev,
>>  			iov, offset, nr_segs, btrfs_get_blocks_direct, NULL,
>>  			btrfs_submit_direct, flags);
>> +
>>  	if (wakeup)
>>  		inode_dio_done(inode);
>> +	if (rw == WRITE)
>> +		mutex_lock(&inode->i_mutex);
>>  	return ret;
>>  }
>>  
>> -- 
>> 1.7.11.7
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC][PATCH 2/2] Btrfs: implement unlocked dio write
  2013-02-01  4:08   ` Miao Xie
@ 2013-02-01  7:39     ` Miao Xie
  0 siblings, 0 replies; 5+ messages in thread
From: Miao Xie @ 2013-02-01  7:39 UTC (permalink / raw)
  To: bo.li.liu, Josef Bacik; +Cc: Linux Btrfs

On 	fri, 01 Feb 2013 12:08:25 +0800, Miao Xie wrote:
> On 	fri, 1 Feb 2013 10:53:30 +0800, Liu Bo wrote:
>> On Thu, Jan 31, 2013 at 05:39:03PM +0800, Miao Xie wrote:
>>> This idea is from ext4. By this patch, we can make the dio write parallel,
>>> and improve the performance.
>>
>> Interesting, AFAIK, ext4 can only do nolock dio write on some
>> conditions(should be a overwrite, file size remains unchanged,
>> no aligned/buffer io in flight), btrfs is ok without any conditions?
> 
> ext4 don't have extent lock, it can not avoid 2 AIO  threads are at work on the same
> unwritten block, so it can not use unlocked dio write for unaligned dio/aio. But btrfs
> has extent lock, it can avoid this problem.

Besides that, btrfs doesn't allow doing a unaligned dio/aio.

I read the code again, found there is a race that several tasks may update i_size at
the same time. There are two methods to fix this problem:
1. just like ext4, don't do unlocked write dio if it is beyond the end of the file
2. use a spin lock to protect i_size update

I want to choose the 2nd one.

Thanks
Miao

> 
> And ext4 need take write lock of ->i_data_sem, when it allocate the free space,
> but in order to avoid truncation and hole punch during dio, it need take the read
> lock of ->i_data_sem before it release ->i_mutex, that is if it isn't a overwrite,
> deadlock will happen, so the unlocked dio of ext4 should be a overwrite. But btrfs
> doesn't have such limitation.
> 
> Thanks
> Miao
> 
>>
>> thanks,
>> liubo
>>
>>>
>>> We needn't worry about the race between dio write and truncate, because the
>>> truncate need wait untill all the dio write end.
>>>
>>> And we also needn't worry about the race between dio write and punch hole,
>>> because we have extent lock to protect our operation.
>>>
>>> I ran fio to test the performance of this feature.
>>>
>>> == Hardware ==
>>> CPU: Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz
>>> Mem: 2GB
>>> SSD: Intel X25-M 120GB (Test Partition: 60GB)
>>>
>>> == config file ==
>>> [global]
>>> ioengine=psync
>>> direct=1
>>> bs=4k
>>> size=32G
>>> runtime=60
>>> directory=/mnt/btrfs/
>>> filename=testfile
>>> group_reporting
>>> thread
>>>
>>> [file1]
>>> numjobs=1 # 2 4
>>> rw=randwrite
>>>
>>> == result (KBps) ==
>>> write	1	2	4
>>> lock	24936	24738	24726
>>> nolock	24962	30866	32101
>>>
>>> == result (iops) ==
>>> write	1	2	4
>>> lock	6234	6184	6181
>>> nolock	6240	7716	8025
>>>
>>> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
>>> ---
>>>  fs/btrfs/inode.c | 24 +++++++++++++-----------
>>>  1 file changed, 13 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>> index d17a04b..091593a 100644
>>> --- a/fs/btrfs/inode.c
>>> +++ b/fs/btrfs/inode.c
>>> @@ -6589,31 +6589,33 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb,
>>>  	struct file *file = iocb->ki_filp;
>>>  	struct inode *inode = file->f_mapping->host;
>>>  	int flags = 0;
>>> -	bool wakeup = false;
>>> +	bool wakeup = true;
>>>  	int ret;
>>>  
>>>  	if (check_direct_IO(BTRFS_I(inode)->root, rw, iocb, iov,
>>>  			    offset, nr_segs))
>>>  		return 0;
>>>  
>>> -	if (rw == READ) {
>>> -		atomic_inc(&inode->i_dio_count);
>>> -		smp_mb__after_atomic_inc();
>>> -		if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
>>> -				      &BTRFS_I(inode)->runtime_flags))) {
>>> -			inode_dio_done(inode);
>>> -			flags = DIO_LOCKING | DIO_SKIP_HOLES;
>>> -		} else {
>>> -			wakeup = true;
>>> -		}
>>> +	atomic_inc(&inode->i_dio_count);
>>> +	smp_mb__after_atomic_inc();
>>> +	if (rw == WRITE) {
>>> +		mutex_unlock(&inode->i_mutex);
>>> +	} else if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
>>> +				     &BTRFS_I(inode)->runtime_flags))) {
>>> +		inode_dio_done(inode);
>>> +		flags = DIO_LOCKING | DIO_SKIP_HOLES;
>>> +		wakeup = false;
>>>  	}
>>>  
>>>  	ret = __blockdev_direct_IO(rw, iocb, inode,
>>>  			BTRFS_I(inode)->root->fs_info->fs_devices->latest_bdev,
>>>  			iov, offset, nr_segs, btrfs_get_blocks_direct, NULL,
>>>  			btrfs_submit_direct, flags);
>>> +
>>>  	if (wakeup)
>>>  		inode_dio_done(inode);
>>> +	if (rw == WRITE)
>>> +		mutex_lock(&inode->i_mutex);
>>>  	return ret;
>>>  }
>>>  
>>> -- 
>>> 1.7.11.7
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-02-01  7:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-31  9:39 [RFC][PATCH 2/2] Btrfs: implement unlocked dio write Miao Xie
2013-01-31 16:42 ` Josef Bacik
2013-02-01  2:53 ` Liu Bo
2013-02-01  4:08   ` Miao Xie
2013-02-01  7:39     ` Miao Xie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).