* [RFC][PATCH 2/2] Btrfs: implement unlocked dio write
@ 2013-01-31 9:39 Miao Xie
2013-01-31 16:42 ` Josef Bacik
2013-02-01 2:53 ` Liu Bo
0 siblings, 2 replies; 5+ messages in thread
From: Miao Xie @ 2013-01-31 9:39 UTC (permalink / raw)
To: Linux Btrfs; +Cc: Josef Bacik
This idea is from ext4. By this patch, we can make the dio write parallel,
and improve the performance.
We needn't worry about the race between dio write and truncate, because the
truncate need wait untill all the dio write end.
And we also needn't worry about the race between dio write and punch hole,
because we have extent lock to protect our operation.
I ran fio to test the performance of this feature.
== Hardware ==
CPU: Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz
Mem: 2GB
SSD: Intel X25-M 120GB (Test Partition: 60GB)
== config file ==
[global]
ioengine=psync
direct=1
bs=4k
size=32G
runtime=60
directory=/mnt/btrfs/
filename=testfile
group_reporting
thread
[file1]
numjobs=1 # 2 4
rw=randwrite
== result (KBps) ==
write 1 2 4
lock 24936 24738 24726
nolock 24962 30866 32101
== result (iops) ==
write 1 2 4
lock 6234 6184 6181
nolock 6240 7716 8025
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
fs/btrfs/inode.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d17a04b..091593a 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6589,31 +6589,33 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb,
struct file *file = iocb->ki_filp;
struct inode *inode = file->f_mapping->host;
int flags = 0;
- bool wakeup = false;
+ bool wakeup = true;
int ret;
if (check_direct_IO(BTRFS_I(inode)->root, rw, iocb, iov,
offset, nr_segs))
return 0;
- if (rw == READ) {
- atomic_inc(&inode->i_dio_count);
- smp_mb__after_atomic_inc();
- if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
- &BTRFS_I(inode)->runtime_flags))) {
- inode_dio_done(inode);
- flags = DIO_LOCKING | DIO_SKIP_HOLES;
- } else {
- wakeup = true;
- }
+ atomic_inc(&inode->i_dio_count);
+ smp_mb__after_atomic_inc();
+ if (rw == WRITE) {
+ mutex_unlock(&inode->i_mutex);
+ } else if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
+ &BTRFS_I(inode)->runtime_flags))) {
+ inode_dio_done(inode);
+ flags = DIO_LOCKING | DIO_SKIP_HOLES;
+ wakeup = false;
}
ret = __blockdev_direct_IO(rw, iocb, inode,
BTRFS_I(inode)->root->fs_info->fs_devices->latest_bdev,
iov, offset, nr_segs, btrfs_get_blocks_direct, NULL,
btrfs_submit_direct, flags);
+
if (wakeup)
inode_dio_done(inode);
+ if (rw == WRITE)
+ mutex_lock(&inode->i_mutex);
return ret;
}
--
1.7.11.7
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC][PATCH 2/2] Btrfs: implement unlocked dio write
2013-01-31 9:39 [RFC][PATCH 2/2] Btrfs: implement unlocked dio write Miao Xie
@ 2013-01-31 16:42 ` Josef Bacik
2013-02-01 2:53 ` Liu Bo
1 sibling, 0 replies; 5+ messages in thread
From: Josef Bacik @ 2013-01-31 16:42 UTC (permalink / raw)
To: Miao Xie; +Cc: Linux Btrfs, Josef Bacik
On Thu, Jan 31, 2013 at 02:39:03AM -0700, Miao Xie wrote:
> This idea is from ext4. By this patch, we can make the dio write parallel,
> and improve the performance.
>
> We needn't worry about the race between dio write and truncate, because the
> truncate need wait untill all the dio write end.
>
> And we also needn't worry about the race between dio write and punch hole,
> because we have extent lock to protect our operation.
>
> I ran fio to test the performance of this feature.
>
> == Hardware ==
> CPU: Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz
> Mem: 2GB
> SSD: Intel X25-M 120GB (Test Partition: 60GB)
>
> == config file ==
> [global]
> ioengine=psync
> direct=1
> bs=4k
> size=32G
> runtime=60
> directory=/mnt/btrfs/
> filename=testfile
> group_reporting
> thread
>
> [file1]
> numjobs=1 # 2 4
> rw=randwrite
>
> == result (KBps) ==
> write 1 2 4
> lock 24936 24738 24726
> nolock 24962 30866 32101
>
> == result (iops) ==
> write 1 2 4
> lock 6234 6184 6181
> nolock 6240 7716 8025
So the one thing I worry about is interactions with fsync. I've been depending
on the mutex to keep us from getting screwed by writers coming in while I'm
trying to write out the changed extents. Could you test this with fsync and
make sure it doesn't break anything? Thanks,
Josef
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC][PATCH 2/2] Btrfs: implement unlocked dio write
2013-01-31 9:39 [RFC][PATCH 2/2] Btrfs: implement unlocked dio write Miao Xie
2013-01-31 16:42 ` Josef Bacik
@ 2013-02-01 2:53 ` Liu Bo
2013-02-01 4:08 ` Miao Xie
1 sibling, 1 reply; 5+ messages in thread
From: Liu Bo @ 2013-02-01 2:53 UTC (permalink / raw)
To: Miao Xie; +Cc: Linux Btrfs, Josef Bacik
On Thu, Jan 31, 2013 at 05:39:03PM +0800, Miao Xie wrote:
> This idea is from ext4. By this patch, we can make the dio write parallel,
> and improve the performance.
Interesting, AFAIK, ext4 can only do nolock dio write on some
conditions(should be a overwrite, file size remains unchanged,
no aligned/buffer io in flight), btrfs is ok without any conditions?
thanks,
liubo
>
> We needn't worry about the race between dio write and truncate, because the
> truncate need wait untill all the dio write end.
>
> And we also needn't worry about the race between dio write and punch hole,
> because we have extent lock to protect our operation.
>
> I ran fio to test the performance of this feature.
>
> == Hardware ==
> CPU: Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz
> Mem: 2GB
> SSD: Intel X25-M 120GB (Test Partition: 60GB)
>
> == config file ==
> [global]
> ioengine=psync
> direct=1
> bs=4k
> size=32G
> runtime=60
> directory=/mnt/btrfs/
> filename=testfile
> group_reporting
> thread
>
> [file1]
> numjobs=1 # 2 4
> rw=randwrite
>
> == result (KBps) ==
> write 1 2 4
> lock 24936 24738 24726
> nolock 24962 30866 32101
>
> == result (iops) ==
> write 1 2 4
> lock 6234 6184 6181
> nolock 6240 7716 8025
>
> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
> ---
> fs/btrfs/inode.c | 24 +++++++++++++-----------
> 1 file changed, 13 insertions(+), 11 deletions(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index d17a04b..091593a 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6589,31 +6589,33 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb,
> struct file *file = iocb->ki_filp;
> struct inode *inode = file->f_mapping->host;
> int flags = 0;
> - bool wakeup = false;
> + bool wakeup = true;
> int ret;
>
> if (check_direct_IO(BTRFS_I(inode)->root, rw, iocb, iov,
> offset, nr_segs))
> return 0;
>
> - if (rw == READ) {
> - atomic_inc(&inode->i_dio_count);
> - smp_mb__after_atomic_inc();
> - if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
> - &BTRFS_I(inode)->runtime_flags))) {
> - inode_dio_done(inode);
> - flags = DIO_LOCKING | DIO_SKIP_HOLES;
> - } else {
> - wakeup = true;
> - }
> + atomic_inc(&inode->i_dio_count);
> + smp_mb__after_atomic_inc();
> + if (rw == WRITE) {
> + mutex_unlock(&inode->i_mutex);
> + } else if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
> + &BTRFS_I(inode)->runtime_flags))) {
> + inode_dio_done(inode);
> + flags = DIO_LOCKING | DIO_SKIP_HOLES;
> + wakeup = false;
> }
>
> ret = __blockdev_direct_IO(rw, iocb, inode,
> BTRFS_I(inode)->root->fs_info->fs_devices->latest_bdev,
> iov, offset, nr_segs, btrfs_get_blocks_direct, NULL,
> btrfs_submit_direct, flags);
> +
> if (wakeup)
> inode_dio_done(inode);
> + if (rw == WRITE)
> + mutex_lock(&inode->i_mutex);
> return ret;
> }
>
> --
> 1.7.11.7
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC][PATCH 2/2] Btrfs: implement unlocked dio write
2013-02-01 2:53 ` Liu Bo
@ 2013-02-01 4:08 ` Miao Xie
2013-02-01 7:39 ` Miao Xie
0 siblings, 1 reply; 5+ messages in thread
From: Miao Xie @ 2013-02-01 4:08 UTC (permalink / raw)
To: bo.li.liu; +Cc: Linux Btrfs, Josef Bacik
On fri, 1 Feb 2013 10:53:30 +0800, Liu Bo wrote:
> On Thu, Jan 31, 2013 at 05:39:03PM +0800, Miao Xie wrote:
>> This idea is from ext4. By this patch, we can make the dio write parallel,
>> and improve the performance.
>
> Interesting, AFAIK, ext4 can only do nolock dio write on some
> conditions(should be a overwrite, file size remains unchanged,
> no aligned/buffer io in flight), btrfs is ok without any conditions?
ext4 don't have extent lock, it can not avoid 2 AIO threads are at work on the same
unwritten block, so it can not use unlocked dio write for unaligned dio/aio. But btrfs
has extent lock, it can avoid this problem.
And ext4 need take write lock of ->i_data_sem, when it allocate the free space,
but in order to avoid truncation and hole punch during dio, it need take the read
lock of ->i_data_sem before it release ->i_mutex, that is if it isn't a overwrite,
deadlock will happen, so the unlocked dio of ext4 should be a overwrite. But btrfs
doesn't have such limitation.
Thanks
Miao
>
> thanks,
> liubo
>
>>
>> We needn't worry about the race between dio write and truncate, because the
>> truncate need wait untill all the dio write end.
>>
>> And we also needn't worry about the race between dio write and punch hole,
>> because we have extent lock to protect our operation.
>>
>> I ran fio to test the performance of this feature.
>>
>> == Hardware ==
>> CPU: Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz
>> Mem: 2GB
>> SSD: Intel X25-M 120GB (Test Partition: 60GB)
>>
>> == config file ==
>> [global]
>> ioengine=psync
>> direct=1
>> bs=4k
>> size=32G
>> runtime=60
>> directory=/mnt/btrfs/
>> filename=testfile
>> group_reporting
>> thread
>>
>> [file1]
>> numjobs=1 # 2 4
>> rw=randwrite
>>
>> == result (KBps) ==
>> write 1 2 4
>> lock 24936 24738 24726
>> nolock 24962 30866 32101
>>
>> == result (iops) ==
>> write 1 2 4
>> lock 6234 6184 6181
>> nolock 6240 7716 8025
>>
>> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
>> ---
>> fs/btrfs/inode.c | 24 +++++++++++++-----------
>> 1 file changed, 13 insertions(+), 11 deletions(-)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index d17a04b..091593a 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -6589,31 +6589,33 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb,
>> struct file *file = iocb->ki_filp;
>> struct inode *inode = file->f_mapping->host;
>> int flags = 0;
>> - bool wakeup = false;
>> + bool wakeup = true;
>> int ret;
>>
>> if (check_direct_IO(BTRFS_I(inode)->root, rw, iocb, iov,
>> offset, nr_segs))
>> return 0;
>>
>> - if (rw == READ) {
>> - atomic_inc(&inode->i_dio_count);
>> - smp_mb__after_atomic_inc();
>> - if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
>> - &BTRFS_I(inode)->runtime_flags))) {
>> - inode_dio_done(inode);
>> - flags = DIO_LOCKING | DIO_SKIP_HOLES;
>> - } else {
>> - wakeup = true;
>> - }
>> + atomic_inc(&inode->i_dio_count);
>> + smp_mb__after_atomic_inc();
>> + if (rw == WRITE) {
>> + mutex_unlock(&inode->i_mutex);
>> + } else if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
>> + &BTRFS_I(inode)->runtime_flags))) {
>> + inode_dio_done(inode);
>> + flags = DIO_LOCKING | DIO_SKIP_HOLES;
>> + wakeup = false;
>> }
>>
>> ret = __blockdev_direct_IO(rw, iocb, inode,
>> BTRFS_I(inode)->root->fs_info->fs_devices->latest_bdev,
>> iov, offset, nr_segs, btrfs_get_blocks_direct, NULL,
>> btrfs_submit_direct, flags);
>> +
>> if (wakeup)
>> inode_dio_done(inode);
>> + if (rw == WRITE)
>> + mutex_lock(&inode->i_mutex);
>> return ret;
>> }
>>
>> --
>> 1.7.11.7
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC][PATCH 2/2] Btrfs: implement unlocked dio write
2013-02-01 4:08 ` Miao Xie
@ 2013-02-01 7:39 ` Miao Xie
0 siblings, 0 replies; 5+ messages in thread
From: Miao Xie @ 2013-02-01 7:39 UTC (permalink / raw)
To: bo.li.liu, Josef Bacik; +Cc: Linux Btrfs
On fri, 01 Feb 2013 12:08:25 +0800, Miao Xie wrote:
> On fri, 1 Feb 2013 10:53:30 +0800, Liu Bo wrote:
>> On Thu, Jan 31, 2013 at 05:39:03PM +0800, Miao Xie wrote:
>>> This idea is from ext4. By this patch, we can make the dio write parallel,
>>> and improve the performance.
>>
>> Interesting, AFAIK, ext4 can only do nolock dio write on some
>> conditions(should be a overwrite, file size remains unchanged,
>> no aligned/buffer io in flight), btrfs is ok without any conditions?
>
> ext4 don't have extent lock, it can not avoid 2 AIO threads are at work on the same
> unwritten block, so it can not use unlocked dio write for unaligned dio/aio. But btrfs
> has extent lock, it can avoid this problem.
Besides that, btrfs doesn't allow doing a unaligned dio/aio.
I read the code again, found there is a race that several tasks may update i_size at
the same time. There are two methods to fix this problem:
1. just like ext4, don't do unlocked write dio if it is beyond the end of the file
2. use a spin lock to protect i_size update
I want to choose the 2nd one.
Thanks
Miao
>
> And ext4 need take write lock of ->i_data_sem, when it allocate the free space,
> but in order to avoid truncation and hole punch during dio, it need take the read
> lock of ->i_data_sem before it release ->i_mutex, that is if it isn't a overwrite,
> deadlock will happen, so the unlocked dio of ext4 should be a overwrite. But btrfs
> doesn't have such limitation.
>
> Thanks
> Miao
>
>>
>> thanks,
>> liubo
>>
>>>
>>> We needn't worry about the race between dio write and truncate, because the
>>> truncate need wait untill all the dio write end.
>>>
>>> And we also needn't worry about the race between dio write and punch hole,
>>> because we have extent lock to protect our operation.
>>>
>>> I ran fio to test the performance of this feature.
>>>
>>> == Hardware ==
>>> CPU: Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz
>>> Mem: 2GB
>>> SSD: Intel X25-M 120GB (Test Partition: 60GB)
>>>
>>> == config file ==
>>> [global]
>>> ioengine=psync
>>> direct=1
>>> bs=4k
>>> size=32G
>>> runtime=60
>>> directory=/mnt/btrfs/
>>> filename=testfile
>>> group_reporting
>>> thread
>>>
>>> [file1]
>>> numjobs=1 # 2 4
>>> rw=randwrite
>>>
>>> == result (KBps) ==
>>> write 1 2 4
>>> lock 24936 24738 24726
>>> nolock 24962 30866 32101
>>>
>>> == result (iops) ==
>>> write 1 2 4
>>> lock 6234 6184 6181
>>> nolock 6240 7716 8025
>>>
>>> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
>>> ---
>>> fs/btrfs/inode.c | 24 +++++++++++++-----------
>>> 1 file changed, 13 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>> index d17a04b..091593a 100644
>>> --- a/fs/btrfs/inode.c
>>> +++ b/fs/btrfs/inode.c
>>> @@ -6589,31 +6589,33 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb,
>>> struct file *file = iocb->ki_filp;
>>> struct inode *inode = file->f_mapping->host;
>>> int flags = 0;
>>> - bool wakeup = false;
>>> + bool wakeup = true;
>>> int ret;
>>>
>>> if (check_direct_IO(BTRFS_I(inode)->root, rw, iocb, iov,
>>> offset, nr_segs))
>>> return 0;
>>>
>>> - if (rw == READ) {
>>> - atomic_inc(&inode->i_dio_count);
>>> - smp_mb__after_atomic_inc();
>>> - if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
>>> - &BTRFS_I(inode)->runtime_flags))) {
>>> - inode_dio_done(inode);
>>> - flags = DIO_LOCKING | DIO_SKIP_HOLES;
>>> - } else {
>>> - wakeup = true;
>>> - }
>>> + atomic_inc(&inode->i_dio_count);
>>> + smp_mb__after_atomic_inc();
>>> + if (rw == WRITE) {
>>> + mutex_unlock(&inode->i_mutex);
>>> + } else if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
>>> + &BTRFS_I(inode)->runtime_flags))) {
>>> + inode_dio_done(inode);
>>> + flags = DIO_LOCKING | DIO_SKIP_HOLES;
>>> + wakeup = false;
>>> }
>>>
>>> ret = __blockdev_direct_IO(rw, iocb, inode,
>>> BTRFS_I(inode)->root->fs_info->fs_devices->latest_bdev,
>>> iov, offset, nr_segs, btrfs_get_blocks_direct, NULL,
>>> btrfs_submit_direct, flags);
>>> +
>>> if (wakeup)
>>> inode_dio_done(inode);
>>> + if (rw == WRITE)
>>> + mutex_lock(&inode->i_mutex);
>>> return ret;
>>> }
>>>
>>> --
>>> 1.7.11.7
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-02-01 7:38 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-31 9:39 [RFC][PATCH 2/2] Btrfs: implement unlocked dio write Miao Xie
2013-01-31 16:42 ` Josef Bacik
2013-02-01 2:53 ` Liu Bo
2013-02-01 4:08 ` Miao Xie
2013-02-01 7:39 ` Miao Xie
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).