From: Omar Sandoval <osandov@osandov.com>
To: robbieko <robbieko@synology.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] Btrfs: implement unlocked buffered write
Date: Tue, 22 May 2018 10:28:43 -0700 [thread overview]
Message-ID: <20180522172843.GA9536@vader> (raw)
In-Reply-To: <1526442757-7167-1-git-send-email-robbieko@synology.com>
On Wed, May 16, 2018 at 11:52:37AM +0800, robbieko wrote:
> From: Robbie Ko <robbieko@synology.com>
>
> This idea is from direct io. By this patch, we can make the buffered
> write parallel, and improve the performance and latency. But because we
> can not update isize without i_mutex, the unlocked buffered write just
> can be done in front of the EOF.
>
> We needn't worry about the race between buffered write and truncate,
> because the truncate need wait until all the buffered write end.
I'm not convinced that this race isn't an issue. Consider this:
__btrfs_buffered_write() btrfs_setsize()
inode_dio_begin()
inode_unlock()
inode_lock()
truncate_setsize() /* Updates i_size */
truncate_pagecache() /* Locks and unlocks pages */
inode_dio_wait()
prepare_pages() /* Locks pages */
btrfs_dirty_pages() /* Updates i_size without i_rwsem! */
I think moving inode_dio_wait() to before truncate_setsize() in
btrfs_setsize() would close this race, but I haven't thought about it
long enough to determine whether that still works for the original
reason the inode_dio_wait() was added.
> And we also needn't worry about the race between dio write and punch hole,
> because we have extent lock to protect our operation.
>
> I ran fio to test the performance of this feature.
>
> == Hardware ==
> CPU: Intel® Xeon® D-1531
> SSD: Intel S3710 200G
> Volume : RAID 5 , SSD * 6
>
> == config file ==
> [global]
> group_reporting
> time_based
> thread=1
> norandommap
> ioengine=libaio
> bs=4k
> iodepth=32
> size=16G
> runtime=180
> numjobs=8
> rw=randwrite
>
> [file1]
> filename=/mnt/btrfs/nocow/testfile
>
> == result (iops) ==
> lock = 68470
> unlocked = 94242
>
> == result (clat) ==
> lock
> lat (usec): min=184, max=1209.9K, avg=3738.35, stdev=20869.49
> clat percentiles (usec):
> | 1.00th=[ 322], 5.00th=[ 330], 10.00th=[ 334], 20.00th=[ 346],
> | 30.00th=[ 370], 40.00th=[ 386], 50.00th=[ 406], 60.00th=[ 446],
> | 70.00th=[ 516], 80.00th=[ 612], 90.00th=[ 828], 95.00th=[10432],
> | 99.00th=[84480], 99.50th=[117248], 99.90th=[226304], 99.95th=[333824],
> | 99.99th=[692224]
>
> unlocked
> lat (usec): min=10, max=218208, avg=2691.44, stdev=5380.82
> clat percentiles (usec):
> | 1.00th=[ 302], 5.00th=[ 390], 10.00th=[ 442], 20.00th=[ 502],
> | 30.00th=[ 548], 40.00th=[ 596], 50.00th=[ 652], 60.00th=[ 724],
> | 70.00th=[ 916], 80.00th=[ 5024], 90.00th=[ 5664], 95.00th=[10048],
> | 99.00th=[29568], 99.50th=[39168], 99.90th=[54016], 99.95th=[59648],
> | 99.99th=[78336]
>
> Signed-off-by: Robbie Ko <robbieko@synology.com>
> ---
> fs/btrfs/file.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 41ab907..8eac540 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1600,6 +1600,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
> int ret = 0;
> bool only_release_metadata = false;
> bool force_page_uptodate = false;
> + bool relock = false;
>
> nrptrs = min(DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE),
> PAGE_SIZE / (sizeof(struct page *)));
> @@ -1609,6 +1610,18 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
> if (!pages)
> return -ENOMEM;
>
> + inode_dio_begin(inode);
This would need a comment as to why we're using inode_dio_begin() for
buffered I/O.
> + /*
> + * If the write is beyond the EOF, we need update
> + * the isize, but it is protected by i_mutex. So we can
> + * not unlock the i_mutex at this case.
> + */
> + if (pos + iov_iter_count(i) <= i_size_read(inode)) {
> + inode_unlock(inode);
> + relock = true;
> + }
> +
> while (iov_iter_count(i) > 0) {
> size_t offset = pos & (PAGE_SIZE - 1);
> size_t sector_offset;
> @@ -1808,6 +1821,10 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
> }
> }
>
> + inode_dio_end(inode);
> + if (relock)
> + inode_lock(inode);
> +
> extent_changeset_free(data_reserved);
> return num_written ? num_written : ret;
> }
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2018-05-22 17:28 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-16 3:52 [PATCH] Btrfs: implement unlocked buffered write robbieko
2018-05-22 17:11 ` David Sterba
2018-05-22 17:28 ` Omar Sandoval [this message]
2018-05-23 7:07 ` robbieko
2018-05-22 18:08 ` Christoph Hellwig
2018-05-22 18:31 ` Chris Mason
2018-05-23 6:37 ` Christoph Hellwig
2018-05-23 7:58 ` Nikolay Borisov
2018-05-23 18:01 ` Chris Mason
2018-05-23 7:26 ` robbieko
2018-05-23 15:56 ` Chris Mason
2018-05-24 8:46 ` robbieko
2018-05-24 15:05 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180522172843.GA9536@vader \
--to=osandov@osandov.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=robbieko@synology.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).