From: Wang Yugui <wangyugui@e16-tech.com>
To: Christoph Hellwig <hch@lst.de>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: btrfs write-bandwidth performance regression of 6.5-rc4/rc3
Date: Tue, 01 Aug 2023 21:04:05 +0800 [thread overview]
Message-ID: <20230801210400.F0DE.409509F4@e16-tech.com> (raw)
In-Reply-To: <20230801100006.GA30042@lst.de>
Hi,
> On Tue, Aug 01, 2023 at 05:32:13PM +0800, Wang Yugui wrote:
> > dmesg output:
> > [ 250.596544] raid6: skipped pq benchmark and selected sse2x4
> > [ 250.602836] raid6: using ssse3x2 recovery algorithm
> > [ 250.612812] xor: automatically using best checksumming function avx
> > [ 250.895573] Btrfs loaded, assert=on, zoned=yes, fsverity=no
> > [ 250.905249] BTRFS: device fsid f5ebfdd6-6bf6-4c2b-b47b-79517bc00c8f devid 3 transid 6 /dev/nvme3n1 scanned by systemd-udevd (1726)
> > [ 250.922155] BTRFS: device fsid f5ebfdd6-6bf6-4c2b-b47b-79517bc00c8f devid 4 transid 6 /dev/nvme0n1 scanned by systemd-udevd (1729)
> > [ 250.935965] BTRFS: device fsid f5ebfdd6-6bf6-4c2b-b47b-79517bc00c8f devid 1 transid 6 /dev/nvme1n1 scanned by systemd-udevd (1724)
> > [ 250.968268] BTRFS: device fsid f5ebfdd6-6bf6-4c2b-b47b-79517bc00c8f devid 2 transid 6 /dev/nvme2n1 scanned by systemd-udevd (1723)
> > [ 251.070139] BTRFS info (device nvme1n1): using crc32c (crc32c-intel) checksum algorithm
>
> So this is using the normal accelerated crc32c algorith that sets
> BTRFS_FS_CSUM_IMPL_FAST. Which means the commit doesn't change
> behavior in should_async_write, which is the only place that checks
> the sync_writers flag. Can your retry the bisetion or apply the patch
> below for a revert on top of latest mainline?
bad performance
6.5.0-rc4 with the revert of e917ff56c8e7b117b590632fa40a08e36577d31f
so I redo the bisetion.
bad performance:( same as prev report)
6.4.0 + patches until e917ff56c8e7b117b590632fa40a08e36577d31f
bad preformance ( good performance in prev report )
6.4.0 +patches before e917ff56c8e7b117b590632fa40a08e36577d31f
good performance
drop 'btrfs: submit IO synchronously for fast checksum implementations' too
6.4.0 + patches until ' btrfs: use SECTOR_SHIFT to convert LBA to physical offset'
but I have tested 6.1.y with a patch almost same as
'btrfs: submit IO synchronously for fast checksum implementations'
for over 20+ times, no performance regression found.
static bool should_async_write(struct btrfs_fs_info *fs_info,
struct btrfs_inode *bi)
{
+ // should_async_write() only called by btrfs_submit_metadata_bio(), it means REQ_META
+ if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags))
+ return false;
if (btrfs_is_zoned(fs_info))
return false;
if (atomic_read(&bi->sync_writers))
return false;
- if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags))
- return false;
return true;
}
BTW, I checked memory ECC status, no error is reported.
Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2023/08/01
> ---
> From 9bdae7bbe4144b9bb49a28a4ee1de5c0f81f9b81 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch@lst.de>
> Date: Tue, 1 Aug 2023 10:27:25 +0200
> Subject: Revert "btrfs: determine synchronous writers from bio or writeback
> control"
>
> This reverts commit e917ff56c8e7b117b590632fa40a08e36577d31f.
> ---
> fs/btrfs/bio.c | 7 ++++---
> fs/btrfs/btrfs_inode.h | 3 +++
> fs/btrfs/file.c | 8 ++++++++
> fs/btrfs/inode.c | 1 +
> fs/btrfs/transaction.c | 2 ++
> 5 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
> index 12b12443efaabb..8fecf4e84da2bf 100644
> --- a/fs/btrfs/bio.c
> +++ b/fs/btrfs/bio.c
> @@ -602,10 +602,11 @@ static bool should_async_write(struct btrfs_bio *bbio)
> return false;
>
> /*
> - * Try to defer the submission to a workqueue to parallelize the
> - * checksum calculation unless the I/O is issued synchronously.
> + * If the I/O is not issued by fsync and friends, (->sync_writers != 0),
> + * then try to defer the submission to a workqueue to parallelize the
> + * checksum calculation.
> */
> - if (op_is_sync(bbio->bio.bi_opf))
> + if (atomic_read(&bbio->inode->sync_writers))
> return false;
>
> /* Zoned devices require I/O to be submitted in order. */
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index d47a927b3504d6..4efe895359dcf8 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -116,6 +116,9 @@ struct btrfs_inode {
>
> unsigned long runtime_flags;
>
> + /* Keep track of who's O_SYNC/fsyncing currently */
> + atomic_t sync_writers;
> +
> /* full 64 bit generation number, struct vfs_inode doesn't have a big
> * enough field for this.
> */
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index fd03e689a6bedc..3e37a62a6b5db7 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1648,6 +1648,7 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
> struct file *file = iocb->ki_filp;
> struct btrfs_inode *inode = BTRFS_I(file_inode(file));
> ssize_t num_written, num_sync;
> + const bool sync = iocb_is_dsync(iocb);
>
> /*
> * If the fs flips readonly due to some impossible error, although we
> @@ -1660,6 +1661,9 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
> if (encoded && (iocb->ki_flags & IOCB_NOWAIT))
> return -EOPNOTSUPP;
>
> + if (sync)
> + atomic_inc(&inode->sync_writers);
> +
> if (encoded) {
> num_written = btrfs_encoded_write(iocb, from, encoded);
> num_sync = encoded->len;
> @@ -1679,6 +1683,8 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
> num_written = num_sync;
> }
>
> + if (sync)
> + atomic_dec(&inode->sync_writers);
> return num_written;
> }
>
> @@ -1722,7 +1728,9 @@ static int start_ordered_ops(struct inode *inode, loff_t start, loff_t end)
> * several segments of stripe length (currently 64K).
> */
> blk_start_plug(&plug);
> + atomic_inc(&BTRFS_I(inode)->sync_writers);
> ret = btrfs_fdatawrite_range(inode, start, end);
> + atomic_dec(&BTRFS_I(inode)->sync_writers);
> blk_finish_plug(&plug);
>
> return ret;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 49cef61f6a39f5..b9bad13ab75d19 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -8618,6 +8618,7 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
> ei->io_tree.inode = ei;
> extent_io_tree_init(fs_info, &ei->file_extent_tree,
> IO_TREE_INODE_FILE_EXTENT);
> + atomic_set(&ei->sync_writers, 0);
> mutex_init(&ei->log_mutex);
> btrfs_ordered_inode_tree_init(&ei->ordered_tree);
> INIT_LIST_HEAD(&ei->delalloc_inodes);
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 91b6c2fdc420e7..cda2b86de18814 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -1060,6 +1060,7 @@ int btrfs_write_marked_extents(struct btrfs_fs_info *fs_info,
> u64 start = 0;
> u64 end;
>
> + atomic_inc(&BTRFS_I(fs_info->btree_inode)->sync_writers);
> while (!find_first_extent_bit(dirty_pages, start, &start, &end,
> mark, &cached_state)) {
> bool wait_writeback = false;
> @@ -1095,6 +1096,7 @@ int btrfs_write_marked_extents(struct btrfs_fs_info *fs_info,
> cond_resched();
> start = end + 1;
> }
> + atomic_dec(&BTRFS_I(fs_info->btree_inode)->sync_writers);
> return werr;
> }
>
> --
> 2.39.2
next prev parent reply other threads:[~2023-08-01 13:04 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-31 7:22 btrfs write-bandwidth performance regression of 6.5-rc4/rc3 Wang Yugui
2023-08-01 2:22 ` Wang Yugui
2023-08-01 8:35 ` Christoph Hellwig
2023-08-01 8:56 ` Wang Yugui
2023-08-01 9:03 ` Christoph Hellwig
2023-08-01 9:32 ` Wang Yugui
2023-08-01 10:00 ` Christoph Hellwig
2023-08-01 13:04 ` Wang Yugui [this message]
2023-08-01 14:59 ` Christoph Hellwig
2023-08-01 15:51 ` Wang Yugui
2023-08-01 15:56 ` Christoph Hellwig
2023-08-01 15:57 ` Christoph Hellwig
2023-08-02 0:04 ` Wang Yugui
2023-08-02 9:26 ` Christoph Hellwig
2023-08-11 8:58 ` Linux regression tracking (Thorsten Leemhuis)
2023-08-11 10:31 ` Christoph Hellwig
2023-08-11 14:23 ` Wang Yugui
2023-08-11 14:52 ` Chris Mason
2023-08-13 9:50 ` Wang Yugui
2023-08-29 9:45 ` Linux regression tracking (Thorsten Leemhuis)
2023-09-11 7:02 ` Thorsten Leemhuis
2023-09-11 23:20 ` Wang Yugui
2023-09-12 7:58 ` Linux regression tracking (Thorsten Leemhuis)
2023-09-26 10:55 ` Thorsten Leemhuis
2023-09-26 17:18 ` Chris Mason
2023-09-27 11:30 ` Linux regression tracking (Thorsten Leemhuis)
2023-12-06 14:22 ` Linux regression tracking (Thorsten Leemhuis)
2023-12-13 15:57 ` Naohiro Aota
2023-08-02 8:45 ` Linux regression tracking #adding (Thorsten Leemhuis)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230801210400.F0DE.409509F4@e16-tech.com \
--to=wangyugui@e16-tech.com \
--cc=hch@lst.de \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox