From: Zhang Yi <yi.zhang@huaweicloud.com>
To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, djwong@kernel.org,
hch@infradead.org, brauner@kernel.org, david@fromorbit.com,
chandanbabu@kernel.org, jack@suse.cz, yi.zhang@huawei.com,
yi.zhang@huaweicloud.com, chengzhihao1@huawei.com,
yukuai3@huawei.com
Subject: [PATCH -next v5 7/8] xfs: speed up truncating down a big realtime inode
Date: Thu, 13 Jun 2024 17:00:32 +0800 [thread overview]
Message-ID: <20240613090033.2246907-8-yi.zhang@huaweicloud.com> (raw)
In-Reply-To: <20240613090033.2246907-1-yi.zhang@huaweicloud.com>
From: Zhang Yi <yi.zhang@huawei.com>
If we truncate down a big realtime inode, zero out the entire aligned
EOF extent could gets slow down as the rtextsize increases. Fortunately,
__xfs_bunmapi() would align the unmapped range to rtextsize, split and
convert the blocks beyond EOF to unwritten. So speed up this by
adjusting the unitsize to the filesystem blocksize when truncating down
a large realtime inode, let __xfs_bunmapi() convert the tail blocks to
unwritten, this could improve the performance significantly.
# mkfs.xfs -f -rrtdev=/dev/pmem1s -f -m reflink=0,rmapbt=0, \
-d rtinherit=1 -r extsize=$rtextsize /dev/pmem2s
# mount -ortdev=/dev/pmem1s /dev/pmem2s /mnt/scratch
# for i in {1..1000}; \
do dd if=/dev/zero of=/mnt/scratch/$i bs=$rtextsize count=1024; done
# sync
# time for i in {1..1000}; \
do xfs_io -c "truncate 4k" /mnt/scratch/$i; done
rtextsize 8k 16k 32k 64k 256k 1024k
before: 9.601s 10.229s 11.153s 12.086s 12.259s 20.141s
after: 9.710s 9.642s 9.958s 9.441s 10.021s 10.526s
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
fs/xfs/xfs_inode.c | 10 ++++++++--
fs/xfs/xfs_iops.c | 9 +++++++++
2 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 92daa2279053..5e837ed093b0 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1487,6 +1487,7 @@ xfs_itruncate_extents_flags(
struct xfs_trans *tp = *tpp;
xfs_fileoff_t first_unmap_block;
int error = 0;
+ unsigned int unitsize = xfs_inode_alloc_unitsize(ip);
xfs_assert_ilocked(ip, XFS_ILOCK_EXCL);
if (atomic_read(&VFS_I(ip)->i_count))
@@ -1510,9 +1511,14 @@ xfs_itruncate_extents_flags(
*
* We have to free all the blocks to the bmbt maximum offset, even if
* the page cache can't scale that far.
+ *
+ * For big realtime inode, don't aligned to allocation unitsize,
+ * it'll split the extent and convert the tail blocks to unwritten.
*/
- first_unmap_block = XFS_B_TO_FSB(mp,
- roundup_64(new_size, xfs_inode_alloc_unitsize(ip)));
+ if (xfs_inode_has_bigrtalloc(ip))
+ unitsize = i_blocksize(VFS_I(ip));
+ first_unmap_block = XFS_B_TO_FSB(mp, roundup_64(new_size, unitsize));
+
if (!xfs_verify_fileoff(mp, first_unmap_block)) {
WARN_ON_ONCE(first_unmap_block > XFS_MAX_FILEOFF);
return 0;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 8af13fd37f1b..1903c06d39bc 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -862,6 +862,15 @@ xfs_setattr_truncate_data(
/* Truncate down */
blocksize = xfs_inode_alloc_unitsize(ip);
+ /*
+ * If it's a big realtime inode, zero out the entire EOF extent could
+ * get slow down as the rtextsize increases, speed it up by adjusting
+ * the blocksize to the filesystem blocksize, let __xfs_bunmapi() to
+ * split the extent and convert the tail blocks to unwritten.
+ */
+ if (xfs_inode_has_bigrtalloc(ip))
+ blocksize = i_blocksize(inode);
+
/*
* iomap won't detect a dirty page over an unwritten block (or a cow
* block over a hole) and subsequently skips zeroing the newly post-EOF
--
2.39.2
next prev parent reply other threads:[~2024-06-13 9:01 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-13 9:00 [PATCH -next v5 0/8] iomap/xfs: fix stale data exposure when truncating realtime inodes Zhang Yi
2024-06-13 9:00 ` [PATCH -next v5 1/8] math64: add rem_u64() to just return the remainder Zhang Yi
2024-06-13 9:00 ` [PATCH -next v5 2/8] iomap: pass blocksize to iomap_truncate_page() Zhang Yi
2024-06-14 5:56 ` Christoph Hellwig
2024-06-13 9:00 ` [PATCH -next v5 3/8] fsdax: pass blocksize to dax_truncate_page() Zhang Yi
2024-06-13 9:00 ` [PATCH -next v5 4/8] xfs: refactor the truncating order Zhang Yi
2024-06-13 9:00 ` [PATCH -next v5 5/8] xfs: correct the truncate blocksize of realtime inode Zhang Yi
2024-06-13 9:00 ` [PATCH -next v5 6/8] xfs: reserve blocks for truncating large " Zhang Yi
2024-06-14 5:58 ` Christoph Hellwig
2024-06-13 9:00 ` Zhang Yi [this message]
2024-06-14 6:08 ` [PATCH -next v5 7/8] xfs: speed up truncating down a big " Christoph Hellwig
2024-06-14 7:18 ` Zhang Yi
2024-06-14 9:13 ` Christoph Hellwig
2024-06-15 11:44 ` Zhang Yi
2024-06-17 6:59 ` Christoph Hellwig
2024-06-17 9:11 ` Zhang Yi
2024-06-13 9:00 ` [PATCH -next v5 8/8] iomap: don't increase i_size in iomap_write_end() Zhang Yi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240613090033.2246907-8-yi.zhang@huaweicloud.com \
--to=yi.zhang@huaweicloud.com \
--cc=brauner@kernel.org \
--cc=chandanbabu@kernel.org \
--cc=chengzhihao1@huawei.com \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=yi.zhang@huawei.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).