From: Zhang Yi <yi.zhang@huaweicloud.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org,
djwong@kernel.org, hch@infradead.org, brauner@kernel.org,
chandanbabu@kernel.org, jack@suse.cz, yi.zhang@huawei.com,
chengzhihao1@huawei.com, yukuai3@huawei.com
Subject: Re: [PATCH v3 3/3] xfs: correct the zeroing truncate range
Date: Thu, 23 May 2024 10:00:02 +0800 [thread overview]
Message-ID: <92cec08a-7fe7-1cc7-7a39-7f3d5fbc087b@huaweicloud.com> (raw)
In-Reply-To: <Zk6XqIcO+7+VPn35@dread.disaster.area>
On 2024/5/23 9:11, Dave Chinner wrote:
> On Wed, May 22, 2024 at 09:57:13AM +0800, Zhang Yi wrote:
>> On 2024/5/21 10:38, Dave Chinner wrote:
>>> We can do all this with a single writeback operation if we are a
>>> little bit smarter about the order of operations we perform and we
>>> are a little bit smarter in iomap about zeroing dirty pages in the
>>> page cache:
>>>
>>> 1. change iomap_zero_range() to do the right thing with
>>> dirty unwritten and cow extents (the patch I've been working
>>> on).
>>>
>>> 2. pass the range to be zeroed into iomap_truncate_page()
>>> (the fundamental change being made here).
>>>
>>> 3. zero the required range *through the page cache*
>>> (iomap_zero_range() already does this).
>>>
>>> 4. write back the XFS inode from ip->i_disk_size to the end
>>> of the range zeroed by iomap_truncate_page()
>>> (xfs_setattr_size() already does this).
>>>
>>> 5. i_size_write(newsize);
>>>
>>> 6. invalidate_inode_pages2_range(newsize, -1) to trash all
>>> the page cache beyond the new EOF without doing any zeroing
>>> as we've already done all the zeroing needed to the page
>>> cache through iomap_truncate_page().
>>>
>>>
>>> The patch I'm working on for step 1 is below. It still needs to be
>>> extended to handle the cow case, but I'm unclear on how to exercise
>>> that case so I haven't written the code to do it. The rest of it is
>>> just rearranging the code that we already use just to get the order
>>> of operations right. The only notable change in behaviour is using
>>> invalidate_inode_pages2_range() instead of truncate_pagecache(),
>>> because we don't want the EOF page to be dirtied again once we've
>>> already written zeroes to disk....
>>>
>>
>> Indeed, this sounds like the best solution. Since Darrick recommended
>> that we could fix the stale data exposure on realtime inode issue by
>> convert the tail extent to unwritten, I suppose we could do this after
>> fixing the problem.
>
> We also need to fix the truncate issue for the upcoming forced
> alignment feature (for atomic writes), and in that case we are
> required to write zeroes to the entire tail extent. i.e. forced
> alignment does not allow partial unwritten extent conversion of
> the EOF extent.
>
Yes, right. I noticed that feature also needs to fix.
> Hence I think we want to fix the problem by zeroing the entire EOF
> extent first, then optimise the large rtextsize case to use
> unwritten extents if that tail zeroing proves to be a performance
> issue.
>
> I say "if" because the large rtextsize case will still need to write
> zeroes for the fsb that spans EOF. Adding conversion of the rest of
> the extent to unwritten may well be more expensive (in terms of both
> CPU and IO requirements for the transactional metadata updates) than
> just submitting a slightly larger IO containing real zeroes and
> leaving it as a written extent....
>
Yeah, if the rtextsize if not large (in most cases), I'm pretty sure
that writing zeros would better. If the rtextsize is large enough, I
think it deserves a performance test.
Thanks,
Yi.
next prev parent reply other threads:[~2024-05-23 2:00 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-17 11:13 [PATCH v3 0/3] iomap/xfs: fix stale data exposure when truncating realtime inodes Zhang Yi
2024-05-17 11:13 ` [PATCH v3 1/3] iomap: pass blocksize to iomap_truncate_page() Zhang Yi
2024-05-17 17:29 ` Darrick J. Wong
2024-05-18 2:01 ` Zhang Yi
2024-05-17 11:13 ` [PATCH v3 2/3] fsdax: pass blocksize to dax_truncate_page() Zhang Yi
2024-05-17 11:13 ` [PATCH v3 3/3] xfs: correct the zeroing truncate range Zhang Yi
2024-05-17 17:59 ` Darrick J. Wong
2024-05-18 6:35 ` Zhang Yi
2024-05-18 19:26 ` Darrick J. Wong
2024-05-20 6:56 ` Zhang Yi
2024-05-20 7:11 ` Zhang Yi
2024-05-20 18:37 ` Darrick J. Wong
2024-05-21 13:45 ` Zhang Yi
2024-05-21 2:38 ` Dave Chinner
2024-05-22 1:57 ` Zhang Yi
2024-05-23 1:11 ` Dave Chinner
2024-05-23 2:00 ` Zhang Yi [this message]
2024-05-22 3:00 ` Darrick J. Wong
2024-05-23 1:14 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=92cec08a-7fe7-1cc7-7a39-7f3d5fbc087b@huaweicloud.com \
--to=yi.zhang@huaweicloud.com \
--cc=brauner@kernel.org \
--cc=chandanbabu@kernel.org \
--cc=chengzhihao1@huawei.com \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=yi.zhang@huawei.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).