From: Long Li <leo.lilong@huawei.com>
To: <brauner@kernel.org>, <djwong@kernel.org>, <cem@kernel.org>
Cc: <linux-xfs@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>,
<yi.zhang@huawei.com>, <houtao1@huawei.com>,
<yangerkun@huawei.com>
Subject: Re: [PATCH v5 1/2] iomap: fix zero padding data issue in concurrent append writes
Date: Sat, 30 Nov 2024 21:39:29 +0800 [thread overview]
Message-ID: <Z0sVkSXzxUDReow7@localhost.localdomain> (raw)
In-Reply-To: <20241127063503.2200005-1-leo.lilong@huawei.com>
On Wed, Nov 27, 2024 at 02:35:02PM +0800, Long Li wrote:
> During concurrent append writes to XFS filesystem, zero padding data
> may appear in the file after power failure. This happens due to imprecise
> disk size updates when handling write completion.
>
> Consider this scenario with concurrent append writes same file:
>
> Thread 1: Thread 2:
> ------------ -----------
> write [A, A+B]
> update inode size to A+B
> submit I/O [A, A+BS]
> write [A+B, A+B+C]
> update inode size to A+B+C
> <I/O completes, updates disk size to min(A+B+C, A+BS)>
> <power failure>
>
> After reboot:
> 1) with A+B+C < A+BS, the file has zero padding in range [A+B, A+B+C]
>
> |< Block Size (BS) >|
> |DDDDDDDDDDDDDDDD0000000000000000|
> ^ ^ ^
> A A+B A+B+C
> (EOF)
>
> 2) with A+B+C > A+BS, the file has zero padding in range [A+B, A+BS]
>
> |< Block Size (BS) >|< Block Size (BS) >|
> |DDDDDDDDDDDDDDDD0000000000000000|00000000000000000000000000000000|
> ^ ^ ^ ^
> A A+B A+BS A+B+C
> (EOF)
>
> D = Valid Data
> 0 = Zero Padding
>
> The issue stems from disk size being set to min(io_offset + io_size,
> inode->i_size) at I/O completion. Since io_offset+io_size is block
> size granularity, it may exceed the actual valid file data size. In
> the case of concurrent append writes, inode->i_size may be larger
> than the actual range of valid file data written to disk, leading to
> inaccurate disk size updates.
>
> This patch modifies the meaning of io_size to represent the size of
> valid data within EOF in an ioend. If the ioend spans beyond i_size,
> io_size will be trimmed to provide the file with more accurate size
> information. This is particularly useful for on-disk size updates
> at completion time.
>
> After this change, ioends that span i_size will not grow or merge with
> other ioends in concurrent scenarios. However, these cases that need
> growth/merging rarely occur and it seems no noticeable performance impact.
> Although rounding up io_size could enable ioend growth/merging in these
> scenarios, we decided to keep the code simple after discussion [1].
>
> Another benefit is that it makes the xfs_ioend_is_append() check more
> accurate, which can reduce unnecessary end bio callbacks of xfs_end_bio()
> in certain scenarios, such as repeated writes at the file tail without
> extending the file size.
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Link[1]: https://patchwork.kernel.org/project/xfs/patch/20241113091907.56937-1-leo.lilong@huawei.com
> Signed-off-by: Long Li <leo.lilong@huawei.com>
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> ---
> v4->v5: remove iomap_ioend_size_aligned() and don't round up io_size for
> ioend growth/merging to keep the code simple.
> fs/iomap/buffered-io.c | 10 ++++++++++
> include/linux/iomap.h | 2 +-
> 2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index d42f01e0fc1c..dc360c8e5641 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1774,6 +1774,7 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
> {
> struct iomap_folio_state *ifs = folio->private;
> size_t poff = offset_in_folio(folio, pos);
> + loff_t isize = i_size_read(inode);
> int error;
>
> if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, pos)) {
> @@ -1789,7 +1790,16 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
>
> if (ifs)
> atomic_add(len, &ifs->write_bytes_pending);
> +
> + /*
> + * If the ioend spans i_size, trim io_size to the former to provide
> + * the fs with more accurate size information. This is useful for
> + * completion time on-disk size updates.
> + */
> wpc->ioend->io_size += len;
> + if (wpc->ioend->io_offset + wpc->ioend->io_size > isize)
> + wpc->ioend->io_size = isize - wpc->ioend->io_offset;
> +
When performing fsstress test with this patch set, there is a very low probability of
encountering an issue where isize is less than ioend->io_offset in iomap_add_to_ioend.
After investigation, this was found to be caused by concurrent with truncate operations.
Consider a scenario with 4K block size and a file size of 12K.
//write back [8K, 12K] //truncate file to 4K
---------------------- ----------------------
iomap_writepage_map xfs_setattr_size
iomap_writepage_handle_eof
truncate_setsize
i_size_write(inode, newsize) //update inode size to 4K
iomap_writepage_map_blocks
iomap_add_to_ioend
< iszie < ioend->io_offset>
<iszie = 4K, ioend->io_offset=8K>
It appears that in extreme cases, folios beyond EOF might be written back,
resulting in situations where isize is less than pos. In such cases,
maybe we should not trim the io_size further.
Long Li
next prev parent reply other threads:[~2024-11-30 13:41 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-27 6:35 [PATCH v5 1/2] iomap: fix zero padding data issue in concurrent append writes Long Li
2024-11-27 6:35 ` [PATCH v5 2/2] xfs: clean up xfs_end_ioend() to reuse local variables Long Li
2024-11-27 16:07 ` Darrick J. Wong
2024-11-27 16:28 ` [PATCH v5 1/2] iomap: fix zero padding data issue in concurrent append writes Darrick J. Wong
2024-11-28 6:38 ` Long Li
2024-11-28 3:33 ` Christoph Hellwig
2024-11-30 13:39 ` Long Li [this message]
2024-12-02 15:26 ` Brian Foster
2024-12-03 2:08 ` Dave Chinner
2024-12-03 14:54 ` Brian Foster
2024-12-03 21:12 ` Dave Chinner
2024-12-04 12:05 ` Brian Foster
2024-12-04 9:06 ` Long Li
2024-12-04 12:05 ` Brian Foster
2024-12-06 3:36 ` Long Li
2024-12-04 9:00 ` Long Li
2024-12-04 12:17 ` Brian Foster
2024-12-05 12:47 ` Long Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z0sVkSXzxUDReow7@localhost.localdomain \
--to=leo.lilong@huawei.com \
--cc=brauner@kernel.org \
--cc=cem@kernel.org \
--cc=djwong@kernel.org \
--cc=houtao1@huawei.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=yangerkun@huawei.com \
--cc=yi.zhang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox