* [PATCH v6 0/3] iomap: fix zero padding data issue in concurrent append writes
@ 2024-12-09 11:42 Long Li
2024-12-09 11:42 ` [PATCH v6 1/3] iomap: pass byte granular end position to iomap_add_to_ioend Long Li
` (4 more replies)
0 siblings, 5 replies; 12+ messages in thread
From: Long Li @ 2024-12-09 11:42 UTC (permalink / raw)
To: brauner, djwong, cem
Cc: linux-xfs, linux-fsdevel, yi.zhang, houtao1, leo.lilong,
yangerkun
Hi ALL,
This patch series fixes zero padding data issues in concurrent append write
scenarios. A detailed problem description and solution can be found in patch 2.
Patch 1 is introduced as preparation for the fix in patch 2, eliminating the
need to resample inode size for io_size trimming and avoiding issues caused
by inode size changes during concurrent writeback and truncate operations.
Patch 3 is a minor cleanup.
v5: https://lore.kernel.org/linux-xfs/20241127063503.2200005-1-leo.lilong@huawei.com/
v4: https://lore.kernel.org/linux-xfs/20241125023341.2816630-1-leo.lilong@huawei.com/
v3: https://lore.kernel.org/linux-xfs/20241121063430.3304895-1-leo.lilong@huawei.com/
v2: https://lore.kernel.org/linux-xfs/20241113091907.56937-1-leo.lilong@huawei.com/
v1: https://lore.kernel.org/linux-xfs/20241108122738.2617669-1-leo.lilong@huawei.com/
v5->v6:
1. Introduce patch 1.
2. The io_size is trimmed based on the end_pos in patch 2.
3. Update the fix tag to a more accurate in patch 2.
4. Collect reviewed tag.
Long Li (3):
iomap: pass byte granular end position to iomap_add_to_ioend
iomap: fix zero padding data issue in concurrent append writes
xfs: clean up xfs_end_ioend() to reuse local variables
fs/iomap/buffered-io.c | 66 ++++++++++++++++++++++++++++++++++++------
fs/xfs/xfs_aops.c | 2 +-
include/linux/iomap.h | 2 +-
3 files changed, 59 insertions(+), 11 deletions(-)
--
2.39.2
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v6 1/3] iomap: pass byte granular end position to iomap_add_to_ioend
2024-12-09 11:42 [PATCH v6 0/3] iomap: fix zero padding data issue in concurrent append writes Long Li
@ 2024-12-09 11:42 ` Long Li
2024-12-09 14:06 ` Brian Foster
2024-12-09 11:42 ` [PATCH v6 2/3] iomap: fix zero padding data issue in concurrent append writes Long Li
` (3 subsequent siblings)
4 siblings, 1 reply; 12+ messages in thread
From: Long Li @ 2024-12-09 11:42 UTC (permalink / raw)
To: brauner, djwong, cem
Cc: linux-xfs, linux-fsdevel, yi.zhang, houtao1, leo.lilong,
yangerkun
This is a preparatory patch for fixing zero padding issues in concurrent
append write scenarios. In the following patches, we need to obtain
byte-granular writeback end position for io_size trimming after EOF
handling.
Due to concurrent writeback and truncate operations, inode size may
shrink. Resampling inode size would force writeback code to handle the
newly appeared post-EOF blocks, which is undesirable. As Dave
explained in [1]:
"Really, the issue is that writeback mappings have to be able to
handle the range being mapped suddenly appear to be beyond EOF.
This behaviour is a longstanding writeback constraint, and is what
iomap_writepage_handle_eof() is attempting to handle.
We handle this by only sampling i_size_read() whilst we have the
folio locked and can determine the action we should take with that
folio (i.e. nothing, partial zeroing, or skip altogether). Once
we've made the decision that the folio is within EOF and taken
action on it (i.e. moved the folio to writeback state), we cannot
then resample the inode size because a truncate may have started
and changed the inode size."
To avoid resampling inode size after EOF handling, we convert end_pos
to byte-granular writeback position and return it from EOF handling
function.
Since iomap_set_range_dirty() can handle unaligned lengths, this
conversion has no impact on it. However, iomap_find_dirty_range()
requires aligned start and end range to find dirty blocks within the
given range, so the end position needs to be rounded up when passed
to it.
LINK [1]: https://lore.kernel.org/linux-xfs/Z1Gg0pAa54MoeYME@localhost.localdomain/
Signed-off-by: Long Li <leo.lilong@huawei.com>
---
fs/iomap/buffered-io.c | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 955f19e27e47..bcc7831d03af 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1774,7 +1774,8 @@ static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos)
*/
static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
struct writeback_control *wbc, struct folio *folio,
- struct inode *inode, loff_t pos, unsigned len)
+ struct inode *inode, loff_t pos, loff_t end_pos,
+ unsigned len)
{
struct iomap_folio_state *ifs = folio->private;
size_t poff = offset_in_folio(folio, pos);
@@ -1800,8 +1801,8 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
struct writeback_control *wbc, struct folio *folio,
- struct inode *inode, u64 pos, unsigned dirty_len,
- unsigned *count)
+ struct inode *inode, u64 pos, u64 end_pos,
+ unsigned dirty_len, unsigned *count)
{
int error;
@@ -1826,7 +1827,7 @@ static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
break;
default:
error = iomap_add_to_ioend(wpc, wbc, folio, inode, pos,
- map_len);
+ end_pos, map_len);
if (!error)
(*count)++;
break;
@@ -1897,11 +1898,11 @@ static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode,
* remaining memory is zeroed when mapped, and writes to that
* region are not written out to the file.
*
- * Also adjust the writeback range to skip all blocks entirely
- * beyond i_size.
+ * Also adjust the end_pos to the end of file and skip writeback
+ * for all blocks entirely beyond i_size.
*/
folio_zero_segment(folio, poff, folio_size(folio));
- *end_pos = round_up(isize, i_blocksize(inode));
+ *end_pos = isize;
}
return true;
@@ -1914,6 +1915,7 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
struct inode *inode = folio->mapping->host;
u64 pos = folio_pos(folio);
u64 end_pos = pos + folio_size(folio);
+ u64 end_aligned = 0;
unsigned count = 0;
int error = 0;
u32 rlen;
@@ -1955,9 +1957,10 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
/*
* Walk through the folio to find dirty areas to write back.
*/
- while ((rlen = iomap_find_dirty_range(folio, &pos, end_pos))) {
+ end_aligned = round_up(end_pos, i_blocksize(inode));
+ while ((rlen = iomap_find_dirty_range(folio, &pos, end_aligned))) {
error = iomap_writepage_map_blocks(wpc, wbc, folio, inode,
- pos, rlen, &count);
+ pos, end_pos, rlen, &count);
if (error)
break;
pos += rlen;
--
2.39.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v6 2/3] iomap: fix zero padding data issue in concurrent append writes
2024-12-09 11:42 [PATCH v6 0/3] iomap: fix zero padding data issue in concurrent append writes Long Li
2024-12-09 11:42 ` [PATCH v6 1/3] iomap: pass byte granular end position to iomap_add_to_ioend Long Li
@ 2024-12-09 11:42 ` Long Li
2024-12-09 11:42 ` [PATCH v6 3/3] xfs: clean up xfs_end_ioend() to reuse local variables Long Li
` (2 subsequent siblings)
4 siblings, 0 replies; 12+ messages in thread
From: Long Li @ 2024-12-09 11:42 UTC (permalink / raw)
To: brauner, djwong, cem
Cc: linux-xfs, linux-fsdevel, yi.zhang, houtao1, leo.lilong,
yangerkun
During concurrent append writes to XFS filesystem, zero padding data
may appear in the file after power failure. This happens due to imprecise
disk size updates when handling write completion.
Consider this scenario with concurrent append writes same file:
Thread 1: Thread 2:
------------ -----------
write [A, A+B]
update inode size to A+B
submit I/O [A, A+BS]
write [A+B, A+B+C]
update inode size to A+B+C
<I/O completes, updates disk size to min(A+B+C, A+BS)>
<power failure>
After reboot:
1) with A+B+C < A+BS, the file has zero padding in range [A+B, A+B+C]
|< Block Size (BS) >|
|DDDDDDDDDDDDDDDD0000000000000000|
^ ^ ^
A A+B A+B+C
(EOF)
2) with A+B+C > A+BS, the file has zero padding in range [A+B, A+BS]
|< Block Size (BS) >|< Block Size (BS) >|
|DDDDDDDDDDDDDDDD0000000000000000|00000000000000000000000000000000|
^ ^ ^ ^
A A+B A+BS A+B+C
(EOF)
D = Valid Data
0 = Zero Padding
The issue stems from disk size being set to min(io_offset + io_size,
inode->i_size) at I/O completion. Since io_offset+io_size is block
size granularity, it may exceed the actual valid file data size. In
the case of concurrent append writes, inode->i_size may be larger
than the actual range of valid file data written to disk, leading to
inaccurate disk size updates.
This patch modifies the meaning of io_size to represent the size of
valid data within EOF in an ioend. If the ioend spans beyond i_size,
io_size will be trimmed to provide the file with more accurate size
information. This is particularly useful for on-disk size updates
at completion time.
After this change, ioends that span i_size will not grow or merge with
other ioends in concurrent scenarios. However, these cases that need
growth/merging rarely occur and it seems no noticeable performance impact.
Although rounding up io_size could enable ioend growth/merging in these
scenarios, we decided to keep the code simple after discussion [1].
Another benefit is that it makes the xfs_ioend_is_append() check more
accurate, which can reduce unnecessary end bio callbacks of xfs_end_bio()
in certain scenarios, such as repeated writes at the file tail without
extending the file size.
Fixes: ae259a9c8593 ("fs: introduce iomap infrastructure") # goes further back than this
Link [1]: https://patchwork.kernel.org/project/xfs/patch/20241113091907.56937-1-leo.lilong@huawei.com
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
fs/iomap/buffered-io.c | 45 ++++++++++++++++++++++++++++++++++++++++++
include/linux/iomap.h | 2 +-
2 files changed, 46 insertions(+), 1 deletion(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index bcc7831d03af..54dc27d92781 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1794,7 +1794,52 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
if (ifs)
atomic_add(len, &ifs->write_bytes_pending);
+
+ /*
+ * Clamp io_offset and io_size to the incore EOF so that ondisk
+ * file size updates in the ioend completion are byte-accurate.
+ * This avoids recovering files with zeroed tail regions when
+ * writeback races with appending writes:
+ *
+ * Thread 1: Thread 2:
+ * ------------ -----------
+ * write [A, A+B]
+ * update inode size to A+B
+ * submit I/O [A, A+BS]
+ * write [A+B, A+B+C]
+ * update inode size to A+B+C
+ * <I/O completes, updates disk size to min(A+B+C, A+BS)>
+ * <power failure>
+ *
+ * After reboot:
+ * 1) with A+B+C < A+BS, the file has zero padding in range
+ * [A+B, A+B+C]
+ *
+ * |< Block Size (BS) >|
+ * |DDDDDDDDDDDD0000000000000|
+ * ^ ^ ^
+ * A A+B A+B+C
+ * (EOF)
+ *
+ * 2) with A+B+C > A+BS, the file has zero padding in range
+ * [A+B, A+BS]
+ *
+ * |< Block Size (BS) >|< Block Size (BS) >|
+ * |DDDDDDDDDDDD0000000000000|00000000000000000000000000|
+ * ^ ^ ^ ^
+ * A A+B A+BS A+B+C
+ * (EOF)
+ *
+ * D = Valid Data
+ * 0 = Zero Padding
+ *
+ * Note that this defeats the ability to chain the ioends of
+ * appending writes.
+ */
wpc->ioend->io_size += len;
+ if (wpc->ioend->io_offset + wpc->ioend->io_size > end_pos)
+ wpc->ioend->io_size = end_pos - wpc->ioend->io_offset;
+
wbc_account_cgroup_owner(wbc, folio, len);
return 0;
}
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 5675af6b740c..75bf54e76f3b 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -335,7 +335,7 @@ struct iomap_ioend {
u16 io_type;
u16 io_flags; /* IOMAP_F_* */
struct inode *io_inode; /* file being written to */
- size_t io_size; /* size of the extent */
+ size_t io_size; /* size of data within eof */
loff_t io_offset; /* offset in the file */
sector_t io_sector; /* start sector of ioend */
struct bio io_bio; /* MUST BE LAST! */
--
2.39.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v6 3/3] xfs: clean up xfs_end_ioend() to reuse local variables
2024-12-09 11:42 [PATCH v6 0/3] iomap: fix zero padding data issue in concurrent append writes Long Li
2024-12-09 11:42 ` [PATCH v6 1/3] iomap: pass byte granular end position to iomap_add_to_ioend Long Li
2024-12-09 11:42 ` [PATCH v6 2/3] iomap: fix zero padding data issue in concurrent append writes Long Li
@ 2024-12-09 11:42 ` Long Li
2025-01-14 10:30 ` Carlos Maiolino
2024-12-10 10:15 ` [PATCH v6 0/3] iomap: fix zero padding data issue in concurrent append writes Christian Brauner
2024-12-11 10:09 ` (subset) " Christian Brauner
4 siblings, 1 reply; 12+ messages in thread
From: Long Li @ 2024-12-09 11:42 UTC (permalink / raw)
To: brauner, djwong, cem
Cc: linux-xfs, linux-fsdevel, yi.zhang, houtao1, leo.lilong,
yangerkun
Use already initialized local variables 'offset' and 'size' instead
of accessing ioend members directly in xfs_setfilesize() call.
This is just a code cleanup with no functional changes.
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/xfs_aops.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 559a3a577097..67877c36ed11 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -131,7 +131,7 @@ xfs_end_ioend(
error = xfs_iomap_write_unwritten(ip, offset, size, false);
if (!error && xfs_ioend_is_append(ioend))
- error = xfs_setfilesize(ip, ioend->io_offset, ioend->io_size);
+ error = xfs_setfilesize(ip, offset, size);
done:
iomap_finish_ioends(ioend, error);
memalloc_nofs_restore(nofs_flag);
--
2.39.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v6 1/3] iomap: pass byte granular end position to iomap_add_to_ioend
2024-12-09 11:42 ` [PATCH v6 1/3] iomap: pass byte granular end position to iomap_add_to_ioend Long Li
@ 2024-12-09 14:06 ` Brian Foster
2024-12-10 8:09 ` Long Li
0 siblings, 1 reply; 12+ messages in thread
From: Brian Foster @ 2024-12-09 14:06 UTC (permalink / raw)
To: Long Li
Cc: brauner, djwong, cem, linux-xfs, linux-fsdevel, yi.zhang, houtao1,
yangerkun
On Mon, Dec 09, 2024 at 07:42:39PM +0800, Long Li wrote:
> This is a preparatory patch for fixing zero padding issues in concurrent
> append write scenarios. In the following patches, we need to obtain
> byte-granular writeback end position for io_size trimming after EOF
> handling.
>
> Due to concurrent writeback and truncate operations, inode size may
> shrink. Resampling inode size would force writeback code to handle the
> newly appeared post-EOF blocks, which is undesirable. As Dave
> explained in [1]:
>
> "Really, the issue is that writeback mappings have to be able to
> handle the range being mapped suddenly appear to be beyond EOF.
> This behaviour is a longstanding writeback constraint, and is what
> iomap_writepage_handle_eof() is attempting to handle.
>
> We handle this by only sampling i_size_read() whilst we have the
> folio locked and can determine the action we should take with that
> folio (i.e. nothing, partial zeroing, or skip altogether). Once
> we've made the decision that the folio is within EOF and taken
> action on it (i.e. moved the folio to writeback state), we cannot
> then resample the inode size because a truncate may have started
> and changed the inode size."
>
> To avoid resampling inode size after EOF handling, we convert end_pos
> to byte-granular writeback position and return it from EOF handling
> function.
>
> Since iomap_set_range_dirty() can handle unaligned lengths, this
> conversion has no impact on it. However, iomap_find_dirty_range()
> requires aligned start and end range to find dirty blocks within the
> given range, so the end position needs to be rounded up when passed
> to it.
>
> LINK [1]: https://lore.kernel.org/linux-xfs/Z1Gg0pAa54MoeYME@localhost.localdomain/
> Signed-off-by: Long Li <leo.lilong@huawei.com>
> ---
> fs/iomap/buffered-io.c | 21 ++++++++++++---------
> 1 file changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 955f19e27e47..bcc7831d03af 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
...
> @@ -1914,6 +1915,7 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> struct inode *inode = folio->mapping->host;
> u64 pos = folio_pos(folio);
> u64 end_pos = pos + folio_size(folio);
> + u64 end_aligned = 0;
> unsigned count = 0;
> int error = 0;
> u32 rlen;
> @@ -1955,9 +1957,10 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> /*
> * Walk through the folio to find dirty areas to write back.
> */
> - while ((rlen = iomap_find_dirty_range(folio, &pos, end_pos))) {
> + end_aligned = round_up(end_pos, i_blocksize(inode));
So do I follow correctly that the set_range_dirty() path doesn't need
the alignment because it uses inclusive first_blk/last_blk logic,
whereas this find_dirty_range() path does the opposite and thus does
require the round_up? If so, presumably that means if we fixed up the
find path we wouldn't need end_aligned at all anymore?
If I follow the reasoning correctly, then this looks Ok to me:
Reviewed-by: Brian Foster <bfoster@redhat.com>
... but as a followup exercise it might be nice to clean up the
iomap_find_dirty_range() path to either do the rounding itself or be
more consistent with set_range_dirty().
Brian
> + while ((rlen = iomap_find_dirty_range(folio, &pos, end_aligned))) {
> error = iomap_writepage_map_blocks(wpc, wbc, folio, inode,
> - pos, rlen, &count);
> + pos, end_pos, rlen, &count);
> if (error)
> break;
> pos += rlen;
> --
> 2.39.2
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v6 1/3] iomap: pass byte granular end position to iomap_add_to_ioend
2024-12-09 14:06 ` Brian Foster
@ 2024-12-10 8:09 ` Long Li
2024-12-10 11:50 ` Brian Foster
0 siblings, 1 reply; 12+ messages in thread
From: Long Li @ 2024-12-10 8:09 UTC (permalink / raw)
To: Brian Foster
Cc: brauner, djwong, cem, linux-xfs, linux-fsdevel, yi.zhang, houtao1,
yangerkun
On Mon, Dec 09, 2024 at 09:06:14AM -0500, Brian Foster wrote:
> On Mon, Dec 09, 2024 at 07:42:39PM +0800, Long Li wrote:
> > This is a preparatory patch for fixing zero padding issues in concurrent
> > append write scenarios. In the following patches, we need to obtain
> > byte-granular writeback end position for io_size trimming after EOF
> > handling.
> >
> > Due to concurrent writeback and truncate operations, inode size may
> > shrink. Resampling inode size would force writeback code to handle the
> > newly appeared post-EOF blocks, which is undesirable. As Dave
> > explained in [1]:
> >
> > "Really, the issue is that writeback mappings have to be able to
> > handle the range being mapped suddenly appear to be beyond EOF.
> > This behaviour is a longstanding writeback constraint, and is what
> > iomap_writepage_handle_eof() is attempting to handle.
> >
> > We handle this by only sampling i_size_read() whilst we have the
> > folio locked and can determine the action we should take with that
> > folio (i.e. nothing, partial zeroing, or skip altogether). Once
> > we've made the decision that the folio is within EOF and taken
> > action on it (i.e. moved the folio to writeback state), we cannot
> > then resample the inode size because a truncate may have started
> > and changed the inode size."
> >
> > To avoid resampling inode size after EOF handling, we convert end_pos
> > to byte-granular writeback position and return it from EOF handling
> > function.
> >
> > Since iomap_set_range_dirty() can handle unaligned lengths, this
> > conversion has no impact on it. However, iomap_find_dirty_range()
> > requires aligned start and end range to find dirty blocks within the
> > given range, so the end position needs to be rounded up when passed
> > to it.
> >
> > LINK [1]: https://lore.kernel.org/linux-xfs/Z1Gg0pAa54MoeYME@localhost.localdomain/
> > Signed-off-by: Long Li <leo.lilong@huawei.com>
> > ---
> > fs/iomap/buffered-io.c | 21 ++++++++++++---------
> > 1 file changed, 12 insertions(+), 9 deletions(-)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index 955f19e27e47..bcc7831d03af 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> ...
> > @@ -1914,6 +1915,7 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> > struct inode *inode = folio->mapping->host;
> > u64 pos = folio_pos(folio);
> > u64 end_pos = pos + folio_size(folio);
> > + u64 end_aligned = 0;
> > unsigned count = 0;
> > int error = 0;
> > u32 rlen;
> > @@ -1955,9 +1957,10 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> > /*
> > * Walk through the folio to find dirty areas to write back.
> > */
> > - while ((rlen = iomap_find_dirty_range(folio, &pos, end_pos))) {
> > + end_aligned = round_up(end_pos, i_blocksize(inode));
>
> So do I follow correctly that the set_range_dirty() path doesn't need
> the alignment because it uses inclusive first_blk/last_blk logic,
> whereas this find_dirty_range() path does the opposite and thus does
> require the round_up? If so, presumably that means if we fixed up the
> find path we wouldn't need end_aligned at all anymore?
>
Agreed with you.
> If I follow the reasoning correctly, then this looks Ok to me:
>
> Reviewed-by: Brian Foster <bfoster@redhat.com>
>
> ... but as a followup exercise it might be nice to clean up the
> iomap_find_dirty_range() path to either do the rounding itself or be
> more consistent with set_range_dirty().
>
> Brian
Yes, I think we can handle the cleanup through a separate patch later?
Thanks,
Long Li
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v6 0/3] iomap: fix zero padding data issue in concurrent append writes
2024-12-09 11:42 [PATCH v6 0/3] iomap: fix zero padding data issue in concurrent append writes Long Li
` (2 preceding siblings ...)
2024-12-09 11:42 ` [PATCH v6 3/3] xfs: clean up xfs_end_ioend() to reuse local variables Long Li
@ 2024-12-10 10:15 ` Christian Brauner
2024-12-10 11:38 ` Christoph Hellwig
2024-12-11 10:09 ` (subset) " Christian Brauner
4 siblings, 1 reply; 12+ messages in thread
From: Christian Brauner @ 2024-12-10 10:15 UTC (permalink / raw)
To: djwong, cem, Long Li
Cc: Christian Brauner, linux-xfs, linux-fsdevel, yi.zhang, houtao1,
yangerkun
On Mon, 09 Dec 2024 19:42:38 +0800, Long Li wrote:
> This patch series fixes zero padding data issues in concurrent append write
> scenarios. A detailed problem description and solution can be found in patch 2.
> Patch 1 is introduced as preparation for the fix in patch 2, eliminating the
> need to resample inode size for io_size trimming and avoiding issues caused
> by inode size changes during concurrent writeback and truncate operations.
> Patch 3 is a minor cleanup.
>
> [...]
Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes
[1/3] iomap: pass byte granular end position to iomap_add_to_ioend
https://git.kernel.org/vfs/vfs/c/f307c58239b5
[2/3] iomap: fix zero padding data issue in concurrent append writes
https://git.kernel.org/vfs/vfs/c/33e72d56fb3a
[3/3] xfs: clean up xfs_end_ioend() to reuse local variables
https://git.kernel.org/vfs/vfs/c/30e611890d89
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v6 0/3] iomap: fix zero padding data issue in concurrent append writes
2024-12-10 10:15 ` [PATCH v6 0/3] iomap: fix zero padding data issue in concurrent append writes Christian Brauner
@ 2024-12-10 11:38 ` Christoph Hellwig
2024-12-11 10:34 ` Christian Brauner
0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2024-12-10 11:38 UTC (permalink / raw)
To: Christian Brauner
Cc: djwong, cem, Long Li, linux-xfs, linux-fsdevel, yi.zhang, houtao1,
yangerkun
Can you please drop the third patch? We'll probably have something in
XFS that will conflict with it if not reabsed, so it's probably better
to merge it through the xfs tree so that everything can be properly
merged.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v6 1/3] iomap: pass byte granular end position to iomap_add_to_ioend
2024-12-10 8:09 ` Long Li
@ 2024-12-10 11:50 ` Brian Foster
0 siblings, 0 replies; 12+ messages in thread
From: Brian Foster @ 2024-12-10 11:50 UTC (permalink / raw)
To: Long Li
Cc: brauner, djwong, cem, linux-xfs, linux-fsdevel, yi.zhang, houtao1,
yangerkun
On Tue, Dec 10, 2024 at 04:09:26PM +0800, Long Li wrote:
> On Mon, Dec 09, 2024 at 09:06:14AM -0500, Brian Foster wrote:
> > On Mon, Dec 09, 2024 at 07:42:39PM +0800, Long Li wrote:
> > > This is a preparatory patch for fixing zero padding issues in concurrent
> > > append write scenarios. In the following patches, we need to obtain
> > > byte-granular writeback end position for io_size trimming after EOF
> > > handling.
> > >
> > > Due to concurrent writeback and truncate operations, inode size may
> > > shrink. Resampling inode size would force writeback code to handle the
> > > newly appeared post-EOF blocks, which is undesirable. As Dave
> > > explained in [1]:
> > >
> > > "Really, the issue is that writeback mappings have to be able to
> > > handle the range being mapped suddenly appear to be beyond EOF.
> > > This behaviour is a longstanding writeback constraint, and is what
> > > iomap_writepage_handle_eof() is attempting to handle.
> > >
> > > We handle this by only sampling i_size_read() whilst we have the
> > > folio locked and can determine the action we should take with that
> > > folio (i.e. nothing, partial zeroing, or skip altogether). Once
> > > we've made the decision that the folio is within EOF and taken
> > > action on it (i.e. moved the folio to writeback state), we cannot
> > > then resample the inode size because a truncate may have started
> > > and changed the inode size."
> > >
> > > To avoid resampling inode size after EOF handling, we convert end_pos
> > > to byte-granular writeback position and return it from EOF handling
> > > function.
> > >
> > > Since iomap_set_range_dirty() can handle unaligned lengths, this
> > > conversion has no impact on it. However, iomap_find_dirty_range()
> > > requires aligned start and end range to find dirty blocks within the
> > > given range, so the end position needs to be rounded up when passed
> > > to it.
> > >
> > > LINK [1]: https://lore.kernel.org/linux-xfs/Z1Gg0pAa54MoeYME@localhost.localdomain/
> > > Signed-off-by: Long Li <leo.lilong@huawei.com>
> > > ---
> > > fs/iomap/buffered-io.c | 21 ++++++++++++---------
> > > 1 file changed, 12 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > index 955f19e27e47..bcc7831d03af 100644
> > > --- a/fs/iomap/buffered-io.c
> > > +++ b/fs/iomap/buffered-io.c
> > ...
> > > @@ -1914,6 +1915,7 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> > > struct inode *inode = folio->mapping->host;
> > > u64 pos = folio_pos(folio);
> > > u64 end_pos = pos + folio_size(folio);
> > > + u64 end_aligned = 0;
> > > unsigned count = 0;
> > > int error = 0;
> > > u32 rlen;
> > > @@ -1955,9 +1957,10 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> > > /*
> > > * Walk through the folio to find dirty areas to write back.
> > > */
> > > - while ((rlen = iomap_find_dirty_range(folio, &pos, end_pos))) {
> > > + end_aligned = round_up(end_pos, i_blocksize(inode));
> >
> > So do I follow correctly that the set_range_dirty() path doesn't need
> > the alignment because it uses inclusive first_blk/last_blk logic,
> > whereas this find_dirty_range() path does the opposite and thus does
> > require the round_up? If so, presumably that means if we fixed up the
> > find path we wouldn't need end_aligned at all anymore?
> >
>
> Agreed with you.
>
> > If I follow the reasoning correctly, then this looks Ok to me:
> >
> > Reviewed-by: Brian Foster <bfoster@redhat.com>
> >
> > ... but as a followup exercise it might be nice to clean up the
> > iomap_find_dirty_range() path to either do the rounding itself or be
> > more consistent with set_range_dirty().
> >
> > Brian
>
> Yes, I think we can handle the cleanup through a separate patch later?
>
Yep, thanks.
Brian
> Thanks,
> Long Li
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: (subset) [PATCH v6 0/3] iomap: fix zero padding data issue in concurrent append writes
2024-12-09 11:42 [PATCH v6 0/3] iomap: fix zero padding data issue in concurrent append writes Long Li
` (3 preceding siblings ...)
2024-12-10 10:15 ` [PATCH v6 0/3] iomap: fix zero padding data issue in concurrent append writes Christian Brauner
@ 2024-12-11 10:09 ` Christian Brauner
4 siblings, 0 replies; 12+ messages in thread
From: Christian Brauner @ 2024-12-11 10:09 UTC (permalink / raw)
To: djwong, cem, Long Li, Christoph Hellwig
Cc: Christian Brauner, linux-xfs, linux-fsdevel, yi.zhang, houtao1,
yangerkun
On Mon, 09 Dec 2024 19:42:38 +0800, Long Li wrote:
> This patch series fixes zero padding data issues in concurrent append write
> scenarios. A detailed problem description and solution can be found in patch 2.
> Patch 1 is introduced as preparation for the fix in patch 2, eliminating the
> need to resample inode size for io_size trimming and avoiding issues caused
> by inode size changes during concurrent writeback and truncate operations.
> Patch 3 is a minor cleanup.
>
> [...]
Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes
[1/3] iomap: pass byte granular end position to iomap_add_to_ioend
https://git.kernel.org/vfs/vfs/c/b44679c63e4d
[2/3] iomap: fix zero padding data issue in concurrent append writes
https://git.kernel.org/vfs/vfs/c/51d20d1dacbe
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v6 0/3] iomap: fix zero padding data issue in concurrent append writes
2024-12-10 11:38 ` Christoph Hellwig
@ 2024-12-11 10:34 ` Christian Brauner
0 siblings, 0 replies; 12+ messages in thread
From: Christian Brauner @ 2024-12-11 10:34 UTC (permalink / raw)
To: Christoph Hellwig
Cc: djwong, cem, Long Li, linux-xfs, linux-fsdevel, yi.zhang, houtao1,
yangerkun
On Tue, Dec 10, 2024 at 03:38:48AM -0800, Christoph Hellwig wrote:
> Can you please drop the third patch? We'll probably have something in
> XFS that will conflict with it if not reabsed, so it's probably better
> to merge it through the xfs tree so that everything can be properly
> merged.
Done.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v6 3/3] xfs: clean up xfs_end_ioend() to reuse local variables
2024-12-09 11:42 ` [PATCH v6 3/3] xfs: clean up xfs_end_ioend() to reuse local variables Long Li
@ 2025-01-14 10:30 ` Carlos Maiolino
0 siblings, 0 replies; 12+ messages in thread
From: Carlos Maiolino @ 2025-01-14 10:30 UTC (permalink / raw)
To: brauner, djwong, Long Li
Cc: linux-xfs, linux-fsdevel, yi.zhang, houtao1, yangerkun
On Mon, 09 Dec 2024 19:42:41 +0800, Long Li wrote:
> Use already initialized local variables 'offset' and 'size' instead
> of accessing ioend members directly in xfs_setfilesize() call.
>
> This is just a code cleanup with no functional changes.
>
>
Applied to for-next, thanks!
[3/3] xfs: clean up xfs_end_ioend() to reuse local variables
commit: 99fc33d16b2405cbc753fd30f93cd413d7d1b5fd
Best regards,
--
Carlos Maiolino <cem@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-01-14 10:30 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-09 11:42 [PATCH v6 0/3] iomap: fix zero padding data issue in concurrent append writes Long Li
2024-12-09 11:42 ` [PATCH v6 1/3] iomap: pass byte granular end position to iomap_add_to_ioend Long Li
2024-12-09 14:06 ` Brian Foster
2024-12-10 8:09 ` Long Li
2024-12-10 11:50 ` Brian Foster
2024-12-09 11:42 ` [PATCH v6 2/3] iomap: fix zero padding data issue in concurrent append writes Long Li
2024-12-09 11:42 ` [PATCH v6 3/3] xfs: clean up xfs_end_ioend() to reuse local variables Long Li
2025-01-14 10:30 ` Carlos Maiolino
2024-12-10 10:15 ` [PATCH v6 0/3] iomap: fix zero padding data issue in concurrent append writes Christian Brauner
2024-12-10 11:38 ` Christoph Hellwig
2024-12-11 10:34 ` Christian Brauner
2024-12-11 10:09 ` (subset) " Christian Brauner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox