* [PATCH] xfs: ensure st_blocks never goes to zero during COW writes
@ 2024-08-20 16:29 Christoph Hellwig
2024-08-21 4:44 ` Darrick J. Wong
0 siblings, 1 reply; 3+ messages in thread
From: Christoph Hellwig @ 2024-08-20 16:29 UTC (permalink / raw)
To: chandan.babu; +Cc: djwong, linux-xfs
COW writes remove the amount overwritten either directly for delalloc
reservations, or in earlier deferred transactions than adding the new
amount back in the bmap map transaction. This means st_blocks on an
inode where all data is overwritten using the COW path can temporarily
show a 0 st_blocks. This can easily be reproduced with the pending
zoned device support where all writes use this path and trips the
check in generic/615, but could also happen on a reflink file without
that.
Fix this by temporarily add the pending blocks to be mapped to
i_delayed_blks while the item is queued.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/libxfs/xfs_bmap.c | 1 +
fs/xfs/xfs_bmap_item.c | 18 ++++++++++++++++++
2 files changed, 19 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 7df74c35d9f900..a63be14a9873e8 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4847,6 +4847,7 @@ xfs_bmapi_remap(
}
ip->i_nblocks += len;
+ ip->i_delayed_blks -= len;
xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
if (ifp->if_format == XFS_DINODE_FMT_BTREE)
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index e224b49b7cff6d..fc5da2dc7c1c66 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -346,6 +346,18 @@ xfs_bmap_defer_add(
trace_xfs_bmap_defer(bi);
xfs_bmap_update_get_group(tp->t_mountp, bi);
+
+ /*
+ * Ensure the deferred mapping is pre-recorded in i_delayed_blks.
+ *
+ * Otherwise stat can report zero blocks for an inode that actually has
+ * data when the entire mapping is in the process of being overwritten
+ * using the out of place write path. This is undone in after
+ * xfs_bmapi_remap has incremented di_nblocks for a successful
+ * operation.
+ */
+ if (bi->bi_type == XFS_BMAP_MAP)
+ bi->bi_owner->i_delayed_blks += bi->bi_bmap.br_blockcount;
xfs_defer_add(tp, &bi->bi_list, &xfs_bmap_update_defer_type);
}
@@ -367,6 +379,9 @@ xfs_bmap_update_cancel_item(
{
struct xfs_bmap_intent *bi = bi_entry(item);
+ if (bi->bi_type == XFS_BMAP_MAP)
+ bi->bi_owner->i_delayed_blks -= bi->bi_bmap.br_blockcount;
+
xfs_bmap_update_put_group(bi);
kmem_cache_free(xfs_bmap_intent_cache, bi);
}
@@ -464,6 +479,9 @@ xfs_bui_recover_work(
bi->bi_owner = *ipp;
xfs_bmap_update_get_group(mp, bi);
+ /* see __xfs_bmap_add for details */
+ if (bi->bi_type == XFS_BMAP_MAP)
+ bi->bi_owner->i_delayed_blks += bi->bi_bmap.br_blockcount;
xfs_defer_add_item(dfp, &bi->bi_list);
return bi;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] xfs: ensure st_blocks never goes to zero during COW writes
2024-08-20 16:29 [PATCH] xfs: ensure st_blocks never goes to zero during COW writes Christoph Hellwig
@ 2024-08-21 4:44 ` Darrick J. Wong
2024-08-21 4:46 ` Christoph Hellwig
0 siblings, 1 reply; 3+ messages in thread
From: Darrick J. Wong @ 2024-08-21 4:44 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: chandan.babu, linux-xfs
On Tue, Aug 20, 2024 at 06:29:59PM +0200, Christoph Hellwig wrote:
> COW writes remove the amount overwritten either directly for delalloc
> reservations, or in earlier deferred transactions than adding the new
> amount back in the bmap map transaction. This means st_blocks on an
> inode where all data is overwritten using the COW path can temporarily
> show a 0 st_blocks. This can easily be reproduced with the pending
> zoned device support where all writes use this path and trips the
> check in generic/615, but could also happen on a reflink file without
> that.
>
> Fix this by temporarily add the pending blocks to be mapped to
> i_delayed_blks while the item is queued.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
How hard is it to hit this race?a I guess all you have to do is statx
in a loop while doing a bunch of cow writeback?
> ---
> fs/xfs/libxfs/xfs_bmap.c | 1 +
> fs/xfs/xfs_bmap_item.c | 18 ++++++++++++++++++
> 2 files changed, 19 insertions(+)
>
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 7df74c35d9f900..a63be14a9873e8 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4847,6 +4847,7 @@ xfs_bmapi_remap(
> }
>
> ip->i_nblocks += len;
> + ip->i_delayed_blks -= len;
This proabably ought to have a comment to reference xfs_bmap_defer_add.
> xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
>
> if (ifp->if_format == XFS_DINODE_FMT_BTREE)
> diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> index e224b49b7cff6d..fc5da2dc7c1c66 100644
> --- a/fs/xfs/xfs_bmap_item.c
> +++ b/fs/xfs/xfs_bmap_item.c
> @@ -346,6 +346,18 @@ xfs_bmap_defer_add(
> trace_xfs_bmap_defer(bi);
>
> xfs_bmap_update_get_group(tp->t_mountp, bi);
> +
> + /*
> + * Ensure the deferred mapping is pre-recorded in i_delayed_blks.
> + *
> + * Otherwise stat can report zero blocks for an inode that actually has
> + * data when the entire mapping is in the process of being overwritten
> + * using the out of place write path. This is undone in after
> + * xfs_bmapi_remap has incremented di_nblocks for a successful
> + * operation.
> + */
> + if (bi->bi_type == XFS_BMAP_MAP)
> + bi->bi_owner->i_delayed_blks += bi->bi_bmap.br_blockcount;
> xfs_defer_add(tp, &bi->bi_list, &xfs_bmap_update_defer_type);
> }
>
> @@ -367,6 +379,9 @@ xfs_bmap_update_cancel_item(
> {
> struct xfs_bmap_intent *bi = bi_entry(item);
>
> + if (bi->bi_type == XFS_BMAP_MAP)
> + bi->bi_owner->i_delayed_blks -= bi->bi_bmap.br_blockcount;
> +
> xfs_bmap_update_put_group(bi);
> kmem_cache_free(xfs_bmap_intent_cache, bi);
> }
> @@ -464,6 +479,9 @@ xfs_bui_recover_work(
> bi->bi_owner = *ipp;
> xfs_bmap_update_get_group(mp, bi);
>
> + /* see __xfs_bmap_add for details */
xfs_bmap_defer_add?
--D
> + if (bi->bi_type == XFS_BMAP_MAP)
> + bi->bi_owner->i_delayed_blks += bi->bi_bmap.br_blockcount;
> xfs_defer_add_item(dfp, &bi->bi_list);
> return bi;
> }
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] xfs: ensure st_blocks never goes to zero during COW writes
2024-08-21 4:44 ` Darrick J. Wong
@ 2024-08-21 4:46 ` Christoph Hellwig
0 siblings, 0 replies; 3+ messages in thread
From: Christoph Hellwig @ 2024-08-21 4:46 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Christoph Hellwig, chandan.babu, linux-xfs
On Tue, Aug 20, 2024 at 09:44:27PM -0700, Darrick J. Wong wrote:
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> How hard is it to hit this race?a I guess all you have to do is statx
> in a loop while doing a bunch of cow writeback?
Yes. That's essential what generic/615 is doing. Now without always_cow
you'd need to do this on freshly reflinked inodes, which makes the
reproducer a bit harder to create.
> > ip->i_nblocks += len;
> > + ip->i_delayed_blks -= len;
>
> This proabably ought to have a comment to reference xfs_bmap_defer_add.
Ok.
> > bi->bi_owner = *ipp;
> > xfs_bmap_update_get_group(mp, bi);
> >
> > + /* see __xfs_bmap_add for details */
>
> xfs_bmap_defer_add?
Yes.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-08-21 4:46 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-20 16:29 [PATCH] xfs: ensure st_blocks never goes to zero during COW writes Christoph Hellwig
2024-08-21 4:44 ` Darrick J. Wong
2024-08-21 4:46 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox