* [PATCH 01/10] iomap: factor out a iomap_last_written_block helper
2024-10-08 8:59 fix stale delalloc punching for COW I/O v5 Christoph Hellwig
@ 2024-10-08 8:59 ` Christoph Hellwig
2024-10-08 8:59 ` [PATCH 02/10] iomap: remove iomap_file_buffered_write_punch_delalloc Christoph Hellwig
` (10 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2024-10-08 8:59 UTC (permalink / raw)
To: Carlos Maiolino, Christian Brauner, Darrick J. Wong
Cc: linux-xfs, linux-fsdevel
Split out a pice of logic from iomap_file_buffered_write_punch_delalloc
that is useful for all iomap_end implementations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
fs/iomap/buffered-io.c | 13 ++-----------
include/linux/iomap.h | 14 ++++++++++++++
2 files changed, 16 insertions(+), 11 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 78ebd265f42594..b944d77a78c668 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1280,7 +1280,6 @@ void iomap_file_buffered_write_punch_delalloc(struct inode *inode,
{
loff_t start_byte;
loff_t end_byte;
- unsigned int blocksize = i_blocksize(inode);
if (iomap->type != IOMAP_DELALLOC)
return;
@@ -1289,16 +1288,8 @@ void iomap_file_buffered_write_punch_delalloc(struct inode *inode,
if (!(iomap->flags & IOMAP_F_NEW))
return;
- /*
- * start_byte refers to the first unused block after a short write. If
- * nothing was written, round offset down to point at the first block in
- * the range.
- */
- if (unlikely(!written))
- start_byte = round_down(pos, blocksize);
- else
- start_byte = round_up(pos + written, blocksize);
- end_byte = round_up(pos + length, blocksize);
+ start_byte = iomap_last_written_block(inode, pos, written);
+ end_byte = round_up(pos + length, i_blocksize(inode));
/* Nothing to do if we've written the entire delalloc extent */
if (start_byte >= end_byte)
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 4ad12a3c8bae22..62253739dedcbe 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -256,6 +256,20 @@ static inline const struct iomap *iomap_iter_srcmap(const struct iomap_iter *i)
return &i->iomap;
}
+/*
+ * Return the file offset for the first unchanged block after a short write.
+ *
+ * If nothing was written, round @pos down to point at the first block in
+ * the range, else round up to include the partially written block.
+ */
+static inline loff_t iomap_last_written_block(struct inode *inode, loff_t pos,
+ ssize_t written)
+{
+ if (unlikely(!written))
+ return round_down(pos, i_blocksize(inode));
+ return round_up(pos + written, i_blocksize(inode));
+}
+
ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
const struct iomap_ops *ops, void *private);
int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops);
--
2.45.2
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH 02/10] iomap: remove iomap_file_buffered_write_punch_delalloc
2024-10-08 8:59 fix stale delalloc punching for COW I/O v5 Christoph Hellwig
2024-10-08 8:59 ` [PATCH 01/10] iomap: factor out a iomap_last_written_block helper Christoph Hellwig
@ 2024-10-08 8:59 ` Christoph Hellwig
2024-10-08 8:59 ` [PATCH 03/10] iomap: move locking out of iomap_write_delalloc_release Christoph Hellwig
` (9 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2024-10-08 8:59 UTC (permalink / raw)
To: Carlos Maiolino, Christian Brauner, Darrick J. Wong
Cc: linux-xfs, linux-fsdevel
Currently iomap_file_buffered_write_punch_delalloc can be called from
XFS either with the invalidate lock held or not. To fix this while
keeping the locking in the file system and not the iomap library
code we'll need to life the locking up into the file system.
To prepare for that, open code iomap_file_buffered_write_punch_delalloc
in the only caller, and instead export iomap_write_delalloc_release.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
.../filesystems/iomap/operations.rst | 2 +-
fs/iomap/buffered-io.c | 85 ++++++-------------
fs/xfs/xfs_iomap.c | 16 +++-
include/linux/iomap.h | 6 +-
4 files changed, 46 insertions(+), 63 deletions(-)
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index 8e6c721d233010..b93115ab8748ae 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -208,7 +208,7 @@ The filesystem must arrange to `cancel
such `reservations
<https://lore.kernel.org/linux-xfs/20220817093627.GZ3600936@dread.disaster.area/>`_
because writeback will not consume the reservation.
-The ``iomap_file_buffered_write_punch_delalloc`` can be called from a
+The ``iomap_write_delalloc_release`` can be called from a
``->iomap_end`` function to find all the clean areas of the folios
caching a fresh (``IOMAP_F_NEW``) delalloc mapping.
It takes the ``invalidate_lock``.
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index b944d77a78c668..b4f742f3104120 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1145,10 +1145,36 @@ static void iomap_write_delalloc_scan(struct inode *inode,
}
/*
+ * When a short write occurs, the filesystem might need to use ->iomap_end
+ * to remove space reservations created in ->iomap_begin.
+ *
+ * For filesystems that use delayed allocation, there can be dirty pages over
+ * the delalloc extent outside the range of a short write but still within the
+ * delalloc extent allocated for this iomap if the write raced with page
+ * faults.
+ *
* Punch out all the delalloc blocks in the range given except for those that
* have dirty data still pending in the page cache - those are going to be
* written and so must still retain the delalloc backing for writeback.
*
+ * The punch() callback *must* only punch delalloc extents in the range passed
+ * to it. It must skip over all other types of extents in the range and leave
+ * them completely unchanged. It must do this punch atomically with respect to
+ * other extent modifications.
+ *
+ * The punch() callback may be called with a folio locked to prevent writeback
+ * extent allocation racing at the edge of the range we are currently punching.
+ * The locked folio may or may not cover the range being punched, so it is not
+ * safe for the punch() callback to lock folios itself.
+ *
+ * Lock order is:
+ *
+ * inode->i_rwsem (shared or exclusive)
+ * inode->i_mapping->invalidate_lock (exclusive)
+ * folio_lock()
+ * ->punch
+ * internal filesystem allocation lock
+ *
* As we are scanning the page cache for data, we don't need to reimplement the
* wheel - mapping_seek_hole_data() does exactly what we need to identify the
* start and end of data ranges correctly even for sub-folio block sizes. This
@@ -1177,7 +1203,7 @@ static void iomap_write_delalloc_scan(struct inode *inode,
* require sprinkling this code with magic "+ 1" and "- 1" arithmetic and expose
* the code to subtle off-by-one bugs....
*/
-static void iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
+void iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
loff_t end_byte, unsigned flags, struct iomap *iomap,
iomap_punch_t punch)
{
@@ -1243,62 +1269,7 @@ static void iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
out_unlock:
filemap_invalidate_unlock(inode->i_mapping);
}
-
-/*
- * When a short write occurs, the filesystem may need to remove reserved space
- * that was allocated in ->iomap_begin from it's ->iomap_end method. For
- * filesystems that use delayed allocation, we need to punch out delalloc
- * extents from the range that are not dirty in the page cache. As the write can
- * race with page faults, there can be dirty pages over the delalloc extent
- * outside the range of a short write but still within the delalloc extent
- * allocated for this iomap.
- *
- * This function uses [start_byte, end_byte) intervals (i.e. open ended) to
- * simplify range iterations.
- *
- * The punch() callback *must* only punch delalloc extents in the range passed
- * to it. It must skip over all other types of extents in the range and leave
- * them completely unchanged. It must do this punch atomically with respect to
- * other extent modifications.
- *
- * The punch() callback may be called with a folio locked to prevent writeback
- * extent allocation racing at the edge of the range we are currently punching.
- * The locked folio may or may not cover the range being punched, so it is not
- * safe for the punch() callback to lock folios itself.
- *
- * Lock order is:
- *
- * inode->i_rwsem (shared or exclusive)
- * inode->i_mapping->invalidate_lock (exclusive)
- * folio_lock()
- * ->punch
- * internal filesystem allocation lock
- */
-void iomap_file_buffered_write_punch_delalloc(struct inode *inode,
- loff_t pos, loff_t length, ssize_t written, unsigned flags,
- struct iomap *iomap, iomap_punch_t punch)
-{
- loff_t start_byte;
- loff_t end_byte;
-
- if (iomap->type != IOMAP_DELALLOC)
- return;
-
- /* If we didn't reserve the blocks, we're not allowed to punch them. */
- if (!(iomap->flags & IOMAP_F_NEW))
- return;
-
- start_byte = iomap_last_written_block(inode, pos, written);
- end_byte = round_up(pos + length, i_blocksize(inode));
-
- /* Nothing to do if we've written the entire delalloc extent */
- if (start_byte >= end_byte)
- return;
-
- iomap_write_delalloc_release(inode, start_byte, end_byte, flags, iomap,
- punch);
-}
-EXPORT_SYMBOL_GPL(iomap_file_buffered_write_punch_delalloc);
+EXPORT_SYMBOL_GPL(iomap_write_delalloc_release);
static loff_t iomap_unshare_iter(struct iomap_iter *iter)
{
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 1e11f48814c0d0..30f2530b6d5461 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1227,8 +1227,20 @@ xfs_buffered_write_iomap_end(
unsigned flags,
struct iomap *iomap)
{
- iomap_file_buffered_write_punch_delalloc(inode, offset, length, written,
- flags, iomap, &xfs_buffered_write_delalloc_punch);
+ loff_t start_byte, end_byte;
+
+ /* If we didn't reserve the blocks, we're not allowed to punch them. */
+ if (iomap->type != IOMAP_DELALLOC || !(iomap->flags & IOMAP_F_NEW))
+ return 0;
+
+ /* Nothing to do if we've written the entire delalloc extent */
+ start_byte = iomap_last_written_block(inode, offset, written);
+ end_byte = round_up(offset + length, i_blocksize(inode));
+ if (start_byte >= end_byte)
+ return 0;
+
+ iomap_write_delalloc_release(inode, start_byte, end_byte, flags, iomap,
+ xfs_buffered_write_delalloc_punch);
return 0;
}
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 62253739dedcbe..d0420e962ffdc2 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -290,9 +290,9 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf,
typedef void (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length,
struct iomap *iomap);
-void iomap_file_buffered_write_punch_delalloc(struct inode *inode, loff_t pos,
- loff_t length, ssize_t written, unsigned flag,
- struct iomap *iomap, iomap_punch_t punch);
+void iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
+ loff_t end_byte, unsigned flags, struct iomap *iomap,
+ iomap_punch_t punch);
int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
u64 start, u64 len, const struct iomap_ops *ops);
--
2.45.2
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH 03/10] iomap: move locking out of iomap_write_delalloc_release
2024-10-08 8:59 fix stale delalloc punching for COW I/O v5 Christoph Hellwig
2024-10-08 8:59 ` [PATCH 01/10] iomap: factor out a iomap_last_written_block helper Christoph Hellwig
2024-10-08 8:59 ` [PATCH 02/10] iomap: remove iomap_file_buffered_write_punch_delalloc Christoph Hellwig
@ 2024-10-08 8:59 ` Christoph Hellwig
2024-10-08 8:59 ` [PATCH 04/10] xfs: factor out a xfs_file_write_zero_eof helper Christoph Hellwig
` (8 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2024-10-08 8:59 UTC (permalink / raw)
To: Carlos Maiolino, Christian Brauner, Darrick J. Wong
Cc: linux-xfs, linux-fsdevel
XFS (which currently is the only user of iomap_write_delalloc_release)
already holds invalidate_lock for most zeroing operations. To be able
to avoid a deadlock it needs to stop taking the lock, but doing so
in iomap would leak XFS locking details into iomap.
To avoid this require the caller to hold invalidate_lock when calling
iomap_write_delalloc_release instead of taking it there.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
fs/iomap/buffered-io.c | 17 ++++++++---------
fs/xfs/xfs_iomap.c | 2 ++
2 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index b4f742f3104120..aa587b2142e214 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1211,12 +1211,13 @@ void iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
loff_t scan_end_byte = min(i_size_read(inode), end_byte);
/*
- * Lock the mapping to avoid races with page faults re-instantiating
- * folios and dirtying them via ->page_mkwrite whilst we walk the
- * cache and perform delalloc extent removal. Failing to do this can
- * leave dirty pages with no space reservation in the cache.
+ * The caller must hold invalidate_lock to avoid races with page faults
+ * re-instantiating folios and dirtying them via ->page_mkwrite whilst
+ * we walk the cache and perform delalloc extent removal. Failing to do
+ * this can leave dirty pages with no space reservation in the cache.
*/
- filemap_invalidate_lock(inode->i_mapping);
+ lockdep_assert_held_write(&inode->i_mapping->invalidate_lock);
+
while (start_byte < scan_end_byte) {
loff_t data_end;
@@ -1233,7 +1234,7 @@ void iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
if (start_byte == -ENXIO || start_byte == scan_end_byte)
break;
if (WARN_ON_ONCE(start_byte < 0))
- goto out_unlock;
+ return;
WARN_ON_ONCE(start_byte < punch_start_byte);
WARN_ON_ONCE(start_byte > scan_end_byte);
@@ -1244,7 +1245,7 @@ void iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
data_end = mapping_seek_hole_data(inode->i_mapping, start_byte,
scan_end_byte, SEEK_HOLE);
if (WARN_ON_ONCE(data_end < 0))
- goto out_unlock;
+ return;
/*
* If we race with post-direct I/O invalidation of the page cache,
@@ -1266,8 +1267,6 @@ void iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
if (punch_start_byte < end_byte)
punch(inode, punch_start_byte, end_byte - punch_start_byte,
iomap);
-out_unlock:
- filemap_invalidate_unlock(inode->i_mapping);
}
EXPORT_SYMBOL_GPL(iomap_write_delalloc_release);
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 30f2530b6d5461..01324da63fcfc7 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1239,8 +1239,10 @@ xfs_buffered_write_iomap_end(
if (start_byte >= end_byte)
return 0;
+ filemap_invalidate_lock(inode->i_mapping);
iomap_write_delalloc_release(inode, start_byte, end_byte, flags, iomap,
xfs_buffered_write_delalloc_punch);
+ filemap_invalidate_unlock(inode->i_mapping);
return 0;
}
--
2.45.2
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH 04/10] xfs: factor out a xfs_file_write_zero_eof helper
2024-10-08 8:59 fix stale delalloc punching for COW I/O v5 Christoph Hellwig
` (2 preceding siblings ...)
2024-10-08 8:59 ` [PATCH 03/10] iomap: move locking out of iomap_write_delalloc_release Christoph Hellwig
@ 2024-10-08 8:59 ` Christoph Hellwig
2024-10-08 8:59 ` [PATCH 05/10] xfs: take XFS_MMAPLOCK_EXCL xfs_file_write_zero_eof Christoph Hellwig
` (7 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2024-10-08 8:59 UTC (permalink / raw)
To: Carlos Maiolino, Christian Brauner, Darrick J. Wong
Cc: linux-xfs, linux-fsdevel
Split a helper from xfs_file_write_checks that just deal with the
post-EOF zeroing to keep the code readable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
fs/xfs/xfs_file.c | 140 +++++++++++++++++++++++++++-------------------
1 file changed, 82 insertions(+), 58 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 412b1d71b52b7d..3efb0da2a910d6 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -347,10 +347,77 @@ xfs_file_splice_read(
return ret;
}
+/*
+ * Take care of zeroing post-EOF blocks when they might exist.
+ *
+ * Returns 0 if successfully, a negative error for a failure, or 1 if this
+ * function dropped the iolock and reacquired it exclusively and the caller
+ * needs to restart the write sanity checks.
+ */
+static ssize_t
+xfs_file_write_zero_eof(
+ struct kiocb *iocb,
+ struct iov_iter *from,
+ unsigned int *iolock,
+ size_t count,
+ bool *drained_dio)
+{
+ struct xfs_inode *ip = XFS_I(iocb->ki_filp->f_mapping->host);
+ loff_t isize;
+
+ /*
+ * We need to serialise against EOF updates that occur in IO completions
+ * here. We want to make sure that nobody is changing the size while
+ * we do this check until we have placed an IO barrier (i.e. hold
+ * XFS_IOLOCK_EXCL) that prevents new IO from being dispatched. The
+ * spinlock effectively forms a memory barrier once we have
+ * XFS_IOLOCK_EXCL so we are guaranteed to see the latest EOF value and
+ * hence be able to correctly determine if we need to run zeroing.
+ */
+ spin_lock(&ip->i_flags_lock);
+ isize = i_size_read(VFS_I(ip));
+ if (iocb->ki_pos <= isize) {
+ spin_unlock(&ip->i_flags_lock);
+ return 0;
+ }
+ spin_unlock(&ip->i_flags_lock);
+
+ if (iocb->ki_flags & IOCB_NOWAIT)
+ return -EAGAIN;
+
+ if (!*drained_dio) {
+ /*
+ * If zeroing is needed and we are currently holding the iolock
+ * shared, we need to update it to exclusive which implies
+ * having to redo all checks before.
+ */
+ if (*iolock == XFS_IOLOCK_SHARED) {
+ xfs_iunlock(ip, *iolock);
+ *iolock = XFS_IOLOCK_EXCL;
+ xfs_ilock(ip, *iolock);
+ iov_iter_reexpand(from, count);
+ }
+
+ /*
+ * We now have an IO submission barrier in place, but AIO can do
+ * EOF updates during IO completion and hence we now need to
+ * wait for all of them to drain. Non-AIO DIO will have drained
+ * before we are given the XFS_IOLOCK_EXCL, and so for most
+ * cases this wait is a no-op.
+ */
+ inode_dio_wait(VFS_I(ip));
+ *drained_dio = true;
+ return 1;
+ }
+
+ trace_xfs_zero_eof(ip, isize, iocb->ki_pos - isize);
+ return xfs_zero_range(ip, isize, iocb->ki_pos - isize, NULL);
+}
+
/*
* Common pre-write limit and setup checks.
*
- * Called with the iolocked held either shared and exclusive according to
+ * Called with the iolock held either shared and exclusive according to
* @iolock, and returns with it held. Might upgrade the iolock to exclusive
* if called for a direct write beyond i_size.
*/
@@ -360,13 +427,10 @@ xfs_file_write_checks(
struct iov_iter *from,
unsigned int *iolock)
{
- struct file *file = iocb->ki_filp;
- struct inode *inode = file->f_mapping->host;
- struct xfs_inode *ip = XFS_I(inode);
- ssize_t error = 0;
+ struct inode *inode = iocb->ki_filp->f_mapping->host;
size_t count = iov_iter_count(from);
bool drained_dio = false;
- loff_t isize;
+ ssize_t error;
restart:
error = generic_write_checks(iocb, from);
@@ -389,7 +453,7 @@ xfs_file_write_checks(
* exclusively.
*/
if (*iolock == XFS_IOLOCK_SHARED && !IS_NOSEC(inode)) {
- xfs_iunlock(ip, *iolock);
+ xfs_iunlock(XFS_I(inode), *iolock);
*iolock = XFS_IOLOCK_EXCL;
error = xfs_ilock_iocb(iocb, *iolock);
if (error) {
@@ -400,64 +464,24 @@ xfs_file_write_checks(
}
/*
- * If the offset is beyond the size of the file, we need to zero any
+ * If the offset is beyond the size of the file, we need to zero all
* blocks that fall between the existing EOF and the start of this
- * write. If zeroing is needed and we are currently holding the iolock
- * shared, we need to update it to exclusive which implies having to
- * redo all checks before.
- *
- * We need to serialise against EOF updates that occur in IO completions
- * here. We want to make sure that nobody is changing the size while we
- * do this check until we have placed an IO barrier (i.e. hold the
- * XFS_IOLOCK_EXCL) that prevents new IO from being dispatched. The
- * spinlock effectively forms a memory barrier once we have the
- * XFS_IOLOCK_EXCL so we are guaranteed to see the latest EOF value and
- * hence be able to correctly determine if we need to run zeroing.
+ * write.
*
- * We can do an unlocked check here safely as IO completion can only
- * extend EOF. Truncate is locked out at this point, so the EOF can
- * not move backwards, only forwards. Hence we only need to take the
- * slow path and spin locks when we are at or beyond the current EOF.
+ * We can do an unlocked check for i_size here safely as I/O completion
+ * can only extend EOF. Truncate is locked out at this point, so the
+ * EOF can not move backwards, only forwards. Hence we only need to take
+ * the slow path when we are at or beyond the current EOF.
*/
- if (iocb->ki_pos <= i_size_read(inode))
- goto out;
-
- spin_lock(&ip->i_flags_lock);
- isize = i_size_read(inode);
- if (iocb->ki_pos > isize) {
- spin_unlock(&ip->i_flags_lock);
-
- if (iocb->ki_flags & IOCB_NOWAIT)
- return -EAGAIN;
-
- if (!drained_dio) {
- if (*iolock == XFS_IOLOCK_SHARED) {
- xfs_iunlock(ip, *iolock);
- *iolock = XFS_IOLOCK_EXCL;
- xfs_ilock(ip, *iolock);
- iov_iter_reexpand(from, count);
- }
- /*
- * We now have an IO submission barrier in place, but
- * AIO can do EOF updates during IO completion and hence
- * we now need to wait for all of them to drain. Non-AIO
- * DIO will have drained before we are given the
- * XFS_IOLOCK_EXCL, and so for most cases this wait is a
- * no-op.
- */
- inode_dio_wait(inode);
- drained_dio = true;
+ if (iocb->ki_pos > i_size_read(inode)) {
+ error = xfs_file_write_zero_eof(iocb, from, iolock, count,
+ &drained_dio);
+ if (error == 1)
goto restart;
- }
-
- trace_xfs_zero_eof(ip, isize, iocb->ki_pos - isize);
- error = xfs_zero_range(ip, isize, iocb->ki_pos - isize, NULL);
if (error)
return error;
- } else
- spin_unlock(&ip->i_flags_lock);
+ }
-out:
return kiocb_modified(iocb);
}
--
2.45.2
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH 05/10] xfs: take XFS_MMAPLOCK_EXCL xfs_file_write_zero_eof
2024-10-08 8:59 fix stale delalloc punching for COW I/O v5 Christoph Hellwig
` (3 preceding siblings ...)
2024-10-08 8:59 ` [PATCH 04/10] xfs: factor out a xfs_file_write_zero_eof helper Christoph Hellwig
@ 2024-10-08 8:59 ` Christoph Hellwig
2024-10-08 8:59 ` [PATCH 06/10] xfs: IOMAP_ZERO and IOMAP_UNSHARE already hold invalidate_lock Christoph Hellwig
` (6 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2024-10-08 8:59 UTC (permalink / raw)
To: Carlos Maiolino, Christian Brauner, Darrick J. Wong
Cc: linux-xfs, linux-fsdevel
xfs_file_write_zero_eof is the only caller of xfs_zero_range that does
not take XFS_MMAPLOCK_EXCL (aka the invalidate lock). Currently that
is actually the right thing, as an error in the iomap zeroing code will
also take the invalidate_lock to clean up, but to fix that deadlock we
need a consistent locking pattern first.
The only extra thing that XFS_MMAPLOCK_EXCL will lock out are read
pagefaults, which isn't really needed here, but also not actively
harmful.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
fs/xfs/xfs_file.c | 8 +++++++-
fs/xfs/xfs_iomap.c | 2 ++
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 3efb0da2a910d6..b19916b11fd563 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -364,6 +364,7 @@ xfs_file_write_zero_eof(
{
struct xfs_inode *ip = XFS_I(iocb->ki_filp->f_mapping->host);
loff_t isize;
+ int error;
/*
* We need to serialise against EOF updates that occur in IO completions
@@ -411,7 +412,12 @@ xfs_file_write_zero_eof(
}
trace_xfs_zero_eof(ip, isize, iocb->ki_pos - isize);
- return xfs_zero_range(ip, isize, iocb->ki_pos - isize, NULL);
+
+ xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
+ error = xfs_zero_range(ip, isize, iocb->ki_pos - isize, NULL);
+ xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
+
+ return error;
}
/*
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 01324da63fcfc7..4fa4d66dc37761 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1449,6 +1449,8 @@ xfs_zero_range(
{
struct inode *inode = VFS_I(ip);
+ xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL);
+
if (IS_DAX(inode))
return dax_zero_range(inode, pos, len, did_zero,
&xfs_dax_write_iomap_ops);
--
2.45.2
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH 06/10] xfs: IOMAP_ZERO and IOMAP_UNSHARE already hold invalidate_lock
2024-10-08 8:59 fix stale delalloc punching for COW I/O v5 Christoph Hellwig
` (4 preceding siblings ...)
2024-10-08 8:59 ` [PATCH 05/10] xfs: take XFS_MMAPLOCK_EXCL xfs_file_write_zero_eof Christoph Hellwig
@ 2024-10-08 8:59 ` Christoph Hellwig
2024-10-08 17:39 ` Darrick J. Wong
2024-10-08 8:59 ` [PATCH 07/10] xfs: support the COW fork in xfs_bmap_punch_delalloc_range Christoph Hellwig
` (5 subsequent siblings)
11 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2024-10-08 8:59 UTC (permalink / raw)
To: Carlos Maiolino, Christian Brauner, Darrick J. Wong
Cc: linux-xfs, linux-fsdevel
All XFS callers of iomap_zero_range and iomap_file_unshare already hold
invalidate_lock, so we can't take it again in
iomap_file_buffered_write_punch_delalloc.
Use the passed in flags argument to detect if we're called from a zero
or unshare operation and don't take the lock again in this case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_iomap.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 4fa4d66dc37761..17170d9b9ff78a 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1239,10 +1239,18 @@ xfs_buffered_write_iomap_end(
if (start_byte >= end_byte)
return 0;
- filemap_invalidate_lock(inode->i_mapping);
- iomap_write_delalloc_release(inode, start_byte, end_byte, flags, iomap,
- xfs_buffered_write_delalloc_punch);
- filemap_invalidate_unlock(inode->i_mapping);
+ /* For zeroing operations the callers already hold invalidate_lock. */
+ if (flags & (IOMAP_UNSHARE | IOMAP_ZERO)) {
+ rwsem_assert_held_write(&inode->i_mapping->invalidate_lock);
+ iomap_write_delalloc_release(inode, start_byte, end_byte, flags,
+ iomap, xfs_buffered_write_delalloc_punch);
+ } else {
+ filemap_invalidate_lock(inode->i_mapping);
+ iomap_write_delalloc_release(inode, start_byte, end_byte, flags,
+ iomap, xfs_buffered_write_delalloc_punch);
+ filemap_invalidate_unlock(inode->i_mapping);
+ }
+
return 0;
}
--
2.45.2
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH 06/10] xfs: IOMAP_ZERO and IOMAP_UNSHARE already hold invalidate_lock
2024-10-08 8:59 ` [PATCH 06/10] xfs: IOMAP_ZERO and IOMAP_UNSHARE already hold invalidate_lock Christoph Hellwig
@ 2024-10-08 17:39 ` Darrick J. Wong
0 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2024-10-08 17:39 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Carlos Maiolino, Christian Brauner, linux-xfs, linux-fsdevel
On Tue, Oct 08, 2024 at 10:59:17AM +0200, Christoph Hellwig wrote:
> All XFS callers of iomap_zero_range and iomap_file_unshare already hold
> invalidate_lock, so we can't take it again in
> iomap_file_buffered_write_punch_delalloc.
>
> Use the passed in flags argument to detect if we're called from a zero
> or unshare operation and don't take the lock again in this case.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Thanks for fixing the bug I reported,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_iomap.c | 16 ++++++++++++----
> 1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 4fa4d66dc37761..17170d9b9ff78a 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1239,10 +1239,18 @@ xfs_buffered_write_iomap_end(
> if (start_byte >= end_byte)
> return 0;
>
> - filemap_invalidate_lock(inode->i_mapping);
> - iomap_write_delalloc_release(inode, start_byte, end_byte, flags, iomap,
> - xfs_buffered_write_delalloc_punch);
> - filemap_invalidate_unlock(inode->i_mapping);
> + /* For zeroing operations the callers already hold invalidate_lock. */
> + if (flags & (IOMAP_UNSHARE | IOMAP_ZERO)) {
> + rwsem_assert_held_write(&inode->i_mapping->invalidate_lock);
> + iomap_write_delalloc_release(inode, start_byte, end_byte, flags,
> + iomap, xfs_buffered_write_delalloc_punch);
> + } else {
> + filemap_invalidate_lock(inode->i_mapping);
> + iomap_write_delalloc_release(inode, start_byte, end_byte, flags,
> + iomap, xfs_buffered_write_delalloc_punch);
> + filemap_invalidate_unlock(inode->i_mapping);
> + }
> +
> return 0;
> }
>
> --
> 2.45.2
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 07/10] xfs: support the COW fork in xfs_bmap_punch_delalloc_range
2024-10-08 8:59 fix stale delalloc punching for COW I/O v5 Christoph Hellwig
` (5 preceding siblings ...)
2024-10-08 8:59 ` [PATCH 06/10] xfs: IOMAP_ZERO and IOMAP_UNSHARE already hold invalidate_lock Christoph Hellwig
@ 2024-10-08 8:59 ` Christoph Hellwig
2024-10-08 8:59 ` [PATCH 08/10] xfs: share more code in xfs_buffered_write_iomap_begin Christoph Hellwig
` (4 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2024-10-08 8:59 UTC (permalink / raw)
To: Carlos Maiolino, Christian Brauner, Darrick J. Wong
Cc: linux-xfs, linux-fsdevel
xfs_buffered_write_iomap_begin can also create delallocate reservations
that need cleaning up, prepare for that by adding support for the COW
fork in xfs_bmap_punch_delalloc_range.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
fs/xfs/xfs_aops.c | 4 ++--
fs/xfs/xfs_bmap_util.c | 10 +++++++---
fs/xfs/xfs_bmap_util.h | 2 +-
fs/xfs/xfs_iomap.c | 3 ++-
4 files changed, 12 insertions(+), 7 deletions(-)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 6dead20338e24c..559a3a57709748 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -116,7 +116,7 @@ xfs_end_ioend(
if (unlikely(error)) {
if (ioend->io_flags & IOMAP_F_SHARED) {
xfs_reflink_cancel_cow_range(ip, offset, size, true);
- xfs_bmap_punch_delalloc_range(ip, offset,
+ xfs_bmap_punch_delalloc_range(ip, XFS_DATA_FORK, offset,
offset + size);
}
goto done;
@@ -456,7 +456,7 @@ xfs_discard_folio(
* byte of the next folio. Hence the end offset is only dependent on the
* folio itself and not the start offset that is passed in.
*/
- xfs_bmap_punch_delalloc_range(ip, pos,
+ xfs_bmap_punch_delalloc_range(ip, XFS_DATA_FORK, pos,
folio_pos(folio) + folio_size(folio));
}
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 053d567c910840..4719ec90029cb7 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -442,11 +442,12 @@ xfs_getbmap(
void
xfs_bmap_punch_delalloc_range(
struct xfs_inode *ip,
+ int whichfork,
xfs_off_t start_byte,
xfs_off_t end_byte)
{
struct xfs_mount *mp = ip->i_mount;
- struct xfs_ifork *ifp = &ip->i_df;
+ struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork);
xfs_fileoff_t start_fsb = XFS_B_TO_FSBT(mp, start_byte);
xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, end_byte);
struct xfs_bmbt_irec got, del;
@@ -474,11 +475,14 @@ xfs_bmap_punch_delalloc_range(
continue;
}
- xfs_bmap_del_extent_delay(ip, XFS_DATA_FORK, &icur, &got, &del);
+ xfs_bmap_del_extent_delay(ip, whichfork, &icur, &got, &del);
if (!xfs_iext_get_extent(ifp, &icur, &got))
break;
}
+ if (whichfork == XFS_COW_FORK && !ifp->if_bytes)
+ xfs_inode_clear_cowblocks_tag(ip);
+
out_unlock:
xfs_iunlock(ip, XFS_ILOCK_EXCL);
}
@@ -580,7 +584,7 @@ xfs_free_eofblocks(
*/
if (ip->i_diflags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)) {
if (ip->i_delayed_blks) {
- xfs_bmap_punch_delalloc_range(ip,
+ xfs_bmap_punch_delalloc_range(ip, XFS_DATA_FORK,
round_up(XFS_ISIZE(ip), mp->m_sb.sb_blocksize),
LLONG_MAX);
}
diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
index eb0895bfb9dae4..b29760d36e1ab1 100644
--- a/fs/xfs/xfs_bmap_util.h
+++ b/fs/xfs/xfs_bmap_util.h
@@ -30,7 +30,7 @@ xfs_bmap_rtalloc(struct xfs_bmalloca *ap)
}
#endif /* CONFIG_XFS_RT */
-void xfs_bmap_punch_delalloc_range(struct xfs_inode *ip,
+void xfs_bmap_punch_delalloc_range(struct xfs_inode *ip, int whichfork,
xfs_off_t start_byte, xfs_off_t end_byte);
struct kgetbmap {
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 17170d9b9ff78a..03eb57a721ced0 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1215,7 +1215,8 @@ xfs_buffered_write_delalloc_punch(
loff_t length,
struct iomap *iomap)
{
- xfs_bmap_punch_delalloc_range(XFS_I(inode), offset, offset + length);
+ xfs_bmap_punch_delalloc_range(XFS_I(inode), XFS_DATA_FORK, offset,
+ offset + length);
}
static int
--
2.45.2
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH 08/10] xfs: share more code in xfs_buffered_write_iomap_begin
2024-10-08 8:59 fix stale delalloc punching for COW I/O v5 Christoph Hellwig
` (6 preceding siblings ...)
2024-10-08 8:59 ` [PATCH 07/10] xfs: support the COW fork in xfs_bmap_punch_delalloc_range Christoph Hellwig
@ 2024-10-08 8:59 ` Christoph Hellwig
2024-10-08 8:59 ` [PATCH 09/10] xfs: set IOMAP_F_SHARED for all COW fork allocations Christoph Hellwig
` (3 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2024-10-08 8:59 UTC (permalink / raw)
To: Carlos Maiolino, Christian Brauner, Darrick J. Wong
Cc: linux-xfs, linux-fsdevel
Introduce a local iomap_flags variable so that the code allocating new
delalloc blocks in the data fork can fall through to the found_imap
label and reuse the code to unlock and fill the iomap.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
fs/xfs/xfs_iomap.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 03eb57a721ced0..ebd0c90c1b3d8e 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -975,6 +975,7 @@ xfs_buffered_write_iomap_begin(
int allocfork = XFS_DATA_FORK;
int error = 0;
unsigned int lockmode = XFS_ILOCK_EXCL;
+ unsigned int iomap_flags = 0;
u64 seq;
if (xfs_is_shutdown(mp))
@@ -1145,6 +1146,11 @@ xfs_buffered_write_iomap_begin(
}
}
+ /*
+ * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
+ * them out if the write happens to fail.
+ */
+ iomap_flags |= IOMAP_F_NEW;
if (allocfork == XFS_COW_FORK) {
error = xfs_bmapi_reserve_delalloc(ip, allocfork, offset_fsb,
end_fsb - offset_fsb, prealloc_blocks, &cmap,
@@ -1162,19 +1168,11 @@ xfs_buffered_write_iomap_begin(
if (error)
goto out_unlock;
- /*
- * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
- * them out if the write happens to fail.
- */
- seq = xfs_iomap_inode_sequence(ip, IOMAP_F_NEW);
- xfs_iunlock(ip, lockmode);
trace_xfs_iomap_alloc(ip, offset, count, allocfork, &imap);
- return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, IOMAP_F_NEW, seq);
-
found_imap:
- seq = xfs_iomap_inode_sequence(ip, 0);
+ seq = xfs_iomap_inode_sequence(ip, iomap_flags);
xfs_iunlock(ip, lockmode);
- return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, 0, seq);
+ return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, iomap_flags, seq);
convert_delay:
xfs_iunlock(ip, lockmode);
--
2.45.2
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH 09/10] xfs: set IOMAP_F_SHARED for all COW fork allocations
2024-10-08 8:59 fix stale delalloc punching for COW I/O v5 Christoph Hellwig
` (7 preceding siblings ...)
2024-10-08 8:59 ` [PATCH 08/10] xfs: share more code in xfs_buffered_write_iomap_begin Christoph Hellwig
@ 2024-10-08 8:59 ` Christoph Hellwig
2024-10-08 8:59 ` [PATCH 10/10] xfs: punch delalloc extents from the COW fork for COW writes Christoph Hellwig
` (2 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2024-10-08 8:59 UTC (permalink / raw)
To: Carlos Maiolino, Christian Brauner, Darrick J. Wong
Cc: linux-xfs, linux-fsdevel
Change to always set xfs_buffered_write_iomap_begin for COW fork
allocations even if they don't overlap existing data fork extents,
which will allow the iomap_end callback to detect if it has to punch
stale delalloc blocks from the COW fork instead of the data fork. It
also means we sample the sequence counter for both the data and the COW
fork when writing to the COW fork, which ensures we properly revalidate
when only COW fork changes happens.
This is essentially a revert of commit 72a048c1056a ("xfs: only set
IOMAP_F_SHARED when providing a srcmap to a write"). This is fine because
the problem that the commit fixed has now been dealt with in iomap by
only looking at the actual srcmap and not the fallback to the write
iomap.
Note that the direct I/O path was never changed and has always set
IOMAP_F_SHARED for all COW fork allocations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
fs/xfs/xfs_iomap.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index ebd0c90c1b3d8e..0317bbfeeb38f3 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1186,20 +1186,20 @@ xfs_buffered_write_iomap_begin(
return 0;
found_cow:
- seq = xfs_iomap_inode_sequence(ip, 0);
if (imap.br_startoff <= offset_fsb) {
- error = xfs_bmbt_to_iomap(ip, srcmap, &imap, flags, 0, seq);
+ error = xfs_bmbt_to_iomap(ip, srcmap, &imap, flags, 0,
+ xfs_iomap_inode_sequence(ip, 0));
if (error)
goto out_unlock;
- seq = xfs_iomap_inode_sequence(ip, IOMAP_F_SHARED);
- xfs_iunlock(ip, lockmode);
- return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags,
- IOMAP_F_SHARED, seq);
+ } else {
+ xfs_trim_extent(&cmap, offset_fsb,
+ imap.br_startoff - offset_fsb);
}
- xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb);
+ iomap_flags = IOMAP_F_SHARED;
+ seq = xfs_iomap_inode_sequence(ip, iomap_flags);
xfs_iunlock(ip, lockmode);
- return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, 0, seq);
+ return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, iomap_flags, seq);
out_unlock:
xfs_iunlock(ip, lockmode);
--
2.45.2
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH 10/10] xfs: punch delalloc extents from the COW fork for COW writes
2024-10-08 8:59 fix stale delalloc punching for COW I/O v5 Christoph Hellwig
` (8 preceding siblings ...)
2024-10-08 8:59 ` [PATCH 09/10] xfs: set IOMAP_F_SHARED for all COW fork allocations Christoph Hellwig
@ 2024-10-08 8:59 ` Christoph Hellwig
2024-10-11 10:59 ` fix stale delalloc punching for COW I/O v5 Carlos Maiolino
2024-10-15 11:09 ` Carlos Maiolino
11 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2024-10-08 8:59 UTC (permalink / raw)
To: Carlos Maiolino, Christian Brauner, Darrick J. Wong
Cc: linux-xfs, linux-fsdevel
When ->iomap_end is called on a short write to the COW fork it needs to
punch stale delalloc data from the COW fork and not the data fork.
Ensure that IOMAP_F_NEW is set for new COW fork allocations in
xfs_buffered_write_iomap_begin, and then use the IOMAP_F_SHARED flag
in xfs_buffered_write_delalloc_punch to decide which fork to punch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
fs/xfs/xfs_iomap.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 0317bbfeeb38f3..916531d9f83c2f 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1196,7 +1196,7 @@ xfs_buffered_write_iomap_begin(
imap.br_startoff - offset_fsb);
}
- iomap_flags = IOMAP_F_SHARED;
+ iomap_flags |= IOMAP_F_SHARED;
seq = xfs_iomap_inode_sequence(ip, iomap_flags);
xfs_iunlock(ip, lockmode);
return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, iomap_flags, seq);
@@ -1213,8 +1213,10 @@ xfs_buffered_write_delalloc_punch(
loff_t length,
struct iomap *iomap)
{
- xfs_bmap_punch_delalloc_range(XFS_I(inode), XFS_DATA_FORK, offset,
- offset + length);
+ xfs_bmap_punch_delalloc_range(XFS_I(inode),
+ (iomap->flags & IOMAP_F_SHARED) ?
+ XFS_COW_FORK : XFS_DATA_FORK,
+ offset, offset + length);
}
static int
--
2.45.2
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: fix stale delalloc punching for COW I/O v5
2024-10-08 8:59 fix stale delalloc punching for COW I/O v5 Christoph Hellwig
` (9 preceding siblings ...)
2024-10-08 8:59 ` [PATCH 10/10] xfs: punch delalloc extents from the COW fork for COW writes Christoph Hellwig
@ 2024-10-11 10:59 ` Carlos Maiolino
2024-10-15 11:09 ` Carlos Maiolino
11 siblings, 0 replies; 14+ messages in thread
From: Carlos Maiolino @ 2024-10-11 10:59 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel
On Tue, Oct 08, 2024 at 10:59:11AM GMT, Christoph Hellwig wrote:
> Hi all,
>
Just as a heads up to Christian, that I'm adding this to my queue now as most of
these are xfs-related.
Carlos
> this is another fallout from the zoned XFS work, which stresses the XFS
> COW I/O path very heavily. It affects normal I/O to reflinked files as
> well, but is very hard to hit there.
>
> The main problem here is that we only punch out delalloc reservations
> from the data fork, but COW I/O places delalloc extents into the COW
> fork, which means that it won't get punched out forshort writes.
>
>
> Changes since v4:
> - unshare also already holds the invalidate_lock as found out by recent
> fsstress enhancements
>
> Changes since v3:
> - improve two comments
>
> Changes since v2:
> - drop the patches already merged and rebased to latest Linus' tree
> - moved taking invalidate_lock from iomap to the caller to avoid a
> too complicated locking protocol
> - better document the xfs_file_write_zero_eof return value
> - fix a commit log typo
>
> Changes since v1:
> - move the already reviewed iomap prep changes to the beginning in case
> Christian wants to take them ASAP
> - take the invalidate_lock for post-EOF zeroing so that we have a
> consistent locking pattern for zeroing.
>
> Diffstat:
> Documentation/filesystems/iomap/operations.rst | 2
> fs/iomap/buffered-io.c | 111 ++++++-------------
> fs/xfs/xfs_aops.c | 4
> fs/xfs/xfs_bmap_util.c | 10 +
> fs/xfs/xfs_bmap_util.h | 2
> fs/xfs/xfs_file.c | 146 +++++++++++++++----------
> fs/xfs/xfs_iomap.c | 67 +++++++----
> include/linux/iomap.h | 20 ++-
> 8 files changed, 198 insertions(+), 164 deletions(-)
>
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: fix stale delalloc punching for COW I/O v5
2024-10-08 8:59 fix stale delalloc punching for COW I/O v5 Christoph Hellwig
` (10 preceding siblings ...)
2024-10-11 10:59 ` fix stale delalloc punching for COW I/O v5 Carlos Maiolino
@ 2024-10-15 11:09 ` Carlos Maiolino
11 siblings, 0 replies; 14+ messages in thread
From: Carlos Maiolino @ 2024-10-15 11:09 UTC (permalink / raw)
To: Christian Brauner, Darrick J. Wong, Christoph Hellwig
Cc: linux-xfs, linux-fsdevel
On Tue, 08 Oct 2024 10:59:11 +0200, Christoph Hellwig wrote:
> this is another fallout from the zoned XFS work, which stresses the XFS
> COW I/O path very heavily. It affects normal I/O to reflinked files as
> well, but is very hard to hit there.
>
> The main problem here is that we only punch out delalloc reservations
> from the data fork, but COW I/O places delalloc extents into the COW
> fork, which means that it won't get punched out forshort writes.
>
> [...]
Applied to for-next, thanks!
[01/10] iomap: factor out a iomap_last_written_block helper
commit: c0adf8c3a9bf33f1dd1bf950601380f46a3fcec3
[02/10] iomap: remove iomap_file_buffered_write_punch_delalloc
commit: caf0ea451d97c33c5bbaa0074dad33b0b2a4e649
[03/10] iomap: move locking out of iomap_write_delalloc_release
commit: b78495166264fee1ed7ac44627e1dd080bbdf283
[04/10] xfs: factor out a xfs_file_write_zero_eof helper
commit: 3c399374af28b158854701da324a7bff576f5a97
[05/10] xfs: take XFS_MMAPLOCK_EXCL xfs_file_write_zero_eof
commit: acfbac776496f2093e9facf7876b4015ef8c3d1d
[06/10] xfs: IOMAP_ZERO and IOMAP_UNSHARE already hold invalidate_lock
commit: abd7d651ad2cd2ab1b8cd4dd31e80a8255196db3
[07/10] xfs: support the COW fork in xfs_bmap_punch_delalloc_range
commit: 8fe3b21efa075f29d64a34000e84f89cfaa6cd80
[08/10] xfs: share more code in xfs_buffered_write_iomap_begin
commit: c29440ff66d6f24be5e9e313c1c0eca7212faf9e
[09/10] xfs: set IOMAP_F_SHARED for all COW fork allocations
commit: 7d6fe5c586e6a866f9e69a5bdd72a72b977bab8e
[10/10] xfs: punch delalloc extents from the COW fork for COW writes
commit: f6f91d290c8b9da6e671bd15f306ad2d0e635a04
Best regards,
--
Carlos Maiolino <cem@kernel.org>
^ permalink raw reply [flat|nested] 14+ messages in thread