* [PATCH v8 0/7] block atomic writes for xfs
@ 2024-10-15 9:01 John Garry
2024-10-15 9:01 ` [PATCH v8 1/7] block/fs: Pass an iocb to generic_atomic_write_valid() John Garry
` (6 more replies)
0 siblings, 7 replies; 16+ messages in thread
From: John Garry @ 2024-10-15 9:01 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
This series expands atomic write support to filesystems, specifically
XFS.
Initially we will only support writing exactly 1x FS block atomically.
Since we can now have FS block size > PAGE_SIZE for XFS, we can write
atomically write 4K+ blocks on x86.
No special per-inode flag is required for enabling writing 1x F block.
In future, to support writing more than one FS block atomically, a new FS
XFLAG flag may then introduced - like FS_XFLAG_BIG_ATOMICWRITES. This
would depend on a feature like forcealign.
So if we format the FS for 16K FS block size:
mkfs.xfs -b size=16384 /dev/sda
The statx reports atomic write unit min/max = FS block size:
$xfs_io -c statx filename
...
stat.stx_atomic_write_unit_min = 16384
stat.stx_atomic_write_unit_max = 16384
stat.stx_atomic_write_segments_max = 1
...
Baseline is 77bfe1b11ea0 (tag: xfs-6.12-fixes-3, xfs/xfs-6.12-fixesC,
xfs/for-next) xfs: fix a typo
Patches for this series can be found at:
https://github.com/johnpgarry/linux/tree/atomic-writes-v6.12-fs-v8
Changes since v7:
- Drop FS_XFLAG_ATOMICWRITES
- Reorder block/fs patches and add fixes tags (Christoph)
- Add RB tag from Christoph (Thanks!)
- Rebase
Changes since v6:
- Add iomap documentation update (Darrick)
- Drop reflink restriction (Darrick, Christoph)
- Catch XFS buffered IO fallback (Darrick)
- Check IOCB_DIRECT in generic_atomic_write_valid()
- Tweaks to coding style (Darrick)
- Add RB tags from Darrick and Christoph (thanks!)
John Garry (7):
block/fs: Pass an iocb to generic_atomic_write_valid()
fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid()
fs: Export generic_atomic_write_valid()
fs: iomap: Atomic write support
xfs: Support atomic write for statx
xfs: Validate atomic writes
xfs: Support setting FMODE_CAN_ATOMIC_WRITE
.../filesystems/iomap/operations.rst | 11 ++++++
block/fops.c | 22 ++++++-----
fs/iomap/direct-io.c | 38 +++++++++++++++++--
fs/iomap/trace.h | 3 +-
fs/read_write.c | 16 +++++---
fs/xfs/xfs_buf.c | 7 ++++
fs/xfs/xfs_buf.h | 3 ++
fs/xfs/xfs_file.c | 10 +++++
fs/xfs/xfs_inode.h | 15 ++++++++
fs/xfs/xfs_iops.c | 25 ++++++++++++
include/linux/fs.h | 2 +-
include/linux/iomap.h | 1 +
12 files changed, 131 insertions(+), 22 deletions(-)
--
2.31.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v8 1/7] block/fs: Pass an iocb to generic_atomic_write_valid()
2024-10-15 9:01 [PATCH v8 0/7] block atomic writes for xfs John Garry
@ 2024-10-15 9:01 ` John Garry
2024-10-15 9:01 ` [PATCH v8 2/7] fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid() John Garry
` (5 subsequent siblings)
6 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2024-10-15 9:01 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Darrick and Hannes both thought it better that generic_atomic_write_valid()
should be passed a struct iocb, and not just the member of that struct
which is referenced; see [0] and [1].
I think that makes a more generic and clean API, so make that change.
[0] https://lore.kernel.org/linux-block/680ce641-729b-4150-b875-531a98657682@suse.de/
[1] https://lore.kernel.org/linux-xfs/20240620212401.GA3058325@frogsfrogsfrogs/
Fixes: c34fc6f26ab8 ("fs: Initial atomic write support")
Suggested-by: "Darrick J. Wong" <djwong@kernel.org>
Suggested-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
block/fops.c | 8 ++++----
fs/read_write.c | 4 ++--
include/linux/fs.h | 2 +-
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/block/fops.c b/block/fops.c
index e696ae53bf1e..968b47b615c4 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -35,13 +35,13 @@ static blk_opf_t dio_bio_write_op(struct kiocb *iocb)
return opf;
}
-static bool blkdev_dio_invalid(struct block_device *bdev, loff_t pos,
+static bool blkdev_dio_invalid(struct block_device *bdev, struct kiocb *iocb,
struct iov_iter *iter, bool is_atomic)
{
- if (is_atomic && !generic_atomic_write_valid(iter, pos))
+ if (is_atomic && !generic_atomic_write_valid(iocb, iter))
return true;
- return pos & (bdev_logical_block_size(bdev) - 1) ||
+ return iocb->ki_pos & (bdev_logical_block_size(bdev) - 1) ||
!bdev_iter_is_aligned(bdev, iter);
}
@@ -374,7 +374,7 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
if (!iov_iter_count(iter))
return 0;
- if (blkdev_dio_invalid(bdev, iocb->ki_pos, iter, is_atomic))
+ if (blkdev_dio_invalid(bdev, iocb, iter, is_atomic))
return -EINVAL;
nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS + 1);
diff --git a/fs/read_write.c b/fs/read_write.c
index 64dc24afdb3a..2c3263530828 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1830,7 +1830,7 @@ int generic_file_rw_checks(struct file *file_in, struct file *file_out)
return 0;
}
-bool generic_atomic_write_valid(struct iov_iter *iter, loff_t pos)
+bool generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter)
{
size_t len = iov_iter_count(iter);
@@ -1840,7 +1840,7 @@ bool generic_atomic_write_valid(struct iov_iter *iter, loff_t pos)
if (!is_power_of_2(len))
return false;
- if (!IS_ALIGNED(pos, len))
+ if (!IS_ALIGNED(iocb->ki_pos, len))
return false;
return true;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e3c603d01337..fbfa032d1d90 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3721,6 +3721,6 @@ static inline bool vfs_empty_path(int dfd, const char __user *path)
return !c;
}
-bool generic_atomic_write_valid(struct iov_iter *iter, loff_t pos);
+bool generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter);
#endif /* _LINUX_FS_H */
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v8 2/7] fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid()
2024-10-15 9:01 [PATCH v8 0/7] block atomic writes for xfs John Garry
2024-10-15 9:01 ` [PATCH v8 1/7] block/fs: Pass an iocb to generic_atomic_write_valid() John Garry
@ 2024-10-15 9:01 ` John Garry
2024-10-15 9:01 ` [PATCH v8 3/7] fs: Export generic_atomic_write_valid() John Garry
` (4 subsequent siblings)
6 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2024-10-15 9:01 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Currently FMODE_CAN_ATOMIC_WRITE is set if the bdev can atomic write and
the file is open for direct IO. This does not work if the file is not
opened for direct IO, yet fcntl(O_DIRECT) is used on the fd later.
Change to check for direct IO on a per-IO basis in
generic_atomic_write_valid(). Since we want to report -EOPNOTSUPP for
non-direct IO for an atomic write, change to return an error code.
Relocate the block fops atomic write checks to the common write path, as to
catch non-direct IO.
Fixes: c34fc6f26ab8 ("fs: Initial atomic write support")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
block/fops.c | 18 ++++++++++--------
fs/read_write.c | 13 ++++++++-----
include/linux/fs.h | 2 +-
3 files changed, 19 insertions(+), 14 deletions(-)
diff --git a/block/fops.c b/block/fops.c
index 968b47b615c4..2d01c9007681 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -36,11 +36,8 @@ static blk_opf_t dio_bio_write_op(struct kiocb *iocb)
}
static bool blkdev_dio_invalid(struct block_device *bdev, struct kiocb *iocb,
- struct iov_iter *iter, bool is_atomic)
+ struct iov_iter *iter)
{
- if (is_atomic && !generic_atomic_write_valid(iocb, iter))
- return true;
-
return iocb->ki_pos & (bdev_logical_block_size(bdev) - 1) ||
!bdev_iter_is_aligned(bdev, iter);
}
@@ -368,13 +365,12 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
{
struct block_device *bdev = I_BDEV(iocb->ki_filp->f_mapping->host);
- bool is_atomic = iocb->ki_flags & IOCB_ATOMIC;
unsigned int nr_pages;
if (!iov_iter_count(iter))
return 0;
- if (blkdev_dio_invalid(bdev, iocb, iter, is_atomic))
+ if (blkdev_dio_invalid(bdev, iocb, iter))
return -EINVAL;
nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS + 1);
@@ -383,7 +379,7 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
return __blkdev_direct_IO_simple(iocb, iter, bdev,
nr_pages);
return __blkdev_direct_IO_async(iocb, iter, bdev, nr_pages);
- } else if (is_atomic) {
+ } else if (iocb->ki_flags & IOCB_ATOMIC) {
return -EINVAL;
}
return __blkdev_direct_IO(iocb, iter, bdev, bio_max_segs(nr_pages));
@@ -625,7 +621,7 @@ static int blkdev_open(struct inode *inode, struct file *filp)
if (!bdev)
return -ENXIO;
- if (bdev_can_atomic_write(bdev) && filp->f_flags & O_DIRECT)
+ if (bdev_can_atomic_write(bdev))
filp->f_mode |= FMODE_CAN_ATOMIC_WRITE;
ret = bdev_open(bdev, mode, filp->private_data, NULL, filp);
@@ -700,6 +696,12 @@ static ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT)
return -EOPNOTSUPP;
+ if (iocb->ki_flags & IOCB_ATOMIC) {
+ ret = generic_atomic_write_valid(iocb, from);
+ if (ret)
+ return ret;
+ }
+
size -= iocb->ki_pos;
if (iov_iter_count(from) > size) {
shorted = iov_iter_count(from) - size;
diff --git a/fs/read_write.c b/fs/read_write.c
index 2c3263530828..befec0b5c537 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1830,18 +1830,21 @@ int generic_file_rw_checks(struct file *file_in, struct file *file_out)
return 0;
}
-bool generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter)
+int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter)
{
size_t len = iov_iter_count(iter);
if (!iter_is_ubuf(iter))
- return false;
+ return -EINVAL;
if (!is_power_of_2(len))
- return false;
+ return -EINVAL;
if (!IS_ALIGNED(iocb->ki_pos, len))
- return false;
+ return -EINVAL;
- return true;
+ if (!(iocb->ki_flags & IOCB_DIRECT))
+ return -EOPNOTSUPP;
+
+ return 0;
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fbfa032d1d90..ba47fb283730 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3721,6 +3721,6 @@ static inline bool vfs_empty_path(int dfd, const char __user *path)
return !c;
}
-bool generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter);
+int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter);
#endif /* _LINUX_FS_H */
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v8 3/7] fs: Export generic_atomic_write_valid()
2024-10-15 9:01 [PATCH v8 0/7] block atomic writes for xfs John Garry
2024-10-15 9:01 ` [PATCH v8 1/7] block/fs: Pass an iocb to generic_atomic_write_valid() John Garry
2024-10-15 9:01 ` [PATCH v8 2/7] fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid() John Garry
@ 2024-10-15 9:01 ` John Garry
2024-10-15 9:01 ` [PATCH v8 4/7] fs: iomap: Atomic write support John Garry
` (3 subsequent siblings)
6 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2024-10-15 9:01 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
The XFS code will need this.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
fs/read_write.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/read_write.c b/fs/read_write.c
index befec0b5c537..3e5dad12a5b4 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1848,3 +1848,4 @@ int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter)
return 0;
}
+EXPORT_SYMBOL_GPL(generic_atomic_write_valid);
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v8 4/7] fs: iomap: Atomic write support
2024-10-15 9:01 [PATCH v8 0/7] block atomic writes for xfs John Garry
` (2 preceding siblings ...)
2024-10-15 9:01 ` [PATCH v8 3/7] fs: Export generic_atomic_write_valid() John Garry
@ 2024-10-15 9:01 ` John Garry
2024-10-15 12:12 ` Christoph Hellwig
2024-10-15 9:01 ` [PATCH v8 5/7] xfs: Support atomic write for statx John Garry
` (2 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: John Garry @ 2024-10-15 9:01 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Support direct I/O atomic writes by producing a single bio with REQ_ATOMIC
flag set.
Initially FSes (XFS) should only support writing a single FS block
atomically.
As with any atomic write, we should produce a single bio which covers the
complete write length.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
.../filesystems/iomap/operations.rst | 11 ++++++
fs/iomap/direct-io.c | 38 +++++++++++++++++--
fs/iomap/trace.h | 3 +-
include/linux/iomap.h | 1 +
4 files changed, 48 insertions(+), 5 deletions(-)
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index 8e6c721d2330..fb95e99ca1a0 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -513,6 +513,17 @@ IOMAP_WRITE`` with any combination of the following enhancements:
if the mapping is unwritten and the filesystem cannot handle zeroing
the unaligned regions without exposing stale contents.
+ * ``IOMAP_ATOMIC``: This write is being issued with torn-write
+ protection. Only a single bio can be created for the write, and the
+ write must not be split into multiple I/O requests, i.e. flag
+ REQ_ATOMIC must be set.
+ The file range to write must be aligned to satisfy the requirements
+ of both the filesystem and the underlying block device's atomic
+ commit capabilities.
+ If filesystem metadata updates are required (e.g. unwritten extent
+ conversion or copy on write), all updates for the entire file range
+ must be committed atomically as well.
+
Callers commonly hold ``i_rwsem`` in shared or exclusive mode before
calling this function.
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index f637aa0706a3..c968a0e2a60b 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -271,7 +271,7 @@ static int iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
* clearing the WRITE_THROUGH flag in the dio request.
*/
static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
- const struct iomap *iomap, bool use_fua)
+ const struct iomap *iomap, bool use_fua, bool atomic)
{
blk_opf_t opflags = REQ_SYNC | REQ_IDLE;
@@ -283,6 +283,8 @@ static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
opflags |= REQ_FUA;
else
dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
+ if (atomic)
+ opflags |= REQ_ATOMIC;
return opflags;
}
@@ -293,7 +295,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
const struct iomap *iomap = &iter->iomap;
struct inode *inode = iter->inode;
unsigned int fs_block_size = i_blocksize(inode), pad;
- loff_t length = iomap_length(iter);
+ const loff_t length = iomap_length(iter);
+ bool atomic = iter->flags & IOMAP_ATOMIC;
loff_t pos = iter->pos;
blk_opf_t bio_opf;
struct bio *bio;
@@ -303,6 +306,9 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
size_t copied = 0;
size_t orig_count;
+ if (atomic && (length != fs_block_size))
+ return -EINVAL;
+
if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1) ||
!bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
return -EINVAL;
@@ -382,7 +388,7 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
* can set up the page vector appropriately for a ZONE_APPEND
* operation.
*/
- bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua);
+ bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua, atomic);
nr_pages = bio_iov_vecs_to_alloc(dio->submit.iter, BIO_MAX_VECS);
do {
@@ -415,6 +421,17 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
}
n = bio->bi_iter.bi_size;
+ if (atomic && n != length) {
+ /*
+ * This bio should have covered the complete length,
+ * which it doesn't, so error. We may need to zero out
+ * the tail (complete FS block), similar to when
+ * bio_iov_iter_get_pages() returns an error, above.
+ */
+ ret = -EINVAL;
+ bio_put(bio);
+ goto zero_tail;
+ }
if (dio->flags & IOMAP_DIO_WRITE) {
task_io_account_write(n);
} else {
@@ -598,6 +615,9 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (iocb->ki_flags & IOCB_NOWAIT)
iomi.flags |= IOMAP_NOWAIT;
+ if (iocb->ki_flags & IOCB_ATOMIC)
+ iomi.flags |= IOMAP_ATOMIC;
+
if (iov_iter_rw(iter) == READ) {
/* reads can always complete inline */
dio->flags |= IOMAP_DIO_INLINE_COMP;
@@ -659,7 +679,17 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (ret != -EAGAIN) {
trace_iomap_dio_invalidate_fail(inode, iomi.pos,
iomi.len);
- ret = -ENOTBLK;
+ if (iocb->ki_flags & IOCB_ATOMIC) {
+ /*
+ * folio invalidation failed, maybe
+ * this is transient, unlock and see if
+ * the caller tries again.
+ */
+ ret = -EAGAIN;
+ } else {
+ /* fall back to buffered write */
+ ret = -ENOTBLK;
+ }
}
goto out_free_dio;
}
diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
index 0a991c4ce87d..4118a42cdab0 100644
--- a/fs/iomap/trace.h
+++ b/fs/iomap/trace.h
@@ -98,7 +98,8 @@ DEFINE_RANGE_EVENT(iomap_dio_rw_queued);
{ IOMAP_REPORT, "REPORT" }, \
{ IOMAP_FAULT, "FAULT" }, \
{ IOMAP_DIRECT, "DIRECT" }, \
- { IOMAP_NOWAIT, "NOWAIT" }
+ { IOMAP_NOWAIT, "NOWAIT" }, \
+ { IOMAP_ATOMIC, "ATOMIC" }
#define IOMAP_F_FLAGS_STRINGS \
{ IOMAP_F_NEW, "NEW" }, \
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 4ad12a3c8bae..c7644bdcfca3 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -178,6 +178,7 @@ struct iomap_folio_ops {
#else
#define IOMAP_DAX 0
#endif /* CONFIG_FS_DAX */
+#define IOMAP_ATOMIC (1 << 9)
struct iomap_ops {
/*
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v8 5/7] xfs: Support atomic write for statx
2024-10-15 9:01 [PATCH v8 0/7] block atomic writes for xfs John Garry
` (3 preceding siblings ...)
2024-10-15 9:01 ` [PATCH v8 4/7] fs: iomap: Atomic write support John Garry
@ 2024-10-15 9:01 ` John Garry
2024-10-15 12:15 ` Christoph Hellwig
2024-10-15 9:01 ` [PATCH v8 6/7] xfs: Validate atomic writes John Garry
2024-10-15 9:01 ` [PATCH v8 7/7] xfs: Support setting FMODE_CAN_ATOMIC_WRITE John Garry
6 siblings, 1 reply; 16+ messages in thread
From: John Garry @ 2024-10-15 9:01 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Support providing info on atomic write unit min and max for an inode.
For simplicity, currently we limit the min at the FS block size. As for
max, we limit also at FS block size, as there is no current method to
guarantee extent alignment or granularity for regular files.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
fs/xfs/xfs_buf.c | 7 +++++++
fs/xfs/xfs_buf.h | 3 +++
fs/xfs/xfs_inode.h | 15 +++++++++++++++
fs/xfs/xfs_iops.c | 25 +++++++++++++++++++++++++
4 files changed, 50 insertions(+)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index aa4dbda7b536..e279e5e139ff 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -2115,6 +2115,13 @@ xfs_alloc_buftarg(
btp->bt_daxdev = fs_dax_get_by_bdev(btp->bt_bdev, &btp->bt_dax_part_off,
mp, ops);
+ if (bdev_can_atomic_write(btp->bt_bdev)) {
+ struct request_queue *q = bdev_get_queue(btp->bt_bdev);
+
+ btp->bt_bdev_awu_min = queue_atomic_write_unit_min_bytes(q);
+ btp->bt_bdev_awu_max = queue_atomic_write_unit_max_bytes(q);
+ }
+
/*
* When allocating the buftargs we have not yet read the super block and
* thus don't know the file system sector size yet.
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 209a389f2abc..2be28bd01087 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -124,6 +124,9 @@ struct xfs_buftarg {
struct percpu_counter bt_io_count;
struct ratelimit_state bt_ioerror_rl;
+ /* Atomic write unit values */
+ unsigned int bt_bdev_awu_min, bt_bdev_awu_max;
+
/* built-in cache, if we're not using the perag one */
struct xfs_buf_cache bt_cache[];
};
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 97ed912306fd..73009a25a119 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -327,6 +327,21 @@ static inline bool xfs_inode_has_bigrtalloc(struct xfs_inode *ip)
(XFS_IS_REALTIME_INODE(ip) ? \
(ip)->i_mount->m_rtdev_targp : (ip)->i_mount->m_ddev_targp)
+static inline bool
+xfs_inode_can_atomicwrite(
+ struct xfs_inode *ip)
+{
+ struct xfs_mount *mp = ip->i_mount;
+ struct xfs_buftarg *target = xfs_inode_buftarg(ip);
+
+ if (mp->m_sb.sb_blocksize < target->bt_bdev_awu_min)
+ return false;
+ if (mp->m_sb.sb_blocksize > target->bt_bdev_awu_max)
+ return false;
+
+ return true;
+}
+
/*
* In-core inode flags.
*/
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index ee79cf161312..919fbcb4b72a 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -570,6 +570,23 @@ xfs_stat_blksize(
return max_t(uint32_t, PAGE_SIZE, mp->m_sb.sb_blocksize);
}
+static void
+xfs_get_atomic_write_attr(
+ struct xfs_inode *ip,
+ unsigned int *unit_min,
+ unsigned int *unit_max)
+{
+ struct xfs_mount *mp = ip->i_mount;
+ struct xfs_sb *sbp = &mp->m_sb;
+
+ if (!xfs_inode_can_atomicwrite(ip)) {
+ *unit_min = *unit_max = 0;
+ return;
+ }
+
+ *unit_min = *unit_max = sbp->sb_blocksize;
+}
+
STATIC int
xfs_vn_getattr(
struct mnt_idmap *idmap,
@@ -643,6 +660,14 @@ xfs_vn_getattr(
stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
stat->dio_offset_align = bdev_logical_block_size(bdev);
}
+ if (request_mask & STATX_WRITE_ATOMIC) {
+ unsigned int unit_min, unit_max;
+
+ xfs_get_atomic_write_attr(ip, &unit_min,
+ &unit_max);
+ generic_fill_statx_atomic_writes(stat,
+ unit_min, unit_max);
+ }
fallthrough;
default:
stat->blksize = xfs_stat_blksize(ip);
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v8 6/7] xfs: Validate atomic writes
2024-10-15 9:01 [PATCH v8 0/7] block atomic writes for xfs John Garry
` (4 preceding siblings ...)
2024-10-15 9:01 ` [PATCH v8 5/7] xfs: Support atomic write for statx John Garry
@ 2024-10-15 9:01 ` John Garry
2024-10-15 12:16 ` Christoph Hellwig
2024-10-15 9:01 ` [PATCH v8 7/7] xfs: Support setting FMODE_CAN_ATOMIC_WRITE John Garry
6 siblings, 1 reply; 16+ messages in thread
From: John Garry @ 2024-10-15 9:01 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Validate that an atomic write adheres to length/offset rules. Currently
we can only write a single FS block.
For an IOCB with IOCB_ATOMIC set to get as far as xfs_file_write_iter(),
FMODE_CAN_ATOMIC_WRITE will need to be set for the file; for this,
ATOMICWRITES flags would also need to be set for the inode.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
fs/xfs/xfs_file.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 412b1d71b52b..3a0a35e7826a 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -822,6 +822,14 @@ xfs_file_write_iter(
if (IS_DAX(inode))
return xfs_file_dax_write(iocb, from);
+ if (iocb->ki_flags & IOCB_ATOMIC) {
+ if (ocount != ip->i_mount->m_sb.sb_blocksize)
+ return -EINVAL;
+ ret = generic_atomic_write_valid(iocb, from);
+ if (ret)
+ return ret;
+ }
+
if (iocb->ki_flags & IOCB_DIRECT) {
/*
* Allow a directio write to fall back to a buffered
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v8 7/7] xfs: Support setting FMODE_CAN_ATOMIC_WRITE
2024-10-15 9:01 [PATCH v8 0/7] block atomic writes for xfs John Garry
` (5 preceding siblings ...)
2024-10-15 9:01 ` [PATCH v8 6/7] xfs: Validate atomic writes John Garry
@ 2024-10-15 9:01 ` John Garry
2024-10-15 12:16 ` Christoph Hellwig
6 siblings, 1 reply; 16+ messages in thread
From: John Garry @ 2024-10-15 9:01 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Set FMODE_CAN_ATOMIC_WRITE flag if we can atomic write for that inode.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
fs/xfs/xfs_file.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 3a0a35e7826a..e4a3a9882b0a 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1217,6 +1217,8 @@ xfs_file_open(
if (xfs_is_shutdown(XFS_M(inode->i_sb)))
return -EIO;
file->f_mode |= FMODE_NOWAIT | FMODE_CAN_ODIRECT;
+ if (xfs_inode_can_atomicwrite(XFS_I(inode)))
+ file->f_mode |= FMODE_CAN_ATOMIC_WRITE;
return generic_file_open(inode, file);
}
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v8 4/7] fs: iomap: Atomic write support
2024-10-15 9:01 ` [PATCH v8 4/7] fs: iomap: Atomic write support John Garry
@ 2024-10-15 12:12 ` Christoph Hellwig
2024-10-15 12:24 ` John Garry
0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2024-10-15 12:12 UTC (permalink / raw)
To: John Garry
Cc: axboe, brauner, djwong, viro, jack, dchinner, hch, cem,
linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin
On Tue, Oct 15, 2024 at 09:01:39AM +0000, John Garry wrote:
> Support direct I/O atomic writes by producing a single bio with REQ_ATOMIC
> flag set.
>
> Initially FSes (XFS) should only support writing a single FS block
> atomically.
>
> As with any atomic write, we should produce a single bio which covers the
> complete write length.
>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
> .../filesystems/iomap/operations.rst | 11 ++++++
> fs/iomap/direct-io.c | 38 +++++++++++++++++--
> fs/iomap/trace.h | 3 +-
> include/linux/iomap.h | 1 +
> 4 files changed, 48 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
> index 8e6c721d2330..fb95e99ca1a0 100644
> --- a/Documentation/filesystems/iomap/operations.rst
> +++ b/Documentation/filesystems/iomap/operations.rst
> @@ -513,6 +513,17 @@ IOMAP_WRITE`` with any combination of the following enhancements:
> if the mapping is unwritten and the filesystem cannot handle zeroing
> the unaligned regions without exposing stale contents.
>
> + * ``IOMAP_ATOMIC``: This write is being issued with torn-write
> + protection. Only a single bio can be created for the write, and the
> + write must not be split into multiple I/O requests, i.e. flag
> + REQ_ATOMIC must be set.
> + The file range to write must be aligned to satisfy the requirements
> + of both the filesystem and the underlying block device's atomic
> + commit capabilities.
> + If filesystem metadata updates are required (e.g. unwritten extent
> + conversion or copy on write), all updates for the entire file range
> + must be committed atomically as well.
> +
> Callers commonly hold ``i_rwsem`` in shared or exclusive mode before
> calling this function.
>
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index f637aa0706a3..c968a0e2a60b 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -271,7 +271,7 @@ static int iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
> * clearing the WRITE_THROUGH flag in the dio request.
> */
> static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
> - const struct iomap *iomap, bool use_fua)
> + const struct iomap *iomap, bool use_fua, bool atomic)
> {
> blk_opf_t opflags = REQ_SYNC | REQ_IDLE;
>
> @@ -283,6 +283,8 @@ static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
> opflags |= REQ_FUA;
> else
> dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
> + if (atomic)
> + opflags |= REQ_ATOMIC;
>
> return opflags;
> }
> @@ -293,7 +295,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> const struct iomap *iomap = &iter->iomap;
> struct inode *inode = iter->inode;
> unsigned int fs_block_size = i_blocksize(inode), pad;
> - loff_t length = iomap_length(iter);
> + const loff_t length = iomap_length(iter);
> + bool atomic = iter->flags & IOMAP_ATOMIC;
> loff_t pos = iter->pos;
> blk_opf_t bio_opf;
> struct bio *bio;
> @@ -303,6 +306,9 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> size_t copied = 0;
> size_t orig_count;
>
> + if (atomic && (length != fs_block_size))
Nit: no need for the inner braces here.
> + if (atomic && n != length) {
> + /*
> + * This bio should have covered the complete length,
> + * which it doesn't, so error. We may need to zero out
> + * the tail (complete FS block), similar to when
> + * bio_iov_iter_get_pages() returns an error, above.
> + */
> + ret = -EINVAL;
Do we want a WARN_ON_ONCE here because this is a condition that should be
impossible to hit?
Otherwise looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v8 5/7] xfs: Support atomic write for statx
2024-10-15 9:01 ` [PATCH v8 5/7] xfs: Support atomic write for statx John Garry
@ 2024-10-15 12:15 ` Christoph Hellwig
2024-10-15 12:22 ` John Garry
0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2024-10-15 12:15 UTC (permalink / raw)
To: John Garry
Cc: axboe, brauner, djwong, viro, jack, dchinner, hch, cem,
linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin
On Tue, Oct 15, 2024 at 09:01:40AM +0000, John Garry wrote:
> Support providing info on atomic write unit min and max for an inode.
>
> For simplicity, currently we limit the min at the FS block size. As for
> max, we limit also at FS block size, as there is no current method to
> guarantee extent alignment or granularity for regular files.
>
> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
> fs/xfs/xfs_buf.c | 7 +++++++
> fs/xfs/xfs_buf.h | 3 +++
> fs/xfs/xfs_inode.h | 15 +++++++++++++++
> fs/xfs/xfs_iops.c | 25 +++++++++++++++++++++++++
> 4 files changed, 50 insertions(+)
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index aa4dbda7b536..e279e5e139ff 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -2115,6 +2115,13 @@ xfs_alloc_buftarg(
> btp->bt_daxdev = fs_dax_get_by_bdev(btp->bt_bdev, &btp->bt_dax_part_off,
> mp, ops);
>
> + if (bdev_can_atomic_write(btp->bt_bdev)) {
> + struct request_queue *q = bdev_get_queue(btp->bt_bdev);
> +
> + btp->bt_bdev_awu_min = queue_atomic_write_unit_min_bytes(q);
> + btp->bt_bdev_awu_max = queue_atomic_write_unit_max_bytes(q);
Consumers of the block layer should never see request_queue. While there
is a few leftovers still I've cleaned most of this up. Please add
bdev_atomic_write_unit_min_bytes and bdev_atomic_write_unit_max_bytes
helpers for use by file systems and stacking drivers, similar to the
other queue limits.
> + /* Atomic write unit values */
> + unsigned int bt_bdev_awu_min, bt_bdev_awu_max;
Nit: While having two struct members declare on the same line using the
same type specification is perfectly valid C, it looks odd and we avoid
it in XFS (and most of the kernel). Please split this into two lines.
> + struct xfs_mount *mp = ip->i_mount;
> + struct xfs_sb *sbp = &mp->m_sb;
> +
> + if (!xfs_inode_can_atomicwrite(ip)) {
> + *unit_min = *unit_max = 0;
> + return;
> + }
> +
> + *unit_min = *unit_max = sbp->sb_blocksize;
Nit: I'd do with the single use sbp local variable here.
> +}
> +
> STATIC int
> xfs_vn_getattr(
> struct mnt_idmap *idmap,
> @@ -643,6 +660,14 @@ xfs_vn_getattr(
> stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
> stat->dio_offset_align = bdev_logical_block_size(bdev);
> }
> + if (request_mask & STATX_WRITE_ATOMIC) {
> + unsigned int unit_min, unit_max;
Nit: XFS (unlike the rest of the kernel) uses tab alignments for
variables.
Otherwise this looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v8 6/7] xfs: Validate atomic writes
2024-10-15 9:01 ` [PATCH v8 6/7] xfs: Validate atomic writes John Garry
@ 2024-10-15 12:16 ` Christoph Hellwig
2024-10-15 12:25 ` John Garry
0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2024-10-15 12:16 UTC (permalink / raw)
To: John Garry
Cc: axboe, brauner, djwong, viro, jack, dchinner, hch, cem,
linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin
On Tue, Oct 15, 2024 at 09:01:41AM +0000, John Garry wrote:
> Validate that an atomic write adheres to length/offset rules. Currently
> we can only write a single FS block.
> + if (iocb->ki_flags & IOCB_ATOMIC) {
> + if (ocount != ip->i_mount->m_sb.sb_blocksize)
> + return -EINVAL;
Maybe throw in a comment here why we are currently limited to atomic
writes of exactly the file system block size and don't allow smaller
values.
Otherwise this looks good to me.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v8 7/7] xfs: Support setting FMODE_CAN_ATOMIC_WRITE
2024-10-15 9:01 ` [PATCH v8 7/7] xfs: Support setting FMODE_CAN_ATOMIC_WRITE John Garry
@ 2024-10-15 12:16 ` Christoph Hellwig
0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2024-10-15 12:16 UTC (permalink / raw)
To: John Garry
Cc: axboe, brauner, djwong, viro, jack, dchinner, hch, cem,
linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v8 5/7] xfs: Support atomic write for statx
2024-10-15 12:15 ` Christoph Hellwig
@ 2024-10-15 12:22 ` John Garry
2024-10-15 12:33 ` Christoph Hellwig
0 siblings, 1 reply; 16+ messages in thread
From: John Garry @ 2024-10-15 12:22 UTC (permalink / raw)
To: Christoph Hellwig
Cc: axboe, brauner, djwong, viro, jack, dchinner, cem, linux-block,
linux-kernel, linux-xfs, linux-fsdevel, hare, martin.petersen,
catherine.hoang, mcgrof, ritesh.list, ojaswin
On 15/10/2024 13:15, Christoph Hellwig wrote:
> On Tue, Oct 15, 2024 at 09:01:40AM +0000, John Garry wrote:
>> Support providing info on atomic write unit min and max for an inode.
>>
>> For simplicity, currently we limit the min at the FS block size. As for
>> max, we limit also at FS block size, as there is no current method to
>> guarantee extent alignment or granularity for regular files.
>>
>> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
>> Signed-off-by: John Garry <john.g.garry@oracle.com>
>> ---
>> fs/xfs/xfs_buf.c | 7 +++++++
>> fs/xfs/xfs_buf.h | 3 +++
>> fs/xfs/xfs_inode.h | 15 +++++++++++++++
>> fs/xfs/xfs_iops.c | 25 +++++++++++++++++++++++++
>> 4 files changed, 50 insertions(+)
>>
>> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
>> index aa4dbda7b536..e279e5e139ff 100644
>> --- a/fs/xfs/xfs_buf.c
>> +++ b/fs/xfs/xfs_buf.c
>> @@ -2115,6 +2115,13 @@ xfs_alloc_buftarg(
>> btp->bt_daxdev = fs_dax_get_by_bdev(btp->bt_bdev, &btp->bt_dax_part_off,
>> mp, ops);
>>
>> + if (bdev_can_atomic_write(btp->bt_bdev)) {
>> + struct request_queue *q = bdev_get_queue(btp->bt_bdev);
>> +
>> + btp->bt_bdev_awu_min = queue_atomic_write_unit_min_bytes(q);
>> + btp->bt_bdev_awu_max = queue_atomic_write_unit_max_bytes(q);
>
> Consumers of the block layer should never see request_queue. While there
> is a few leftovers still I've cleaned most of this up. Please add
> bdev_atomic_write_unit_min_bytes and bdev_atomic_write_unit_max_bytes
> helpers for use by file systems and stacking drivers, similar to the
> other queue limits.
ok, fine
>
>> + /* Atomic write unit values */
>> + unsigned int bt_bdev_awu_min, bt_bdev_awu_max;
>
> Nit: While having two struct members declare on the same line using the
> same type specification is perfectly valid C, it looks odd and we avoid
> it in XFS (and most of the kernel). Please split this into two lines.
sure
>
>> + struct xfs_mount *mp = ip->i_mount;
>> + struct xfs_sb *sbp = &mp->m_sb;
>> +
>> + if (!xfs_inode_can_atomicwrite(ip)) {
>> + *unit_min = *unit_max = 0;
>> + return;
>> + }
>> +
>> + *unit_min = *unit_max = sbp->sb_blocksize;
>
> Nit: I'd do with the single use sbp local variable here.
I think that you mean do without.
>
>> +}
>> +
>> STATIC int
>> xfs_vn_getattr(
>> struct mnt_idmap *idmap,
>> @@ -643,6 +660,14 @@ xfs_vn_getattr(
>> stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
>> stat->dio_offset_align = bdev_logical_block_size(bdev);
>> }
>> + if (request_mask & STATX_WRITE_ATOMIC) {
>> + unsigned int unit_min, unit_max;
>
> Nit: XFS (unlike the rest of the kernel) uses tab alignments for
> variables.
ok
>
> Otherwise this looks good:
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
>
cheers
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v8 4/7] fs: iomap: Atomic write support
2024-10-15 12:12 ` Christoph Hellwig
@ 2024-10-15 12:24 ` John Garry
0 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2024-10-15 12:24 UTC (permalink / raw)
To: Christoph Hellwig
Cc: axboe, brauner, djwong, viro, jack, dchinner, cem, linux-block,
linux-kernel, linux-xfs, linux-fsdevel, hare, martin.petersen,
catherine.hoang, mcgrof, ritesh.list, ojaswin
On 15/10/2024 13:12, Christoph Hellwig wrote:
> On Tue, Oct 15, 2024 at 09:01:39AM +0000, John Garry wrote:
>> Support direct I/O atomic writes by producing a single bio with REQ_ATOMIC
>> flag set.
>>
>> Initially FSes (XFS) should only support writing a single FS block
>> atomically.
>>
>> As with any atomic write, we should produce a single bio which covers the
>> complete write length.
>>
>> Signed-off-by: John Garry <john.g.garry@oracle.com>
>> ---
>> .../filesystems/iomap/operations.rst | 11 ++++++
>> fs/iomap/direct-io.c | 38 +++++++++++++++++--
>> fs/iomap/trace.h | 3 +-
>> include/linux/iomap.h | 1 +
>> 4 files changed, 48 insertions(+), 5 deletions(-)
>>
>> diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
>> index 8e6c721d2330..fb95e99ca1a0 100644
>> --- a/Documentation/filesystems/iomap/operations.rst
>> +++ b/Documentation/filesystems/iomap/operations.rst
>> @@ -513,6 +513,17 @@ IOMAP_WRITE`` with any combination of the following enhancements:
>> if the mapping is unwritten and the filesystem cannot handle zeroing
>> the unaligned regions without exposing stale contents.
>>
>> + * ``IOMAP_ATOMIC``: This write is being issued with torn-write
>> + protection. Only a single bio can be created for the write, and the
>> + write must not be split into multiple I/O requests, i.e. flag
>> + REQ_ATOMIC must be set.
>> + The file range to write must be aligned to satisfy the requirements
>> + of both the filesystem and the underlying block device's atomic
>> + commit capabilities.
>> + If filesystem metadata updates are required (e.g. unwritten extent
>> + conversion or copy on write), all updates for the entire file range
>> + must be committed atomically as well.
>> +
>> Callers commonly hold ``i_rwsem`` in shared or exclusive mode before
>> calling this function.
>>
>> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
>> index f637aa0706a3..c968a0e2a60b 100644
>> --- a/fs/iomap/direct-io.c
>> +++ b/fs/iomap/direct-io.c
>> @@ -271,7 +271,7 @@ static int iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
>> * clearing the WRITE_THROUGH flag in the dio request.
>> */
>> static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
>> - const struct iomap *iomap, bool use_fua)
>> + const struct iomap *iomap, bool use_fua, bool atomic)
>> {
>> blk_opf_t opflags = REQ_SYNC | REQ_IDLE;
>>
>> @@ -283,6 +283,8 @@ static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
>> opflags |= REQ_FUA;
>> else
>> dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
>> + if (atomic)
>> + opflags |= REQ_ATOMIC;
>>
>> return opflags;
>> }
>> @@ -293,7 +295,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
>> const struct iomap *iomap = &iter->iomap;
>> struct inode *inode = iter->inode;
>> unsigned int fs_block_size = i_blocksize(inode), pad;
>> - loff_t length = iomap_length(iter);
>> + const loff_t length = iomap_length(iter);
>> + bool atomic = iter->flags & IOMAP_ATOMIC;
>> loff_t pos = iter->pos;
>> blk_opf_t bio_opf;
>> struct bio *bio;
>> @@ -303,6 +306,9 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
>> size_t copied = 0;
>> size_t orig_count;
>>
>> + if (atomic && (length != fs_block_size))
>
> Nit: no need for the inner braces here.
ok
>
>> + if (atomic && n != length) {
>> + /*
>> + * This bio should have covered the complete length,
>> + * which it doesn't, so error. We may need to zero out
>> + * the tail (complete FS block), similar to when
>> + * bio_iov_iter_get_pages() returns an error, above.
>> + */
>> + ret = -EINVAL;
>
> Do we want a WARN_ON_ONCE here because this is a condition that should be
> impossible to hit?
I have no objection - as you said, it should not be hit. But it would be
nice to see why we are getting the -EINVAL if it were hit (so I can add it)
>
> Otherwise looks good:
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
cheers
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v8 6/7] xfs: Validate atomic writes
2024-10-15 12:16 ` Christoph Hellwig
@ 2024-10-15 12:25 ` John Garry
0 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2024-10-15 12:25 UTC (permalink / raw)
To: Christoph Hellwig
Cc: axboe, brauner, djwong, viro, jack, dchinner, cem, linux-block,
linux-kernel, linux-xfs, linux-fsdevel, hare, martin.petersen,
catherine.hoang, mcgrof, ritesh.list, ojaswin
On 15/10/2024 13:16, Christoph Hellwig wrote:
> On Tue, Oct 15, 2024 at 09:01:41AM +0000, John Garry wrote:
>> Validate that an atomic write adheres to length/offset rules. Currently
>> we can only write a single FS block.
>> + if (iocb->ki_flags & IOCB_ATOMIC) {
>> + if (ocount != ip->i_mount->m_sb.sb_blocksize)
>> + return -EINVAL;
> Maybe throw in a comment here why we are currently limited to atomic
> writes of exactly the file system block size and don't allow smaller
> values.
>
ok
> Otherwise this looks good to me.
cheers
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v8 5/7] xfs: Support atomic write for statx
2024-10-15 12:22 ` John Garry
@ 2024-10-15 12:33 ` Christoph Hellwig
0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2024-10-15 12:33 UTC (permalink / raw)
To: John Garry
Cc: Christoph Hellwig, axboe, brauner, djwong, viro, jack, dchinner,
cem, linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin
On Tue, Oct 15, 2024 at 01:22:27PM +0100, John Garry wrote:
>> Nit: I'd do with the single use sbp local variable here.
>
> I think that you mean do without.
Yes, sorry.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2024-10-15 12:33 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-15 9:01 [PATCH v8 0/7] block atomic writes for xfs John Garry
2024-10-15 9:01 ` [PATCH v8 1/7] block/fs: Pass an iocb to generic_atomic_write_valid() John Garry
2024-10-15 9:01 ` [PATCH v8 2/7] fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid() John Garry
2024-10-15 9:01 ` [PATCH v8 3/7] fs: Export generic_atomic_write_valid() John Garry
2024-10-15 9:01 ` [PATCH v8 4/7] fs: iomap: Atomic write support John Garry
2024-10-15 12:12 ` Christoph Hellwig
2024-10-15 12:24 ` John Garry
2024-10-15 9:01 ` [PATCH v8 5/7] xfs: Support atomic write for statx John Garry
2024-10-15 12:15 ` Christoph Hellwig
2024-10-15 12:22 ` John Garry
2024-10-15 12:33 ` Christoph Hellwig
2024-10-15 9:01 ` [PATCH v8 6/7] xfs: Validate atomic writes John Garry
2024-10-15 12:16 ` Christoph Hellwig
2024-10-15 12:25 ` John Garry
2024-10-15 9:01 ` [PATCH v8 7/7] xfs: Support setting FMODE_CAN_ATOMIC_WRITE John Garry
2024-10-15 12:16 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).