* [PATCH v7 0/8] block atomic writes for xfs
@ 2024-10-04 9:22 John Garry
2024-10-04 9:22 ` [PATCH v7 1/8] block/fs: Pass an iocb to generic_atomic_write_valid() John Garry
` (7 more replies)
0 siblings, 8 replies; 16+ messages in thread
From: John Garry @ 2024-10-04 9:22 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
This series expands atomic write support to filesystems, specifically
XFS.
Flag FS_XFLAG_ATOMICWRITES is added as an enabling flag for atomic writes.
XFS can be formatted for atomic writes as follows:
mkfs.xfs -i atomicwrites=1 -b size=16384 /dev/sda
Support can be enabled through xfs_io command:
$xfs_io -c "chattr +W" filename
$xfs_io -c "lsattr -v" filename
[atomic-writes] filename
$xfs_io -c statx filename
...
stat.stx_atomic_write_unit_min = 16384
stat.stx_atomic_write_unit_max = 16384
stat.stx_atomic_write_segments_max = 1
...
Dependence on forcealign is gone for the moment. This means that we can
only write a single FS block atomically. Since we can now have FS block
size > PAGE_SIZE for XFS, we can write atomically write 4K+ blocks on
x86.
Using the large FS block size has downsides, so we still want the
forcealign feature.
Baseline is v6.12-rc1
Basic xfsprogs support at:
https://github.com/johnpgarry/xfsprogs-dev/tree/atomic-writes-v7
Patches for this series can be found at:
https://github.com/johnpgarry/linux/tree/atomic-writes-v6.12-fs-v7
Changes since v6:
- Add iomap documentation update (Darrick)
- Drop reflink restriction (Darrick, Christoph)
- Catch XFS buffered IO fallback (Darrick)
- Check IOCB_DIRECT in generic_atomic_write_valid()
- Tweaks to coding style (Darrick)
- Add RB tags from Darrick and Christoph (thanks!)
Changes since v5:
- Drop forcealign dependency
- Can support atomically writing a single FS block
- XFS_DIFLAG2_ATOMICWRITES is inherited from parent directories
- Add RB tags from Darrick (thanks!)
John Garry (8):
block/fs: Pass an iocb to generic_atomic_write_valid()
fs: Export generic_atomic_write_valid()
fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid()
fs: iomap: Atomic write support
xfs: Support FS_XFLAG_ATOMICWRITES
xfs: Support atomic write for statx
xfs: Validate atomic writes
xfs: Support setting FMODE_CAN_ATOMIC_WRITE
.../filesystems/iomap/operations.rst | 11 ++++++
block/fops.c | 22 ++++++-----
fs/iomap/direct-io.c | 38 +++++++++++++++++--
fs/iomap/trace.h | 3 +-
fs/read_write.c | 16 +++++---
fs/xfs/libxfs/xfs_format.h | 11 +++++-
fs/xfs/libxfs/xfs_inode_buf.c | 22 +++++++++++
fs/xfs/libxfs/xfs_inode_util.c | 6 +++
fs/xfs/libxfs/xfs_sb.c | 2 +
fs/xfs/xfs_buf.c | 7 ++++
fs/xfs/xfs_buf.h | 3 ++
fs/xfs/xfs_file.c | 10 +++++
fs/xfs/xfs_inode.h | 22 +++++++++++
fs/xfs/xfs_ioctl.c | 27 +++++++++++++
fs/xfs/xfs_iops.c | 25 ++++++++++++
fs/xfs/xfs_mount.h | 2 +
fs/xfs/xfs_super.c | 4 ++
include/linux/fs.h | 2 +-
include/linux/iomap.h | 1 +
include/uapi/linux/fs.h | 1 +
20 files changed, 211 insertions(+), 24 deletions(-)
--
2.31.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v7 1/8] block/fs: Pass an iocb to generic_atomic_write_valid()
2024-10-04 9:22 [PATCH v7 0/8] block atomic writes for xfs John Garry
@ 2024-10-04 9:22 ` John Garry
2024-10-04 9:22 ` [PATCH v7 2/8] fs: Export generic_atomic_write_valid() John Garry
` (6 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2024-10-04 9:22 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Darrick and Hannes both thought it better that generic_atomic_write_valid()
should be passed a struct iocb, and not just the member of that struct
which is referenced; see [0] and [1].
I think that makes a more generic and clean API, so make that change.
[0] https://lore.kernel.org/linux-block/680ce641-729b-4150-b875-531a98657682@suse.de/
[1] https://lore.kernel.org/linux-xfs/20240620212401.GA3058325@frogsfrogsfrogs/
Suggested-by: "Darrick J. Wong" <djwong@kernel.org>
Suggested-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
block/fops.c | 8 ++++----
fs/read_write.c | 4 ++--
include/linux/fs.h | 2 +-
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/block/fops.c b/block/fops.c
index e696ae53bf1e..968b47b615c4 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -35,13 +35,13 @@ static blk_opf_t dio_bio_write_op(struct kiocb *iocb)
return opf;
}
-static bool blkdev_dio_invalid(struct block_device *bdev, loff_t pos,
+static bool blkdev_dio_invalid(struct block_device *bdev, struct kiocb *iocb,
struct iov_iter *iter, bool is_atomic)
{
- if (is_atomic && !generic_atomic_write_valid(iter, pos))
+ if (is_atomic && !generic_atomic_write_valid(iocb, iter))
return true;
- return pos & (bdev_logical_block_size(bdev) - 1) ||
+ return iocb->ki_pos & (bdev_logical_block_size(bdev) - 1) ||
!bdev_iter_is_aligned(bdev, iter);
}
@@ -374,7 +374,7 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
if (!iov_iter_count(iter))
return 0;
- if (blkdev_dio_invalid(bdev, iocb->ki_pos, iter, is_atomic))
+ if (blkdev_dio_invalid(bdev, iocb, iter, is_atomic))
return -EINVAL;
nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS + 1);
diff --git a/fs/read_write.c b/fs/read_write.c
index 64dc24afdb3a..2c3263530828 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1830,7 +1830,7 @@ int generic_file_rw_checks(struct file *file_in, struct file *file_out)
return 0;
}
-bool generic_atomic_write_valid(struct iov_iter *iter, loff_t pos)
+bool generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter)
{
size_t len = iov_iter_count(iter);
@@ -1840,7 +1840,7 @@ bool generic_atomic_write_valid(struct iov_iter *iter, loff_t pos)
if (!is_power_of_2(len))
return false;
- if (!IS_ALIGNED(pos, len))
+ if (!IS_ALIGNED(iocb->ki_pos, len))
return false;
return true;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e3c603d01337..fbfa032d1d90 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3721,6 +3721,6 @@ static inline bool vfs_empty_path(int dfd, const char __user *path)
return !c;
}
-bool generic_atomic_write_valid(struct iov_iter *iter, loff_t pos);
+bool generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter);
#endif /* _LINUX_FS_H */
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v7 2/8] fs: Export generic_atomic_write_valid()
2024-10-04 9:22 [PATCH v7 0/8] block atomic writes for xfs John Garry
2024-10-04 9:22 ` [PATCH v7 1/8] block/fs: Pass an iocb to generic_atomic_write_valid() John Garry
@ 2024-10-04 9:22 ` John Garry
2024-10-04 9:22 ` [PATCH v7 3/8] fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid() John Garry
` (5 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2024-10-04 9:22 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
The XFS code will need this.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
fs/read_write.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/read_write.c b/fs/read_write.c
index 2c3263530828..32b476bf9be0 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1845,3 +1845,4 @@ bool generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter)
return true;
}
+EXPORT_SYMBOL_GPL(generic_atomic_write_valid);
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v7 3/8] fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid()
2024-10-04 9:22 [PATCH v7 0/8] block atomic writes for xfs John Garry
2024-10-04 9:22 ` [PATCH v7 1/8] block/fs: Pass an iocb to generic_atomic_write_valid() John Garry
2024-10-04 9:22 ` [PATCH v7 2/8] fs: Export generic_atomic_write_valid() John Garry
@ 2024-10-04 9:22 ` John Garry
2024-10-04 12:34 ` Christoph Hellwig
2024-10-04 9:22 ` [PATCH v7 4/8] fs: iomap: Atomic write support John Garry
` (4 subsequent siblings)
7 siblings, 1 reply; 16+ messages in thread
From: John Garry @ 2024-10-04 9:22 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Currently FMODE_CAN_ATOMIC_WRITE is set if the bdev can atomic write and
the file is open for direct IO. This does not work if the file is not
opened for direct IO, yet fcntl(O_DIRECT) is used on the fd later.
Change to check for direct IO on a per-IO basis in
generic_atomic_write_valid(). Since we want to report -EOPNOTSUPP for
non-direct IO for an atomic write, change to return an error code.
Relocate the block fops atomic write checks to the common write path, as to
catch non-direct IO.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
block/fops.c | 18 ++++++++++--------
fs/read_write.c | 13 ++++++++-----
include/linux/fs.h | 2 +-
3 files changed, 19 insertions(+), 14 deletions(-)
diff --git a/block/fops.c b/block/fops.c
index 968b47b615c4..2d01c9007681 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -36,11 +36,8 @@ static blk_opf_t dio_bio_write_op(struct kiocb *iocb)
}
static bool blkdev_dio_invalid(struct block_device *bdev, struct kiocb *iocb,
- struct iov_iter *iter, bool is_atomic)
+ struct iov_iter *iter)
{
- if (is_atomic && !generic_atomic_write_valid(iocb, iter))
- return true;
-
return iocb->ki_pos & (bdev_logical_block_size(bdev) - 1) ||
!bdev_iter_is_aligned(bdev, iter);
}
@@ -368,13 +365,12 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
{
struct block_device *bdev = I_BDEV(iocb->ki_filp->f_mapping->host);
- bool is_atomic = iocb->ki_flags & IOCB_ATOMIC;
unsigned int nr_pages;
if (!iov_iter_count(iter))
return 0;
- if (blkdev_dio_invalid(bdev, iocb, iter, is_atomic))
+ if (blkdev_dio_invalid(bdev, iocb, iter))
return -EINVAL;
nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS + 1);
@@ -383,7 +379,7 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
return __blkdev_direct_IO_simple(iocb, iter, bdev,
nr_pages);
return __blkdev_direct_IO_async(iocb, iter, bdev, nr_pages);
- } else if (is_atomic) {
+ } else if (iocb->ki_flags & IOCB_ATOMIC) {
return -EINVAL;
}
return __blkdev_direct_IO(iocb, iter, bdev, bio_max_segs(nr_pages));
@@ -625,7 +621,7 @@ static int blkdev_open(struct inode *inode, struct file *filp)
if (!bdev)
return -ENXIO;
- if (bdev_can_atomic_write(bdev) && filp->f_flags & O_DIRECT)
+ if (bdev_can_atomic_write(bdev))
filp->f_mode |= FMODE_CAN_ATOMIC_WRITE;
ret = bdev_open(bdev, mode, filp->private_data, NULL, filp);
@@ -700,6 +696,12 @@ static ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT)
return -EOPNOTSUPP;
+ if (iocb->ki_flags & IOCB_ATOMIC) {
+ ret = generic_atomic_write_valid(iocb, from);
+ if (ret)
+ return ret;
+ }
+
size -= iocb->ki_pos;
if (iov_iter_count(from) > size) {
shorted = iov_iter_count(from) - size;
diff --git a/fs/read_write.c b/fs/read_write.c
index 32b476bf9be0..3e5dad12a5b4 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1830,19 +1830,22 @@ int generic_file_rw_checks(struct file *file_in, struct file *file_out)
return 0;
}
-bool generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter)
+int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter)
{
size_t len = iov_iter_count(iter);
if (!iter_is_ubuf(iter))
- return false;
+ return -EINVAL;
if (!is_power_of_2(len))
- return false;
+ return -EINVAL;
if (!IS_ALIGNED(iocb->ki_pos, len))
- return false;
+ return -EINVAL;
- return true;
+ if (!(iocb->ki_flags & IOCB_DIRECT))
+ return -EOPNOTSUPP;
+
+ return 0;
}
EXPORT_SYMBOL_GPL(generic_atomic_write_valid);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fbfa032d1d90..ba47fb283730 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3721,6 +3721,6 @@ static inline bool vfs_empty_path(int dfd, const char __user *path)
return !c;
}
-bool generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter);
+int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter);
#endif /* _LINUX_FS_H */
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v7 4/8] fs: iomap: Atomic write support
2024-10-04 9:22 [PATCH v7 0/8] block atomic writes for xfs John Garry
` (2 preceding siblings ...)
2024-10-04 9:22 ` [PATCH v7 3/8] fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid() John Garry
@ 2024-10-04 9:22 ` John Garry
2024-10-04 9:22 ` [PATCH v7 5/8] xfs: Support FS_XFLAG_ATOMICWRITES John Garry
` (3 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2024-10-04 9:22 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Support direct I/O atomic writes by producing a single bio with REQ_ATOMIC
flag set.
Initially FSes (XFS) should only support writing a single FS block
atomically.
As with any atomic write, we should produce a single bio which covers the
complete write length.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
.../filesystems/iomap/operations.rst | 11 ++++++
fs/iomap/direct-io.c | 38 +++++++++++++++++--
fs/iomap/trace.h | 3 +-
include/linux/iomap.h | 1 +
4 files changed, 48 insertions(+), 5 deletions(-)
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index 8e6c721d2330..fb95e99ca1a0 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -513,6 +513,17 @@ IOMAP_WRITE`` with any combination of the following enhancements:
if the mapping is unwritten and the filesystem cannot handle zeroing
the unaligned regions without exposing stale contents.
+ * ``IOMAP_ATOMIC``: This write is being issued with torn-write
+ protection. Only a single bio can be created for the write, and the
+ write must not be split into multiple I/O requests, i.e. flag
+ REQ_ATOMIC must be set.
+ The file range to write must be aligned to satisfy the requirements
+ of both the filesystem and the underlying block device's atomic
+ commit capabilities.
+ If filesystem metadata updates are required (e.g. unwritten extent
+ conversion or copy on write), all updates for the entire file range
+ must be committed atomically as well.
+
Callers commonly hold ``i_rwsem`` in shared or exclusive mode before
calling this function.
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index f637aa0706a3..c968a0e2a60b 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -271,7 +271,7 @@ static int iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
* clearing the WRITE_THROUGH flag in the dio request.
*/
static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
- const struct iomap *iomap, bool use_fua)
+ const struct iomap *iomap, bool use_fua, bool atomic)
{
blk_opf_t opflags = REQ_SYNC | REQ_IDLE;
@@ -283,6 +283,8 @@ static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
opflags |= REQ_FUA;
else
dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
+ if (atomic)
+ opflags |= REQ_ATOMIC;
return opflags;
}
@@ -293,7 +295,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
const struct iomap *iomap = &iter->iomap;
struct inode *inode = iter->inode;
unsigned int fs_block_size = i_blocksize(inode), pad;
- loff_t length = iomap_length(iter);
+ const loff_t length = iomap_length(iter);
+ bool atomic = iter->flags & IOMAP_ATOMIC;
loff_t pos = iter->pos;
blk_opf_t bio_opf;
struct bio *bio;
@@ -303,6 +306,9 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
size_t copied = 0;
size_t orig_count;
+ if (atomic && (length != fs_block_size))
+ return -EINVAL;
+
if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1) ||
!bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
return -EINVAL;
@@ -382,7 +388,7 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
* can set up the page vector appropriately for a ZONE_APPEND
* operation.
*/
- bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua);
+ bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua, atomic);
nr_pages = bio_iov_vecs_to_alloc(dio->submit.iter, BIO_MAX_VECS);
do {
@@ -415,6 +421,17 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
}
n = bio->bi_iter.bi_size;
+ if (atomic && n != length) {
+ /*
+ * This bio should have covered the complete length,
+ * which it doesn't, so error. We may need to zero out
+ * the tail (complete FS block), similar to when
+ * bio_iov_iter_get_pages() returns an error, above.
+ */
+ ret = -EINVAL;
+ bio_put(bio);
+ goto zero_tail;
+ }
if (dio->flags & IOMAP_DIO_WRITE) {
task_io_account_write(n);
} else {
@@ -598,6 +615,9 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (iocb->ki_flags & IOCB_NOWAIT)
iomi.flags |= IOMAP_NOWAIT;
+ if (iocb->ki_flags & IOCB_ATOMIC)
+ iomi.flags |= IOMAP_ATOMIC;
+
if (iov_iter_rw(iter) == READ) {
/* reads can always complete inline */
dio->flags |= IOMAP_DIO_INLINE_COMP;
@@ -659,7 +679,17 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (ret != -EAGAIN) {
trace_iomap_dio_invalidate_fail(inode, iomi.pos,
iomi.len);
- ret = -ENOTBLK;
+ if (iocb->ki_flags & IOCB_ATOMIC) {
+ /*
+ * folio invalidation failed, maybe
+ * this is transient, unlock and see if
+ * the caller tries again.
+ */
+ ret = -EAGAIN;
+ } else {
+ /* fall back to buffered write */
+ ret = -ENOTBLK;
+ }
}
goto out_free_dio;
}
diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
index 0a991c4ce87d..4118a42cdab0 100644
--- a/fs/iomap/trace.h
+++ b/fs/iomap/trace.h
@@ -98,7 +98,8 @@ DEFINE_RANGE_EVENT(iomap_dio_rw_queued);
{ IOMAP_REPORT, "REPORT" }, \
{ IOMAP_FAULT, "FAULT" }, \
{ IOMAP_DIRECT, "DIRECT" }, \
- { IOMAP_NOWAIT, "NOWAIT" }
+ { IOMAP_NOWAIT, "NOWAIT" }, \
+ { IOMAP_ATOMIC, "ATOMIC" }
#define IOMAP_F_FLAGS_STRINGS \
{ IOMAP_F_NEW, "NEW" }, \
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 4ad12a3c8bae..c7644bdcfca3 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -178,6 +178,7 @@ struct iomap_folio_ops {
#else
#define IOMAP_DAX 0
#endif /* CONFIG_FS_DAX */
+#define IOMAP_ATOMIC (1 << 9)
struct iomap_ops {
/*
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v7 5/8] xfs: Support FS_XFLAG_ATOMICWRITES
2024-10-04 9:22 [PATCH v7 0/8] block atomic writes for xfs John Garry
` (3 preceding siblings ...)
2024-10-04 9:22 ` [PATCH v7 4/8] fs: iomap: Atomic write support John Garry
@ 2024-10-04 9:22 ` John Garry
2024-10-04 12:35 ` Christoph Hellwig
2024-10-04 9:22 ` [PATCH v7 6/8] xfs: Support atomic write for statx John Garry
` (2 subsequent siblings)
7 siblings, 1 reply; 16+ messages in thread
From: John Garry @ 2024-10-04 9:22 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Add initial support for new flag FS_XFLAG_ATOMICWRITES.
This flag is a file attribute that mirrors an ondisk inode flag. Actual
support for untorn file writes (for now) depends on both the iflag and the
underlying storage devices, which we can only really check at statx and
pwritev2() time. This is the same story as FS_XFLAG_DAX, which signals to
the fs that we should try to enable the fsdax IO path on the file (instead
of the regular page cache), but applications have to query STAT_ATTR_DAX to
find out if they really got that IO path.
Current kernel support for atomic writes is based on HW support (for atomic
writes). Since for regular files XFS has no way to specify extent alignment
or granularity, atomic write size is limited to the FS block size.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
fs/xfs/libxfs/xfs_format.h | 11 +++++++++--
fs/xfs/libxfs/xfs_inode_buf.c | 22 ++++++++++++++++++++++
fs/xfs/libxfs/xfs_inode_util.c | 6 ++++++
fs/xfs/libxfs/xfs_sb.c | 2 ++
fs/xfs/xfs_buf.c | 7 +++++++
fs/xfs/xfs_buf.h | 3 +++
fs/xfs/xfs_inode.h | 5 +++++
fs/xfs/xfs_ioctl.c | 27 +++++++++++++++++++++++++++
fs/xfs/xfs_mount.h | 2 ++
fs/xfs/xfs_super.c | 4 ++++
include/uapi/linux/fs.h | 1 +
11 files changed, 88 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index e1bfee0c3b1a..ed5e5442f0d4 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -352,11 +352,15 @@ xfs_sb_has_compat_feature(
#define XFS_SB_FEAT_RO_COMPAT_RMAPBT (1 << 1) /* reverse map btree */
#define XFS_SB_FEAT_RO_COMPAT_REFLINK (1 << 2) /* reflinked files */
#define XFS_SB_FEAT_RO_COMPAT_INOBTCNT (1 << 3) /* inobt block counts */
+#define XFS_SB_FEAT_RO_COMPAT_ATOMICWRITES (1 << 31) /* atomicwrites enabled */
+
#define XFS_SB_FEAT_RO_COMPAT_ALL \
(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
XFS_SB_FEAT_RO_COMPAT_REFLINK| \
- XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
+ XFS_SB_FEAT_RO_COMPAT_INOBTCNT | \
+ XFS_SB_FEAT_RO_COMPAT_ATOMICWRITES)
+
#define XFS_SB_FEAT_RO_COMPAT_UNKNOWN ~XFS_SB_FEAT_RO_COMPAT_ALL
static inline bool
xfs_sb_has_ro_compat_feature(
@@ -1093,16 +1097,19 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
#define XFS_DIFLAG2_COWEXTSIZE_BIT 2 /* copy on write extent size hint */
#define XFS_DIFLAG2_BIGTIME_BIT 3 /* big timestamps */
#define XFS_DIFLAG2_NREXT64_BIT 4 /* large extent counters */
+#define XFS_DIFLAG2_ATOMICWRITES_BIT 5 /* atomic writes permitted */
#define XFS_DIFLAG2_DAX (1 << XFS_DIFLAG2_DAX_BIT)
#define XFS_DIFLAG2_REFLINK (1 << XFS_DIFLAG2_REFLINK_BIT)
#define XFS_DIFLAG2_COWEXTSIZE (1 << XFS_DIFLAG2_COWEXTSIZE_BIT)
#define XFS_DIFLAG2_BIGTIME (1 << XFS_DIFLAG2_BIGTIME_BIT)
#define XFS_DIFLAG2_NREXT64 (1 << XFS_DIFLAG2_NREXT64_BIT)
+#define XFS_DIFLAG2_ATOMICWRITES (1 << XFS_DIFLAG2_ATOMICWRITES_BIT)
#define XFS_DIFLAG2_ANY \
(XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE | \
- XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64)
+ XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64 | \
+ XFS_DIFLAG2_ATOMICWRITES)
static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip)
{
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index 79babeac9d75..7fe94d038e83 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -483,6 +483,22 @@ xfs_dinode_verify_nrext64(
return NULL;
}
+static xfs_failaddr_t
+xfs_inode_validate_atomicwrites(
+ struct xfs_mount *mp,
+ uint16_t mode)
+{
+ /* superblock rocompat feature flag */
+ if (!xfs_has_atomicwrites(mp))
+ return __this_address;
+
+ /* Only regular files and directories */
+ if (!S_ISREG(mode) && !(S_ISDIR(mode)))
+ return __this_address;
+
+ return NULL;
+}
+
xfs_failaddr_t
xfs_dinode_verify(
struct xfs_mount *mp,
@@ -663,6 +679,12 @@ xfs_dinode_verify(
!xfs_has_bigtime(mp))
return __this_address;
+ if (flags2 & XFS_DIFLAG2_ATOMICWRITES) {
+ fa = xfs_inode_validate_atomicwrites(mp, mode);
+ if (fa)
+ return fa;
+ }
+
return NULL;
}
diff --git a/fs/xfs/libxfs/xfs_inode_util.c b/fs/xfs/libxfs/xfs_inode_util.c
index cc38e1c3c3e1..e59e98783bf7 100644
--- a/fs/xfs/libxfs/xfs_inode_util.c
+++ b/fs/xfs/libxfs/xfs_inode_util.c
@@ -80,6 +80,8 @@ xfs_flags2diflags2(
di_flags2 |= XFS_DIFLAG2_DAX;
if (xflags & FS_XFLAG_COWEXTSIZE)
di_flags2 |= XFS_DIFLAG2_COWEXTSIZE;
+ if (xflags & FS_XFLAG_ATOMICWRITES)
+ di_flags2 |= XFS_DIFLAG2_ATOMICWRITES;
return di_flags2;
}
@@ -126,6 +128,8 @@ xfs_ip2xflags(
flags |= FS_XFLAG_DAX;
if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE)
flags |= FS_XFLAG_COWEXTSIZE;
+ if (ip->i_diflags2 & XFS_DIFLAG2_ATOMICWRITES)
+ flags |= FS_XFLAG_ATOMICWRITES;
}
if (xfs_inode_has_attr_fork(ip))
@@ -224,6 +228,8 @@ xfs_inode_inherit_flags2(
}
if (pip->i_diflags2 & XFS_DIFLAG2_DAX)
ip->i_diflags2 |= XFS_DIFLAG2_DAX;
+ if (pip->i_diflags2 & XFS_DIFLAG2_ATOMICWRITES)
+ ip->i_diflags2 |= XFS_DIFLAG2_ATOMICWRITES;
/* Don't let invalid cowextsize hints propagate. */
failaddr = xfs_inode_validate_cowextsize(ip->i_mount, ip->i_cowextsize,
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index d95409f3cba6..dd819561d0a5 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -164,6 +164,8 @@ xfs_sb_version_to_features(
features |= XFS_FEAT_REFLINK;
if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
features |= XFS_FEAT_INOBTCNT;
+ if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_ATOMICWRITES)
+ features |= XFS_FEAT_ATOMICWRITES;
if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_FTYPE)
features |= XFS_FEAT_FTYPE;
if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_SPINODES)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index aa4dbda7b536..e279e5e139ff 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -2115,6 +2115,13 @@ xfs_alloc_buftarg(
btp->bt_daxdev = fs_dax_get_by_bdev(btp->bt_bdev, &btp->bt_dax_part_off,
mp, ops);
+ if (bdev_can_atomic_write(btp->bt_bdev)) {
+ struct request_queue *q = bdev_get_queue(btp->bt_bdev);
+
+ btp->bt_bdev_awu_min = queue_atomic_write_unit_min_bytes(q);
+ btp->bt_bdev_awu_max = queue_atomic_write_unit_max_bytes(q);
+ }
+
/*
* When allocating the buftargs we have not yet read the super block and
* thus don't know the file system sector size yet.
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 209a389f2abc..2be28bd01087 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -124,6 +124,9 @@ struct xfs_buftarg {
struct percpu_counter bt_io_count;
struct ratelimit_state bt_ioerror_rl;
+ /* Atomic write unit values */
+ unsigned int bt_bdev_awu_min, bt_bdev_awu_max;
+
/* built-in cache, if we're not using the perag one */
struct xfs_buf_cache bt_cache[];
};
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 97ed912306fd..1c62ee294a5a 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -327,6 +327,11 @@ static inline bool xfs_inode_has_bigrtalloc(struct xfs_inode *ip)
(XFS_IS_REALTIME_INODE(ip) ? \
(ip)->i_mount->m_rtdev_targp : (ip)->i_mount->m_ddev_targp)
+static inline bool xfs_inode_has_atomicwrites(struct xfs_inode *ip)
+{
+ return ip->i_diflags2 & XFS_DIFLAG2_ATOMICWRITES;
+}
+
/*
* In-core inode flags.
*/
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index a20d426ef021..1c980c863ada 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -469,6 +469,26 @@ xfs_fileattr_get(
return 0;
}
+static int
+xfs_ioctl_setattr_atomicwrites(
+ struct xfs_inode *ip)
+{
+ struct xfs_buftarg *target = xfs_inode_buftarg(ip);
+ struct xfs_mount *mp = ip->i_mount;
+ struct xfs_sb *sbp = &mp->m_sb;
+
+ if (!xfs_has_atomicwrites(mp))
+ return -EINVAL;
+
+ if (target->bt_bdev_awu_min > sbp->sb_blocksize)
+ return -EINVAL;
+
+ if (target->bt_bdev_awu_max < sbp->sb_blocksize)
+ return -EINVAL;
+
+ return 0;
+}
+
static int
xfs_ioctl_setattr_xflags(
struct xfs_trans *tp,
@@ -478,6 +498,7 @@ xfs_ioctl_setattr_xflags(
struct xfs_mount *mp = ip->i_mount;
bool rtflag = (fa->fsx_xflags & FS_XFLAG_REALTIME);
uint64_t i_flags2;
+ int error;
if (rtflag != XFS_IS_REALTIME_INODE(ip)) {
/* Can't change realtime flag if any extents are allocated. */
@@ -512,6 +533,12 @@ xfs_ioctl_setattr_xflags(
if (i_flags2 && !xfs_has_v3inodes(mp))
return -EINVAL;
+ if (fa->fsx_xflags & FS_XFLAG_ATOMICWRITES) {
+ error = xfs_ioctl_setattr_atomicwrites(ip);
+ if (error)
+ return error;
+ }
+
ip->i_diflags = xfs_flags2diflags(ip, fa->fsx_xflags);
ip->i_diflags2 = i_flags2;
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 96496f39f551..6ac6518a2ef3 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -298,6 +298,7 @@ typedef struct xfs_mount {
#define XFS_FEAT_NEEDSREPAIR (1ULL << 25) /* needs xfs_repair */
#define XFS_FEAT_NREXT64 (1ULL << 26) /* large extent counters */
#define XFS_FEAT_EXCHANGE_RANGE (1ULL << 27) /* exchange range */
+#define XFS_FEAT_ATOMICWRITES (1ULL << 28) /* atomic writes support */
/* Mount features */
#define XFS_FEAT_NOATTR2 (1ULL << 48) /* disable attr2 creation */
@@ -384,6 +385,7 @@ __XFS_ADD_V4_FEAT(projid32, PROJID32)
__XFS_HAS_V4_FEAT(v3inodes, V3INODES)
__XFS_HAS_V4_FEAT(crc, CRC)
__XFS_HAS_V4_FEAT(pquotino, PQUOTINO)
+__XFS_HAS_FEAT(atomicwrites, ATOMICWRITES)
/*
* Mount features
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index fbb3a1594c0d..97c1d9493cdb 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1733,6 +1733,10 @@ xfs_fs_fill_super(
mp->m_features &= ~XFS_FEAT_DISCARD;
}
+ if (xfs_has_atomicwrites(mp))
+ xfs_warn(mp,
+ "EXPERIMENTAL atomicwrites feature in use. Use at your own risk!");
+
if (xfs_has_reflink(mp)) {
if (mp->m_sb.sb_rblocks) {
xfs_alert(mp,
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 753971770733..e813217e0fe4 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -158,6 +158,7 @@ struct fsxattr {
#define FS_XFLAG_FILESTREAM 0x00004000 /* use filestream allocator */
#define FS_XFLAG_DAX 0x00008000 /* use DAX for IO */
#define FS_XFLAG_COWEXTSIZE 0x00010000 /* CoW extent size allocator hint */
+#define FS_XFLAG_ATOMICWRITES 0x00020000 /* atomic writes enabled */
#define FS_XFLAG_HASATTR 0x80000000 /* no DIFLAG for this */
/* the read-only stuff doesn't really belong here, but any other place is
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v7 6/8] xfs: Support atomic write for statx
2024-10-04 9:22 [PATCH v7 0/8] block atomic writes for xfs John Garry
` (4 preceding siblings ...)
2024-10-04 9:22 ` [PATCH v7 5/8] xfs: Support FS_XFLAG_ATOMICWRITES John Garry
@ 2024-10-04 9:22 ` John Garry
2024-10-04 9:22 ` [PATCH v7 7/8] xfs: Validate atomic writes John Garry
2024-10-04 9:22 ` [PATCH v7 8/8] xfs: Support setting FMODE_CAN_ATOMIC_WRITE John Garry
7 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2024-10-04 9:22 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Support providing info on atomic write unit min and max for an inode.
For simplicity, currently we limit the min at the FS block size. As for
max, we limit also at FS block size, as there is no current method to
guarantee extent alignment or granularity for regular files.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
fs/xfs/xfs_inode.h | 17 +++++++++++++++++
fs/xfs/xfs_iops.c | 25 +++++++++++++++++++++++++
2 files changed, 42 insertions(+)
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 1c62ee294a5a..1ea73402d592 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -332,6 +332,23 @@ static inline bool xfs_inode_has_atomicwrites(struct xfs_inode *ip)
return ip->i_diflags2 & XFS_DIFLAG2_ATOMICWRITES;
}
+static inline bool
+xfs_inode_can_atomicwrite(
+ struct xfs_inode *ip)
+{
+ struct xfs_mount *mp = ip->i_mount;
+ struct xfs_buftarg *target = xfs_inode_buftarg(ip);
+
+ if (!xfs_inode_has_atomicwrites(ip))
+ return false;
+ if (mp->m_sb.sb_blocksize < target->bt_bdev_awu_min)
+ return false;
+ if (mp->m_sb.sb_blocksize > target->bt_bdev_awu_max)
+ return false;
+
+ return true;
+}
+
/*
* In-core inode flags.
*/
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index ee79cf161312..919fbcb4b72a 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -570,6 +570,23 @@ xfs_stat_blksize(
return max_t(uint32_t, PAGE_SIZE, mp->m_sb.sb_blocksize);
}
+static void
+xfs_get_atomic_write_attr(
+ struct xfs_inode *ip,
+ unsigned int *unit_min,
+ unsigned int *unit_max)
+{
+ struct xfs_mount *mp = ip->i_mount;
+ struct xfs_sb *sbp = &mp->m_sb;
+
+ if (!xfs_inode_can_atomicwrite(ip)) {
+ *unit_min = *unit_max = 0;
+ return;
+ }
+
+ *unit_min = *unit_max = sbp->sb_blocksize;
+}
+
STATIC int
xfs_vn_getattr(
struct mnt_idmap *idmap,
@@ -643,6 +660,14 @@ xfs_vn_getattr(
stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
stat->dio_offset_align = bdev_logical_block_size(bdev);
}
+ if (request_mask & STATX_WRITE_ATOMIC) {
+ unsigned int unit_min, unit_max;
+
+ xfs_get_atomic_write_attr(ip, &unit_min,
+ &unit_max);
+ generic_fill_statx_atomic_writes(stat,
+ unit_min, unit_max);
+ }
fallthrough;
default:
stat->blksize = xfs_stat_blksize(ip);
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v7 7/8] xfs: Validate atomic writes
2024-10-04 9:22 [PATCH v7 0/8] block atomic writes for xfs John Garry
` (5 preceding siblings ...)
2024-10-04 9:22 ` [PATCH v7 6/8] xfs: Support atomic write for statx John Garry
@ 2024-10-04 9:22 ` John Garry
2024-10-04 9:22 ` [PATCH v7 8/8] xfs: Support setting FMODE_CAN_ATOMIC_WRITE John Garry
7 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2024-10-04 9:22 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Validate that an atomic write adheres to length/offset rules. Currently
we can only write a single FS block.
For an IOCB with IOCB_ATOMIC set to get as far as xfs_file_write_iter(),
FMODE_CAN_ATOMIC_WRITE will need to be set for the file; for this,
ATOMICWRITES flags would also need to be set for the inode.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
fs/xfs/xfs_file.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 412b1d71b52b..3a0a35e7826a 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -822,6 +822,14 @@ xfs_file_write_iter(
if (IS_DAX(inode))
return xfs_file_dax_write(iocb, from);
+ if (iocb->ki_flags & IOCB_ATOMIC) {
+ if (ocount != ip->i_mount->m_sb.sb_blocksize)
+ return -EINVAL;
+ ret = generic_atomic_write_valid(iocb, from);
+ if (ret)
+ return ret;
+ }
+
if (iocb->ki_flags & IOCB_DIRECT) {
/*
* Allow a directio write to fall back to a buffered
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v7 8/8] xfs: Support setting FMODE_CAN_ATOMIC_WRITE
2024-10-04 9:22 [PATCH v7 0/8] block atomic writes for xfs John Garry
` (6 preceding siblings ...)
2024-10-04 9:22 ` [PATCH v7 7/8] xfs: Validate atomic writes John Garry
@ 2024-10-04 9:22 ` John Garry
7 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2024-10-04 9:22 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
For when an inode is enabled for atomic writes, set FMODE_CAN_ATOMIC_WRITE
flag.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
fs/xfs/xfs_file.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 3a0a35e7826a..e4a3a9882b0a 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1217,6 +1217,8 @@ xfs_file_open(
if (xfs_is_shutdown(XFS_M(inode->i_sb)))
return -EIO;
file->f_mode |= FMODE_NOWAIT | FMODE_CAN_ODIRECT;
+ if (xfs_inode_can_atomicwrite(XFS_I(inode)))
+ file->f_mode |= FMODE_CAN_ATOMIC_WRITE;
return generic_file_open(inode, file);
}
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v7 3/8] fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid()
2024-10-04 9:22 ` [PATCH v7 3/8] fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid() John Garry
@ 2024-10-04 12:34 ` Christoph Hellwig
2024-10-04 12:45 ` John Garry
0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2024-10-04 12:34 UTC (permalink / raw)
To: John Garry
Cc: axboe, brauner, djwong, viro, jack, dchinner, hch, cem,
linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin
On Fri, Oct 04, 2024 at 09:22:49AM +0000, John Garry wrote:
> Currently FMODE_CAN_ATOMIC_WRITE is set if the bdev can atomic write and
> the file is open for direct IO. This does not work if the file is not
> opened for direct IO, yet fcntl(O_DIRECT) is used on the fd later.
This sound like a bug fix for 6.12-rc?
The fix looks good to me:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v7 5/8] xfs: Support FS_XFLAG_ATOMICWRITES
2024-10-04 9:22 ` [PATCH v7 5/8] xfs: Support FS_XFLAG_ATOMICWRITES John Garry
@ 2024-10-04 12:35 ` Christoph Hellwig
2024-10-04 13:07 ` John Garry
0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2024-10-04 12:35 UTC (permalink / raw)
To: John Garry
Cc: axboe, brauner, djwong, viro, jack, dchinner, hch, cem,
linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin
On Fri, Oct 04, 2024 at 09:22:51AM +0000, John Garry wrote:
> Add initial support for new flag FS_XFLAG_ATOMICWRITES.
>
> This flag is a file attribute that mirrors an ondisk inode flag. Actual
> support for untorn file writes (for now) depends on both the iflag and the
> underlying storage devices, which we can only really check at statx and
> pwritev2() time. This is the same story as FS_XFLAG_DAX, which signals to
> the fs that we should try to enable the fsdax IO path on the file (instead
> of the regular page cache), but applications have to query STAT_ATTR_DAX to
> find out if they really got that IO path.
>
> Current kernel support for atomic writes is based on HW support (for atomic
> writes). Since for regular files XFS has no way to specify extent alignment
> or granularity, atomic write size is limited to the FS block size.
I'm still confused why this flag is needed for the current version
of this patch set. We should always be able to support atomic writes
<= block size if support by the block device.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v7 3/8] fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid()
2024-10-04 12:34 ` Christoph Hellwig
@ 2024-10-04 12:45 ` John Garry
0 siblings, 0 replies; 16+ messages in thread
From: John Garry @ 2024-10-04 12:45 UTC (permalink / raw)
To: Christoph Hellwig
Cc: axboe, brauner, djwong, viro, jack, dchinner, cem, linux-block,
linux-kernel, linux-xfs, linux-fsdevel, hare, martin.petersen,
catherine.hoang, mcgrof, ritesh.list, ojaswin
On 04/10/2024 13:34, Christoph Hellwig wrote:
> On Fri, Oct 04, 2024 at 09:22:49AM +0000, John Garry wrote:
>> Currently FMODE_CAN_ATOMIC_WRITE is set if the bdev can atomic write and
>> the file is open for direct IO. This does not work if the file is not
>> opened for direct IO, yet fcntl(O_DIRECT) is used on the fd later.
>
> This sound like a bug fix for 6.12-rc?
The worse that can happen is that a RWF_ATOMIC would be rejected for
scenario described (with using fcntl(O_DIRECT)).
But you are right, it is a fix really. I would need to add patch 1/8 as
well, as it provides the iocb pointer needed.
I'll mark them both as fixes then.
>
> The fix looks good to me:
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
cheers
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v7 5/8] xfs: Support FS_XFLAG_ATOMICWRITES
2024-10-04 12:35 ` Christoph Hellwig
@ 2024-10-04 13:07 ` John Garry
2024-10-07 5:42 ` Christoph Hellwig
0 siblings, 1 reply; 16+ messages in thread
From: John Garry @ 2024-10-04 13:07 UTC (permalink / raw)
To: Christoph Hellwig
Cc: axboe, brauner, djwong, viro, jack, dchinner, cem, linux-block,
linux-kernel, linux-xfs, linux-fsdevel, hare, martin.petersen,
catherine.hoang, mcgrof, ritesh.list, ojaswin
On 04/10/2024 13:35, Christoph Hellwig wrote:
> On Fri, Oct 04, 2024 at 09:22:51AM +0000, John Garry wrote:
>> Add initial support for new flag FS_XFLAG_ATOMICWRITES.
>>
>> This flag is a file attribute that mirrors an ondisk inode flag. Actual
>> support for untorn file writes (for now) depends on both the iflag and the
>> underlying storage devices, which we can only really check at statx and
>> pwritev2() time. This is the same story as FS_XFLAG_DAX, which signals to
>> the fs that we should try to enable the fsdax IO path on the file (instead
>> of the regular page cache), but applications have to query STAT_ATTR_DAX to
>> find out if they really got that IO path.
>>
>> Current kernel support for atomic writes is based on HW support (for atomic
>> writes). Since for regular files XFS has no way to specify extent alignment
>> or granularity, atomic write size is limited to the FS block size.
>
> I'm still confused why this flag is needed for the current version
> of this patch set.
> We should always be able to support atomic writes
> <= block size if support by the block device.
>
Sure, that is true (about being able to atomically write 1x FS block if
the bdev support it).
But if we are going to add forcealign or similar later, then it would
make sense (to me) to have FS_XFLAG_ATOMICWRITES (and its other flags)
from the beginning. I mean, for example, if FS_XFLAG_FORCEALIGN were
enabled and we want atomic writes, setting FS_XFLAG_ATOMICWRITES would
be rejected if AG count is not aligned with extsize, or extsize is not a
power-of-2, or extsize exceeds bdev limits. So FS_XFLAG_ATOMICWRITES
could have some value there.
As such, it makes sense to have a consistent user experience and require
FS_XFLAG_ATOMICWRITES from the beginning.
Cheers,
John
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v7 5/8] xfs: Support FS_XFLAG_ATOMICWRITES
2024-10-04 13:07 ` John Garry
@ 2024-10-07 5:42 ` Christoph Hellwig
2024-10-13 21:06 ` John Garry
0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2024-10-07 5:42 UTC (permalink / raw)
To: John Garry
Cc: Christoph Hellwig, axboe, brauner, djwong, viro, jack, dchinner,
cem, linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin
On Fri, Oct 04, 2024 at 02:07:05PM +0100, John Garry wrote:
> Sure, that is true (about being able to atomically write 1x FS block if the
> bdev support it).
>
> But if we are going to add forcealign or similar later, then it would make
> sense (to me) to have FS_XFLAG_ATOMICWRITES (and its other flags) from the
> beginning. I mean, for example, if FS_XFLAG_FORCEALIGN were enabled and we
> want atomic writes, setting FS_XFLAG_ATOMICWRITES would be rejected if AG
> count is not aligned with extsize, or extsize is not a power-of-2, or
> extsize exceeds bdev limits. So FS_XFLAG_ATOMICWRITES could have some value
> there.
>
> As such, it makes sense to have a consistent user experience and require
> FS_XFLAG_ATOMICWRITES from the beginning.
Well, even with forcealign we're not going to lose support for atomic
writes <= block size, are we?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v7 5/8] xfs: Support FS_XFLAG_ATOMICWRITES
2024-10-07 5:42 ` Christoph Hellwig
@ 2024-10-13 21:06 ` John Garry
2024-10-16 0:52 ` Darrick J. Wong
0 siblings, 1 reply; 16+ messages in thread
From: John Garry @ 2024-10-13 21:06 UTC (permalink / raw)
To: Christoph Hellwig
Cc: axboe, brauner, djwong, viro, jack, dchinner, cem, linux-block,
linux-kernel, linux-xfs, linux-fsdevel, hare, martin.petersen,
catherine.hoang, mcgrof, ritesh.list, ojaswin
On 07/10/2024 06:42, Christoph Hellwig wrote:
> On Fri, Oct 04, 2024 at 02:07:05PM +0100, John Garry wrote:
>> Sure, that is true (about being able to atomically write 1x FS block if the
>> bdev support it).
>>
>> But if we are going to add forcealign or similar later, then it would make
>> sense (to me) to have FS_XFLAG_ATOMICWRITES (and its other flags) from the
>> beginning. I mean, for example, if FS_XFLAG_FORCEALIGN were enabled and we
>> want atomic writes, setting FS_XFLAG_ATOMICWRITES would be rejected if AG
>> count is not aligned with extsize, or extsize is not a power-of-2, or
>> extsize exceeds bdev limits. So FS_XFLAG_ATOMICWRITES could have some value
>> there.
>>
>> As such, it makes sense to have a consistent user experience and require
>> FS_XFLAG_ATOMICWRITES from the beginning.
>
> Well, even with forcealign we're not going to lose support for atomic
> writes <= block size, are we?
>
forcealign would not be required for atomic writes <= FS block size.
How about this modified approach:
a. Drop FS_XFLAG_ATOMICWRITES support from this series, and so we can
always atomic write 1x FS block (if the bdev supports it)
b. If we agree to support forcealign afterwards, then we can introduce
2x new flags:
- FS_XFLAG_FORCEALIGN - as before
- FS_XFLAG_BIG_ATOMICWRITES - this depends on FS_XFLAG_FORCEALIGN
being enabled per inode, and allows us to atomically write > 1 FS block
c. Later support writing < 1 FS block
- this would not depend on forcealign
- would require a real user, and I don't know one yet
better?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v7 5/8] xfs: Support FS_XFLAG_ATOMICWRITES
2024-10-13 21:06 ` John Garry
@ 2024-10-16 0:52 ` Darrick J. Wong
0 siblings, 0 replies; 16+ messages in thread
From: Darrick J. Wong @ 2024-10-16 0:52 UTC (permalink / raw)
To: John Garry
Cc: Christoph Hellwig, axboe, brauner, viro, jack, dchinner, cem,
linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin
On Sun, Oct 13, 2024 at 10:06:04PM +0100, John Garry wrote:
> On 07/10/2024 06:42, Christoph Hellwig wrote:
> > On Fri, Oct 04, 2024 at 02:07:05PM +0100, John Garry wrote:
> > > Sure, that is true (about being able to atomically write 1x FS block if the
> > > bdev support it).
> > >
> > > But if we are going to add forcealign or similar later, then it would make
> > > sense (to me) to have FS_XFLAG_ATOMICWRITES (and its other flags) from the
> > > beginning. I mean, for example, if FS_XFLAG_FORCEALIGN were enabled and we
> > > want atomic writes, setting FS_XFLAG_ATOMICWRITES would be rejected if AG
> > > count is not aligned with extsize, or extsize is not a power-of-2, or
> > > extsize exceeds bdev limits. So FS_XFLAG_ATOMICWRITES could have some value
> > > there.
> > >
> > > As such, it makes sense to have a consistent user experience and require
> > > FS_XFLAG_ATOMICWRITES from the beginning.
> >
> > Well, even with forcealign we're not going to lose support for atomic
> > writes <= block size, are we?
> >
>
> forcealign would not be required for atomic writes <= FS block size.
>
> How about this modified approach:
>
> a. Drop FS_XFLAG_ATOMICWRITES support from this series, and so we can always
> atomic write 1x FS block (if the bdev supports it)
>
> b. If we agree to support forcealign afterwards, then we can introduce 2x
> new flags:
> - FS_XFLAG_FORCEALIGN - as before
> - FS_XFLAG_BIG_ATOMICWRITES - this depends on FS_XFLAG_FORCEALIGN being
> enabled per inode, and allows us to atomically write > 1 FS block
>
> c. Later support writing < 1 FS block
> - this would not depend on forcealign
> - would require a real user, and I don't know one yet
>
> better?
Sounds fine to /me/, but that's just my opinion. :)
--D
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2024-10-16 0:52 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-04 9:22 [PATCH v7 0/8] block atomic writes for xfs John Garry
2024-10-04 9:22 ` [PATCH v7 1/8] block/fs: Pass an iocb to generic_atomic_write_valid() John Garry
2024-10-04 9:22 ` [PATCH v7 2/8] fs: Export generic_atomic_write_valid() John Garry
2024-10-04 9:22 ` [PATCH v7 3/8] fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid() John Garry
2024-10-04 12:34 ` Christoph Hellwig
2024-10-04 12:45 ` John Garry
2024-10-04 9:22 ` [PATCH v7 4/8] fs: iomap: Atomic write support John Garry
2024-10-04 9:22 ` [PATCH v7 5/8] xfs: Support FS_XFLAG_ATOMICWRITES John Garry
2024-10-04 12:35 ` Christoph Hellwig
2024-10-04 13:07 ` John Garry
2024-10-07 5:42 ` Christoph Hellwig
2024-10-13 21:06 ` John Garry
2024-10-16 0:52 ` Darrick J. Wong
2024-10-04 9:22 ` [PATCH v7 6/8] xfs: Support atomic write for statx John Garry
2024-10-04 9:22 ` [PATCH v7 7/8] xfs: Validate atomic writes John Garry
2024-10-04 9:22 ` [PATCH v7 8/8] xfs: Support setting FMODE_CAN_ATOMIC_WRITE John Garry
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).