* [PATCH v10 0/8] block atomic writes for xfs
@ 2024-10-19 12:51 John Garry
2024-10-19 12:51 ` [PATCH v10 1/8] block/fs: Pass an iocb to generic_atomic_write_valid() John Garry
` (9 more replies)
0 siblings, 10 replies; 20+ messages in thread
From: John Garry @ 2024-10-19 12:51 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
This series expands atomic write support to filesystems, specifically
XFS.
Initially we will only support writing exactly 1x FS block atomically.
Since we can now have FS block size > PAGE_SIZE for XFS, we can write
atomically 4K+ blocks on x86.
No special per-inode flag is required for enabling writing 1x F block.
In future, to support writing more than one FS block atomically, a new FS
XFLAG flag may then introduced - like FS_XFLAG_BIG_ATOMICWRITES. This
would depend on a feature like forcealign.
So if we format the FS for 16K FS block size:
mkfs.xfs -b size=16384 /dev/sda
The statx reports atomic write unit min/max = FS block size:
$xfs_io -c statx filename
...
stat.stx_atomic_write_unit_min = 16384
stat.stx_atomic_write_unit_max = 16384
stat.stx_atomic_write_segments_max = 1
...
Baseline is 77bfe1b11ea0 (tag: xfs-6.12-fixes-3, xfs/xfs-6.12-fixesC,
xfs/for-next) xfs: fix a typo
Patches for this series can be found at:
https://github.com/johnpgarry/linux/tree/atomic-writes-v6.12-fs-v10
Changes since v9:
- iomap doc fix (Darrick)
- Add RB tags from Christoph and Darrick (Thanks!)
Changes since v8:
- Add bdev atomic write unit helpers (Christoph)
- Add comment on FS block size limit (Christoph)
- Stylistic improvements (Christoph)
- Add RB tags from Christoph (thanks!)
Changes since v7:
- Drop FS_XFLAG_ATOMICWRITES
- Reorder block/fs patches and add fixes tags (Christoph)
- Add RB tag from Christoph (Thanks!)
- Rebase
John Garry (8):
block/fs: Pass an iocb to generic_atomic_write_valid()
fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid()
block: Add bdev atomic write limits helpers
fs: Export generic_atomic_write_valid()
fs: iomap: Atomic write support
xfs: Support atomic write for statx
xfs: Validate atomic writes
xfs: Support setting FMODE_CAN_ATOMIC_WRITE
.../filesystems/iomap/operations.rst | 12 ++++++
block/fops.c | 22 ++++++-----
fs/iomap/direct-io.c | 38 +++++++++++++++++--
fs/iomap/trace.h | 3 +-
fs/read_write.c | 16 +++++---
fs/xfs/xfs_buf.c | 7 ++++
fs/xfs/xfs_buf.h | 4 ++
fs/xfs/xfs_file.c | 16 ++++++++
fs/xfs/xfs_inode.h | 15 ++++++++
fs/xfs/xfs_iops.c | 22 +++++++++++
include/linux/blkdev.h | 16 ++++++++
include/linux/fs.h | 2 +-
include/linux/iomap.h | 1 +
13 files changed, 152 insertions(+), 22 deletions(-)
--
2.31.1
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v10 1/8] block/fs: Pass an iocb to generic_atomic_write_valid()
2024-10-19 12:51 [PATCH v10 0/8] block atomic writes for xfs John Garry
@ 2024-10-19 12:51 ` John Garry
2024-10-19 12:51 ` [PATCH v10 2/8] fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid() John Garry
` (8 subsequent siblings)
9 siblings, 0 replies; 20+ messages in thread
From: John Garry @ 2024-10-19 12:51 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Darrick and Hannes both thought it better that generic_atomic_write_valid()
should be passed a struct iocb, and not just the member of that struct
which is referenced; see [0] and [1].
I think that makes a more generic and clean API, so make that change.
[0] https://lore.kernel.org/linux-block/680ce641-729b-4150-b875-531a98657682@suse.de/
[1] https://lore.kernel.org/linux-xfs/20240620212401.GA3058325@frogsfrogsfrogs/
Fixes: c34fc6f26ab8 ("fs: Initial atomic write support")
Suggested-by: "Darrick J. Wong" <djwong@kernel.org>
Suggested-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
block/fops.c | 8 ++++----
fs/read_write.c | 4 ++--
include/linux/fs.h | 2 +-
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/block/fops.c b/block/fops.c
index e696ae53bf1e..968b47b615c4 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -35,13 +35,13 @@ static blk_opf_t dio_bio_write_op(struct kiocb *iocb)
return opf;
}
-static bool blkdev_dio_invalid(struct block_device *bdev, loff_t pos,
+static bool blkdev_dio_invalid(struct block_device *bdev, struct kiocb *iocb,
struct iov_iter *iter, bool is_atomic)
{
- if (is_atomic && !generic_atomic_write_valid(iter, pos))
+ if (is_atomic && !generic_atomic_write_valid(iocb, iter))
return true;
- return pos & (bdev_logical_block_size(bdev) - 1) ||
+ return iocb->ki_pos & (bdev_logical_block_size(bdev) - 1) ||
!bdev_iter_is_aligned(bdev, iter);
}
@@ -374,7 +374,7 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
if (!iov_iter_count(iter))
return 0;
- if (blkdev_dio_invalid(bdev, iocb->ki_pos, iter, is_atomic))
+ if (blkdev_dio_invalid(bdev, iocb, iter, is_atomic))
return -EINVAL;
nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS + 1);
diff --git a/fs/read_write.c b/fs/read_write.c
index 64dc24afdb3a..2c3263530828 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1830,7 +1830,7 @@ int generic_file_rw_checks(struct file *file_in, struct file *file_out)
return 0;
}
-bool generic_atomic_write_valid(struct iov_iter *iter, loff_t pos)
+bool generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter)
{
size_t len = iov_iter_count(iter);
@@ -1840,7 +1840,7 @@ bool generic_atomic_write_valid(struct iov_iter *iter, loff_t pos)
if (!is_power_of_2(len))
return false;
- if (!IS_ALIGNED(pos, len))
+ if (!IS_ALIGNED(iocb->ki_pos, len))
return false;
return true;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e3c603d01337..fbfa032d1d90 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3721,6 +3721,6 @@ static inline bool vfs_empty_path(int dfd, const char __user *path)
return !c;
}
-bool generic_atomic_write_valid(struct iov_iter *iter, loff_t pos);
+bool generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter);
#endif /* _LINUX_FS_H */
--
2.31.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v10 2/8] fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid()
2024-10-19 12:51 [PATCH v10 0/8] block atomic writes for xfs John Garry
2024-10-19 12:51 ` [PATCH v10 1/8] block/fs: Pass an iocb to generic_atomic_write_valid() John Garry
@ 2024-10-19 12:51 ` John Garry
2024-10-19 12:51 ` [PATCH v10 3/8] block: Add bdev atomic write limits helpers John Garry
` (7 subsequent siblings)
9 siblings, 0 replies; 20+ messages in thread
From: John Garry @ 2024-10-19 12:51 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Currently FMODE_CAN_ATOMIC_WRITE is set if the bdev can atomic write and
the file is open for direct IO. This does not work if the file is not
opened for direct IO, yet fcntl(O_DIRECT) is used on the fd later.
Change to check for direct IO on a per-IO basis in
generic_atomic_write_valid(). Since we want to report -EOPNOTSUPP for
non-direct IO for an atomic write, change to return an error code.
Relocate the block fops atomic write checks to the common write path, as to
catch non-direct IO.
Fixes: c34fc6f26ab8 ("fs: Initial atomic write support")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
block/fops.c | 18 ++++++++++--------
fs/read_write.c | 13 ++++++++-----
include/linux/fs.h | 2 +-
3 files changed, 19 insertions(+), 14 deletions(-)
diff --git a/block/fops.c b/block/fops.c
index 968b47b615c4..2d01c9007681 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -36,11 +36,8 @@ static blk_opf_t dio_bio_write_op(struct kiocb *iocb)
}
static bool blkdev_dio_invalid(struct block_device *bdev, struct kiocb *iocb,
- struct iov_iter *iter, bool is_atomic)
+ struct iov_iter *iter)
{
- if (is_atomic && !generic_atomic_write_valid(iocb, iter))
- return true;
-
return iocb->ki_pos & (bdev_logical_block_size(bdev) - 1) ||
!bdev_iter_is_aligned(bdev, iter);
}
@@ -368,13 +365,12 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
{
struct block_device *bdev = I_BDEV(iocb->ki_filp->f_mapping->host);
- bool is_atomic = iocb->ki_flags & IOCB_ATOMIC;
unsigned int nr_pages;
if (!iov_iter_count(iter))
return 0;
- if (blkdev_dio_invalid(bdev, iocb, iter, is_atomic))
+ if (blkdev_dio_invalid(bdev, iocb, iter))
return -EINVAL;
nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS + 1);
@@ -383,7 +379,7 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
return __blkdev_direct_IO_simple(iocb, iter, bdev,
nr_pages);
return __blkdev_direct_IO_async(iocb, iter, bdev, nr_pages);
- } else if (is_atomic) {
+ } else if (iocb->ki_flags & IOCB_ATOMIC) {
return -EINVAL;
}
return __blkdev_direct_IO(iocb, iter, bdev, bio_max_segs(nr_pages));
@@ -625,7 +621,7 @@ static int blkdev_open(struct inode *inode, struct file *filp)
if (!bdev)
return -ENXIO;
- if (bdev_can_atomic_write(bdev) && filp->f_flags & O_DIRECT)
+ if (bdev_can_atomic_write(bdev))
filp->f_mode |= FMODE_CAN_ATOMIC_WRITE;
ret = bdev_open(bdev, mode, filp->private_data, NULL, filp);
@@ -700,6 +696,12 @@ static ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT)
return -EOPNOTSUPP;
+ if (iocb->ki_flags & IOCB_ATOMIC) {
+ ret = generic_atomic_write_valid(iocb, from);
+ if (ret)
+ return ret;
+ }
+
size -= iocb->ki_pos;
if (iov_iter_count(from) > size) {
shorted = iov_iter_count(from) - size;
diff --git a/fs/read_write.c b/fs/read_write.c
index 2c3263530828..befec0b5c537 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1830,18 +1830,21 @@ int generic_file_rw_checks(struct file *file_in, struct file *file_out)
return 0;
}
-bool generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter)
+int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter)
{
size_t len = iov_iter_count(iter);
if (!iter_is_ubuf(iter))
- return false;
+ return -EINVAL;
if (!is_power_of_2(len))
- return false;
+ return -EINVAL;
if (!IS_ALIGNED(iocb->ki_pos, len))
- return false;
+ return -EINVAL;
- return true;
+ if (!(iocb->ki_flags & IOCB_DIRECT))
+ return -EOPNOTSUPP;
+
+ return 0;
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fbfa032d1d90..ba47fb283730 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3721,6 +3721,6 @@ static inline bool vfs_empty_path(int dfd, const char __user *path)
return !c;
}
-bool generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter);
+int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter);
#endif /* _LINUX_FS_H */
--
2.31.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v10 3/8] block: Add bdev atomic write limits helpers
2024-10-19 12:51 [PATCH v10 0/8] block atomic writes for xfs John Garry
2024-10-19 12:51 ` [PATCH v10 1/8] block/fs: Pass an iocb to generic_atomic_write_valid() John Garry
2024-10-19 12:51 ` [PATCH v10 2/8] fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid() John Garry
@ 2024-10-19 12:51 ` John Garry
2024-10-19 12:51 ` [PATCH v10 4/8] fs: Export generic_atomic_write_valid() John Garry
` (6 subsequent siblings)
9 siblings, 0 replies; 20+ messages in thread
From: John Garry @ 2024-10-19 12:51 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Add helpers to get atomic write limits for a bdev, so that we don't access
request_queue helpers outside the block layer.
We check if the bdev can actually atomic write in these helpers, so we
can avoid users missing using this check.
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
include/linux/blkdev.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 50c3b959da28..c2cc3c146d74 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1674,6 +1674,22 @@ static inline bool bdev_can_atomic_write(struct block_device *bdev)
return true;
}
+static inline unsigned int
+bdev_atomic_write_unit_min_bytes(struct block_device *bdev)
+{
+ if (!bdev_can_atomic_write(bdev))
+ return 0;
+ return queue_atomic_write_unit_min_bytes(bdev_get_queue(bdev));
+}
+
+static inline unsigned int
+bdev_atomic_write_unit_max_bytes(struct block_device *bdev)
+{
+ if (!bdev_can_atomic_write(bdev))
+ return 0;
+ return queue_atomic_write_unit_max_bytes(bdev_get_queue(bdev));
+}
+
#define DEFINE_IO_COMP_BATCH(name) struct io_comp_batch name = { }
#endif /* _LINUX_BLKDEV_H */
--
2.31.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v10 4/8] fs: Export generic_atomic_write_valid()
2024-10-19 12:51 [PATCH v10 0/8] block atomic writes for xfs John Garry
` (2 preceding siblings ...)
2024-10-19 12:51 ` [PATCH v10 3/8] block: Add bdev atomic write limits helpers John Garry
@ 2024-10-19 12:51 ` John Garry
2024-10-19 12:51 ` [PATCH v10 5/8] fs: iomap: Atomic write support John Garry
` (5 subsequent siblings)
9 siblings, 0 replies; 20+ messages in thread
From: John Garry @ 2024-10-19 12:51 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
The XFS code will need this.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
fs/read_write.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/read_write.c b/fs/read_write.c
index befec0b5c537..3e5dad12a5b4 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1848,3 +1848,4 @@ int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter)
return 0;
}
+EXPORT_SYMBOL_GPL(generic_atomic_write_valid);
--
2.31.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v10 5/8] fs: iomap: Atomic write support
2024-10-19 12:51 [PATCH v10 0/8] block atomic writes for xfs John Garry
` (3 preceding siblings ...)
2024-10-19 12:51 ` [PATCH v10 4/8] fs: Export generic_atomic_write_valid() John Garry
@ 2024-10-19 12:51 ` John Garry
2024-10-20 8:21 ` Ritesh Harjani
2024-10-19 12:51 ` [PATCH v10 6/8] xfs: Support atomic write for statx John Garry
` (4 subsequent siblings)
9 siblings, 1 reply; 20+ messages in thread
From: John Garry @ 2024-10-19 12:51 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Support direct I/O atomic writes by producing a single bio with REQ_ATOMIC
flag set.
Initially FSes (XFS) should only support writing a single FS block
atomically.
As with any atomic write, we should produce a single bio which covers the
complete write length.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
.../filesystems/iomap/operations.rst | 12 ++++++
fs/iomap/direct-io.c | 38 +++++++++++++++++--
fs/iomap/trace.h | 3 +-
include/linux/iomap.h | 1 +
4 files changed, 49 insertions(+), 5 deletions(-)
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index b93115ab8748..529f81dd3d2c 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -513,6 +513,18 @@ IOMAP_WRITE`` with any combination of the following enhancements:
if the mapping is unwritten and the filesystem cannot handle zeroing
the unaligned regions without exposing stale contents.
+ * ``IOMAP_ATOMIC``: This write is being issued with torn-write
+ protection.
+ Only a single bio can be created for the write, and the write must
+ not be split into multiple I/O requests, i.e. flag REQ_ATOMIC must be
+ set.
+ The file range to write must be aligned to satisfy the requirements
+ of both the filesystem and the underlying block device's atomic
+ commit capabilities.
+ If filesystem metadata updates are required (e.g. unwritten extent
+ conversion or copy on write), all updates for the entire file range
+ must be committed atomically as well.
+
Callers commonly hold ``i_rwsem`` in shared or exclusive mode before
calling this function.
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index f637aa0706a3..ed4764e3b8f0 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -271,7 +271,7 @@ static int iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
* clearing the WRITE_THROUGH flag in the dio request.
*/
static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
- const struct iomap *iomap, bool use_fua)
+ const struct iomap *iomap, bool use_fua, bool atomic)
{
blk_opf_t opflags = REQ_SYNC | REQ_IDLE;
@@ -283,6 +283,8 @@ static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
opflags |= REQ_FUA;
else
dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
+ if (atomic)
+ opflags |= REQ_ATOMIC;
return opflags;
}
@@ -293,7 +295,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
const struct iomap *iomap = &iter->iomap;
struct inode *inode = iter->inode;
unsigned int fs_block_size = i_blocksize(inode), pad;
- loff_t length = iomap_length(iter);
+ const loff_t length = iomap_length(iter);
+ bool atomic = iter->flags & IOMAP_ATOMIC;
loff_t pos = iter->pos;
blk_opf_t bio_opf;
struct bio *bio;
@@ -303,6 +306,9 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
size_t copied = 0;
size_t orig_count;
+ if (atomic && length != fs_block_size)
+ return -EINVAL;
+
if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1) ||
!bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
return -EINVAL;
@@ -382,7 +388,7 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
* can set up the page vector appropriately for a ZONE_APPEND
* operation.
*/
- bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua);
+ bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua, atomic);
nr_pages = bio_iov_vecs_to_alloc(dio->submit.iter, BIO_MAX_VECS);
do {
@@ -415,6 +421,17 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
}
n = bio->bi_iter.bi_size;
+ if (WARN_ON_ONCE(atomic && n != length)) {
+ /*
+ * This bio should have covered the complete length,
+ * which it doesn't, so error. We may need to zero out
+ * the tail (complete FS block), similar to when
+ * bio_iov_iter_get_pages() returns an error, above.
+ */
+ ret = -EINVAL;
+ bio_put(bio);
+ goto zero_tail;
+ }
if (dio->flags & IOMAP_DIO_WRITE) {
task_io_account_write(n);
} else {
@@ -598,6 +615,9 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (iocb->ki_flags & IOCB_NOWAIT)
iomi.flags |= IOMAP_NOWAIT;
+ if (iocb->ki_flags & IOCB_ATOMIC)
+ iomi.flags |= IOMAP_ATOMIC;
+
if (iov_iter_rw(iter) == READ) {
/* reads can always complete inline */
dio->flags |= IOMAP_DIO_INLINE_COMP;
@@ -659,7 +679,17 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (ret != -EAGAIN) {
trace_iomap_dio_invalidate_fail(inode, iomi.pos,
iomi.len);
- ret = -ENOTBLK;
+ if (iocb->ki_flags & IOCB_ATOMIC) {
+ /*
+ * folio invalidation failed, maybe
+ * this is transient, unlock and see if
+ * the caller tries again.
+ */
+ ret = -EAGAIN;
+ } else {
+ /* fall back to buffered write */
+ ret = -ENOTBLK;
+ }
}
goto out_free_dio;
}
diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
index 0a991c4ce87d..4118a42cdab0 100644
--- a/fs/iomap/trace.h
+++ b/fs/iomap/trace.h
@@ -98,7 +98,8 @@ DEFINE_RANGE_EVENT(iomap_dio_rw_queued);
{ IOMAP_REPORT, "REPORT" }, \
{ IOMAP_FAULT, "FAULT" }, \
{ IOMAP_DIRECT, "DIRECT" }, \
- { IOMAP_NOWAIT, "NOWAIT" }
+ { IOMAP_NOWAIT, "NOWAIT" }, \
+ { IOMAP_ATOMIC, "ATOMIC" }
#define IOMAP_F_FLAGS_STRINGS \
{ IOMAP_F_NEW, "NEW" }, \
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index d0420e962ffd..84282db3e4c1 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -178,6 +178,7 @@ struct iomap_folio_ops {
#else
#define IOMAP_DAX 0
#endif /* CONFIG_FS_DAX */
+#define IOMAP_ATOMIC (1 << 9)
struct iomap_ops {
/*
--
2.31.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v10 6/8] xfs: Support atomic write for statx
2024-10-19 12:51 [PATCH v10 0/8] block atomic writes for xfs John Garry
` (4 preceding siblings ...)
2024-10-19 12:51 ` [PATCH v10 5/8] fs: iomap: Atomic write support John Garry
@ 2024-10-19 12:51 ` John Garry
2024-10-19 12:51 ` [PATCH v10 7/8] xfs: Validate atomic writes John Garry
` (3 subsequent siblings)
9 siblings, 0 replies; 20+ messages in thread
From: John Garry @ 2024-10-19 12:51 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Support providing info on atomic write unit min and max for an inode.
For simplicity, currently we limit the min at the FS block size. As for
max, we limit also at FS block size, as there is no current method to
guarantee extent alignment or granularity for regular files.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
fs/xfs/xfs_buf.c | 7 +++++++
fs/xfs/xfs_buf.h | 4 ++++
fs/xfs/xfs_inode.h | 15 +++++++++++++++
fs/xfs/xfs_iops.c | 22 ++++++++++++++++++++++
4 files changed, 48 insertions(+)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index aa4dbda7b536..e8196f5778e2 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -2115,6 +2115,13 @@ xfs_alloc_buftarg(
btp->bt_daxdev = fs_dax_get_by_bdev(btp->bt_bdev, &btp->bt_dax_part_off,
mp, ops);
+ if (bdev_can_atomic_write(btp->bt_bdev)) {
+ btp->bt_bdev_awu_min = bdev_atomic_write_unit_min_bytes(
+ btp->bt_bdev);
+ btp->bt_bdev_awu_max = bdev_atomic_write_unit_max_bytes(
+ btp->bt_bdev);
+ }
+
/*
* When allocating the buftargs we have not yet read the super block and
* thus don't know the file system sector size yet.
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 209a389f2abc..3d56bc7a35cc 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -124,6 +124,10 @@ struct xfs_buftarg {
struct percpu_counter bt_io_count;
struct ratelimit_state bt_ioerror_rl;
+ /* Atomic write unit values */
+ unsigned int bt_bdev_awu_min;
+ unsigned int bt_bdev_awu_max;
+
/* built-in cache, if we're not using the perag one */
struct xfs_buf_cache bt_cache[];
};
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 97ed912306fd..73009a25a119 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -327,6 +327,21 @@ static inline bool xfs_inode_has_bigrtalloc(struct xfs_inode *ip)
(XFS_IS_REALTIME_INODE(ip) ? \
(ip)->i_mount->m_rtdev_targp : (ip)->i_mount->m_ddev_targp)
+static inline bool
+xfs_inode_can_atomicwrite(
+ struct xfs_inode *ip)
+{
+ struct xfs_mount *mp = ip->i_mount;
+ struct xfs_buftarg *target = xfs_inode_buftarg(ip);
+
+ if (mp->m_sb.sb_blocksize < target->bt_bdev_awu_min)
+ return false;
+ if (mp->m_sb.sb_blocksize > target->bt_bdev_awu_max)
+ return false;
+
+ return true;
+}
+
/*
* In-core inode flags.
*/
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index ee79cf161312..5cd804812efd 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -570,6 +570,20 @@ xfs_stat_blksize(
return max_t(uint32_t, PAGE_SIZE, mp->m_sb.sb_blocksize);
}
+static void
+xfs_get_atomic_write_attr(
+ struct xfs_inode *ip,
+ unsigned int *unit_min,
+ unsigned int *unit_max)
+{
+ if (!xfs_inode_can_atomicwrite(ip)) {
+ *unit_min = *unit_max = 0;
+ return;
+ }
+
+ *unit_min = *unit_max = ip->i_mount->m_sb.sb_blocksize;
+}
+
STATIC int
xfs_vn_getattr(
struct mnt_idmap *idmap,
@@ -643,6 +657,14 @@ xfs_vn_getattr(
stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
stat->dio_offset_align = bdev_logical_block_size(bdev);
}
+ if (request_mask & STATX_WRITE_ATOMIC) {
+ unsigned int unit_min, unit_max;
+
+ xfs_get_atomic_write_attr(ip, &unit_min,
+ &unit_max);
+ generic_fill_statx_atomic_writes(stat,
+ unit_min, unit_max);
+ }
fallthrough;
default:
stat->blksize = xfs_stat_blksize(ip);
--
2.31.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v10 7/8] xfs: Validate atomic writes
2024-10-19 12:51 [PATCH v10 0/8] block atomic writes for xfs John Garry
` (5 preceding siblings ...)
2024-10-19 12:51 ` [PATCH v10 6/8] xfs: Support atomic write for statx John Garry
@ 2024-10-19 12:51 ` John Garry
2024-10-20 9:44 ` Ritesh Harjani
2024-10-19 12:51 ` [PATCH v10 8/8] xfs: Support setting FMODE_CAN_ATOMIC_WRITE John Garry
` (2 subsequent siblings)
9 siblings, 1 reply; 20+ messages in thread
From: John Garry @ 2024-10-19 12:51 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Validate that an atomic write adheres to length/offset rules. Currently
we can only write a single FS block.
For an IOCB with IOCB_ATOMIC set to get as far as xfs_file_write_iter(),
FMODE_CAN_ATOMIC_WRITE will need to be set for the file; for this,
ATOMICWRITES flags would also need to be set for the inode.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
fs/xfs/xfs_file.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index b19916b11fd5..1ccbc1eb75c9 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -852,6 +852,20 @@ xfs_file_write_iter(
if (IS_DAX(inode))
return xfs_file_dax_write(iocb, from);
+ if (iocb->ki_flags & IOCB_ATOMIC) {
+ /*
+ * Currently only atomic writing of a single FS block is
+ * supported. It would be possible to atomic write smaller than
+ * a FS block, but there is no requirement to support this.
+ * Note that iomap also does not support this yet.
+ */
+ if (ocount != ip->i_mount->m_sb.sb_blocksize)
+ return -EINVAL;
+ ret = generic_atomic_write_valid(iocb, from);
+ if (ret)
+ return ret;
+ }
+
if (iocb->ki_flags & IOCB_DIRECT) {
/*
* Allow a directio write to fall back to a buffered
--
2.31.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v10 8/8] xfs: Support setting FMODE_CAN_ATOMIC_WRITE
2024-10-19 12:51 [PATCH v10 0/8] block atomic writes for xfs John Garry
` (6 preceding siblings ...)
2024-10-19 12:51 ` [PATCH v10 7/8] xfs: Validate atomic writes John Garry
@ 2024-10-19 12:51 ` John Garry
2024-10-19 22:49 ` (subset) [PATCH v10 0/8] block atomic writes for xfs Jens Axboe
2024-10-24 6:32 ` Ojaswin Mujoo
9 siblings, 0 replies; 20+ messages in thread
From: John Garry @ 2024-10-19 12:51 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Set FMODE_CAN_ATOMIC_WRITE flag if we can atomic write for that inode.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
fs/xfs/xfs_file.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 1ccbc1eb75c9..ca47cae5a40a 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1253,6 +1253,8 @@ xfs_file_open(
if (xfs_is_shutdown(XFS_M(inode->i_sb)))
return -EIO;
file->f_mode |= FMODE_NOWAIT | FMODE_CAN_ODIRECT;
+ if (xfs_inode_can_atomicwrite(XFS_I(inode)))
+ file->f_mode |= FMODE_CAN_ATOMIC_WRITE;
return generic_file_open(inode, file);
}
--
2.31.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: (subset) [PATCH v10 0/8] block atomic writes for xfs
2024-10-19 12:51 [PATCH v10 0/8] block atomic writes for xfs John Garry
` (7 preceding siblings ...)
2024-10-19 12:51 ` [PATCH v10 8/8] xfs: Support setting FMODE_CAN_ATOMIC_WRITE John Garry
@ 2024-10-19 22:49 ` Jens Axboe
2024-10-19 22:50 ` Jens Axboe
2024-10-24 6:32 ` Ojaswin Mujoo
9 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2024-10-19 22:49 UTC (permalink / raw)
To: brauner, djwong, viro, jack, dchinner, hch, cem, John Garry
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin
On Sat, 19 Oct 2024 12:51:05 +0000, John Garry wrote:
> This series expands atomic write support to filesystems, specifically
> XFS.
>
> Initially we will only support writing exactly 1x FS block atomically.
>
> Since we can now have FS block size > PAGE_SIZE for XFS, we can write
> atomically 4K+ blocks on x86.
>
> [...]
Applied, thanks!
[1/8] block/fs: Pass an iocb to generic_atomic_write_valid()
commit: 9a8dbdadae509e5717ff6e5aa572ca0974d2101d
[2/8] fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid()
commit: c3be7ebbbce5201e151f17e28a6c807602f369c9
[3/8] block: Add bdev atomic write limits helpers
commit: 1eadb157947163ca72ba8963b915fdc099ce6cca
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: (subset) [PATCH v10 0/8] block atomic writes for xfs
2024-10-19 22:49 ` (subset) [PATCH v10 0/8] block atomic writes for xfs Jens Axboe
@ 2024-10-19 22:50 ` Jens Axboe
2024-10-23 12:42 ` John Garry
0 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2024-10-19 22:50 UTC (permalink / raw)
To: brauner, djwong, viro, jack, dchinner, hch, cem, John Garry
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin
On 10/19/24 4:49 PM, Jens Axboe wrote:
>
> On Sat, 19 Oct 2024 12:51:05 +0000, John Garry wrote:
>> This series expands atomic write support to filesystems, specifically
>> XFS.
>>
>> Initially we will only support writing exactly 1x FS block atomically.
>>
>> Since we can now have FS block size > PAGE_SIZE for XFS, we can write
>> atomically 4K+ blocks on x86.
>>
>> [...]
>
> Applied, thanks!
>
> [1/8] block/fs: Pass an iocb to generic_atomic_write_valid()
> commit: 9a8dbdadae509e5717ff6e5aa572ca0974d2101d
> [2/8] fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid()
> commit: c3be7ebbbce5201e151f17e28a6c807602f369c9
> [3/8] block: Add bdev atomic write limits helpers
> commit: 1eadb157947163ca72ba8963b915fdc099ce6cca
These are now sitting in:
git://git.kernel.dk/linux for-6.13/block-atomic
and can be pulled in by the fs/xfs people.
--
Jens Axboe
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v10 5/8] fs: iomap: Atomic write support
2024-10-19 12:51 ` [PATCH v10 5/8] fs: iomap: Atomic write support John Garry
@ 2024-10-20 8:21 ` Ritesh Harjani
2024-10-20 11:21 ` John Garry
0 siblings, 1 reply; 20+ messages in thread
From: Ritesh Harjani @ 2024-10-20 8:21 UTC (permalink / raw)
To: John Garry, axboe, brauner, djwong, viro, jack, dchinner, hch,
cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ojaswin, John Garry
John Garry <john.g.garry@oracle.com> writes:
> Support direct I/O atomic writes by producing a single bio with REQ_ATOMIC
> flag set.
>
> Initially FSes (XFS) should only support writing a single FS block
> atomically.
>
> As with any atomic write, we should produce a single bio which covers the
> complete write length.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
> .../filesystems/iomap/operations.rst | 12 ++++++
> fs/iomap/direct-io.c | 38 +++++++++++++++++--
> fs/iomap/trace.h | 3 +-
> include/linux/iomap.h | 1 +
> 4 files changed, 49 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
> index b93115ab8748..529f81dd3d2c 100644
> --- a/Documentation/filesystems/iomap/operations.rst
> +++ b/Documentation/filesystems/iomap/operations.rst
> @@ -513,6 +513,18 @@ IOMAP_WRITE`` with any combination of the following enhancements:
> if the mapping is unwritten and the filesystem cannot handle zeroing
> the unaligned regions without exposing stale contents.
>
> + * ``IOMAP_ATOMIC``: This write is being issued with torn-write
> + protection.
> + Only a single bio can be created for the write, and the write must
> + not be split into multiple I/O requests, i.e. flag REQ_ATOMIC must be
> + set.
> + The file range to write must be aligned to satisfy the requirements
> + of both the filesystem and the underlying block device's atomic
> + commit capabilities.
> + If filesystem metadata updates are required (e.g. unwritten extent
> + conversion or copy on write), all updates for the entire file range
> + must be committed atomically as well.
> +
> Callers commonly hold ``i_rwsem`` in shared or exclusive mode before
> calling this function.
>
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index f637aa0706a3..ed4764e3b8f0 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -271,7 +271,7 @@ static int iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
> * clearing the WRITE_THROUGH flag in the dio request.
> */
> static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
> - const struct iomap *iomap, bool use_fua)
> + const struct iomap *iomap, bool use_fua, bool atomic)
> {
> blk_opf_t opflags = REQ_SYNC | REQ_IDLE;
>
> @@ -283,6 +283,8 @@ static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
> opflags |= REQ_FUA;
> else
> dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
> + if (atomic)
> + opflags |= REQ_ATOMIC;
>
> return opflags;
> }
> @@ -293,7 +295,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> const struct iomap *iomap = &iter->iomap;
> struct inode *inode = iter->inode;
> unsigned int fs_block_size = i_blocksize(inode), pad;
> - loff_t length = iomap_length(iter);
> + const loff_t length = iomap_length(iter);
> + bool atomic = iter->flags & IOMAP_ATOMIC;
> loff_t pos = iter->pos;
> blk_opf_t bio_opf;
> struct bio *bio;
> @@ -303,6 +306,9 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> size_t copied = 0;
> size_t orig_count;
>
> + if (atomic && length != fs_block_size)
> + return -EINVAL;
We anyway mandate iov_iter_count() write should be same as sb_blocksize
in xfs_file_write_iter() for atomic writes.
This comparison here is not required. I believe we do plan to lift this
restriction maybe when we are going to add forcealign support right?
And similarly this needs to be lifted when ext4 adds support for atomic
write even with bigalloc. I hope we can do so when we add such support, right?
(I guess, that is also the reason we haven't mentioned this restriction
in description of "IOMAP_ATOMIC" in Documentation.)
-ritesh
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v10 7/8] xfs: Validate atomic writes
2024-10-19 12:51 ` [PATCH v10 7/8] xfs: Validate atomic writes John Garry
@ 2024-10-20 9:44 ` Ritesh Harjani
2024-10-20 11:09 ` John Garry
0 siblings, 1 reply; 20+ messages in thread
From: Ritesh Harjani @ 2024-10-20 9:44 UTC (permalink / raw)
To: John Garry, axboe, brauner, djwong, viro, jack, dchinner, hch,
cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ojaswin, John Garry
John Garry <john.g.garry@oracle.com> writes:
> Validate that an atomic write adheres to length/offset rules. Currently
> we can only write a single FS block.
>
> For an IOCB with IOCB_ATOMIC set to get as far as xfs_file_write_iter(),
> FMODE_CAN_ATOMIC_WRITE will need to be set for the file; for this,
> ATOMICWRITES flags would also need to be set for the inode.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
> fs/xfs/xfs_file.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index b19916b11fd5..1ccbc1eb75c9 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -852,6 +852,20 @@ xfs_file_write_iter(
> if (IS_DAX(inode))
> return xfs_file_dax_write(iocb, from);
>
> + if (iocb->ki_flags & IOCB_ATOMIC) {
> + /*
> + * Currently only atomic writing of a single FS block is
> + * supported. It would be possible to atomic write smaller than
> + * a FS block, but there is no requirement to support this.
> + * Note that iomap also does not support this yet.
> + */
> + if (ocount != ip->i_mount->m_sb.sb_blocksize)
> + return -EINVAL;
Shouldn't we "return -ENOTSUPP" ?
Given we are later going to add support for ocount > sb_blocksize.
-ritesh
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v10 7/8] xfs: Validate atomic writes
2024-10-20 9:44 ` Ritesh Harjani
@ 2024-10-20 11:09 ` John Garry
2024-10-20 11:41 ` Ritesh Harjani
0 siblings, 1 reply; 20+ messages in thread
From: John Garry @ 2024-10-20 11:09 UTC (permalink / raw)
To: Ritesh Harjani (IBM), axboe, brauner, djwong, viro, jack,
dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ojaswin
On 20/10/2024 10:44, Ritesh Harjani (IBM) wrote:
>> + if (iocb->ki_flags & IOCB_ATOMIC) {
>> + /*
>> + * Currently only atomic writing of a single FS block is
>> + * supported. It would be possible to atomic write smaller than
>> + * a FS block, but there is no requirement to support this.
>> + * Note that iomap also does not support this yet.
>> + */
>> + if (ocount != ip->i_mount->m_sb.sb_blocksize)
>> + return -EINVAL;
> Shouldn't we "return -ENOTSUPP" ?
> Given we are later going to add support for ocount > sb_blocksize.
So far we have been reporting -EINVAL for an invalid atomic write size
(according to atomic write unit min and max reported for that inode).
-ENOTSUPP is used for times when we just don't support atomic writes,
like non-DIO.
Thanks,
John
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v10 5/8] fs: iomap: Atomic write support
2024-10-20 8:21 ` Ritesh Harjani
@ 2024-10-20 11:21 ` John Garry
2024-10-20 11:37 ` Ritesh Harjani
0 siblings, 1 reply; 20+ messages in thread
From: John Garry @ 2024-10-20 11:21 UTC (permalink / raw)
To: Ritesh Harjani (IBM), axboe, brauner, djwong, viro, jack,
dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ojaswin
On 20/10/2024 09:21, Ritesh Harjani (IBM) wrote:
>> -293,7 +295,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
>> const struct iomap *iomap = &iter->iomap;
>> struct inode *inode = iter->inode;
>> unsigned int fs_block_size = i_blocksize(inode), pad;
>> - loff_t length = iomap_length(iter);
>> + const loff_t length = iomap_length(iter);
>> + bool atomic = iter->flags & IOMAP_ATOMIC;
>> loff_t pos = iter->pos;
>> blk_opf_t bio_opf;
>> struct bio *bio;
>> @@ -303,6 +306,9 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
>> size_t copied = 0;
>> size_t orig_count;
>>
>> + if (atomic && length != fs_block_size)
>> + return -EINVAL;
> We anyway mandate iov_iter_count() write should be same as sb_blocksize
> in xfs_file_write_iter() for atomic writes.
> This comparison here is not required. I believe we do plan to lift this
> restriction maybe when we are going to add forcealign support right?
Yes, we would lift this restriction if and when forcealign is added. Or
when bigalloc is leveraged for ext4 atomic writes.
But I think that today it is proper to add this check, as we are saying
that iomap DIO path does not support anything else than fs_block_size.
For forcealign, we were introducing support for atomic writes spanning
mixed unwritten and written extents in [0]. We don't have that support
here, so it is prudent to say that we just support fs_block_size.
[0]
https://lore.kernel.org/linux-xfs/20240607143919.2622319-4-john.g.garry@oracle.com/
>
> And similarly this needs to be lifted when ext4 adds support for atomic
> write even with bigalloc. I hope we can do so when we add such support, right?
Right
>
> (I guess, that is also the reason we haven't mentioned this restriction
> in description of "IOMAP_ATOMIC" in Documentation.)
The documentation does not mention the size, but that is not intentional.
Thanks,
John
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v10 5/8] fs: iomap: Atomic write support
2024-10-20 11:21 ` John Garry
@ 2024-10-20 11:37 ` Ritesh Harjani
0 siblings, 0 replies; 20+ messages in thread
From: Ritesh Harjani @ 2024-10-20 11:37 UTC (permalink / raw)
To: John Garry, axboe, brauner, djwong, viro, jack, dchinner, hch,
cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ojaswin
John Garry <john.g.garry@oracle.com> writes:
> On 20/10/2024 09:21, Ritesh Harjani (IBM) wrote:
>>> -293,7 +295,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
>>> const struct iomap *iomap = &iter->iomap;
>>> struct inode *inode = iter->inode;
>>> unsigned int fs_block_size = i_blocksize(inode), pad;
>>> - loff_t length = iomap_length(iter);
>>> + const loff_t length = iomap_length(iter);
>>> + bool atomic = iter->flags & IOMAP_ATOMIC;
>>> loff_t pos = iter->pos;
>>> blk_opf_t bio_opf;
>>> struct bio *bio;
>>> @@ -303,6 +306,9 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
>>> size_t copied = 0;
>>> size_t orig_count;
>>>
>>> + if (atomic && length != fs_block_size)
>>> + return -EINVAL;
>> We anyway mandate iov_iter_count() write should be same as sb_blocksize
>> in xfs_file_write_iter() for atomic writes.
>> This comparison here is not required. I believe we do plan to lift this
>> restriction maybe when we are going to add forcealign support right?
>
> Yes, we would lift this restriction if and when forcealign is added. Or
> when bigalloc is leveraged for ext4 atomic writes.
>
> But I think that today it is proper to add this check, as we are saying
> that iomap DIO path does not support anything else than fs_block_size.
>
> For forcealign, we were introducing support for atomic writes spanning
> mixed unwritten and written extents in [0]. We don't have that support
> here, so it is prudent to say that we just support fs_block_size.
>
> [0]
> https://lore.kernel.org/linux-xfs/20240607143919.2622319-4-john.g.garry@oracle.com/
>
Sure.
>>
>> And similarly this needs to be lifted when ext4 adds support for atomic
>> write even with bigalloc. I hope we can do so when we add such support, right?
>
> Right
>
Thanks for confirming that.
The patch looks good to me. Please feel free to add -
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v10 7/8] xfs: Validate atomic writes
2024-10-20 11:09 ` John Garry
@ 2024-10-20 11:41 ` Ritesh Harjani
0 siblings, 0 replies; 20+ messages in thread
From: Ritesh Harjani @ 2024-10-20 11:41 UTC (permalink / raw)
To: John Garry, axboe, brauner, djwong, viro, jack, dchinner, hch,
cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ojaswin
John Garry <john.g.garry@oracle.com> writes:
> On 20/10/2024 10:44, Ritesh Harjani (IBM) wrote:
>>> + if (iocb->ki_flags & IOCB_ATOMIC) {
>>> + /*
>>> + * Currently only atomic writing of a single FS block is
>>> + * supported. It would be possible to atomic write smaller than
>>> + * a FS block, but there is no requirement to support this.
>>> + * Note that iomap also does not support this yet.
>>> + */
>>> + if (ocount != ip->i_mount->m_sb.sb_blocksize)
>>> + return -EINVAL;
>> Shouldn't we "return -ENOTSUPP" ?
>> Given we are later going to add support for ocount > sb_blocksize.
>
> So far we have been reporting -EINVAL for an invalid atomic write size
> (according to atomic write unit min and max reported for that inode).
>
> -ENOTSUPP is used for times when we just don't support atomic writes,
> like non-DIO.
>
Sure make sense.
-ritesh
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: (subset) [PATCH v10 0/8] block atomic writes for xfs
2024-10-19 22:50 ` Jens Axboe
@ 2024-10-23 12:42 ` John Garry
2024-10-23 12:50 ` Carlos Maiolino
0 siblings, 1 reply; 20+ messages in thread
From: John Garry @ 2024-10-23 12:42 UTC (permalink / raw)
To: Jens Axboe, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
hch, brauner, djwong, viro, jack, dchinner
On 19/10/2024 23:50, Jens Axboe wrote:
>> On Sat, 19 Oct 2024 12:51:05 +0000, John Garry wrote:
>>> This series expands atomic write support to filesystems, specifically
>>> XFS.
>>>
>>> Initially we will only support writing exactly 1x FS block atomically.
>>>
>>> Since we can now have FS block size > PAGE_SIZE for XFS, we can write
>>> atomically 4K+ blocks on x86.
>>>
>>> [...]
>> Applied, thanks!
>>
>> [1/8] block/fs: Pass an iocb to generic_atomic_write_valid()
>> commit: 9a8dbdadae509e5717ff6e5aa572ca0974d2101d
>> [2/8] fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid()
>> commit: c3be7ebbbce5201e151f17e28a6c807602f369c9
>> [3/8] block: Add bdev atomic write limits helpers
>> commit: 1eadb157947163ca72ba8963b915fdc099ce6cca
Thanks Jens
> These are now sitting in:
>
> git://git.kernel.dk/linux for-6.13/block-atomic
>
> and can be pulled in by the fs/xfs people.
Carlos, can you kindly consider merging that branch and picking up the
iomap + xfs changes?
Cheers
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: (subset) [PATCH v10 0/8] block atomic writes for xfs
2024-10-23 12:42 ` John Garry
@ 2024-10-23 12:50 ` Carlos Maiolino
0 siblings, 0 replies; 20+ messages in thread
From: Carlos Maiolino @ 2024-10-23 12:50 UTC (permalink / raw)
To: John Garry
Cc: Jens Axboe, linux-block, linux-kernel, linux-xfs, linux-fsdevel,
hare, martin.petersen, catherine.hoang, mcgrof, ritesh.list,
ojaswin, hch, brauner, djwong, viro, jack, dchinner
On Wed, Oct 23, 2024 at 01:42:24PM GMT, John Garry wrote:
> On 19/10/2024 23:50, Jens Axboe wrote:
> > > On Sat, 19 Oct 2024 12:51:05 +0000, John Garry wrote:
> > > > This series expands atomic write support to filesystems, specifically
> > > > XFS.
> > > >
> > > > Initially we will only support writing exactly 1x FS block atomically.
> > > >
> > > > Since we can now have FS block size > PAGE_SIZE for XFS, we can write
> > > > atomically 4K+ blocks on x86.
> > > >
> > > > [...]
> > > Applied, thanks!
> > >
> > > [1/8] block/fs: Pass an iocb to generic_atomic_write_valid()
> > > commit: 9a8dbdadae509e5717ff6e5aa572ca0974d2101d
> > > [2/8] fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid()
> > > commit: c3be7ebbbce5201e151f17e28a6c807602f369c9
> > > [3/8] block: Add bdev atomic write limits helpers
> > > commit: 1eadb157947163ca72ba8963b915fdc099ce6cca
>
> Thanks Jens
>
> > These are now sitting in:
> >
> > git://git.kernel.dk/linux for-6.13/block-atomic
> >
> > and can be pulled in by the fs/xfs people.
>
> Carlos, can you kindly consider merging that branch and picking up the iomap
> + xfs changes?
yup, I'll queue them up for 6.12 merge window
Carlos
>
> Cheers
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v10 0/8] block atomic writes for xfs
2024-10-19 12:51 [PATCH v10 0/8] block atomic writes for xfs John Garry
` (8 preceding siblings ...)
2024-10-19 22:49 ` (subset) [PATCH v10 0/8] block atomic writes for xfs Jens Axboe
@ 2024-10-24 6:32 ` Ojaswin Mujoo
9 siblings, 0 replies; 20+ messages in thread
From: Ojaswin Mujoo @ 2024-10-24 6:32 UTC (permalink / raw)
To: John Garry
Cc: axboe, brauner, djwong, viro, jack, dchinner, hch, cem,
linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list
On Sat, Oct 19, 2024 at 12:51:05PM +0000, John Garry wrote:
> This series expands atomic write support to filesystems, specifically
> XFS.
>
> Initially we will only support writing exactly 1x FS block atomically.
>
> Since we can now have FS block size > PAGE_SIZE for XFS, we can write
> atomically 4K+ blocks on x86.
>
> No special per-inode flag is required for enabling writing 1x F block.
> In future, to support writing more than one FS block atomically, a new FS
> XFLAG flag may then introduced - like FS_XFLAG_BIG_ATOMICWRITES. This
> would depend on a feature like forcealign.
>
> So if we format the FS for 16K FS block size:
> mkfs.xfs -b size=16384 /dev/sda
>
> The statx reports atomic write unit min/max = FS block size:
> $xfs_io -c statx filename
> ...
> stat.stx_atomic_write_unit_min = 16384
> stat.stx_atomic_write_unit_max = 16384
> stat.stx_atomic_write_segments_max = 1
> ...
>
> Baseline is 77bfe1b11ea0 (tag: xfs-6.12-fixes-3, xfs/xfs-6.12-fixesC,
> xfs/for-next) xfs: fix a typo
>
> Patches for this series can be found at:
> https://github.com/johnpgarry/linux/tree/atomic-writes-v6.12-fs-v10
>
> Changes since v9:
> - iomap doc fix (Darrick)
> - Add RB tags from Christoph and Darrick (Thanks!)
>
> Changes since v8:
> - Add bdev atomic write unit helpers (Christoph)
> - Add comment on FS block size limit (Christoph)
> - Stylistic improvements (Christoph)
> - Add RB tags from Christoph (thanks!)
>
> Changes since v7:
> - Drop FS_XFLAG_ATOMICWRITES
> - Reorder block/fs patches and add fixes tags (Christoph)
> - Add RB tag from Christoph (Thanks!)
> - Rebase
>
> John Garry (8):
> block/fs: Pass an iocb to generic_atomic_write_valid()
> fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid()
> block: Add bdev atomic write limits helpers
> fs: Export generic_atomic_write_valid()
> fs: iomap: Atomic write support
> xfs: Support atomic write for statx
> xfs: Validate atomic writes
> xfs: Support setting FMODE_CAN_ATOMIC_WRITE
>
> .../filesystems/iomap/operations.rst | 12 ++++++
> block/fops.c | 22 ++++++-----
> fs/iomap/direct-io.c | 38 +++++++++++++++++--
> fs/iomap/trace.h | 3 +-
> fs/read_write.c | 16 +++++---
> fs/xfs/xfs_buf.c | 7 ++++
> fs/xfs/xfs_buf.h | 4 ++
> fs/xfs/xfs_file.c | 16 ++++++++
> fs/xfs/xfs_inode.h | 15 ++++++++
> fs/xfs/xfs_iops.c | 22 +++++++++++
> include/linux/blkdev.h | 16 ++++++++
> include/linux/fs.h | 2 +-
> include/linux/iomap.h | 1 +
> 13 files changed, 152 insertions(+), 22 deletions(-)
>
> --
Hi John,
I've tested the whole patchset on powerpc (64k pagesize) with 4k, 16k
and 64k blocksizes and it passes the tests. My tests basically check
following scenarios:
Statx behavior:
# 1.1 bs > unit_max
# 1.2 bs < unit_max
# 1.3 bs == unit_max
# 1.4 dev deosn't support
pwrite tests:
# 3.1 len < fsmin
# 3.2 len > fsmax
# 3.3 write not naturally aligned
# 3.4 Atomic write abiding to all rule
For the whole patchset, feel free to add:
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> #On ppc64
Thanks,
Ojaswin
> 2.31.1
>
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2024-10-24 6:33 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-19 12:51 [PATCH v10 0/8] block atomic writes for xfs John Garry
2024-10-19 12:51 ` [PATCH v10 1/8] block/fs: Pass an iocb to generic_atomic_write_valid() John Garry
2024-10-19 12:51 ` [PATCH v10 2/8] fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid() John Garry
2024-10-19 12:51 ` [PATCH v10 3/8] block: Add bdev atomic write limits helpers John Garry
2024-10-19 12:51 ` [PATCH v10 4/8] fs: Export generic_atomic_write_valid() John Garry
2024-10-19 12:51 ` [PATCH v10 5/8] fs: iomap: Atomic write support John Garry
2024-10-20 8:21 ` Ritesh Harjani
2024-10-20 11:21 ` John Garry
2024-10-20 11:37 ` Ritesh Harjani
2024-10-19 12:51 ` [PATCH v10 6/8] xfs: Support atomic write for statx John Garry
2024-10-19 12:51 ` [PATCH v10 7/8] xfs: Validate atomic writes John Garry
2024-10-20 9:44 ` Ritesh Harjani
2024-10-20 11:09 ` John Garry
2024-10-20 11:41 ` Ritesh Harjani
2024-10-19 12:51 ` [PATCH v10 8/8] xfs: Support setting FMODE_CAN_ATOMIC_WRITE John Garry
2024-10-19 22:49 ` (subset) [PATCH v10 0/8] block atomic writes for xfs Jens Axboe
2024-10-19 22:50 ` Jens Axboe
2024-10-23 12:42 ` John Garry
2024-10-23 12:50 ` Carlos Maiolino
2024-10-24 6:32 ` Ojaswin Mujoo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).