* [PATCH v9 1/8] block/fs: Pass an iocb to generic_atomic_write_valid()
2024-10-16 10:03 [PATCH v9 0/8] block atomic writes for xfs John Garry
@ 2024-10-16 10:03 ` John Garry
2024-10-16 10:03 ` [PATCH v9 2/8] fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid() John Garry
` (7 subsequent siblings)
8 siblings, 0 replies; 22+ messages in thread
From: John Garry @ 2024-10-16 10:03 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Darrick and Hannes both thought it better that generic_atomic_write_valid()
should be passed a struct iocb, and not just the member of that struct
which is referenced; see [0] and [1].
I think that makes a more generic and clean API, so make that change.
[0] https://lore.kernel.org/linux-block/680ce641-729b-4150-b875-531a98657682@suse.de/
[1] https://lore.kernel.org/linux-xfs/20240620212401.GA3058325@frogsfrogsfrogs/
Fixes: c34fc6f26ab8 ("fs: Initial atomic write support")
Suggested-by: "Darrick J. Wong" <djwong@kernel.org>
Suggested-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
block/fops.c | 8 ++++----
fs/read_write.c | 4 ++--
include/linux/fs.h | 2 +-
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/block/fops.c b/block/fops.c
index e696ae53bf1e..968b47b615c4 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -35,13 +35,13 @@ static blk_opf_t dio_bio_write_op(struct kiocb *iocb)
return opf;
}
-static bool blkdev_dio_invalid(struct block_device *bdev, loff_t pos,
+static bool blkdev_dio_invalid(struct block_device *bdev, struct kiocb *iocb,
struct iov_iter *iter, bool is_atomic)
{
- if (is_atomic && !generic_atomic_write_valid(iter, pos))
+ if (is_atomic && !generic_atomic_write_valid(iocb, iter))
return true;
- return pos & (bdev_logical_block_size(bdev) - 1) ||
+ return iocb->ki_pos & (bdev_logical_block_size(bdev) - 1) ||
!bdev_iter_is_aligned(bdev, iter);
}
@@ -374,7 +374,7 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
if (!iov_iter_count(iter))
return 0;
- if (blkdev_dio_invalid(bdev, iocb->ki_pos, iter, is_atomic))
+ if (blkdev_dio_invalid(bdev, iocb, iter, is_atomic))
return -EINVAL;
nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS + 1);
diff --git a/fs/read_write.c b/fs/read_write.c
index 64dc24afdb3a..2c3263530828 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1830,7 +1830,7 @@ int generic_file_rw_checks(struct file *file_in, struct file *file_out)
return 0;
}
-bool generic_atomic_write_valid(struct iov_iter *iter, loff_t pos)
+bool generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter)
{
size_t len = iov_iter_count(iter);
@@ -1840,7 +1840,7 @@ bool generic_atomic_write_valid(struct iov_iter *iter, loff_t pos)
if (!is_power_of_2(len))
return false;
- if (!IS_ALIGNED(pos, len))
+ if (!IS_ALIGNED(iocb->ki_pos, len))
return false;
return true;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e3c603d01337..fbfa032d1d90 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3721,6 +3721,6 @@ static inline bool vfs_empty_path(int dfd, const char __user *path)
return !c;
}
-bool generic_atomic_write_valid(struct iov_iter *iter, loff_t pos);
+bool generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter);
#endif /* _LINUX_FS_H */
--
2.31.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v9 2/8] fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid()
2024-10-16 10:03 [PATCH v9 0/8] block atomic writes for xfs John Garry
2024-10-16 10:03 ` [PATCH v9 1/8] block/fs: Pass an iocb to generic_atomic_write_valid() John Garry
@ 2024-10-16 10:03 ` John Garry
2024-10-16 19:57 ` Darrick J. Wong
2024-10-16 10:03 ` [PATCH v9 3/8] block: Add bdev atomic write limits helpers John Garry
` (6 subsequent siblings)
8 siblings, 1 reply; 22+ messages in thread
From: John Garry @ 2024-10-16 10:03 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Currently FMODE_CAN_ATOMIC_WRITE is set if the bdev can atomic write and
the file is open for direct IO. This does not work if the file is not
opened for direct IO, yet fcntl(O_DIRECT) is used on the fd later.
Change to check for direct IO on a per-IO basis in
generic_atomic_write_valid(). Since we want to report -EOPNOTSUPP for
non-direct IO for an atomic write, change to return an error code.
Relocate the block fops atomic write checks to the common write path, as to
catch non-direct IO.
Fixes: c34fc6f26ab8 ("fs: Initial atomic write support")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
block/fops.c | 18 ++++++++++--------
fs/read_write.c | 13 ++++++++-----
include/linux/fs.h | 2 +-
3 files changed, 19 insertions(+), 14 deletions(-)
diff --git a/block/fops.c b/block/fops.c
index 968b47b615c4..2d01c9007681 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -36,11 +36,8 @@ static blk_opf_t dio_bio_write_op(struct kiocb *iocb)
}
static bool blkdev_dio_invalid(struct block_device *bdev, struct kiocb *iocb,
- struct iov_iter *iter, bool is_atomic)
+ struct iov_iter *iter)
{
- if (is_atomic && !generic_atomic_write_valid(iocb, iter))
- return true;
-
return iocb->ki_pos & (bdev_logical_block_size(bdev) - 1) ||
!bdev_iter_is_aligned(bdev, iter);
}
@@ -368,13 +365,12 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
{
struct block_device *bdev = I_BDEV(iocb->ki_filp->f_mapping->host);
- bool is_atomic = iocb->ki_flags & IOCB_ATOMIC;
unsigned int nr_pages;
if (!iov_iter_count(iter))
return 0;
- if (blkdev_dio_invalid(bdev, iocb, iter, is_atomic))
+ if (blkdev_dio_invalid(bdev, iocb, iter))
return -EINVAL;
nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS + 1);
@@ -383,7 +379,7 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
return __blkdev_direct_IO_simple(iocb, iter, bdev,
nr_pages);
return __blkdev_direct_IO_async(iocb, iter, bdev, nr_pages);
- } else if (is_atomic) {
+ } else if (iocb->ki_flags & IOCB_ATOMIC) {
return -EINVAL;
}
return __blkdev_direct_IO(iocb, iter, bdev, bio_max_segs(nr_pages));
@@ -625,7 +621,7 @@ static int blkdev_open(struct inode *inode, struct file *filp)
if (!bdev)
return -ENXIO;
- if (bdev_can_atomic_write(bdev) && filp->f_flags & O_DIRECT)
+ if (bdev_can_atomic_write(bdev))
filp->f_mode |= FMODE_CAN_ATOMIC_WRITE;
ret = bdev_open(bdev, mode, filp->private_data, NULL, filp);
@@ -700,6 +696,12 @@ static ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT)
return -EOPNOTSUPP;
+ if (iocb->ki_flags & IOCB_ATOMIC) {
+ ret = generic_atomic_write_valid(iocb, from);
+ if (ret)
+ return ret;
+ }
+
size -= iocb->ki_pos;
if (iov_iter_count(from) > size) {
shorted = iov_iter_count(from) - size;
diff --git a/fs/read_write.c b/fs/read_write.c
index 2c3263530828..befec0b5c537 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1830,18 +1830,21 @@ int generic_file_rw_checks(struct file *file_in, struct file *file_out)
return 0;
}
-bool generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter)
+int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter)
{
size_t len = iov_iter_count(iter);
if (!iter_is_ubuf(iter))
- return false;
+ return -EINVAL;
if (!is_power_of_2(len))
- return false;
+ return -EINVAL;
if (!IS_ALIGNED(iocb->ki_pos, len))
- return false;
+ return -EINVAL;
- return true;
+ if (!(iocb->ki_flags & IOCB_DIRECT))
+ return -EOPNOTSUPP;
+
+ return 0;
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fbfa032d1d90..ba47fb283730 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3721,6 +3721,6 @@ static inline bool vfs_empty_path(int dfd, const char __user *path)
return !c;
}
-bool generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter);
+int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter);
#endif /* _LINUX_FS_H */
--
2.31.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v9 2/8] fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid()
2024-10-16 10:03 ` [PATCH v9 2/8] fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid() John Garry
@ 2024-10-16 19:57 ` Darrick J. Wong
0 siblings, 0 replies; 22+ messages in thread
From: Darrick J. Wong @ 2024-10-16 19:57 UTC (permalink / raw)
To: John Garry
Cc: axboe, brauner, viro, jack, dchinner, hch, cem, linux-block,
linux-kernel, linux-xfs, linux-fsdevel, hare, martin.petersen,
catherine.hoang, mcgrof, ritesh.list, ojaswin
On Wed, Oct 16, 2024 at 10:03:19AM +0000, John Garry wrote:
> Currently FMODE_CAN_ATOMIC_WRITE is set if the bdev can atomic write and
> the file is open for direct IO. This does not work if the file is not
> opened for direct IO, yet fcntl(O_DIRECT) is used on the fd later.
>
> Change to check for direct IO on a per-IO basis in
> generic_atomic_write_valid(). Since we want to report -EOPNOTSUPP for
> non-direct IO for an atomic write, change to return an error code.
>
> Relocate the block fops atomic write checks to the common write path, as to
> catch non-direct IO.
>
> Fixes: c34fc6f26ab8 ("fs: Initial atomic write support")
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
Looks good to me,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> ---
> block/fops.c | 18 ++++++++++--------
> fs/read_write.c | 13 ++++++++-----
> include/linux/fs.h | 2 +-
> 3 files changed, 19 insertions(+), 14 deletions(-)
>
> diff --git a/block/fops.c b/block/fops.c
> index 968b47b615c4..2d01c9007681 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -36,11 +36,8 @@ static blk_opf_t dio_bio_write_op(struct kiocb *iocb)
> }
>
> static bool blkdev_dio_invalid(struct block_device *bdev, struct kiocb *iocb,
> - struct iov_iter *iter, bool is_atomic)
> + struct iov_iter *iter)
> {
> - if (is_atomic && !generic_atomic_write_valid(iocb, iter))
> - return true;
> -
> return iocb->ki_pos & (bdev_logical_block_size(bdev) - 1) ||
> !bdev_iter_is_aligned(bdev, iter);
> }
> @@ -368,13 +365,12 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
> static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
> {
> struct block_device *bdev = I_BDEV(iocb->ki_filp->f_mapping->host);
> - bool is_atomic = iocb->ki_flags & IOCB_ATOMIC;
> unsigned int nr_pages;
>
> if (!iov_iter_count(iter))
> return 0;
>
> - if (blkdev_dio_invalid(bdev, iocb, iter, is_atomic))
> + if (blkdev_dio_invalid(bdev, iocb, iter))
> return -EINVAL;
>
> nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS + 1);
> @@ -383,7 +379,7 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
> return __blkdev_direct_IO_simple(iocb, iter, bdev,
> nr_pages);
> return __blkdev_direct_IO_async(iocb, iter, bdev, nr_pages);
> - } else if (is_atomic) {
> + } else if (iocb->ki_flags & IOCB_ATOMIC) {
> return -EINVAL;
> }
> return __blkdev_direct_IO(iocb, iter, bdev, bio_max_segs(nr_pages));
> @@ -625,7 +621,7 @@ static int blkdev_open(struct inode *inode, struct file *filp)
> if (!bdev)
> return -ENXIO;
>
> - if (bdev_can_atomic_write(bdev) && filp->f_flags & O_DIRECT)
> + if (bdev_can_atomic_write(bdev))
> filp->f_mode |= FMODE_CAN_ATOMIC_WRITE;
>
> ret = bdev_open(bdev, mode, filp->private_data, NULL, filp);
> @@ -700,6 +696,12 @@ static ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
> if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT)
> return -EOPNOTSUPP;
>
> + if (iocb->ki_flags & IOCB_ATOMIC) {
> + ret = generic_atomic_write_valid(iocb, from);
> + if (ret)
> + return ret;
> + }
> +
> size -= iocb->ki_pos;
> if (iov_iter_count(from) > size) {
> shorted = iov_iter_count(from) - size;
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 2c3263530828..befec0b5c537 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1830,18 +1830,21 @@ int generic_file_rw_checks(struct file *file_in, struct file *file_out)
> return 0;
> }
>
> -bool generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter)
> +int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter)
> {
> size_t len = iov_iter_count(iter);
>
> if (!iter_is_ubuf(iter))
> - return false;
> + return -EINVAL;
>
> if (!is_power_of_2(len))
> - return false;
> + return -EINVAL;
>
> if (!IS_ALIGNED(iocb->ki_pos, len))
> - return false;
> + return -EINVAL;
>
> - return true;
> + if (!(iocb->ki_flags & IOCB_DIRECT))
> + return -EOPNOTSUPP;
> +
> + return 0;
> }
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index fbfa032d1d90..ba47fb283730 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3721,6 +3721,6 @@ static inline bool vfs_empty_path(int dfd, const char __user *path)
> return !c;
> }
>
> -bool generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter);
> +int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter);
>
> #endif /* _LINUX_FS_H */
> --
> 2.31.1
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v9 3/8] block: Add bdev atomic write limits helpers
2024-10-16 10:03 [PATCH v9 0/8] block atomic writes for xfs John Garry
2024-10-16 10:03 ` [PATCH v9 1/8] block/fs: Pass an iocb to generic_atomic_write_valid() John Garry
2024-10-16 10:03 ` [PATCH v9 2/8] fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid() John Garry
@ 2024-10-16 10:03 ` John Garry
2024-10-16 12:29 ` Christoph Hellwig
2024-10-16 19:58 ` Darrick J. Wong
2024-10-16 10:03 ` [PATCH v9 4/8] fs: Export generic_atomic_write_valid() John Garry
` (5 subsequent siblings)
8 siblings, 2 replies; 22+ messages in thread
From: John Garry @ 2024-10-16 10:03 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Add helpers to get atomic write limits for a bdev, so that we don't access
request_queue helpers outside the block layer.
We check if the bdev can actually atomic write in these helpers, so we
can avoid users missing using this check.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
include/linux/blkdev.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 50c3b959da28..c2cc3c146d74 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1674,6 +1674,22 @@ static inline bool bdev_can_atomic_write(struct block_device *bdev)
return true;
}
+static inline unsigned int
+bdev_atomic_write_unit_min_bytes(struct block_device *bdev)
+{
+ if (!bdev_can_atomic_write(bdev))
+ return 0;
+ return queue_atomic_write_unit_min_bytes(bdev_get_queue(bdev));
+}
+
+static inline unsigned int
+bdev_atomic_write_unit_max_bytes(struct block_device *bdev)
+{
+ if (!bdev_can_atomic_write(bdev))
+ return 0;
+ return queue_atomic_write_unit_max_bytes(bdev_get_queue(bdev));
+}
+
#define DEFINE_IO_COMP_BATCH(name) struct io_comp_batch name = { }
#endif /* _LINUX_BLKDEV_H */
--
2.31.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v9 3/8] block: Add bdev atomic write limits helpers
2024-10-16 10:03 ` [PATCH v9 3/8] block: Add bdev atomic write limits helpers John Garry
@ 2024-10-16 12:29 ` Christoph Hellwig
2024-10-16 12:36 ` John Garry
2024-10-16 19:58 ` Darrick J. Wong
1 sibling, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2024-10-16 12:29 UTC (permalink / raw)
To: John Garry
Cc: axboe, brauner, djwong, viro, jack, dchinner, hch, cem,
linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin
On Wed, Oct 16, 2024 at 10:03:20AM +0000, John Garry wrote:
> +static inline unsigned int
> +bdev_atomic_write_unit_min_bytes(struct block_device *bdev)
> +{
> + if (!bdev_can_atomic_write(bdev))
> + return 0;
Aren't the limits always zero when the block device doesn't support
atomic writes and we can skip these checks?
Otherwise looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v9 3/8] block: Add bdev atomic write limits helpers
2024-10-16 12:29 ` Christoph Hellwig
@ 2024-10-16 12:36 ` John Garry
2024-10-16 12:39 ` Christoph Hellwig
0 siblings, 1 reply; 22+ messages in thread
From: John Garry @ 2024-10-16 12:36 UTC (permalink / raw)
To: Christoph Hellwig
Cc: axboe, brauner, djwong, viro, jack, dchinner, cem, linux-block,
linux-kernel, linux-xfs, linux-fsdevel, hare, martin.petersen,
catherine.hoang, mcgrof, ritesh.list, ojaswin
On 16/10/2024 13:29, Christoph Hellwig wrote:
> On Wed, Oct 16, 2024 at 10:03:20AM +0000, John Garry wrote:
>> +static inline unsigned int
>> +bdev_atomic_write_unit_min_bytes(struct block_device *bdev)
>> +{
>> + if (!bdev_can_atomic_write(bdev))
>> + return 0;
>
> Aren't the limits always zero when the block device doesn't support
> atomic writes and we can skip these checks?
Not necessarily. bdev_can_atomic_write() has the partition alignment
checks also.
>
> Otherwise looks good:
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
>
thanks, but let me know if you would still like to see any changes.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v9 3/8] block: Add bdev atomic write limits helpers
2024-10-16 12:36 ` John Garry
@ 2024-10-16 12:39 ` Christoph Hellwig
0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2024-10-16 12:39 UTC (permalink / raw)
To: John Garry
Cc: Christoph Hellwig, axboe, brauner, djwong, viro, jack, dchinner,
cem, linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin
On Wed, Oct 16, 2024 at 01:36:40PM +0100, John Garry wrote:
> thanks, but let me know if you would still like to see any changes.
Sounds like this is fine.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v9 3/8] block: Add bdev atomic write limits helpers
2024-10-16 10:03 ` [PATCH v9 3/8] block: Add bdev atomic write limits helpers John Garry
2024-10-16 12:29 ` Christoph Hellwig
@ 2024-10-16 19:58 ` Darrick J. Wong
1 sibling, 0 replies; 22+ messages in thread
From: Darrick J. Wong @ 2024-10-16 19:58 UTC (permalink / raw)
To: John Garry
Cc: axboe, brauner, viro, jack, dchinner, hch, cem, linux-block,
linux-kernel, linux-xfs, linux-fsdevel, hare, martin.petersen,
catherine.hoang, mcgrof, ritesh.list, ojaswin
On Wed, Oct 16, 2024 at 10:03:20AM +0000, John Garry wrote:
> Add helpers to get atomic write limits for a bdev, so that we don't access
> request_queue helpers outside the block layer.
>
> We check if the bdev can actually atomic write in these helpers, so we
> can avoid users missing using this check.
>
> Suggested-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
Looks good,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> ---
> include/linux/blkdev.h | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 50c3b959da28..c2cc3c146d74 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1674,6 +1674,22 @@ static inline bool bdev_can_atomic_write(struct block_device *bdev)
> return true;
> }
>
> +static inline unsigned int
> +bdev_atomic_write_unit_min_bytes(struct block_device *bdev)
> +{
> + if (!bdev_can_atomic_write(bdev))
> + return 0;
> + return queue_atomic_write_unit_min_bytes(bdev_get_queue(bdev));
> +}
> +
> +static inline unsigned int
> +bdev_atomic_write_unit_max_bytes(struct block_device *bdev)
> +{
> + if (!bdev_can_atomic_write(bdev))
> + return 0;
> + return queue_atomic_write_unit_max_bytes(bdev_get_queue(bdev));
> +}
> +
> #define DEFINE_IO_COMP_BATCH(name) struct io_comp_batch name = { }
>
> #endif /* _LINUX_BLKDEV_H */
> --
> 2.31.1
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v9 4/8] fs: Export generic_atomic_write_valid()
2024-10-16 10:03 [PATCH v9 0/8] block atomic writes for xfs John Garry
` (2 preceding siblings ...)
2024-10-16 10:03 ` [PATCH v9 3/8] block: Add bdev atomic write limits helpers John Garry
@ 2024-10-16 10:03 ` John Garry
2024-10-16 10:03 ` [PATCH v9 5/8] fs: iomap: Atomic write support John Garry
` (4 subsequent siblings)
8 siblings, 0 replies; 22+ messages in thread
From: John Garry @ 2024-10-16 10:03 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
The XFS code will need this.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
fs/read_write.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/read_write.c b/fs/read_write.c
index befec0b5c537..3e5dad12a5b4 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1848,3 +1848,4 @@ int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter)
return 0;
}
+EXPORT_SYMBOL_GPL(generic_atomic_write_valid);
--
2.31.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v9 5/8] fs: iomap: Atomic write support
2024-10-16 10:03 [PATCH v9 0/8] block atomic writes for xfs John Garry
` (3 preceding siblings ...)
2024-10-16 10:03 ` [PATCH v9 4/8] fs: Export generic_atomic_write_valid() John Garry
@ 2024-10-16 10:03 ` John Garry
2024-10-16 20:03 ` Darrick J. Wong
2024-10-16 10:03 ` [PATCH v9 6/8] xfs: Support atomic write for statx John Garry
` (3 subsequent siblings)
8 siblings, 1 reply; 22+ messages in thread
From: John Garry @ 2024-10-16 10:03 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Support direct I/O atomic writes by producing a single bio with REQ_ATOMIC
flag set.
Initially FSes (XFS) should only support writing a single FS block
atomically.
As with any atomic write, we should produce a single bio which covers the
complete write length.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
.../filesystems/iomap/operations.rst | 11 ++++++
fs/iomap/direct-io.c | 38 +++++++++++++++++--
fs/iomap/trace.h | 3 +-
include/linux/iomap.h | 1 +
4 files changed, 48 insertions(+), 5 deletions(-)
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index b93115ab8748..5f382076db67 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -513,6 +513,17 @@ IOMAP_WRITE`` with any combination of the following enhancements:
if the mapping is unwritten and the filesystem cannot handle zeroing
the unaligned regions without exposing stale contents.
+ * ``IOMAP_ATOMIC``: This write is being issued with torn-write
+ protection. Only a single bio can be created for the write, and the
+ write must not be split into multiple I/O requests, i.e. flag
+ REQ_ATOMIC must be set.
+ The file range to write must be aligned to satisfy the requirements
+ of both the filesystem and the underlying block device's atomic
+ commit capabilities.
+ If filesystem metadata updates are required (e.g. unwritten extent
+ conversion or copy on write), all updates for the entire file range
+ must be committed atomically as well.
+
Callers commonly hold ``i_rwsem`` in shared or exclusive mode before
calling this function.
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index f637aa0706a3..ed4764e3b8f0 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -271,7 +271,7 @@ static int iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
* clearing the WRITE_THROUGH flag in the dio request.
*/
static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
- const struct iomap *iomap, bool use_fua)
+ const struct iomap *iomap, bool use_fua, bool atomic)
{
blk_opf_t opflags = REQ_SYNC | REQ_IDLE;
@@ -283,6 +283,8 @@ static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
opflags |= REQ_FUA;
else
dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
+ if (atomic)
+ opflags |= REQ_ATOMIC;
return opflags;
}
@@ -293,7 +295,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
const struct iomap *iomap = &iter->iomap;
struct inode *inode = iter->inode;
unsigned int fs_block_size = i_blocksize(inode), pad;
- loff_t length = iomap_length(iter);
+ const loff_t length = iomap_length(iter);
+ bool atomic = iter->flags & IOMAP_ATOMIC;
loff_t pos = iter->pos;
blk_opf_t bio_opf;
struct bio *bio;
@@ -303,6 +306,9 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
size_t copied = 0;
size_t orig_count;
+ if (atomic && length != fs_block_size)
+ return -EINVAL;
+
if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1) ||
!bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
return -EINVAL;
@@ -382,7 +388,7 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
* can set up the page vector appropriately for a ZONE_APPEND
* operation.
*/
- bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua);
+ bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua, atomic);
nr_pages = bio_iov_vecs_to_alloc(dio->submit.iter, BIO_MAX_VECS);
do {
@@ -415,6 +421,17 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
}
n = bio->bi_iter.bi_size;
+ if (WARN_ON_ONCE(atomic && n != length)) {
+ /*
+ * This bio should have covered the complete length,
+ * which it doesn't, so error. We may need to zero out
+ * the tail (complete FS block), similar to when
+ * bio_iov_iter_get_pages() returns an error, above.
+ */
+ ret = -EINVAL;
+ bio_put(bio);
+ goto zero_tail;
+ }
if (dio->flags & IOMAP_DIO_WRITE) {
task_io_account_write(n);
} else {
@@ -598,6 +615,9 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (iocb->ki_flags & IOCB_NOWAIT)
iomi.flags |= IOMAP_NOWAIT;
+ if (iocb->ki_flags & IOCB_ATOMIC)
+ iomi.flags |= IOMAP_ATOMIC;
+
if (iov_iter_rw(iter) == READ) {
/* reads can always complete inline */
dio->flags |= IOMAP_DIO_INLINE_COMP;
@@ -659,7 +679,17 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (ret != -EAGAIN) {
trace_iomap_dio_invalidate_fail(inode, iomi.pos,
iomi.len);
- ret = -ENOTBLK;
+ if (iocb->ki_flags & IOCB_ATOMIC) {
+ /*
+ * folio invalidation failed, maybe
+ * this is transient, unlock and see if
+ * the caller tries again.
+ */
+ ret = -EAGAIN;
+ } else {
+ /* fall back to buffered write */
+ ret = -ENOTBLK;
+ }
}
goto out_free_dio;
}
diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
index 0a991c4ce87d..4118a42cdab0 100644
--- a/fs/iomap/trace.h
+++ b/fs/iomap/trace.h
@@ -98,7 +98,8 @@ DEFINE_RANGE_EVENT(iomap_dio_rw_queued);
{ IOMAP_REPORT, "REPORT" }, \
{ IOMAP_FAULT, "FAULT" }, \
{ IOMAP_DIRECT, "DIRECT" }, \
- { IOMAP_NOWAIT, "NOWAIT" }
+ { IOMAP_NOWAIT, "NOWAIT" }, \
+ { IOMAP_ATOMIC, "ATOMIC" }
#define IOMAP_F_FLAGS_STRINGS \
{ IOMAP_F_NEW, "NEW" }, \
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index d0420e962ffd..84282db3e4c1 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -178,6 +178,7 @@ struct iomap_folio_ops {
#else
#define IOMAP_DAX 0
#endif /* CONFIG_FS_DAX */
+#define IOMAP_ATOMIC (1 << 9)
struct iomap_ops {
/*
--
2.31.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v9 5/8] fs: iomap: Atomic write support
2024-10-16 10:03 ` [PATCH v9 5/8] fs: iomap: Atomic write support John Garry
@ 2024-10-16 20:03 ` Darrick J. Wong
2024-10-17 7:57 ` John Garry
0 siblings, 1 reply; 22+ messages in thread
From: Darrick J. Wong @ 2024-10-16 20:03 UTC (permalink / raw)
To: John Garry
Cc: axboe, brauner, viro, jack, dchinner, hch, cem, linux-block,
linux-kernel, linux-xfs, linux-fsdevel, hare, martin.petersen,
catherine.hoang, mcgrof, ritesh.list, ojaswin
On Wed, Oct 16, 2024 at 10:03:22AM +0000, John Garry wrote:
> Support direct I/O atomic writes by producing a single bio with REQ_ATOMIC
> flag set.
>
> Initially FSes (XFS) should only support writing a single FS block
> atomically.
>
> As with any atomic write, we should produce a single bio which covers the
> complete write length.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
> .../filesystems/iomap/operations.rst | 11 ++++++
> fs/iomap/direct-io.c | 38 +++++++++++++++++--
> fs/iomap/trace.h | 3 +-
> include/linux/iomap.h | 1 +
> 4 files changed, 48 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
> index b93115ab8748..5f382076db67 100644
> --- a/Documentation/filesystems/iomap/operations.rst
> +++ b/Documentation/filesystems/iomap/operations.rst
> @@ -513,6 +513,17 @@ IOMAP_WRITE`` with any combination of the following enhancements:
> if the mapping is unwritten and the filesystem cannot handle zeroing
> the unaligned regions without exposing stale contents.
>
> + * ``IOMAP_ATOMIC``: This write is being issued with torn-write
> + protection. Only a single bio can be created for the write, and the
Dumb nit: ^^ start new sentences on a new line like the rest of
the file, please.
With that fixed,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> + write must not be split into multiple I/O requests, i.e. flag
> + REQ_ATOMIC must be set.
> + The file range to write must be aligned to satisfy the requirements
> + of both the filesystem and the underlying block device's atomic
> + commit capabilities.
> + If filesystem metadata updates are required (e.g. unwritten extent
> + conversion or copy on write), all updates for the entire file range
> + must be committed atomically as well.
> +
> Callers commonly hold ``i_rwsem`` in shared or exclusive mode before
> calling this function.
>
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index f637aa0706a3..ed4764e3b8f0 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -271,7 +271,7 @@ static int iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
> * clearing the WRITE_THROUGH flag in the dio request.
> */
> static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
> - const struct iomap *iomap, bool use_fua)
> + const struct iomap *iomap, bool use_fua, bool atomic)
> {
> blk_opf_t opflags = REQ_SYNC | REQ_IDLE;
>
> @@ -283,6 +283,8 @@ static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
> opflags |= REQ_FUA;
> else
> dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
> + if (atomic)
> + opflags |= REQ_ATOMIC;
>
> return opflags;
> }
> @@ -293,7 +295,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> const struct iomap *iomap = &iter->iomap;
> struct inode *inode = iter->inode;
> unsigned int fs_block_size = i_blocksize(inode), pad;
> - loff_t length = iomap_length(iter);
> + const loff_t length = iomap_length(iter);
> + bool atomic = iter->flags & IOMAP_ATOMIC;
> loff_t pos = iter->pos;
> blk_opf_t bio_opf;
> struct bio *bio;
> @@ -303,6 +306,9 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> size_t copied = 0;
> size_t orig_count;
>
> + if (atomic && length != fs_block_size)
> + return -EINVAL;
> +
> if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1) ||
> !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
> return -EINVAL;
> @@ -382,7 +388,7 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> * can set up the page vector appropriately for a ZONE_APPEND
> * operation.
> */
> - bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua);
> + bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua, atomic);
>
> nr_pages = bio_iov_vecs_to_alloc(dio->submit.iter, BIO_MAX_VECS);
> do {
> @@ -415,6 +421,17 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> }
>
> n = bio->bi_iter.bi_size;
> + if (WARN_ON_ONCE(atomic && n != length)) {
> + /*
> + * This bio should have covered the complete length,
> + * which it doesn't, so error. We may need to zero out
> + * the tail (complete FS block), similar to when
> + * bio_iov_iter_get_pages() returns an error, above.
> + */
> + ret = -EINVAL;
> + bio_put(bio);
> + goto zero_tail;
> + }
> if (dio->flags & IOMAP_DIO_WRITE) {
> task_io_account_write(n);
> } else {
> @@ -598,6 +615,9 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> if (iocb->ki_flags & IOCB_NOWAIT)
> iomi.flags |= IOMAP_NOWAIT;
>
> + if (iocb->ki_flags & IOCB_ATOMIC)
> + iomi.flags |= IOMAP_ATOMIC;
> +
> if (iov_iter_rw(iter) == READ) {
> /* reads can always complete inline */
> dio->flags |= IOMAP_DIO_INLINE_COMP;
> @@ -659,7 +679,17 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> if (ret != -EAGAIN) {
> trace_iomap_dio_invalidate_fail(inode, iomi.pos,
> iomi.len);
> - ret = -ENOTBLK;
> + if (iocb->ki_flags & IOCB_ATOMIC) {
> + /*
> + * folio invalidation failed, maybe
> + * this is transient, unlock and see if
> + * the caller tries again.
> + */
> + ret = -EAGAIN;
> + } else {
> + /* fall back to buffered write */
> + ret = -ENOTBLK;
> + }
> }
> goto out_free_dio;
> }
> diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
> index 0a991c4ce87d..4118a42cdab0 100644
> --- a/fs/iomap/trace.h
> +++ b/fs/iomap/trace.h
> @@ -98,7 +98,8 @@ DEFINE_RANGE_EVENT(iomap_dio_rw_queued);
> { IOMAP_REPORT, "REPORT" }, \
> { IOMAP_FAULT, "FAULT" }, \
> { IOMAP_DIRECT, "DIRECT" }, \
> - { IOMAP_NOWAIT, "NOWAIT" }
> + { IOMAP_NOWAIT, "NOWAIT" }, \
> + { IOMAP_ATOMIC, "ATOMIC" }
>
> #define IOMAP_F_FLAGS_STRINGS \
> { IOMAP_F_NEW, "NEW" }, \
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index d0420e962ffd..84282db3e4c1 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -178,6 +178,7 @@ struct iomap_folio_ops {
> #else
> #define IOMAP_DAX 0
> #endif /* CONFIG_FS_DAX */
> +#define IOMAP_ATOMIC (1 << 9)
>
> struct iomap_ops {
> /*
> --
> 2.31.1
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v9 5/8] fs: iomap: Atomic write support
2024-10-16 20:03 ` Darrick J. Wong
@ 2024-10-17 7:57 ` John Garry
0 siblings, 0 replies; 22+ messages in thread
From: John Garry @ 2024-10-17 7:57 UTC (permalink / raw)
To: Darrick J. Wong
Cc: axboe, brauner, viro, jack, dchinner, hch, cem, linux-block,
linux-kernel, linux-xfs, linux-fsdevel, hare, martin.petersen,
catherine.hoang, mcgrof, ritesh.list, ojaswin
On 16/10/2024 21:03, Darrick J. Wong wrote:
>> diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
>> index b93115ab8748..5f382076db67 100644
>> --- a/Documentation/filesystems/iomap/operations.rst
>> +++ b/Documentation/filesystems/iomap/operations.rst
>> @@ -513,6 +513,17 @@ IOMAP_WRITE`` with any combination of the following enhancements:
>> if the mapping is unwritten and the filesystem cannot handle zeroing
>> the unaligned regions without exposing stale contents.
>>
>> + * ``IOMAP_ATOMIC``: This write is being issued with torn-write
>> + protection. Only a single bio can be created for the write, and the
> Dumb nit: ^^ start new sentences on a new line like the rest of
> the file, please.
>
ok, np
> With that fixed,
> Reviewed-by: Darrick J. Wong<djwong@kernel.org>
cheers
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v9 6/8] xfs: Support atomic write for statx
2024-10-16 10:03 [PATCH v9 0/8] block atomic writes for xfs John Garry
` (4 preceding siblings ...)
2024-10-16 10:03 ` [PATCH v9 5/8] fs: iomap: Atomic write support John Garry
@ 2024-10-16 10:03 ` John Garry
2024-10-16 10:03 ` [PATCH v9 7/8] xfs: Validate atomic writes John Garry
` (2 subsequent siblings)
8 siblings, 0 replies; 22+ messages in thread
From: John Garry @ 2024-10-16 10:03 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Support providing info on atomic write unit min and max for an inode.
For simplicity, currently we limit the min at the FS block size. As for
max, we limit also at FS block size, as there is no current method to
guarantee extent alignment or granularity for regular files.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
fs/xfs/xfs_buf.c | 7 +++++++
fs/xfs/xfs_buf.h | 4 ++++
fs/xfs/xfs_inode.h | 15 +++++++++++++++
fs/xfs/xfs_iops.c | 22 ++++++++++++++++++++++
4 files changed, 48 insertions(+)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index aa4dbda7b536..e8196f5778e2 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -2115,6 +2115,13 @@ xfs_alloc_buftarg(
btp->bt_daxdev = fs_dax_get_by_bdev(btp->bt_bdev, &btp->bt_dax_part_off,
mp, ops);
+ if (bdev_can_atomic_write(btp->bt_bdev)) {
+ btp->bt_bdev_awu_min = bdev_atomic_write_unit_min_bytes(
+ btp->bt_bdev);
+ btp->bt_bdev_awu_max = bdev_atomic_write_unit_max_bytes(
+ btp->bt_bdev);
+ }
+
/*
* When allocating the buftargs we have not yet read the super block and
* thus don't know the file system sector size yet.
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 209a389f2abc..3d56bc7a35cc 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -124,6 +124,10 @@ struct xfs_buftarg {
struct percpu_counter bt_io_count;
struct ratelimit_state bt_ioerror_rl;
+ /* Atomic write unit values */
+ unsigned int bt_bdev_awu_min;
+ unsigned int bt_bdev_awu_max;
+
/* built-in cache, if we're not using the perag one */
struct xfs_buf_cache bt_cache[];
};
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 97ed912306fd..73009a25a119 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -327,6 +327,21 @@ static inline bool xfs_inode_has_bigrtalloc(struct xfs_inode *ip)
(XFS_IS_REALTIME_INODE(ip) ? \
(ip)->i_mount->m_rtdev_targp : (ip)->i_mount->m_ddev_targp)
+static inline bool
+xfs_inode_can_atomicwrite(
+ struct xfs_inode *ip)
+{
+ struct xfs_mount *mp = ip->i_mount;
+ struct xfs_buftarg *target = xfs_inode_buftarg(ip);
+
+ if (mp->m_sb.sb_blocksize < target->bt_bdev_awu_min)
+ return false;
+ if (mp->m_sb.sb_blocksize > target->bt_bdev_awu_max)
+ return false;
+
+ return true;
+}
+
/*
* In-core inode flags.
*/
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index ee79cf161312..5cd804812efd 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -570,6 +570,20 @@ xfs_stat_blksize(
return max_t(uint32_t, PAGE_SIZE, mp->m_sb.sb_blocksize);
}
+static void
+xfs_get_atomic_write_attr(
+ struct xfs_inode *ip,
+ unsigned int *unit_min,
+ unsigned int *unit_max)
+{
+ if (!xfs_inode_can_atomicwrite(ip)) {
+ *unit_min = *unit_max = 0;
+ return;
+ }
+
+ *unit_min = *unit_max = ip->i_mount->m_sb.sb_blocksize;
+}
+
STATIC int
xfs_vn_getattr(
struct mnt_idmap *idmap,
@@ -643,6 +657,14 @@ xfs_vn_getattr(
stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
stat->dio_offset_align = bdev_logical_block_size(bdev);
}
+ if (request_mask & STATX_WRITE_ATOMIC) {
+ unsigned int unit_min, unit_max;
+
+ xfs_get_atomic_write_attr(ip, &unit_min,
+ &unit_max);
+ generic_fill_statx_atomic_writes(stat,
+ unit_min, unit_max);
+ }
fallthrough;
default:
stat->blksize = xfs_stat_blksize(ip);
--
2.31.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v9 7/8] xfs: Validate atomic writes
2024-10-16 10:03 [PATCH v9 0/8] block atomic writes for xfs John Garry
` (5 preceding siblings ...)
2024-10-16 10:03 ` [PATCH v9 6/8] xfs: Support atomic write for statx John Garry
@ 2024-10-16 10:03 ` John Garry
2024-10-16 12:29 ` Christoph Hellwig
2024-10-16 20:13 ` Darrick J. Wong
2024-10-16 10:03 ` [PATCH v9 8/8] xfs: Support setting FMODE_CAN_ATOMIC_WRITE John Garry
2024-10-18 9:29 ` [PATCH v9 0/8] block atomic writes for xfs John Garry
8 siblings, 2 replies; 22+ messages in thread
From: John Garry @ 2024-10-16 10:03 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Validate that an atomic write adheres to length/offset rules. Currently
we can only write a single FS block.
For an IOCB with IOCB_ATOMIC set to get as far as xfs_file_write_iter(),
FMODE_CAN_ATOMIC_WRITE will need to be set for the file; for this,
ATOMICWRITES flags would also need to be set for the inode.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
fs/xfs/xfs_file.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index b19916b11fd5..1ccbc1eb75c9 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -852,6 +852,20 @@ xfs_file_write_iter(
if (IS_DAX(inode))
return xfs_file_dax_write(iocb, from);
+ if (iocb->ki_flags & IOCB_ATOMIC) {
+ /*
+ * Currently only atomic writing of a single FS block is
+ * supported. It would be possible to atomic write smaller than
+ * a FS block, but there is no requirement to support this.
+ * Note that iomap also does not support this yet.
+ */
+ if (ocount != ip->i_mount->m_sb.sb_blocksize)
+ return -EINVAL;
+ ret = generic_atomic_write_valid(iocb, from);
+ if (ret)
+ return ret;
+ }
+
if (iocb->ki_flags & IOCB_DIRECT) {
/*
* Allow a directio write to fall back to a buffered
--
2.31.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v9 7/8] xfs: Validate atomic writes
2024-10-16 10:03 ` [PATCH v9 7/8] xfs: Validate atomic writes John Garry
@ 2024-10-16 12:29 ` Christoph Hellwig
2024-10-16 20:13 ` Darrick J. Wong
1 sibling, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2024-10-16 12:29 UTC (permalink / raw)
To: John Garry
Cc: axboe, brauner, djwong, viro, jack, dchinner, hch, cem,
linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v9 7/8] xfs: Validate atomic writes
2024-10-16 10:03 ` [PATCH v9 7/8] xfs: Validate atomic writes John Garry
2024-10-16 12:29 ` Christoph Hellwig
@ 2024-10-16 20:13 ` Darrick J. Wong
1 sibling, 0 replies; 22+ messages in thread
From: Darrick J. Wong @ 2024-10-16 20:13 UTC (permalink / raw)
To: John Garry
Cc: axboe, brauner, viro, jack, dchinner, hch, cem, linux-block,
linux-kernel, linux-xfs, linux-fsdevel, hare, martin.petersen,
catherine.hoang, mcgrof, ritesh.list, ojaswin
On Wed, Oct 16, 2024 at 10:03:24AM +0000, John Garry wrote:
> Validate that an atomic write adheres to length/offset rules. Currently
> we can only write a single FS block.
>
> For an IOCB with IOCB_ATOMIC set to get as far as xfs_file_write_iter(),
> FMODE_CAN_ATOMIC_WRITE will need to be set for the file; for this,
> ATOMICWRITES flags would also need to be set for the inode.
>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
This looks ok to me now,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_file.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index b19916b11fd5..1ccbc1eb75c9 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -852,6 +852,20 @@ xfs_file_write_iter(
> if (IS_DAX(inode))
> return xfs_file_dax_write(iocb, from);
>
> + if (iocb->ki_flags & IOCB_ATOMIC) {
> + /*
> + * Currently only atomic writing of a single FS block is
> + * supported. It would be possible to atomic write smaller than
> + * a FS block, but there is no requirement to support this.
> + * Note that iomap also does not support this yet.
> + */
> + if (ocount != ip->i_mount->m_sb.sb_blocksize)
> + return -EINVAL;
> + ret = generic_atomic_write_valid(iocb, from);
> + if (ret)
> + return ret;
> + }
> +
> if (iocb->ki_flags & IOCB_DIRECT) {
> /*
> * Allow a directio write to fall back to a buffered
> --
> 2.31.1
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v9 8/8] xfs: Support setting FMODE_CAN_ATOMIC_WRITE
2024-10-16 10:03 [PATCH v9 0/8] block atomic writes for xfs John Garry
` (6 preceding siblings ...)
2024-10-16 10:03 ` [PATCH v9 7/8] xfs: Validate atomic writes John Garry
@ 2024-10-16 10:03 ` John Garry
2024-10-16 20:13 ` Darrick J. Wong
2024-10-18 9:29 ` [PATCH v9 0/8] block atomic writes for xfs John Garry
8 siblings, 1 reply; 22+ messages in thread
From: John Garry @ 2024-10-16 10:03 UTC (permalink / raw)
To: axboe, brauner, djwong, viro, jack, dchinner, hch, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
John Garry
Set FMODE_CAN_ATOMIC_WRITE flag if we can atomic write for that inode.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
fs/xfs/xfs_file.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 1ccbc1eb75c9..ca47cae5a40a 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1253,6 +1253,8 @@ xfs_file_open(
if (xfs_is_shutdown(XFS_M(inode->i_sb)))
return -EIO;
file->f_mode |= FMODE_NOWAIT | FMODE_CAN_ODIRECT;
+ if (xfs_inode_can_atomicwrite(XFS_I(inode)))
+ file->f_mode |= FMODE_CAN_ATOMIC_WRITE;
return generic_file_open(inode, file);
}
--
2.31.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v9 8/8] xfs: Support setting FMODE_CAN_ATOMIC_WRITE
2024-10-16 10:03 ` [PATCH v9 8/8] xfs: Support setting FMODE_CAN_ATOMIC_WRITE John Garry
@ 2024-10-16 20:13 ` Darrick J. Wong
0 siblings, 0 replies; 22+ messages in thread
From: Darrick J. Wong @ 2024-10-16 20:13 UTC (permalink / raw)
To: John Garry
Cc: axboe, brauner, viro, jack, dchinner, hch, cem, linux-block,
linux-kernel, linux-xfs, linux-fsdevel, hare, martin.petersen,
catherine.hoang, mcgrof, ritesh.list, ojaswin
On Wed, Oct 16, 2024 at 10:03:25AM +0000, John Garry wrote:
> Set FMODE_CAN_ATOMIC_WRITE flag if we can atomic write for that inode.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
Woot!
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_file.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 1ccbc1eb75c9..ca47cae5a40a 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1253,6 +1253,8 @@ xfs_file_open(
> if (xfs_is_shutdown(XFS_M(inode->i_sb)))
> return -EIO;
> file->f_mode |= FMODE_NOWAIT | FMODE_CAN_ODIRECT;
> + if (xfs_inode_can_atomicwrite(XFS_I(inode)))
> + file->f_mode |= FMODE_CAN_ATOMIC_WRITE;
> return generic_file_open(inode, file);
> }
>
> --
> 2.31.1
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v9 0/8] block atomic writes for xfs
2024-10-16 10:03 [PATCH v9 0/8] block atomic writes for xfs John Garry
` (7 preceding siblings ...)
2024-10-16 10:03 ` [PATCH v9 8/8] xfs: Support setting FMODE_CAN_ATOMIC_WRITE John Garry
@ 2024-10-18 9:29 ` John Garry
2024-10-18 13:41 ` Jens Axboe
8 siblings, 1 reply; 22+ messages in thread
From: John Garry @ 2024-10-18 9:29 UTC (permalink / raw)
To: axboe, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
brauner, djwong, viro, jack, dchinner, hch
On 16/10/2024 11:03, John Garry wrote:
Hi Jens,
There are block changes in this series. I was going to ask Carlos to
queue this work via the XFS tree, so can you let me know whether you
have any issue with those (block) changes. There is a fix included,
which I can manually backport to stable (if not autoselected).
Note that I still plan on sending a v10 for this series, to fix a small
documentation issue which Darrick noticed.
BTW, I was hoping to send non-RFCs patches for atomic write RAID support
soon, originally sent in:
https://lore.kernel.org/linux-block/20240919092302.3094725-1-john.g.garry@oracle.com/
https://lore.kernel.org/linux-block/20240903150748.2179966-1-john.g.garry@oracle.com/
They should not have any dependency or conflict with this series.
Thanks,
John
> This series expands atomic write support to filesystems, specifically
> XFS.
>
> Initially we will only support writing exactly 1x FS block atomically.
>
> Since we can now have FS block size > PAGE_SIZE for XFS, we can write
> atomically 4K+ blocks on x86.
>
...
> John Garry (8):
> block/fs: Pass an iocb to generic_atomic_write_valid()
> fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid()
> block: Add bdev atomic write limits helpers
> fs: Export generic_atomic_write_valid()
> fs: iomap: Atomic write support
> xfs: Support atomic write for statx
> xfs: Validate atomic writes
> xfs: Support setting FMODE_CAN_ATOMIC_WRITE
>
> .../filesystems/iomap/operations.rst | 11 ++++++
> block/fops.c | 22 ++++++-----
> fs/iomap/direct-io.c | 38 +++++++++++++++++--
> fs/iomap/trace.h | 3 +-
> fs/read_write.c | 16 +++++---
> fs/xfs/xfs_buf.c | 7 ++++
> fs/xfs/xfs_buf.h | 4 ++
> fs/xfs/xfs_file.c | 16 ++++++++
> fs/xfs/xfs_inode.h | 15 ++++++++
> fs/xfs/xfs_iops.c | 22 +++++++++++
> include/linux/blkdev.h | 16 ++++++++
> include/linux/fs.h | 2 +-
> include/linux/iomap.h | 1 +
> 13 files changed, 151 insertions(+), 22 deletions(-)
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v9 0/8] block atomic writes for xfs
2024-10-18 9:29 ` [PATCH v9 0/8] block atomic writes for xfs John Garry
@ 2024-10-18 13:41 ` Jens Axboe
2024-10-18 17:43 ` John Garry
0 siblings, 1 reply; 22+ messages in thread
From: Jens Axboe @ 2024-10-18 13:41 UTC (permalink / raw)
To: John Garry, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
brauner, djwong, viro, jack, dchinner, hch
On 10/18/24 3:29 AM, John Garry wrote:
> On 16/10/2024 11:03, John Garry wrote:
>
> Hi Jens,
>
> There are block changes in this series. I was going to ask Carlos to
> queue this work via the XFS tree, so can you let me know whether you
> have any issue with those (block) changes. There is a fix included,
> which I can manually backport to stable (if not autoselected).
>
> Note that I still plan on sending a v10 for this series, to fix a
> small documentation issue which Darrick noticed.
To avoid conflicts, let's do a separate branch that we can both pull in.
I'll take a closer look and set that up once you post v10.
--
Jens Axboe
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v9 0/8] block atomic writes for xfs
2024-10-18 13:41 ` Jens Axboe
@ 2024-10-18 17:43 ` John Garry
0 siblings, 0 replies; 22+ messages in thread
From: John Garry @ 2024-10-18 17:43 UTC (permalink / raw)
To: Jens Axboe, cem
Cc: linux-block, linux-kernel, linux-xfs, linux-fsdevel, hare,
martin.petersen, catherine.hoang, mcgrof, ritesh.list, ojaswin,
brauner, djwong, viro, jack, dchinner, hch
On 18/10/2024 14:41, Jens Axboe wrote:
>> There are block changes in this series. I was going to ask Carlos to
>> queue this work via the XFS tree, so can you let me know whether you
>> have any issue with those (block) changes. There is a fix included,
>> which I can manually backport to stable (if not autoselected).
>>
>> Note that I still plan on sending a v10 for this series, to fix a
>> small documentation issue which Darrick noticed.
> To avoid conflicts, let's do a separate branch that we can both pull in.
> I'll take a closer look and set that up once you post v10.
ok, I'll send v10 soon.
Thanks,
John
^ permalink raw reply [flat|nested] 22+ messages in thread