* [PATCH v2 1/2] btrfs: remove btrfs_dio_data::submitted
2026-06-08 3:02 [PATCH v2 0/2] btrfs: minor fixes and cleanups on direct IO path Qu Wenruo
@ 2026-06-08 3:02 ` Qu Wenruo
2026-06-08 3:02 ` [PATCH v2 2/2] btrfs: refactor btrfs_dio_iomap_end() Qu Wenruo
2026-06-08 11:25 ` [PATCH v2 0/2] btrfs: minor fixes and cleanups on direct IO path Filipe Manana
2 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2026-06-08 3:02 UTC (permalink / raw)
To: linux-btrfs
That member is to record how many bytes are submitted for a direct
read/write, utilized by iomap_end() callback to handle short IO cases.
However iomap_end() callback is already providing an internally tracked
@written member, which is doing the same accounting and providing the
same value as btrfs_dio_data::submitted.
There is no need to duplicate the work, just remove
btrfs_dio_data::submitted.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/direct-io.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/direct-io.c b/fs/btrfs/direct-io.c
index 460326d34143..b2add124cc89 100644
--- a/fs/btrfs/direct-io.c
+++ b/fs/btrfs/direct-io.c
@@ -14,7 +14,6 @@
#include "ordered-data.h"
struct btrfs_dio_data {
- ssize_t submitted;
loff_t old_isize;
struct extent_changeset *data_reserved;
struct btrfs_ordered_extent *ordered;
@@ -619,7 +618,6 @@ static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length,
{
struct iomap_iter *iter = container_of(iomap, struct iomap_iter, iomap);
struct btrfs_dio_data *dio_data = iter->private;
- size_t submitted = dio_data->submitted;
const bool write = !!(flags & IOMAP_WRITE);
int ret = 0;
@@ -630,9 +628,9 @@ static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length,
return 0;
}
- if (submitted < length) {
- pos += submitted;
- length -= submitted;
+ if (written < length) {
+ pos += written;
+ length -= written;
if (write) {
/*
* Got a short write and have updated the isize, need to
@@ -659,7 +657,7 @@ static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length,
if (dio_data->updated_isize) {
u64 new_isize;
- if (submitted == 0)
+ if (written == 0)
new_isize = dio_data->old_isize;
else
new_isize = max(dio_data->old_isize, pos);
@@ -772,8 +770,6 @@ static void btrfs_dio_submit_io(const struct iomap_iter *iter, struct bio *bio,
dip->file_offset = file_offset;
dip->bytes = bio->bi_iter.bi_size;
- dio_data->submitted += bio->bi_iter.bi_size;
-
/*
* Check if we are doing a partial write. If we are, we need to split
* the ordered extent to match the submitted bio. Hang on to the
--
2.54.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* [PATCH v2 2/2] btrfs: refactor btrfs_dio_iomap_end()
2026-06-08 3:02 [PATCH v2 0/2] btrfs: minor fixes and cleanups on direct IO path Qu Wenruo
2026-06-08 3:02 ` [PATCH v2 1/2] btrfs: remove btrfs_dio_data::submitted Qu Wenruo
@ 2026-06-08 3:02 ` Qu Wenruo
2026-06-08 11:25 ` [PATCH v2 0/2] btrfs: minor fixes and cleanups on direct IO path Filipe Manana
2 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2026-06-08 3:02 UTC (permalink / raw)
To: linux-btrfs
That function has the following problems:
- Read/write handling scattered across different locations
E.g. At the beginning there is a dedicated hole read handling, but
later short read handling is at an if() branch.
- Modifying of @pos and @length parameter for short read
Although it's completely fine to modify those parameters as they are
passed by value, but it can still be confusing to read.
As normally we would assume @pos and @length to be the original range.
But for short IO handling we modify @pos/@length, and completely
ignore @written.
- Unnecessary split for ordered extent and changeset handling
Both OE and changeset are only for writes, but they are handled in two
different if (write) {} blocks.
Refactor the function so that:
- Handling of reads and writes are concentrated in their code block
Now the handling of reads are in its own small if () branch.
Leaving the more complex writes handling to take the remaining
function, and reduce the indent level.
This also removes all unnecessary "if (write)" checks.
- Do not modify @pos and @length
Let short IO handling to manually calculate the remaining range.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/direct-io.c | 125 ++++++++++++++++++++++---------------------
1 file changed, 65 insertions(+), 60 deletions(-)
diff --git a/fs/btrfs/direct-io.c b/fs/btrfs/direct-io.c
index b2add124cc89..e566a60b0ce5 100644
--- a/fs/btrfs/direct-io.c
+++ b/fs/btrfs/direct-io.c
@@ -621,74 +621,79 @@ static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length,
const bool write = !!(flags & IOMAP_WRITE);
int ret = 0;
- if (!write && (iomap->type == IOMAP_HOLE)) {
- /* If reading from a hole, unlock and return */
- btrfs_unlock_dio_extent(&BTRFS_I(inode)->io_tree, pos,
- pos + length - 1, NULL);
+ if (!write) {
+ /*
+ * Hole read, nothing is submitted, thus we have to unlock
+ * the whole range.
+ */
+ if (iomap->type == IOMAP_HOLE) {
+ btrfs_unlock_dio_extent(&BTRFS_I(inode)->io_tree, pos,
+ pos + length - 1, NULL);
+ return 0;
+ }
+ /*
+ * Short read, needs to unlock the remaining range, and
+ * return -ENOTBLK so we can later fault in the pages and retry.
+ */
+ if (written < length) {
+ btrfs_unlock_dio_extent(&BTRFS_I(inode)->io_tree, pos + written,
+ pos + length - 1, NULL);
+ return -ENOTBLK;
+ }
+ /* The full range is submitted, endio will do the unlock. */
return 0;
}
if (written < length) {
- pos += written;
- length -= written;
- if (write) {
- /*
- * Got a short write and have updated the isize, need to
- * revert the isize change.
- *
- * Normally we need to update isize with extent lock hold,
- * but we're safe due to the following factors:
- *
- * - Only a single writer can be enlarging isize
- * Enlarging isize will take the exclusive inode lock.
- *
- * - Buffered readers need to wait for the OE we're holding
- * Buffered readers will lock extent and wait for OE
- * of the folio range, and since page cache is invalidated
- * the OE wait can not be skipped.
- *
- * So here we are safe to revert the isize before
- * finishing the OE, and no reader of the remaining range
- * can see the enlarged size.
- *
- * TODO: Extend the DIO_LOCKED lifespan for direct writes,
- * and only enlarge isize after a successful write.
- */
- if (dio_data->updated_isize) {
- u64 new_isize;
+ /*
+ * Got a short write and have updated the isize, need to
+ * revert the isize change.
+ *
+ * Normally we need to update isize with extent lock hold,
+ * but we're safe due to the following factors:
+ *
+ * - Only a single writer can be enlarging isize
+ * Enlarging isize will take the exclusive inode lock.
+ *
+ * - Buffered readers need to wait for the OE we're holding
+ * Buffered readers will lock extent and wait for OE
+ * of the folio range, and since page cache is invalidated
+ * the OE wait can not be skipped.
+ *
+ * So here we are safe to revert the isize before
+ * finishing the OE, and no reader of the remaining range
+ * can see the enlarged size.
+ *
+ * TODO: Extend the DIO_LOCKED lifespan for direct writes,
+ * and only enlarge isize after a successful write.
+ */
+ if (dio_data->updated_isize) {
+ u64 new_isize;
- if (written == 0)
- new_isize = dio_data->old_isize;
- else
- new_isize = max(dio_data->old_isize, pos);
- i_size_write(inode, new_isize);
- dio_data->updated_isize = false;
- }
- /*
- * We have a short write, if there is any range
- * that is submitted properly, that part will have
- * its own OE split from the original one.
- *
- * So for the OE at dio_data->ordered, it's the part
- * that is not submitted, and should be marked
- * as fully truncated.
- */
- btrfs_mark_ordered_extent_truncated(dio_data->ordered, 0);
- btrfs_finish_ordered_extent(dio_data->ordered,
- pos, length, true);
- } else {
- btrfs_unlock_dio_extent(&BTRFS_I(inode)->io_tree, pos,
- pos + length - 1, NULL);
+ if (written == 0)
+ new_isize = dio_data->old_isize;
+ else
+ new_isize = max(dio_data->old_isize, pos + written);
+ i_size_write(inode, new_isize);
+ dio_data->updated_isize = false;
}
+ /*
+ * We have a short write, if there is any range
+ * that is submitted properly, that part will have
+ * its own OE split from the original one.
+ *
+ * So for the OE at dio_data->ordered, it's the part
+ * that is not submitted, and should be marked
+ * as fully truncated.
+ */
+ btrfs_mark_ordered_extent_truncated(dio_data->ordered, 0);
+ btrfs_finish_ordered_extent(dio_data->ordered,
+ pos + written, length - written, true);
ret = -ENOTBLK;
}
- if (write) {
- btrfs_put_ordered_extent(dio_data->ordered);
- dio_data->ordered = NULL;
- }
-
- if (write)
- extent_changeset_free(dio_data->data_reserved);
+ btrfs_put_ordered_extent(dio_data->ordered);
+ dio_data->ordered = NULL;
+ extent_changeset_free(dio_data->data_reserved);
return ret;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 4+ messages in thread