Linux Btrfs filesystem development
 help / color / mirror / Atom feed
* [PATCH v2 0/2] btrfs: minor fixes and cleanups on direct IO path
@ 2026-06-08  3:02 Qu Wenruo
  2026-06-08  3:02 ` [PATCH v2 1/2] btrfs: remove btrfs_dio_data::submitted Qu Wenruo
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Qu Wenruo @ 2026-06-08  3:02 UTC (permalink / raw)
  To: linux-btrfs

[CHANGELOG]
v2:
- Drop the first 2 patches
  The problem with APPEND NOWAIT direct writes needs more consideration.
  If short writes happened for APPEND NOWAIT write, there are more
  factors to consider.

  Firstly we should return -EGAIN so that the caller can retry in a
  blocking way.

  Secondly we should fully revert what has been written so that isize is
  not increased.


  For the OE conflicts I have not yet hit with upstream, so it should be
  something wrong in my previous direct locking changes.

There are several small problems inside the direct IO code:

- Duplicated btrfs_dio_data::submitted
  The iomap_end() callback has @written parameter which provides the
  same value.

  Fixed in the third patch.

- Badly structured btrfs_dio_iomap_end()
  Refactored in the last patch.

Qu Wenruo (2):
  btrfs: remove btrfs_dio_data::submitted
  btrfs: refactor btrfs_dio_iomap_end()

 fs/btrfs/direct-io.c | 131 ++++++++++++++++++++++---------------------
 1 file changed, 66 insertions(+), 65 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 1/2] btrfs: remove btrfs_dio_data::submitted
  2026-06-08  3:02 [PATCH v2 0/2] btrfs: minor fixes and cleanups on direct IO path Qu Wenruo
@ 2026-06-08  3:02 ` Qu Wenruo
  2026-06-08  3:02 ` [PATCH v2 2/2] btrfs: refactor btrfs_dio_iomap_end() Qu Wenruo
  2026-06-08 11:25 ` [PATCH v2 0/2] btrfs: minor fixes and cleanups on direct IO path Filipe Manana
  2 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2026-06-08  3:02 UTC (permalink / raw)
  To: linux-btrfs

That member is to record how many bytes are submitted for a direct
read/write, utilized by iomap_end() callback to handle short IO cases.

However iomap_end() callback is already providing an internally tracked
@written member, which is doing the same accounting and providing the
same value as btrfs_dio_data::submitted.

There is no need to duplicate the work, just remove
btrfs_dio_data::submitted.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/direct-io.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/direct-io.c b/fs/btrfs/direct-io.c
index 460326d34143..b2add124cc89 100644
--- a/fs/btrfs/direct-io.c
+++ b/fs/btrfs/direct-io.c
@@ -14,7 +14,6 @@
 #include "ordered-data.h"
 
 struct btrfs_dio_data {
-	ssize_t submitted;
 	loff_t old_isize;
 	struct extent_changeset *data_reserved;
 	struct btrfs_ordered_extent *ordered;
@@ -619,7 +618,6 @@ static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 {
 	struct iomap_iter *iter = container_of(iomap, struct iomap_iter, iomap);
 	struct btrfs_dio_data *dio_data = iter->private;
-	size_t submitted = dio_data->submitted;
 	const bool write = !!(flags & IOMAP_WRITE);
 	int ret = 0;
 
@@ -630,9 +628,9 @@ static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 		return 0;
 	}
 
-	if (submitted < length) {
-		pos += submitted;
-		length -= submitted;
+	if (written < length) {
+		pos += written;
+		length -= written;
 		if (write) {
 			/*
 			 * Got a short write and have updated the isize, need to
@@ -659,7 +657,7 @@ static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 			if (dio_data->updated_isize) {
 				u64 new_isize;
 
-				if (submitted == 0)
+				if (written == 0)
 					new_isize = dio_data->old_isize;
 				else
 					new_isize = max(dio_data->old_isize, pos);
@@ -772,8 +770,6 @@ static void btrfs_dio_submit_io(const struct iomap_iter *iter, struct bio *bio,
 	dip->file_offset = file_offset;
 	dip->bytes = bio->bi_iter.bi_size;
 
-	dio_data->submitted += bio->bi_iter.bi_size;
-
 	/*
 	 * Check if we are doing a partial write.  If we are, we need to split
 	 * the ordered extent to match the submitted bio.  Hang on to the
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 2/2] btrfs: refactor btrfs_dio_iomap_end()
  2026-06-08  3:02 [PATCH v2 0/2] btrfs: minor fixes and cleanups on direct IO path Qu Wenruo
  2026-06-08  3:02 ` [PATCH v2 1/2] btrfs: remove btrfs_dio_data::submitted Qu Wenruo
@ 2026-06-08  3:02 ` Qu Wenruo
  2026-06-08 11:25 ` [PATCH v2 0/2] btrfs: minor fixes and cleanups on direct IO path Filipe Manana
  2 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2026-06-08  3:02 UTC (permalink / raw)
  To: linux-btrfs

That function has the following problems:

- Read/write handling scattered across different locations
  E.g. At the beginning there is a dedicated hole read handling, but
  later short read handling is at an if() branch.

- Modifying of @pos and @length parameter for short read
  Although it's completely fine to modify those parameters as they are
  passed by value, but it can still be confusing to read.
  As normally we would assume @pos and @length to be the original range.

  But for short IO handling we modify @pos/@length, and completely
  ignore @written.

- Unnecessary split for ordered extent and changeset handling
  Both OE and changeset are only for writes, but they are handled in two
  different if (write) {} blocks.

Refactor the function so that:

- Handling of reads and writes are concentrated in their code block
  Now the handling of reads are in its own small if () branch.

  Leaving the more complex writes handling to take the remaining
  function, and reduce the indent level.

  This also removes all unnecessary "if (write)" checks.

- Do not modify @pos and @length
  Let short IO handling to manually calculate the remaining range.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/direct-io.c | 125 ++++++++++++++++++++++---------------------
 1 file changed, 65 insertions(+), 60 deletions(-)

diff --git a/fs/btrfs/direct-io.c b/fs/btrfs/direct-io.c
index b2add124cc89..e566a60b0ce5 100644
--- a/fs/btrfs/direct-io.c
+++ b/fs/btrfs/direct-io.c
@@ -621,74 +621,79 @@ static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 	const bool write = !!(flags & IOMAP_WRITE);
 	int ret = 0;
 
-	if (!write && (iomap->type == IOMAP_HOLE)) {
-		/* If reading from a hole, unlock and return */
-		btrfs_unlock_dio_extent(&BTRFS_I(inode)->io_tree, pos,
-					pos + length - 1, NULL);
+	if (!write) {
+		/*
+		 * Hole read, nothing is submitted, thus we have to unlock
+		 * the whole range.
+		 */
+		if (iomap->type == IOMAP_HOLE) {
+			btrfs_unlock_dio_extent(&BTRFS_I(inode)->io_tree, pos,
+						pos + length - 1, NULL);
+			return 0;
+		}
+		/*
+		 * Short read, needs to unlock the remaining range, and
+		 * return -ENOTBLK so we can later fault in the pages and retry.
+		 */
+		if (written < length) {
+			btrfs_unlock_dio_extent(&BTRFS_I(inode)->io_tree, pos + written,
+						pos + length - 1, NULL);
+			return -ENOTBLK;
+		}
+		/* The full range is submitted, endio will do the unlock. */
 		return 0;
 	}
 
 	if (written < length) {
-		pos += written;
-		length -= written;
-		if (write) {
-			/*
-			 * Got a short write and have updated the isize, need to
-			 * revert the isize change.
-			 *
-			 * Normally we need to update isize with extent lock hold,
-			 * but we're safe due to the following factors:
-			 *
-			 * - Only a single writer can be enlarging isize
-			 *   Enlarging isize will take the exclusive inode lock.
-			 *
-			 * - Buffered readers need to wait for the OE we're holding
-			 *   Buffered readers will lock extent and wait for OE
-			 *   of the folio range, and since page cache is invalidated
-			 *   the OE wait can not be skipped.
-			 *
-			 * So here we are safe to revert the isize before
-			 * finishing the OE, and no reader of the remaining range
-			 * can see the enlarged size.
-			 *
-			 * TODO: Extend the DIO_LOCKED lifespan for direct writes,
-			 * and only enlarge isize after a successful write.
-			 */
-			if (dio_data->updated_isize) {
-				u64 new_isize;
+		/*
+		 * Got a short write and have updated the isize, need to
+		 * revert the isize change.
+		 *
+		 * Normally we need to update isize with extent lock hold,
+		 * but we're safe due to the following factors:
+		 *
+		 * - Only a single writer can be enlarging isize
+		 *   Enlarging isize will take the exclusive inode lock.
+		 *
+		 * - Buffered readers need to wait for the OE we're holding
+		 *   Buffered readers will lock extent and wait for OE
+		 *   of the folio range, and since page cache is invalidated
+		 *   the OE wait can not be skipped.
+		 *
+		 * So here we are safe to revert the isize before
+		 * finishing the OE, and no reader of the remaining range
+		 * can see the enlarged size.
+		 *
+		 * TODO: Extend the DIO_LOCKED lifespan for direct writes,
+		 * and only enlarge isize after a successful write.
+		 */
+		if (dio_data->updated_isize) {
+			u64 new_isize;
 
-				if (written == 0)
-					new_isize = dio_data->old_isize;
-				else
-					new_isize = max(dio_data->old_isize, pos);
-				i_size_write(inode, new_isize);
-				dio_data->updated_isize = false;
-			}
-			/*
-			 * We have a short write, if there is any range
-			 * that is submitted properly, that part will have
-			 * its own OE split from the original one.
-			 *
-			 * So for the OE at dio_data->ordered, it's the part
-			 * that is not submitted, and should be marked
-			 * as fully truncated.
-			 */
-			btrfs_mark_ordered_extent_truncated(dio_data->ordered, 0);
-			btrfs_finish_ordered_extent(dio_data->ordered,
-						    pos, length, true);
-		} else {
-			btrfs_unlock_dio_extent(&BTRFS_I(inode)->io_tree, pos,
-						pos + length - 1, NULL);
+			if (written == 0)
+				new_isize = dio_data->old_isize;
+			else
+				new_isize = max(dio_data->old_isize, pos + written);
+			i_size_write(inode, new_isize);
+			dio_data->updated_isize = false;
 		}
+		/*
+		 * We have a short write, if there is any range
+		 * that is submitted properly, that part will have
+		 * its own OE split from the original one.
+		 *
+		 * So for the OE at dio_data->ordered, it's the part
+		 * that is not submitted, and should be marked
+		 * as fully truncated.
+		 */
+		btrfs_mark_ordered_extent_truncated(dio_data->ordered, 0);
+		btrfs_finish_ordered_extent(dio_data->ordered,
+					    pos + written, length - written, true);
 		ret = -ENOTBLK;
 	}
-	if (write) {
-		btrfs_put_ordered_extent(dio_data->ordered);
-		dio_data->ordered = NULL;
-	}
-
-	if (write)
-		extent_changeset_free(dio_data->data_reserved);
+	btrfs_put_ordered_extent(dio_data->ordered);
+	dio_data->ordered = NULL;
+	extent_changeset_free(dio_data->data_reserved);
 	return ret;
 }
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 0/2] btrfs: minor fixes and cleanups on direct IO path
  2026-06-08  3:02 [PATCH v2 0/2] btrfs: minor fixes and cleanups on direct IO path Qu Wenruo
  2026-06-08  3:02 ` [PATCH v2 1/2] btrfs: remove btrfs_dio_data::submitted Qu Wenruo
  2026-06-08  3:02 ` [PATCH v2 2/2] btrfs: refactor btrfs_dio_iomap_end() Qu Wenruo
@ 2026-06-08 11:25 ` Filipe Manana
  2 siblings, 0 replies; 4+ messages in thread
From: Filipe Manana @ 2026-06-08 11:25 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Jun 8, 2026 at 4:03 AM Qu Wenruo <wqu@suse.com> wrote:
>
> [CHANGELOG]
> v2:
> - Drop the first 2 patches
>   The problem with APPEND NOWAIT direct writes needs more consideration.
>   If short writes happened for APPEND NOWAIT write, there are more
>   factors to consider.
>
>   Firstly we should return -EGAIN so that the caller can retry in a
>   blocking way.
>
>   Secondly we should fully revert what has been written so that isize is
>   not increased.
>
>
>   For the OE conflicts I have not yet hit with upstream, so it should be
>   something wrong in my previous direct locking changes.
>
> There are several small problems inside the direct IO code:
>
> - Duplicated btrfs_dio_data::submitted
>   The iomap_end() callback has @written parameter which provides the
>   same value.
>
>   Fixed in the third patch.
>
> - Badly structured btrfs_dio_iomap_end()
>   Refactored in the last patch.
>
> Qu Wenruo (2):
>   btrfs: remove btrfs_dio_data::submitted
>   btrfs: refactor btrfs_dio_iomap_end()

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

>
>  fs/btrfs/direct-io.c | 131 ++++++++++++++++++++++---------------------
>  1 file changed, 66 insertions(+), 65 deletions(-)
>
> --
> 2.54.0
>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-08 11:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-08  3:02 [PATCH v2 0/2] btrfs: minor fixes and cleanups on direct IO path Qu Wenruo
2026-06-08  3:02 ` [PATCH v2 1/2] btrfs: remove btrfs_dio_data::submitted Qu Wenruo
2026-06-08  3:02 ` [PATCH v2 2/2] btrfs: refactor btrfs_dio_iomap_end() Qu Wenruo
2026-06-08 11:25 ` [PATCH v2 0/2] btrfs: minor fixes and cleanups on direct IO path Filipe Manana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox