[PATCH 2/9] btrfs: don't wait for writeback on clean pages in extent_write_cache

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 2/9] btrfs: don't wait for writeback on clean pages in extent_write_cache_pages
  2023-07-13 13:04 small writeback fixes Christoph Hellwig
@ 2023-07-13 13:04 ` Christoph Hellwig
  0 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2023-07-13 13:04 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs, linux-fsdevel

__extent_writepage can have started on more pages than the one it was
called for.  This happens regularly for zoned file systems, and in theory
could happen for compressed I/O if the worker thread was executed very
quicky. For such pages extent_write_cache_pages waits for writeback
to complete before moving on to the next page, which is highly inefficient
as it blocks the flusher thread

Port over the PageDirty check that was added to write_cache_pages in
commit 515f4a037fb ("mm: write_cache_pages optimise page cleaning") to
fix this.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index cb8c5d06fe2304..2ae3badaf36df6 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2075,6 +2075,12 @@ static int extent_write_cache_pages(struct address_space *mapping,
 				continue;
 			}

+			if (!folio_test_dirty(folio)) {
+				/* someone wrote it for us */
+				folio_unlock(folio);
+				continue;
+			}
+
 			if (wbc->sync_mode != WB_SYNC_NONE) {
 				if (folio_test_writeback(folio))
 					submit_write_bio(bio_ctrl, 0);
-- 
2.39.2

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* small writeback fixes v2
@ 2023-07-24 13:26 Christoph Hellwig
  2023-07-24 13:26 ` [PATCH 1/9] btrfs: don't stop integrity writeback too early Christoph Hellwig
                   ` (9 more replies)
  0 siblings, 10 replies; 24+ messages in thread
From: Christoph Hellwig @ 2023-07-24 13:26 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Hi all,

this series has various fixes for bugs found in inspect or only triggered
with upcoming changes that are a fallout from my work on bound lifetimes
for the ordered extent and better confirming to expectations from the
common writeback code.

Note that this series builds on top of the for-next branch.

A git tree is also available here:

    git://git.infradead.org/users/hch/misc.git btrfs-writeback-fixes

Gitweb:

    http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/btrfs-writeback-fixes

Changes since v1:
 - rebased to for-next
 - picked up review tags from Josef

Diffatat:
 extent_io.c |  182 ++++++++++++++++++++++++++++++++++++------------------------
 inode.c     |   19 ++----
 2 files changed, 118 insertions(+), 83 deletions(-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/9] btrfs: don't stop integrity writeback too early
  2023-07-24 13:26 small writeback fixes v2 Christoph Hellwig
@ 2023-07-24 13:26 ` Christoph Hellwig
  2023-08-02 22:49   ` Boris Burkov
  2023-07-24 13:26 ` [PATCH 2/9] btrfs: don't wait for writeback on clean pages in extent_write_cache_pages Christoph Hellwig
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2023-07-24 13:26 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

extent_write_cache_pages stops writing pages as soon as nr_to_write hits
zero.  That is the right thing for opportunistic writeback, but incorrect
for data integrity writeback, which needs to ensure that no dirty pages
are left in the range.  Thus only stop the writeback for WB_SYNC_NONE
if nr_to_write hits 0.

This is a port of write_cache_pages changes in commit 05fe478dd04e
("mm: write_cache_pages integrity fix").

Note that I've only trigger the problem with other changes to the btrfs
writeback code, but this condition seems worthwhile fixing anyway.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent_io.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c0440a0988c9a8..231e620e6c497d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2098,7 +2098,8 @@ static int extent_write_cache_pages(struct address_space *mapping,
 			 * We have to make sure to honor the new nr_to_write
 			 * at any time
 			 */
-			nr_to_write_done = wbc->nr_to_write <= 0;
+			nr_to_write_done = wbc->sync_mode == WB_SYNC_NONE &&
+						wbc->nr_to_write <= 0;
 		}
 		folio_batch_release(&fbatch);
 		cond_resched();
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/9] btrfs: don't wait for writeback on clean pages in extent_write_cache_pages
  2023-07-24 13:26 small writeback fixes v2 Christoph Hellwig
  2023-07-24 13:26 ` [PATCH 1/9] btrfs: don't stop integrity writeback too early Christoph Hellwig
@ 2023-07-24 13:26 ` Christoph Hellwig
  2023-07-24 13:26 ` [PATCH 3/9] btrfs: fix an error handling corner case in cow_file_range Christoph Hellwig
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2023-07-24 13:26 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

__extent_writepage can have started on more pages than the one it was
called for.  This happens regularly for zoned file systems, and in theory
could happen for compressed I/O if the worker thread was executed very
quicky. For such pages extent_write_cache_pages waits for writeback
to complete before moving on to the next page, which is highly inefficient
as it blocks the flusher thread

Port over the PageDirty check that was added to write_cache_pages in
commit 515f4a037fb ("mm: write_cache_pages optimise page cleaning") to
fix this.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent_io.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 231e620e6c497d..1cc46bbbd888cd 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2075,6 +2075,12 @@ static int extent_write_cache_pages(struct address_space *mapping,
 				continue;
 			}
 
+			if (!folio_test_dirty(folio)) {
+				/* someone wrote it for us */
+				folio_unlock(folio);
+				continue;
+			}
+
 			if (wbc->sync_mode != WB_SYNC_NONE) {
 				if (folio_test_writeback(folio))
 					submit_write_bio(bio_ctrl, 0);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/9] btrfs: fix an error handling corner case in cow_file_range
  2023-07-24 13:26 small writeback fixes v2 Christoph Hellwig
  2023-07-24 13:26 ` [PATCH 1/9] btrfs: don't stop integrity writeback too early Christoph Hellwig
  2023-07-24 13:26 ` [PATCH 2/9] btrfs: don't wait for writeback on clean pages in extent_write_cache_pages Christoph Hellwig
@ 2023-07-24 13:26 ` Christoph Hellwig
  2023-07-24 13:26 ` [PATCH 4/9] btrfs: move the cow_fixup earlier in writepages handling Christoph Hellwig
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2023-07-24 13:26 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

When the call to btrfs_reloc_clone_csums in cow_file_range returns an
error, we jump to the out_unlock label with the extent_reserved variable
set to false.   The cleanup at the label will then call
extent_clear_unlock_delalloc on the range from start to end.  But we've
already added cur_alloc_size to start, so there might no range be left
from the newly increment start to end.  Move the check for start < end
so that it is reached by also for the !extent_reserved case.

Fixes: a315e68f6e8b ("Btrfs: fix invalid attempt to free reserved space on failure to cow range")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/inode.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 640e7dd26f6132..84c8c51e20478c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1550,8 +1550,6 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 					     clear_bits,
 					     page_ops);
 		start += cur_alloc_size;
-		if (start >= end)
-			return ret;
 	}
 
 	/*
@@ -1560,9 +1558,11 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 	 * space_info's bytes_may_use counter, reserved in
 	 * btrfs_check_data_free_space().
 	 */
-	extent_clear_unlock_delalloc(inode, start, end, locked_page,
-				     clear_bits | EXTENT_CLEAR_DATA_RESV,
-				     page_ops);
+	if (start < end) {
+		clear_bits |= EXTENT_CLEAR_DATA_RESV;
+		extent_clear_unlock_delalloc(inode, start, end, locked_page,
+					     clear_bits, page_ops);
+	}
 	return ret;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/9] btrfs: move the cow_fixup earlier in writepages handling
  2023-07-24 13:26 small writeback fixes v2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2023-07-24 13:26 ` [PATCH 3/9] btrfs: fix an error handling corner case in cow_file_range Christoph Hellwig
@ 2023-07-24 13:26 ` Christoph Hellwig
  2023-08-03  0:21   ` Boris Burkov
  2023-07-24 13:26 ` [PATCH 5/9] btrfs: fix handling of errors from __extent_writepage_io Christoph Hellwig
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2023-07-24 13:26 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

btrfs has a special fixup for pages that are marked dirty without having
space reserved for them.  But the place where is is run means it can't
work for I/O that isn't kicked off inline from __extent_writepage, most
notable compressed I/O and I/O to zoned file systems.

Move the fixup earlier based on not findining any delalloc range in the
I/O tree to cover this case as well instead of relying on the fairly
obscure fallthrough behavior that calls __extent_write_page_io even when
no delalloc space was found.

Fixes: c8b978188c9a ("Btrfs: Add zlib compression support")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent_io.c | 31 +++++++++++++++++--------------
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1cc46bbbd888cd..cc258bddd88eab 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1153,6 +1153,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 	u64 delalloc_start = page_start;
 	u64 delalloc_end = page_end;
 	u64 delalloc_to_write = 0;
+	bool found_delalloc = false;
 	int ret = 0;
 
 	while (delalloc_start < page_end) {
@@ -1169,6 +1170,22 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 			return ret;
 
 		delalloc_start = delalloc_end + 1;
+		found_delalloc = true;
+	}
+
+	/*
+	 * If we did not find any delalloc range in the io_tree, this must be
+	 * the rare case of dirtying pages through get_user_pages without
+	 * calling into ->page_mkwrite.
+	 * While these are in the process of being fixed by switching to
+	 * pin_user_pages, some are still around and need to be worked around
+	 * by creating a delalloc reservation in a fixup worker, and waiting
+	 * us to be called again with that reservation.
+	 */
+	if (!found_delalloc && btrfs_writepage_cow_fixup(page)) {
+		redirty_page_for_writepage(wbc, page);
+		unlock_page(page);
+		return 1;
 	}
 
 	/*
@@ -1274,14 +1291,6 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 	int ret = 0;
 	int nr = 0;
 
-	ret = btrfs_writepage_cow_fixup(page);
-	if (ret) {
-		/* Fixup worker will requeue */
-		redirty_page_for_writepage(bio_ctrl->wbc, page);
-		unlock_page(page);
-		return 1;
-	}
-
 	bio_ctrl->end_io_func = end_bio_extent_writepage;
 	while (cur <= end) {
 		u32 len = end - cur + 1;
@@ -1421,9 +1430,6 @@ static int __extent_writepage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl
 		goto done;
 
 	ret = __extent_writepage_io(BTRFS_I(inode), page, bio_ctrl, i_size, &nr);
-	if (ret == 1)
-		return 0;
-
 	bio_ctrl->wbc->nr_to_write--;
 
 done:
@@ -2176,8 +2182,6 @@ void extent_write_locked_range(struct inode *inode, struct page *locked_page,
 
 		ret = __extent_writepage_io(BTRFS_I(inode), page, &bio_ctrl,
 					    i_size, &nr);
-		if (ret == 1)
-			goto next_page;
 
 		/* Make sure the mapping tag for page dirty gets cleared. */
 		if (nr == 0) {
@@ -2193,7 +2197,6 @@ void extent_write_locked_range(struct inode *inode, struct page *locked_page,
 		btrfs_page_unlock_writer(fs_info, page, cur, cur_len);
 		if (ret < 0)
 			found_error = true;
-next_page:
 		put_page(page);
 		cur = cur_end + 1;
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/9] btrfs: fix handling of errors from __extent_writepage_io
  2023-07-24 13:26 small writeback fixes v2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2023-07-24 13:26 ` [PATCH 4/9] btrfs: move the cow_fixup earlier in writepages handling Christoph Hellwig
@ 2023-07-24 13:26 ` Christoph Hellwig
  2023-07-24 13:26 ` [PATCH 6/9] btrfs: stop submitting I/O after an error in extent_write_locked_range Christoph Hellwig
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2023-07-24 13:26 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

When __extent_writepage_io fails, the callers complete the ordered range
for the entire page, including the range for which bios have been
submitted already, leading to a double completion for those ranges.

Fix this by pulling the error handling into __extent_writepage_io, and
only apply it to the ranges not submitted yet.

This also means that the remaining error handling in __extent_writepage
never needs to complete the ordered extent, as there won't be one before
writepage_delalloc is called, or when writepage_delalloc returns an
error.

Fixes: 61391d562229 ("Btrfs: fix hang on error (such as ENOSPC) when writing extent pages")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent_io.c | 78 +++++++++++++++++++-------------------------
 1 file changed, 33 insertions(+), 45 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index cc258bddd88eab..85d7a219040d07 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1278,12 +1278,11 @@ static void find_next_dirty_byte(struct btrfs_fs_info *fs_info,
  */
 static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 				 struct page *page,
-				 struct btrfs_bio_ctrl *bio_ctrl,
-				 loff_t i_size,
-				 int *nr_ret)
+				 struct btrfs_bio_ctrl *bio_ctrl, loff_t i_size)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	u64 cur = page_offset(page);
+	u64 start = page_offset(page);
+	u64 cur = start;
 	u64 end = cur + PAGE_SIZE - 1;
 	u64 extent_offset;
 	u64 block_start;
@@ -1325,7 +1324,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 		em = btrfs_get_extent(inode, NULL, 0, cur, len);
 		if (IS_ERR(em)) {
 			ret = PTR_ERR_OR_ZERO(em);
-			goto out_error;
+			goto out;
 		}
 
 		extent_offset = cur - em->start;
@@ -1372,15 +1371,21 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 	}
 
 	btrfs_page_assert_not_dirty(fs_info, page);
-	*nr_ret = nr;
-	return 0;
+	ret = 0;
+out:
+	/* Make sure the mapping tag for page dirty gets cleared. */
+	if (nr == 0) {
+		set_page_writeback(page);
+		end_page_writeback(page);
+	}
+	if (ret) {
+		u32 remaining = end + 1 - cur;
 
-out_error:
-	/*
-	 * If we finish without problem, we should not only clear page dirty,
-	 * but also empty subpage dirty bits
-	 */
-	*nr_ret = nr;
+		btrfs_mark_ordered_io_finished(inode, page, cur, remaining,
+					       false);
+		btrfs_page_clear_uptodate(fs_info, page, cur, remaining);
+		mapping_set_error(page->mapping, ret);
+	}
 	return ret;
 }
 
@@ -1399,7 +1404,6 @@ static int __extent_writepage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl
 	struct inode *inode = page->mapping->host;
 	const u64 page_start = page_offset(page);
 	int ret;
-	int nr = 0;
 	size_t pg_offset;
 	loff_t i_size = i_size_read(inode);
 	unsigned long end_index = i_size >> PAGE_SHIFT;
@@ -1421,30 +1425,27 @@ static int __extent_writepage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl
 
 	ret = set_page_extent_mapped(page);
 	if (ret < 0)
-		goto done;
+		goto error;
 
 	ret = writepage_delalloc(BTRFS_I(inode), page, bio_ctrl->wbc);
 	if (ret == 1)
 		return 0;
 	if (ret)
-		goto done;
+		goto error;
 
-	ret = __extent_writepage_io(BTRFS_I(inode), page, bio_ctrl, i_size, &nr);
+	ret = __extent_writepage_io(BTRFS_I(inode), page, bio_ctrl, i_size);
+	/* __extent_writepage_io cleans up by itself on error. */
 	bio_ctrl->wbc->nr_to_write--;
-
-done:
-	if (nr == 0) {
-		/* make sure the mapping tag for page dirty gets cleared */
-		set_page_writeback(page);
-		end_page_writeback(page);
-	}
-	if (ret) {
-		btrfs_mark_ordered_io_finished(BTRFS_I(inode), page, page_start,
-					       PAGE_SIZE, !ret);
-		btrfs_page_clear_uptodate(btrfs_sb(inode->i_sb), page,
-					  page_start, PAGE_SIZE);
-		mapping_set_error(page->mapping, ret);
-	}
+	goto unlock_page;
+
+error:
+	/* Make sure the mapping tag for page dirty gets cleared. */
+	set_page_writeback(page);
+	end_page_writeback(page);
+	btrfs_page_clear_uptodate(btrfs_sb(inode->i_sb), page,
+				  page_start, PAGE_SIZE);
+	mapping_set_error(page->mapping, ret);
+unlock_page:
 	unlock_page(page);
 	ASSERT(ret <= 0);
 	return ret;
@@ -2171,7 +2172,6 @@ void extent_write_locked_range(struct inode *inode, struct page *locked_page,
 		u64 cur_end = min(round_down(cur, PAGE_SIZE) + PAGE_SIZE - 1, end);
 		u32 cur_len = cur_end + 1 - cur;
 		struct page *page;
-		int nr = 0;
 
 		page = find_get_page(mapping, cur >> PAGE_SHIFT);
 		ASSERT(PageLocked(page));
@@ -2181,19 +2181,7 @@ void extent_write_locked_range(struct inode *inode, struct page *locked_page,
 		}
 
 		ret = __extent_writepage_io(BTRFS_I(inode), page, &bio_ctrl,
-					    i_size, &nr);
-
-		/* Make sure the mapping tag for page dirty gets cleared. */
-		if (nr == 0) {
-			set_page_writeback(page);
-			end_page_writeback(page);
-		}
-		if (ret) {
-			btrfs_mark_ordered_io_finished(BTRFS_I(inode), page,
-						       cur, cur_len, !ret);
-			btrfs_page_clear_uptodate(fs_info, page, cur, cur_len);
-			mapping_set_error(page->mapping, ret);
-		}
+					    i_size);
 		btrfs_page_unlock_writer(fs_info, page, cur, cur_len);
 		if (ret < 0)
 			found_error = true;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/9] btrfs: stop submitting I/O after an error in extent_write_locked_range
  2023-07-24 13:26 small writeback fixes v2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2023-07-24 13:26 ` [PATCH 5/9] btrfs: fix handling of errors from __extent_writepage_io Christoph Hellwig
@ 2023-07-24 13:26 ` Christoph Hellwig
  2023-08-03  0:42   ` Boris Burkov
  2023-07-24 13:26 ` [PATCH 7/9] btrfs: fix a race in clearing the writeback bit for sub-page I/O Christoph Hellwig
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2023-07-24 13:26 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

As soon a __extent_writepage_io returns an error we know that the
extent_map is corrupted or we're out of memory, and there is no point
in continuing to submit I/O.  Follow the behavior in extent_write_cache
pages and stop submitting I/O, and instead just unlock all pages.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent_io.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 85d7a219040d07..fada7a1931b130 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2151,7 +2151,6 @@ void extent_write_locked_range(struct inode *inode, struct page *locked_page,
 			       u64 start, u64 end, struct writeback_control *wbc,
 			       bool pages_dirty)
 {
-	bool found_error = false;
 	int ret = 0;
 	struct address_space *mapping = inode->i_mapping;
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -2180,16 +2179,29 @@ void extent_write_locked_range(struct inode *inode, struct page *locked_page,
 			clear_page_dirty_for_io(page);
 		}
 
-		ret = __extent_writepage_io(BTRFS_I(inode), page, &bio_ctrl,
-					    i_size);
+		/*
+		 * The entire page range that we were called on is locked and
+		 * covered by an ordered_extent.  Make sure we continue the
+		 * loop after an initial writeback error to unwind these as well
+		 * as clearing the dirty bit on the page by starting and ending
+		 * writeback.
+		 */
+		if (ret) {
+			btrfs_mark_ordered_io_finished(BTRFS_I(inode), page,
+						       cur, cur_len, false);
+			btrfs_page_clear_uptodate(fs_info, page, cur, cur_len);
+			set_page_writeback(page);
+			end_page_writeback(page);
+		} else {
+			ret = __extent_writepage_io(BTRFS_I(inode), page,
+						    &bio_ctrl, i_size);
+		}
 		btrfs_page_unlock_writer(fs_info, page, cur, cur_len);
-		if (ret < 0)
-			found_error = true;
 		put_page(page);
 		cur = cur_end + 1;
 	}
 
-	submit_write_bio(&bio_ctrl, found_error ? ret : 0);
+	submit_write_bio(&bio_ctrl, ret);
 }
 
 int extent_writepages(struct address_space *mapping,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 7/9] btrfs: fix a race in clearing the writeback bit for sub-page I/O
  2023-07-24 13:26 small writeback fixes v2 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2023-07-24 13:26 ` [PATCH 6/9] btrfs: stop submitting I/O after an error in extent_write_locked_range Christoph Hellwig
@ 2023-07-24 13:26 ` Christoph Hellwig
  2023-08-03  0:59   ` Boris Burkov
  2023-07-24 13:27 ` [PATCH 8/9] btrfs: remove the call to btrfs_mark_ordered_io_finished in btrfs_writepage_fixup_worker Christoph Hellwig
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2023-07-24 13:26 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

For sub-page I/O, a fast I/O completion can clear the page writeback bit
in the I/O completion handler before subsequent bios were submitted.
Use a trick from iomap and defer submission of bios until we're reached
the end of the page to avoid this race.

Fixes: c5ef5c6c733a ("btrfs: make __extent_writepage_io() only submit dirty range for subpage")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent_io.c | 54 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 42 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index fada7a1931b130..48a49c57daa6fd 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -105,7 +105,8 @@ struct btrfs_bio_ctrl {
 	struct writeback_control *wbc;
 };
 
-static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl)
+static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl,
+			   struct bio_list *bio_list)
 {
 	struct btrfs_bio *bbio = bio_ctrl->bbio;
 
@@ -118,6 +119,8 @@ static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl)
 	if (btrfs_op(&bbio->bio) == BTRFS_MAP_READ &&
 	    bio_ctrl->compress_type != BTRFS_COMPRESS_NONE)
 		btrfs_submit_compressed_read(bbio);
+	else if (bio_list)
+		bio_list_add(bio_list, &bbio->bio);
 	else
 		btrfs_submit_bio(bbio, 0);
 
@@ -141,7 +144,22 @@ static void submit_write_bio(struct btrfs_bio_ctrl *bio_ctrl, int ret)
 		/* The bio is owned by the end_io handler now */
 		bio_ctrl->bbio = NULL;
 	} else {
-		submit_one_bio(bio_ctrl);
+		submit_one_bio(bio_ctrl, NULL);
+	}
+}
+
+static void btrfs_submit_pending_bios(struct bio_list *bio_list, int ret)
+{
+	struct bio *bio;
+
+	if (ret) {
+		blk_status_t status = errno_to_blk_status(ret);
+
+		while ((bio = bio_list_pop(bio_list)))
+			btrfs_bio_end_io(btrfs_bio(bio), status);
+	} else {
+		while ((bio = bio_list_pop(bio_list)))
+			btrfs_submit_bio(btrfs_bio(bio), 0);
 	}
 }
 
@@ -791,7 +809,8 @@ static void alloc_new_bio(struct btrfs_inode *inode,
  */
 static void submit_extent_page(struct btrfs_bio_ctrl *bio_ctrl,
 			       u64 disk_bytenr, struct page *page,
-			       size_t size, unsigned long pg_offset)
+			       size_t size, unsigned long pg_offset,
+			       struct bio_list *bio_list)
 {
 	struct btrfs_inode *inode = BTRFS_I(page->mapping->host);
 
@@ -800,7 +819,7 @@ static void submit_extent_page(struct btrfs_bio_ctrl *bio_ctrl,
 
 	if (bio_ctrl->bbio &&
 	    !btrfs_bio_is_contig(bio_ctrl, page, disk_bytenr, pg_offset))
-		submit_one_bio(bio_ctrl);
+		submit_one_bio(bio_ctrl, bio_list);
 
 	do {
 		u32 len = size;
@@ -820,7 +839,7 @@ static void submit_extent_page(struct btrfs_bio_ctrl *bio_ctrl,
 
 		if (bio_add_page(&bio_ctrl->bbio->bio, page, len, pg_offset) != len) {
 			/* bio full: move on to a new one */
-			submit_one_bio(bio_ctrl);
+			submit_one_bio(bio_ctrl, bio_list);
 			continue;
 		}
 
@@ -834,7 +853,7 @@ static void submit_extent_page(struct btrfs_bio_ctrl *bio_ctrl,
 
 		/* Ordered extent boundary: move on to a new bio. */
 		if (bio_ctrl->len_to_oe_boundary == 0)
-			submit_one_bio(bio_ctrl);
+			submit_one_bio(bio_ctrl, bio_list);
 	} while (size);
 }
 
@@ -1082,14 +1101,14 @@ static int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 		}
 
 		if (bio_ctrl->compress_type != compress_type) {
-			submit_one_bio(bio_ctrl);
+			submit_one_bio(bio_ctrl, NULL);
 			bio_ctrl->compress_type = compress_type;
 		}
 
 		if (force_bio_submit)
-			submit_one_bio(bio_ctrl);
+			submit_one_bio(bio_ctrl, NULL);
 		submit_extent_page(bio_ctrl, disk_bytenr, page, iosize,
-				   pg_offset);
+				   pg_offset, NULL);
 		cur = cur + iosize;
 		pg_offset += iosize;
 	}
@@ -1113,7 +1132,7 @@ int btrfs_read_folio(struct file *file, struct folio *folio)
 	 * If btrfs_do_readpage() failed we will want to submit the assembled
 	 * bio to do the cleanup.
 	 */
-	submit_one_bio(&bio_ctrl);
+	submit_one_bio(&bio_ctrl, NULL);
 	return ret;
 }
 
@@ -1287,6 +1306,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 	u64 extent_offset;
 	u64 block_start;
 	struct extent_map *em;
+	struct bio_list bio_list = BIO_EMPTY_LIST;
 	int ret = 0;
 	int nr = 0;
 
@@ -1365,7 +1385,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 		btrfs_page_clear_dirty(fs_info, page, cur, iosize);
 
 		submit_extent_page(bio_ctrl, disk_bytenr, page, iosize,
-				   cur - page_offset(page));
+				   cur - page_offset(page), &bio_list);
 		cur += iosize;
 		nr++;
 	}
@@ -1378,6 +1398,16 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 		set_page_writeback(page);
 		end_page_writeback(page);
 	}
+
+	/*
+	 * Submit all bios we queued up for this page.
+	 *
+	 * The bios are not submitted directly after building them as otherwise
+	 * a very fast I/O completion on an earlier bio could clear the page
+	 * writeback bit before the subsequent bios are even submitted.
+	 */
+	btrfs_submit_pending_bios(&bio_list, ret);
+
 	if (ret) {
 		u32 remaining = end + 1 - cur;
 
@@ -2243,7 +2273,7 @@ void extent_readahead(struct readahead_control *rac)
 
 	if (em_cached)
 		free_extent_map(em_cached);
-	submit_one_bio(&bio_ctrl);
+	submit_one_bio(&bio_ctrl, NULL);
 }
 
 /*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 8/9] btrfs: remove the call to btrfs_mark_ordered_io_finished in btrfs_writepage_fixup_worker
  2023-07-24 13:26 small writeback fixes v2 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2023-07-24 13:26 ` [PATCH 7/9] btrfs: fix a race in clearing the writeback bit for sub-page I/O Christoph Hellwig
@ 2023-07-24 13:27 ` Christoph Hellwig
  2023-07-24 13:27 ` [PATCH 9/9] btrfs: lift the call to mapping_set_error out of cow_file_range Christoph Hellwig
  2023-07-27 17:06 ` small writeback fixes v2 David Sterba
  9 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2023-07-24 13:27 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

If btrfs_writepage_fixup_worker is called for a range that is
covered by and ordered_extent, but which doesn't have the PageOrdered
bit set, that means the ordered_extent is either for a pending direct
I/O, or for a buffered writeback in the final stages of I/O completion.

In either case it is now owned by the calling context, and should
never be completed by it.

Fixes: 87826df0ec36 ("btrfs: delalloc for page dirtied out-of-band in fixup worker")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/inode.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 84c8c51e20478c..0d33bff5ca176f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2798,8 +2798,6 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
 		 * to reflect the errors and clean the page.
 		 */
 		mapping_set_error(page->mapping, ret);
-		btrfs_mark_ordered_io_finished(inode, page, page_start,
-					       PAGE_SIZE, !ret);
 		btrfs_page_clear_uptodate(fs_info, page, page_start, PAGE_SIZE);
 		clear_page_dirty_for_io(page);
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 9/9] btrfs: lift the call to mapping_set_error out of cow_file_range
  2023-07-24 13:26 small writeback fixes v2 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2023-07-24 13:27 ` [PATCH 8/9] btrfs: remove the call to btrfs_mark_ordered_io_finished in btrfs_writepage_fixup_worker Christoph Hellwig
@ 2023-07-24 13:27 ` Christoph Hellwig
  2023-07-27 17:06 ` small writeback fixes v2 David Sterba
  9 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2023-07-24 13:27 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Instead of calling mapping_set_error in cow_file_range for the
!locked_page case, make the submit_uncompressed_range call
mapping_set_error also for the !locked_page as that is the only caller
that might pass a NULL locked_page.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/inode.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0d33bff5ca176f..b04eacc0fed44e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1077,6 +1077,7 @@ static void submit_uncompressed_range(struct btrfs_inode *inode,
 	wbc_detach_inode(&wbc);
 	if (ret < 0) {
 		btrfs_cleanup_ordered_extents(inode, locked_page, start, end - start + 1);
+		mapping_set_error(inode->vfs_inode.i_mapping, ret);
 		if (locked_page) {
 			const u64 page_start = page_offset(locked_page);
 
@@ -1088,7 +1089,6 @@ static void submit_uncompressed_range(struct btrfs_inode *inode,
 			btrfs_page_clear_uptodate(inode->root->fs_info,
 						  locked_page, page_start,
 						  PAGE_SIZE);
-			mapping_set_error(locked_page->mapping, ret);
 			unlock_page(locked_page);
 		}
 	}
@@ -1526,12 +1526,9 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 	 * However, in case of @keep_locked, we still need to unlock the pages
 	 * (except @locked_page) to ensure all the pages are unlocked.
 	 */
-	if (keep_locked && orig_start < start) {
-		if (!locked_page)
-			mapping_set_error(inode->vfs_inode.i_mapping, ret);
+	if (keep_locked && orig_start < start)
 		extent_clear_unlock_delalloc(inode, orig_start, start - 1,
 					     locked_page, 0, page_ops);
-	}
 
 	/*
 	 * For the range (2). If we reserved an extent for our delalloc range
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: small writeback fixes v2
  2023-07-24 13:26 small writeback fixes v2 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2023-07-24 13:27 ` [PATCH 9/9] btrfs: lift the call to mapping_set_error out of cow_file_range Christoph Hellwig
@ 2023-07-27 17:06 ` David Sterba
  2023-08-01 15:29   ` Christoph Hellwig
  2023-08-09 14:13   ` Christoph Hellwig
  9 siblings, 2 replies; 24+ messages in thread
From: David Sterba @ 2023-07-27 17:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Mon, Jul 24, 2023 at 06:26:52AM -0700, Christoph Hellwig wrote:
> Hi all,
> 
> this series has various fixes for bugs found in inspect or only triggered
> with upcoming changes that are a fallout from my work on bound lifetimes
> for the ordered extent and better confirming to expectations from the
> common writeback code.

I've so far merged patches 1-3, the rest will be in for-next as it's
quite risky and I'd appreciate more reviews.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: small writeback fixes v2
  2023-07-27 17:06 ` small writeback fixes v2 David Sterba
@ 2023-08-01 15:29   ` Christoph Hellwig
  2023-08-01 15:37     ` Josef Bacik
  2023-08-02 12:49     ` Josef Bacik
  2023-08-09 14:13   ` Christoph Hellwig
  1 sibling, 2 replies; 24+ messages in thread
From: Christoph Hellwig @ 2023-08-01 15:29 UTC (permalink / raw)
  To: David Sterba
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	linux-btrfs

On Thu, Jul 27, 2023 at 07:06:22PM +0200, David Sterba wrote:
> On Mon, Jul 24, 2023 at 06:26:52AM -0700, Christoph Hellwig wrote:
> > Hi all,
> > 
> > this series has various fixes for bugs found in inspect or only triggered
> > with upcoming changes that are a fallout from my work on bound lifetimes
> > for the ordered extent and better confirming to expectations from the
> > common writeback code.
> 
> I've so far merged patches 1-3, the rest will be in for-next as it's
> quite risky and I'd appreciate more reviews.

So who would be a suitable reviewer for this code in addition to Josef?

Any volunteers?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: small writeback fixes v2
  2023-08-01 15:29   ` Christoph Hellwig
@ 2023-08-01 15:37     ` Josef Bacik
  2023-08-02 12:49     ` Josef Bacik
  1 sibling, 0 replies; 24+ messages in thread
From: Josef Bacik @ 2023-08-01 15:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: David Sterba, Chris Mason, David Sterba, linux-btrfs, boris

On Tue, Aug 01, 2023 at 05:29:11PM +0200, Christoph Hellwig wrote:
> On Thu, Jul 27, 2023 at 07:06:22PM +0200, David Sterba wrote:
> > On Mon, Jul 24, 2023 at 06:26:52AM -0700, Christoph Hellwig wrote:
> > > Hi all,
> > > 
> > > this series has various fixes for bugs found in inspect or only triggered
> > > with upcoming changes that are a fallout from my work on bound lifetimes
> > > for the ordered extent and better confirming to expectations from the
> > > common writeback code.
> > 
> > I've so far merged patches 1-3, the rest will be in for-next as it's
> > quite risky and I'd appreciate more reviews.
> 
> So who would be a suitable reviewer for this code in addition to Josef?
> 
> Any volunteers?

Boris volunteers.  Also I'll queue this up in the CI testing thing, I had done
it with your first submission but hit all the other bugs I've been fixing.  It
didn't hit anything we didn't hit without your patches so I assume they'll do
fine, but we'll find out in a couple of hours.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: small writeback fixes v2
  2023-08-01 15:29   ` Christoph Hellwig
  2023-08-01 15:37     ` Josef Bacik
@ 2023-08-02 12:49     ` Josef Bacik
  2023-08-02 15:16       ` Christoph Hellwig
  1 sibling, 1 reply; 24+ messages in thread
From: Josef Bacik @ 2023-08-02 12:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: David Sterba, Chris Mason, David Sterba, linux-btrfs

On Tue, Aug 01, 2023 at 05:29:11PM +0200, Christoph Hellwig wrote:
> On Thu, Jul 27, 2023 at 07:06:22PM +0200, David Sterba wrote:
> > On Mon, Jul 24, 2023 at 06:26:52AM -0700, Christoph Hellwig wrote:
> > > Hi all,
> > > 
> > > this series has various fixes for bugs found in inspect or only triggered
> > > with upcoming changes that are a fallout from my work on bound lifetimes
> > > for the ordered extent and better confirming to expectations from the
> > > common writeback code.
> > 
> > I've so far merged patches 1-3, the rest will be in for-next as it's
> > quite risky and I'd appreciate more reviews.
> 
> So who would be a suitable reviewer for this code in addition to Josef?
> 
> Any volunteers?

I ran this through the CI and I got a deadlock on generic/476 with

[btrfs_block_group_tree]
TEST_DIR=/mnt/test
TEST_DEV=/dev/vdb
SCRATCH_DEV_POOL="/dev/vdc /dev/vdd /dev/vde /dev/vdg /dev/vdh /dev/vdi
/dev/vdj"
SCRATCH_MNT=/mnt/scratch
LOGWRITES_DEV=/dev/vdk
MKFS_OPTIONS="-K -O block-group-tree"
RECREATE_TEST_DEV=true

I got a panic with btrfs/190 with the following config

[btrfs_holes_spacecache]
TEST_DIR=/mnt/test
TEST_DEV=/dev/vdb
SCRATCH_DEV_POOL="/dev/vdc /dev/vdd /dev/vde /dev/vdg /dev/vdh /dev/vdi /dev/vdj"
SCRATCH_MNT=/mnt/scratch
LOGWRITES_DEV=/dev/vdk
MKFS_OPTIONS="-K -O ^no-holes,^free-space-tree"
RECREATE_TEST_DEV=true

[ 3461.147888] assertion failed: block_group->io_ctl.inode == NULL, in
fs/btrfs/block-group.c:4256
[ 3461.148437] ------------[ cut here ]------------
[ 3461.148632] kernel BUG at fs/btrfs/block-group.c:4256!
[ 3461.148857] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 3461.149073] CPU: 1 PID: 887651 Comm: umount Not tainted 6.5.0-rc3+ #1
[ 3461.149344] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
edk2-20230301gitf80f052277c8-26.fc38 03/01/2023
[ 3461.149772] RIP: 0010:btrfs_put_block_group_cache+0x136/0x140
[ 3461.150015] Code: 02 00 00 00 e8 8b 37 18 00 eb 88 b9 a0 10 00 00 48 c7 c2 7a
d9 aa 8f 48 c7 c6 98 af b5 8f 48 c7 c7 20 e1 b4 8f e8 3a c0 a7 ff <0f> 0b 0f 1f
84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90
[ 3461.150776] RSP: 0018:ffffaac483f6fcc0 EFLAGS: 00010246
[ 3461.150997] RAX: 0000000000000053 RBX: ffff99b414118000 RCX: 0000000000000000
[ 3461.151287] RDX: 0000000000000000 RSI: ffff99b47bd21840 RDI: ffff99b47bd21840
[ 3461.151769] RBP: ffff99b462afdc60 R08: 0000000000000000 R09: ffffaac483f6fb60
[ 3461.152082] R10: 0000000000000003 R11: ffffffff905553d0 R12: ffff99b414118010
[ 3461.152384] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[ 3461.152682] FS:  00007f43ff0c2800(0000) GS:ffff99b47bd00000(0000)
knlGS:0000000000000000
[ 3461.153007] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3461.153244] CR2: 000055a37df5c360 CR3: 000000017896a000 CR4: 0000000000350ee0
[ 3461.153549] Call Trace:
[ 3461.153657]  <TASK>
[ 3461.153752]  ? die+0x36/0x90
[ 3461.153885]  ? do_trap+0xda/0x100
[ 3461.154027]  ? btrfs_put_block_group_cache+0x136/0x140
[ 3461.154241]  ? btrfs_put_block_group_cache+0x136/0x140
[ 3461.154462]  ? do_error_trap+0x81/0x110
[ 3461.154631]  ? btrfs_put_block_group_cache+0x136/0x140
[ 3461.154952]  ? exc_invalid_op+0x50/0x70
[ 3461.155116]  ? btrfs_put_block_group_cache+0x136/0x140
[ 3461.155348]  ? asm_exc_invalid_op+0x1a/0x20
[ 3461.155533]  ? btrfs_put_block_group_cache+0x136/0x140
[ 3461.155748]  close_ctree+0x1da/0x600
[ 3461.155901]  ? fsnotify_sb_delete+0x16c/0x210
[ 3461.156088]  ? evict_inodes+0x169/0x1d0
[ 3461.156251]  generic_shutdown_super+0x80/0x150
[ 3461.156451]  kill_anon_super+0x18/0x30
[ 3461.156610]  btrfs_kill_super+0x16/0x20
[ 3461.156774]  deactivate_locked_super+0x33/0xa0
[ 3461.156963]  cleanup_mnt+0xba/0x150
[ 3461.157123]  task_work_run+0x5d/0xa0
[ 3461.157284]  exit_to_user_mode_prepare+0x23f/0x250
[ 3461.157505]  syscall_exit_to_user_mode+0x1a/0x50
[ 3461.157702]  do_syscall_64+0x6c/0x90
[ 3461.157856]  ? lockdep_hardirqs_on_prepare+0xe0/0x190
[ 3461.158101]  ? do_syscall_64+0x6c/0x90
[ 3461.158259]  ? do_user_addr_fault+0x293/0x840
[ 3461.158461]  ? trace_hardirqs_off+0x46/0xa0
[ 3461.158636]  ? exc_page_fault+0xf5/0x200
[ 3461.158800]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

I also got an EBUSY trying to umount $SCRATCH_MNT with generic/475 with

[btrfs_subpage_compress]
TEST_DIR=/mnt/test
TEST_DEV=/dev/vdb
SCRATCH_DEV_POOL="/dev/vdc /dev/vdd /dev/vde /dev/vdg /dev/vdh /dev/vdi
/dev/vdj"
SCRATCH_MNT=/mnt/scratch
LOGWRITES_DEV=/dev/vdk
MKFS_OPTIONS="-K -n 4k -s 4k"
MOUNT_OPTIONS="-o compress"
RECREATE_TEST_DEV=true

on an ARM machine with 64kib pagesize.  Though I'm pretty sure you're not to
blame for that last failure.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: small writeback fixes v2
  2023-08-02 12:49     ` Josef Bacik
@ 2023-08-02 15:16       ` Christoph Hellwig
  2023-08-02 15:35         ` Josef Bacik
  2023-08-02 17:55         ` Josef Bacik
  0 siblings, 2 replies; 24+ messages in thread
From: Christoph Hellwig @ 2023-08-02 15:16 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Christoph Hellwig, David Sterba, Chris Mason, David Sterba,
	linux-btrfs

On Wed, Aug 02, 2023 at 08:49:56AM -0400, Josef Bacik wrote:
> I ran this through the CI

Thanks a lot!

> [ 3461.147888] assertion failed: block_group->io_ctl.inode == NULL, in
> fs/btrfs/block-group.c:4256

Hmm, this looks so unrelated that it leaves me puzzled.  How confident
are you that this is a new issue based on the overall test setup?

> I also got an EBUSY trying to umount $SCRATCH_MNT with generic/475 with

> on an ARM machine with 64kib pagesize.  Though I'm pretty sure you're not to
> blame for that last failure.  Thanks,

Yes, I've seen EBUSY in 475 quite regulary even without the changes,
I think I also mentioned it in reply to the other 475-related discussion
we had.  I tried to debug it for a while but didn't manage to get far.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: small writeback fixes v2
  2023-08-02 15:16       ` Christoph Hellwig
@ 2023-08-02 15:35         ` Josef Bacik
  2023-08-03 17:11           ` Boris Burkov
  2023-08-02 17:55         ` Josef Bacik
  1 sibling, 1 reply; 24+ messages in thread
From: Josef Bacik @ 2023-08-02 15:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: David Sterba, Chris Mason, David Sterba, linux-btrfs

On Wed, Aug 02, 2023 at 05:16:43PM +0200, Christoph Hellwig wrote:
> On Wed, Aug 02, 2023 at 08:49:56AM -0400, Josef Bacik wrote:
> > I ran this through the CI
> 
> Thanks a lot!
> 
> > [ 3461.147888] assertion failed: block_group->io_ctl.inode == NULL, in
> > fs/btrfs/block-group.c:4256
> 
> Hmm, this looks so unrelated that it leaves me puzzled.  How confident
> are you that this is a new issue based on the overall test setup?
> 

This is the first I've seen this, so it could be new to for-next, your stuff, or
just simply haven't hit it in the ~20ish runs I've done with this new setup.
I'm going to go back and test these other ones with just for-next, but I wanted
to get the results to you in case they rang a bell or you wanted to debug
locally.

> > I also got an EBUSY trying to umount $SCRATCH_MNT with generic/475 with
> 
> > on an ARM machine with 64kib pagesize.  Though I'm pretty sure you're not to
> > blame for that last failure.  Thanks,
> 
> Yes, I've seen EBUSY in 475 quite regulary even without the changes,
> I think I also mentioned it in reply to the other 475-related discussion
> we had.  I tried to debug it for a while but didn't manage to get far.
> 

Yeah it definitely reproduces on for-next, I'm debugging this right now so don't
worry about this thing.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: small writeback fixes v2
  2023-08-02 15:16       ` Christoph Hellwig
  2023-08-02 15:35         ` Josef Bacik
@ 2023-08-02 17:55         ` Josef Bacik
  1 sibling, 0 replies; 24+ messages in thread
From: Josef Bacik @ 2023-08-02 17:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: David Sterba, Chris Mason, David Sterba, linux-btrfs

On Wed, Aug 02, 2023 at 05:16:43PM +0200, Christoph Hellwig wrote:
> On Wed, Aug 02, 2023 at 08:49:56AM -0400, Josef Bacik wrote:
> > I ran this through the CI
> 
> Thanks a lot!
> 
> > [ 3461.147888] assertion failed: block_group->io_ctl.inode == NULL, in
> > fs/btrfs/block-group.c:4256
> 
> Hmm, this looks so unrelated that it leaves me puzzled.  How confident
> are you that this is a new issue based on the overall test setup?
> 
> > I also got an EBUSY trying to umount $SCRATCH_MNT with generic/475 with
> 
> > on an ARM machine with 64kib pagesize.  Though I'm pretty sure you're not to
> > blame for that last failure.  Thanks,
> 
> Yes, I've seen EBUSY in 475 quite regulary even without the changes,
> I think I also mentioned it in reply to the other 475-related discussion
> we had.  I tried to debug it for a while but didn't manage to get far.
> 

[  594.911885] ------------[ cut here ]------------
[  594.911892] WARNING: CPU: 1 PID: 1833 at fs/namespace.c:179
mnt_add_count+0xa4/0xb8
[  594.911897] Modules linked in: rfkill vfat fat virtio_net net_failover
virtio_balloon failover loop zram virtiofs fuse virtio_console virtio_blk
qemu_fw_cfg
[  594.911908] CPU: 1 PID: 1833 Comm: kworker/1:7 Tainted: G        W
6.5.0-rc3+ #31
[  594.911910] Hardware name: QEMU KVM Virtual Machine, BIOS
edk2-20230301gitf80f052277c8-26.fc38 03/01/2023
[  594.911911] Workqueue: events delayed_fput
[  594.911916] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[  594.911918] pc : mnt_add_count+0xa4/0xb8
[  594.911920] lr : mnt_add_count+0x98/0xb8
[  594.911922] sp : ffff800093ea3c60
[  594.911923] x29: ffff800093ea3c60 x28: 0000000000000000 x27: 0000000000000000
[  594.911926] x26: ffff6b947f39d505 x25: ffffbdbf2bd9b1c8 x24: ffff6b9459997d40
[  594.911929] x23: ffff6b9456365728 x22: cd9fbdbf29b30b60 x21: ffff800093ea3ce8
[  594.911931] x20: ffff6b94596af5c0 x19: 00000000ffffffff x18: 0000000000003f75
[  594.911934] x17: 0000000000000000 x16: 0000000000000000 x15: ffff6b94497dc000
[  594.911936] x14: 0000000000000000 x13: 0000000000000030 x12: 0000000000000000
[  594.911939] x11: ffffbdbf32e69204 x10: ffffbdbf2b6dae50 x9 : ffffbdbf2f49f000
[  594.911942] x8 : ffff800093ea3c08 x7 : 0000000000000000 x6 : 00000000000fffff
[  594.911944] x5 : 00000000002b60c6 x4 : ffff6b94405716c0 x3 : ffff6b94405716c0
[  594.911947] x2 : ffffadd553c90000 x1 : ffff6b9441625900 x0 : 0000000060001020
[  594.911949] Call trace:
[  594.911950]  mnt_add_count+0xa4/0xb8
[  594.911952]  mntput_no_expire+0x88/0x4e0
[  594.911955]  mntput+0x28/0x48
[  594.911957]  __fput+0x134/0x2a0
[  594.911960]  delayed_fput+0x4c/0x68
[  594.911964]  process_one_work+0x290/0x5e0
[  594.911967]  worker_thread+0x7c/0x428
[  594.911970]  kthread+0x124/0x130
[  594.911972]  ret_from_fork+0x10/0x20
[  594.911974] irq event stamp: 27982
[  594.911975] hardirqs last  enabled at (27981): [<ffffbdbf2982162c>]
__call_rcu_common.constprop.0+0x16c/0x680
[  594.911980] hardirqs last disabled at (27982): [<ffffbdbf2a7642e4>]
el1_dbg+0x24/0x90
[  594.911984] softirqs last  enabled at (24348): [<ffffbdbf2a4d8650>]
neigh_managed_work+0xd8/0xf8
[  594.911989] softirqs last disabled at (24344): [<ffffbdbf2a4d85a8>]
neigh_managed_work+0x30/0xf8
[  594.911993] ---[ end trace 0000000000000000 ]---

So it's the delayed fput thing that we already know about.  I talked to Jens and
he's on his way to a fix but isn't there yet.

I'll drop a patch in our fstests to just umount in a loop, this test uncovers
lots of things so I don't want to exclude it from the CI runs.  I'll go look at
the other failures now.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/9] btrfs: don't stop integrity writeback too early
  2023-07-24 13:26 ` [PATCH 1/9] btrfs: don't stop integrity writeback too early Christoph Hellwig
@ 2023-08-02 22:49   ` Boris Burkov
  0 siblings, 0 replies; 24+ messages in thread
From: Boris Burkov @ 2023-08-02 22:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Mon, Jul 24, 2023 at 06:26:53AM -0700, Christoph Hellwig wrote:
> extent_write_cache_pages stops writing pages as soon as nr_to_write hits
> zero.  That is the right thing for opportunistic writeback, but incorrect
> for data integrity writeback, which needs to ensure that no dirty pages
> are left in the range.  Thus only stop the writeback for WB_SYNC_NONE
> if nr_to_write hits 0.
> 
> This is a port of write_cache_pages changes in commit 05fe478dd04e
> ("mm: write_cache_pages integrity fix").

This makes sense to me. What is the reason the same reasoning doesn't
apply to btree_write_cache_pages? Does the issue only happen in practice
with fsync that no one is doing on the btree inode? It feels, in theory,
we could do a writepages with SYNC_ALL and hit the same issue with pages
going dirty and stealing the nr_writes.

> 
> Note that I've only trigger the problem with other changes to the btrfs
> writeback code, but this condition seems worthwhile fixing anyway.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/extent_io.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index c0440a0988c9a8..231e620e6c497d 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2098,7 +2098,8 @@ static int extent_write_cache_pages(struct address_space *mapping,
>  			 * We have to make sure to honor the new nr_to_write
>  			 * at any time
>  			 */
> -			nr_to_write_done = wbc->nr_to_write <= 0;
> +			nr_to_write_done = wbc->sync_mode == WB_SYNC_NONE &&
> +						wbc->nr_to_write <= 0;
>  		}
>  		folio_batch_release(&fbatch);
>  		cond_resched();
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/9] btrfs: move the cow_fixup earlier in writepages handling
  2023-07-24 13:26 ` [PATCH 4/9] btrfs: move the cow_fixup earlier in writepages handling Christoph Hellwig
@ 2023-08-03  0:21   ` Boris Burkov
  0 siblings, 0 replies; 24+ messages in thread
From: Boris Burkov @ 2023-08-03  0:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Mon, Jul 24, 2023 at 06:26:56AM -0700, Christoph Hellwig wrote:
> btrfs has a special fixup for pages that are marked dirty without having
> space reserved for them.  But the place where is is run means it can't
> work for I/O that isn't kicked off inline from __extent_writepage, most
> notable compressed I/O and I/O to zoned file systems.
> 
> Move the fixup earlier based on not findining any delalloc range in the
> I/O tree to cover this case as well instead of relying on the fairly
> obscure fallthrough behavior that calls __extent_write_page_io even when
> no delalloc space was found.

This almost makes sense to me, but not quite. As far as I can tell, the
zoned and compressed cases you are describing are the cases in
btrfs_run_delalloc_range which end up calling extent_write_locked_range.
And indeed, if that happens, it appears we return 1, don't call
__extent_writepage, and don't do the redirty check. However, if that
happens, then your new code won't run either, because it will set
found_delalloc after btrfs_run_delalloc_range returns 1 (>= 0).

Therefore, it must be the case that your new check uses the assumption
that in any case where the fixup would trip, the find_delalloc must have
failed as well. Let's assume that's true, because we always set the
delalloc bit for any page we properly dirty. Even then, this feels like
it strictly reduces the cases we do the fixup.

To me it seems like best case this is a no-op change, worst case it
reduces the cases where catch wrong dirty pages.

Put another way, I don't see a codepath which hits this logic, but
doesn't hit the old logic.

Do you have a reproducer for what this is fixing?

Thanks,
Boris

> 
> Fixes: c8b978188c9a ("Btrfs: Add zlib compression support")
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/extent_io.c | 31 +++++++++++++++++--------------
>  1 file changed, 17 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 1cc46bbbd888cd..cc258bddd88eab 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -1153,6 +1153,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
>  	u64 delalloc_start = page_start;
>  	u64 delalloc_end = page_end;
>  	u64 delalloc_to_write = 0;
> +	bool found_delalloc = false;
>  	int ret = 0;
>  
>  	while (delalloc_start < page_end) {
> @@ -1169,6 +1170,22 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
>  			return ret;
>  
>  		delalloc_start = delalloc_end + 1;
> +		found_delalloc = true;
> +	}
> +
> +	/*
> +	 * If we did not find any delalloc range in the io_tree, this must be
> +	 * the rare case of dirtying pages through get_user_pages without
> +	 * calling into ->page_mkwrite.
> +	 * While these are in the process of being fixed by switching to
> +	 * pin_user_pages, some are still around and need to be worked around
> +	 * by creating a delalloc reservation in a fixup worker, and waiting
> +	 * us to be called again with that reservation.
> +	 */
> +	if (!found_delalloc && btrfs_writepage_cow_fixup(page)) {
> +		redirty_page_for_writepage(wbc, page);
> +		unlock_page(page);
> +		return 1;
>  	}
>  
>  	/*
> @@ -1274,14 +1291,6 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
>  	int ret = 0;
>  	int nr = 0;
>  
> -	ret = btrfs_writepage_cow_fixup(page);
> -	if (ret) {
> -		/* Fixup worker will requeue */
> -		redirty_page_for_writepage(bio_ctrl->wbc, page);
> -		unlock_page(page);
> -		return 1;
> -	}
> -
>  	bio_ctrl->end_io_func = end_bio_extent_writepage;
>  	while (cur <= end) {
>  		u32 len = end - cur + 1;
> @@ -1421,9 +1430,6 @@ static int __extent_writepage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl
>  		goto done;
>  
>  	ret = __extent_writepage_io(BTRFS_I(inode), page, bio_ctrl, i_size, &nr);
> -	if (ret == 1)
> -		return 0;
> -
>  	bio_ctrl->wbc->nr_to_write--;
>  
>  done:
> @@ -2176,8 +2182,6 @@ void extent_write_locked_range(struct inode *inode, struct page *locked_page,
>  
>  		ret = __extent_writepage_io(BTRFS_I(inode), page, &bio_ctrl,
>  					    i_size, &nr);
> -		if (ret == 1)
> -			goto next_page;
>  
>  		/* Make sure the mapping tag for page dirty gets cleared. */
>  		if (nr == 0) {
> @@ -2193,7 +2197,6 @@ void extent_write_locked_range(struct inode *inode, struct page *locked_page,
>  		btrfs_page_unlock_writer(fs_info, page, cur, cur_len);
>  		if (ret < 0)
>  			found_error = true;
> -next_page:
>  		put_page(page);
>  		cur = cur_end + 1;
>  	}
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/9] btrfs: stop submitting I/O after an error in extent_write_locked_range
  2023-07-24 13:26 ` [PATCH 6/9] btrfs: stop submitting I/O after an error in extent_write_locked_range Christoph Hellwig
@ 2023-08-03  0:42   ` Boris Burkov
  0 siblings, 0 replies; 24+ messages in thread
From: Boris Burkov @ 2023-08-03  0:42 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Mon, Jul 24, 2023 at 06:26:58AM -0700, Christoph Hellwig wrote:
> As soon a __extent_writepage_io returns an error we know that the
> extent_map is corrupted or we're out of memory, and there is no point
> in continuing to submit I/O.  Follow the behavior in extent_write_cache
> pages and stop submitting I/O, and instead just unlock all pages.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Boris Burkov <boris@bur.io>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 7/9] btrfs: fix a race in clearing the writeback bit for sub-page I/O
  2023-07-24 13:26 ` [PATCH 7/9] btrfs: fix a race in clearing the writeback bit for sub-page I/O Christoph Hellwig
@ 2023-08-03  0:59   ` Boris Burkov
  0 siblings, 0 replies; 24+ messages in thread
From: Boris Burkov @ 2023-08-03  0:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Mon, Jul 24, 2023 at 06:26:59AM -0700, Christoph Hellwig wrote:
> For sub-page I/O, a fast I/O completion can clear the page writeback bit
> in the I/O completion handler before subsequent bios were submitted.
> Use a trick from iomap and defer submission of bios until we're reached
> the end of the page to avoid this race.

This LGTM in that I don't see a bug.

However, I'm confused why it's exactly necessary: doesn't the existing
logic already only allocate one bio and calls bio_add_page on it in a
loop. And bio_add_page handles the subpage blocksize case.

As far as I can tell, it tries to do it sequentially so it should be
contiguous. Are you worried about non-contiguous writes within one page
resulting in separate bios? In that case, why would a completion clear
the writeback bit on the page if the whole page isn't written back? I
guess that could be tricky to do and this is the correct solution?

Thanks,
Boris

> 
> Fixes: c5ef5c6c733a ("btrfs: make __extent_writepage_io() only submit dirty range for subpage")
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/extent_io.c | 54 ++++++++++++++++++++++++++++++++++----------
>  1 file changed, 42 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index fada7a1931b130..48a49c57daa6fd 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -105,7 +105,8 @@ struct btrfs_bio_ctrl {
>  	struct writeback_control *wbc;
>  };
>  
> -static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl)
> +static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl,
> +			   struct bio_list *bio_list)
>  {
>  	struct btrfs_bio *bbio = bio_ctrl->bbio;
>  
> @@ -118,6 +119,8 @@ static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl)
>  	if (btrfs_op(&bbio->bio) == BTRFS_MAP_READ &&
>  	    bio_ctrl->compress_type != BTRFS_COMPRESS_NONE)
>  		btrfs_submit_compressed_read(bbio);
> +	else if (bio_list)
> +		bio_list_add(bio_list, &bbio->bio);
>  	else
>  		btrfs_submit_bio(bbio, 0);
>  
> @@ -141,7 +144,22 @@ static void submit_write_bio(struct btrfs_bio_ctrl *bio_ctrl, int ret)
>  		/* The bio is owned by the end_io handler now */
>  		bio_ctrl->bbio = NULL;
>  	} else {
> -		submit_one_bio(bio_ctrl);
> +		submit_one_bio(bio_ctrl, NULL);
> +	}
> +}
> +
> +static void btrfs_submit_pending_bios(struct bio_list *bio_list, int ret)
> +{
> +	struct bio *bio;
> +
> +	if (ret) {
> +		blk_status_t status = errno_to_blk_status(ret);
> +
> +		while ((bio = bio_list_pop(bio_list)))
> +			btrfs_bio_end_io(btrfs_bio(bio), status);
> +	} else {
> +		while ((bio = bio_list_pop(bio_list)))
> +			btrfs_submit_bio(btrfs_bio(bio), 0);
>  	}
>  }
>  
> @@ -791,7 +809,8 @@ static void alloc_new_bio(struct btrfs_inode *inode,
>   */
>  static void submit_extent_page(struct btrfs_bio_ctrl *bio_ctrl,
>  			       u64 disk_bytenr, struct page *page,
> -			       size_t size, unsigned long pg_offset)
> +			       size_t size, unsigned long pg_offset,
> +			       struct bio_list *bio_list)
>  {
>  	struct btrfs_inode *inode = BTRFS_I(page->mapping->host);
>  
> @@ -800,7 +819,7 @@ static void submit_extent_page(struct btrfs_bio_ctrl *bio_ctrl,
>  
>  	if (bio_ctrl->bbio &&
>  	    !btrfs_bio_is_contig(bio_ctrl, page, disk_bytenr, pg_offset))
> -		submit_one_bio(bio_ctrl);
> +		submit_one_bio(bio_ctrl, bio_list);
>  
>  	do {
>  		u32 len = size;
> @@ -820,7 +839,7 @@ static void submit_extent_page(struct btrfs_bio_ctrl *bio_ctrl,
>  
>  		if (bio_add_page(&bio_ctrl->bbio->bio, page, len, pg_offset) != len) {
>  			/* bio full: move on to a new one */
> -			submit_one_bio(bio_ctrl);
> +			submit_one_bio(bio_ctrl, bio_list);
>  			continue;
>  		}
>  
> @@ -834,7 +853,7 @@ static void submit_extent_page(struct btrfs_bio_ctrl *bio_ctrl,
>  
>  		/* Ordered extent boundary: move on to a new bio. */
>  		if (bio_ctrl->len_to_oe_boundary == 0)
> -			submit_one_bio(bio_ctrl);
> +			submit_one_bio(bio_ctrl, bio_list);
>  	} while (size);
>  }
>  
> @@ -1082,14 +1101,14 @@ static int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
>  		}
>  
>  		if (bio_ctrl->compress_type != compress_type) {
> -			submit_one_bio(bio_ctrl);
> +			submit_one_bio(bio_ctrl, NULL);
>  			bio_ctrl->compress_type = compress_type;
>  		}
>  
>  		if (force_bio_submit)
> -			submit_one_bio(bio_ctrl);
> +			submit_one_bio(bio_ctrl, NULL);
>  		submit_extent_page(bio_ctrl, disk_bytenr, page, iosize,
> -				   pg_offset);
> +				   pg_offset, NULL);
>  		cur = cur + iosize;
>  		pg_offset += iosize;
>  	}
> @@ -1113,7 +1132,7 @@ int btrfs_read_folio(struct file *file, struct folio *folio)
>  	 * If btrfs_do_readpage() failed we will want to submit the assembled
>  	 * bio to do the cleanup.
>  	 */
> -	submit_one_bio(&bio_ctrl);
> +	submit_one_bio(&bio_ctrl, NULL);
>  	return ret;
>  }
>  
> @@ -1287,6 +1306,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
>  	u64 extent_offset;
>  	u64 block_start;
>  	struct extent_map *em;
> +	struct bio_list bio_list = BIO_EMPTY_LIST;
>  	int ret = 0;
>  	int nr = 0;
>  
> @@ -1365,7 +1385,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
>  		btrfs_page_clear_dirty(fs_info, page, cur, iosize);
>  
>  		submit_extent_page(bio_ctrl, disk_bytenr, page, iosize,
> -				   cur - page_offset(page));
> +				   cur - page_offset(page), &bio_list);
>  		cur += iosize;
>  		nr++;
>  	}
> @@ -1378,6 +1398,16 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
>  		set_page_writeback(page);
>  		end_page_writeback(page);
>  	}
> +
> +	/*
> +	 * Submit all bios we queued up for this page.
> +	 *
> +	 * The bios are not submitted directly after building them as otherwise
> +	 * a very fast I/O completion on an earlier bio could clear the page
> +	 * writeback bit before the subsequent bios are even submitted.
> +	 */
> +	btrfs_submit_pending_bios(&bio_list, ret);
> +
>  	if (ret) {
>  		u32 remaining = end + 1 - cur;
>  
> @@ -2243,7 +2273,7 @@ void extent_readahead(struct readahead_control *rac)
>  
>  	if (em_cached)
>  		free_extent_map(em_cached);
> -	submit_one_bio(&bio_ctrl);
> +	submit_one_bio(&bio_ctrl, NULL);
>  }
>  
>  /*
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: small writeback fixes v2
  2023-08-02 15:35         ` Josef Bacik
@ 2023-08-03 17:11           ` Boris Burkov
  0 siblings, 0 replies; 24+ messages in thread
From: Boris Burkov @ 2023-08-03 17:11 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Christoph Hellwig, David Sterba, Chris Mason, David Sterba,
	linux-btrfs

On Wed, Aug 02, 2023 at 11:35:27AM -0400, Josef Bacik wrote:
> On Wed, Aug 02, 2023 at 05:16:43PM +0200, Christoph Hellwig wrote:
> > On Wed, Aug 02, 2023 at 08:49:56AM -0400, Josef Bacik wrote:
> > > I ran this through the CI
> > 
> > Thanks a lot!
> > 
> > > [ 3461.147888] assertion failed: block_group->io_ctl.inode == NULL, in
> > > fs/btrfs/block-group.c:4256
> > 
> > Hmm, this looks so unrelated that it leaves me puzzled.  How confident
> > are you that this is a new issue based on the overall test setup?
> > 
> 
> This is the first I've seen this, so it could be new to for-next, your stuff, or
> just simply haven't hit it in the ~20ish runs I've done with this new setup.
> I'm going to go back and test these other ones with just for-next, but I wanted
> to get the results to you in case they rang a bell or you wanted to debug
> locally.

FWIW, I did not hit the crash in 1000 runs with that same config on the
btrfs_writeback_fixes branch.

> 
> > > I also got an EBUSY trying to umount $SCRATCH_MNT with generic/475 with
> > 
> > > on an ARM machine with 64kib pagesize.  Though I'm pretty sure you're not to
> > > blame for that last failure.  Thanks,
> > 
> > Yes, I've seen EBUSY in 475 quite regulary even without the changes,
> > I think I also mentioned it in reply to the other 475-related discussion
> > we had.  I tried to debug it for a while but didn't manage to get far.
> > 
> 
> Yeah it definitely reproduces on for-next, I'm debugging this right now so don't
> worry about this thing.  Thanks,
> 
> Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: small writeback fixes v2
  2023-07-27 17:06 ` small writeback fixes v2 David Sterba
  2023-08-01 15:29   ` Christoph Hellwig
@ 2023-08-09 14:13   ` Christoph Hellwig
  1 sibling, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2023-08-09 14:13 UTC (permalink / raw)
  To: David Sterba
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	linux-btrfs

On Thu, Jul 27, 2023 at 07:06:22PM +0200, David Sterba wrote:
> On Mon, Jul 24, 2023 at 06:26:52AM -0700, Christoph Hellwig wrote:
> > Hi all,
> > 
> > this series has various fixes for bugs found in inspect or only triggered
> > with upcoming changes that are a fallout from my work on bound lifetimes
> > for the ordered extent and better confirming to expectations from the
> > common writeback code.
> 
> I've so far merged patches 1-3, the rest will be in for-next as it's
> quite risky and I'd appreciate more reviews.

Please drop it from for-next for now.  There's plenty of outstanding
issues, and with this week being last minute business travel followed
by two weeks of vacation I'm not going to be anywhere near my subpage
test rig even if I had time to look into all the issues.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-08-09 14:13 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-24 13:26 small writeback fixes v2 Christoph Hellwig
2023-07-24 13:26 ` [PATCH 1/9] btrfs: don't stop integrity writeback too early Christoph Hellwig
2023-08-02 22:49   ` Boris Burkov
2023-07-24 13:26 ` [PATCH 2/9] btrfs: don't wait for writeback on clean pages in extent_write_cache_pages Christoph Hellwig
2023-07-24 13:26 ` [PATCH 3/9] btrfs: fix an error handling corner case in cow_file_range Christoph Hellwig
2023-07-24 13:26 ` [PATCH 4/9] btrfs: move the cow_fixup earlier in writepages handling Christoph Hellwig
2023-08-03  0:21   ` Boris Burkov
2023-07-24 13:26 ` [PATCH 5/9] btrfs: fix handling of errors from __extent_writepage_io Christoph Hellwig
2023-07-24 13:26 ` [PATCH 6/9] btrfs: stop submitting I/O after an error in extent_write_locked_range Christoph Hellwig
2023-08-03  0:42   ` Boris Burkov
2023-07-24 13:26 ` [PATCH 7/9] btrfs: fix a race in clearing the writeback bit for sub-page I/O Christoph Hellwig
2023-08-03  0:59   ` Boris Burkov
2023-07-24 13:27 ` [PATCH 8/9] btrfs: remove the call to btrfs_mark_ordered_io_finished in btrfs_writepage_fixup_worker Christoph Hellwig
2023-07-24 13:27 ` [PATCH 9/9] btrfs: lift the call to mapping_set_error out of cow_file_range Christoph Hellwig
2023-07-27 17:06 ` small writeback fixes v2 David Sterba
2023-08-01 15:29   ` Christoph Hellwig
2023-08-01 15:37     ` Josef Bacik
2023-08-02 12:49     ` Josef Bacik
2023-08-02 15:16       ` Christoph Hellwig
2023-08-02 15:35         ` Josef Bacik
2023-08-03 17:11           ` Boris Burkov
2023-08-02 17:55         ` Josef Bacik
2023-08-09 14:13   ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2023-07-13 13:04 small writeback fixes Christoph Hellwig
2023-07-13 13:04 ` [PATCH 2/9] btrfs: don't wait for writeback on clean pages in extent_write_cache_pages Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).