[PATCH 0/2] Metadata IO error fixes

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] Metadata IO error fixes
@ 2021-11-24 17:37 Josef Bacik
  2021-11-24 17:37 ` [PATCH 1/2] btrfs: clear extent buffer uptodate when we fail to write it Josef Bacik
  2021-11-24 17:37 ` [PATCH 2/2] btrfs: check the root node for uptodate before returning it Josef Bacik
  0 siblings, 2 replies; 4+ messages in thread
From: Josef Bacik @ 2021-11-24 17:37 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Hello,

I saw a dmesg failure with generic/281 on our overnight runs.  This turned out
to be because we weren't getting an error back from btrfs_search_slot() even
though we found a metadata block that shouldn't have been uptodate.

The root cause is that write errors on the page clear uptodate on the page, but
not on the extent buffer itself.  Since we rely on that bit to tell wether the
extent buffer is valid or not we don't notice that the eb is bogus when we find
it in cache in a subsequent write, and eventually trip over
assert_eb_page_uptodate() warnings.

This fixes the problem I was seeing, I could easily reproduce by running
generic/281 in a loop a few times.  With these pages I haven't reproduced in 20
loops.  Thanks,

Josef

Josef Bacik (2):
  btrfs: clear extent buffer uptodate when we fail to write it
  btrfs: check the root node for uptodate before returning it

 fs/btrfs/ctree.c     | 19 +++++++++++++++----
 fs/btrfs/extent_io.c |  6 ++++++
 2 files changed, 21 insertions(+), 4 deletions(-)

-- 
2.26.3

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] btrfs: clear extent buffer uptodate when we fail to write it
  2021-11-24 17:37 [PATCH 0/2] Metadata IO error fixes Josef Bacik
@ 2021-11-24 17:37 ` Josef Bacik
  2021-11-25  8:50   ` Nikolay Borisov
  2021-11-24 17:37 ` [PATCH 2/2] btrfs: check the root node for uptodate before returning it Josef Bacik
  1 sibling, 1 reply; 4+ messages in thread
From: Josef Bacik @ 2021-11-24 17:37 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

I got dmesg errors on generic/281 on our overnight xfstests.  Looking at
the history this happens occasionally, with errors like this

------------[ cut here ]------------
WARNING: CPU: 0 PID: 673217 at fs/btrfs/extent_io.c:6848 assert_eb_page_uptodate+0x3f/0x50
CPU: 0 PID: 673217 Comm: kworker/u4:13 Tainted: G        W         5.16.0-rc2+ #469
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Workqueue: btrfs-cache btrfs_work_helper
RIP: 0010:assert_eb_page_uptodate+0x3f/0x50
RSP: 0018:ffffae598230bc60 EFLAGS: 00010246
RAX: 0017ffffc0002112 RBX: ffffebaec4100900 RCX: 0000000000001000
RDX: ffffebaec45733c7 RSI: ffffebaec4100900 RDI: ffff9fd98919f340
RBP: 0000000000000d56 R08: ffff9fd98e300000 R09: 0000000000000000
R10: 0001207370a91c50 R11: 0000000000000000 R12: 00000000000007b0
R13: ffff9fd98919f340 R14: 0000000001500000 R15: 0000000001cb0000
FS:  0000000000000000(0000) GS:ffff9fd9fbc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f549fcf8940 CR3: 0000000114908004 CR4: 0000000000370ef0
Call Trace:

 extent_buffer_test_bit+0x3f/0x70
 free_space_test_bit+0xa6/0xc0
 load_free_space_tree+0x1d6/0x430
 caching_thread+0x454/0x630
 ? rcu_read_lock_sched_held+0x12/0x60
 ? rcu_read_lock_sched_held+0x12/0x60
 ? rcu_read_lock_sched_held+0x12/0x60
 ? lock_release+0x1f0/0x2d0
 btrfs_work_helper+0xf2/0x3e0
 ? lock_release+0x1f0/0x2d0
 ? finish_task_switch.isra.0+0xf9/0x3a0
 process_one_work+0x270/0x5a0
 worker_thread+0x55/0x3c0
 ? process_one_work+0x5a0/0x5a0
 kthread+0x174/0x1a0
 ? set_kthread_struct+0x40/0x40
 ret_from_fork+0x1f/0x30

This happens because we're trying to read from a extent buffer page that
is !PageUptodate.  This happens because we will clear the page uptodate
when we have an IO error, but we don't clear the extent buffer uptodate.
If we do a read later and find this extent buffer we'll think its valid
and not return an error, and then trip over this warning.

Fix this by also clearing uptodate on the extent buffer when this
happens, so that we get an error when we do a btrfs_search_slot() and
find this block later.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent_io.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index b289d26aca0d..3454cac28389 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4308,6 +4308,12 @@ static void set_btree_ioerr(struct page *page, struct extent_buffer *eb)
 	if (test_and_set_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags))
 		return;
 
+	/*
+	 * A read may stumble upon this buffer later, make sure that it gets an
+	 * error and knows there was an error.
+	 */
+	clear_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+
 	/*
 	 * If we error out, we should add back the dirty_metadata_bytes
 	 * to make it consistent.
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] btrfs: check the root node for uptodate before returning it
  2021-11-24 17:37 [PATCH 0/2] Metadata IO error fixes Josef Bacik
  2021-11-24 17:37 ` [PATCH 1/2] btrfs: clear extent buffer uptodate when we fail to write it Josef Bacik
@ 2021-11-24 17:37 ` Josef Bacik
  1 sibling, 0 replies; 4+ messages in thread
From: Josef Bacik @ 2021-11-24 17:37 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Now that we clear the extent buffer uptodate if we fail to write it out
we need to check to see if our root node is uptodate before we search
down it.  Otherwise we could return stale data (or potentially corrupt
data that was caught by the write verification step) and think that the
path is OK to search down.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 216bf35f6caf..d2297e449072 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1568,12 +1568,9 @@ static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root *root,
 							int write_lock_level)
 {
 	struct extent_buffer *b;
-	int root_lock;
+	int root_lock = 0;
 	int level = 0;
 
-	/* We try very hard to do read locks on the root */
-	root_lock = BTRFS_READ_LOCK;
-
 	if (p->search_commit_root) {
 		b = root->commit_root;
 		atomic_inc(&b->refs);
@@ -1593,6 +1590,9 @@ static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root *root,
 		goto out;
 	}
 
+	/* We try very hard to do read locks on the root */
+	root_lock = BTRFS_READ_LOCK;
+
 	/*
 	 * If the level is set to maximum, we can skip trying to get the read
 	 * lock.
@@ -1619,6 +1619,17 @@ static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root *root,
 	level = btrfs_header_level(b);
 
 out:
+	/*
+	 * The root may have failed to write out at some point, and thus is no
+	 * longer valid, return an error in this case.
+	 */
+	if (!extent_buffer_uptodate(b)) {
+		if (root_lock)
+			btrfs_tree_unlock_rw(b, root_lock);
+		free_extent_buffer(b);
+		return ERR_PTR(-EIO);
+	}
+
 	p->nodes[level] = b;
 	if (!p->skip_locking)
 		p->locks[level] = root_lock;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] btrfs: clear extent buffer uptodate when we fail to write it
  2021-11-24 17:37 ` [PATCH 1/2] btrfs: clear extent buffer uptodate when we fail to write it Josef Bacik
@ 2021-11-25  8:50   ` Nikolay Borisov
  0 siblings, 0 replies; 4+ messages in thread
From: Nikolay Borisov @ 2021-11-25  8:50 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 24.11.21 г. 19:37, Josef Bacik wrote:
> I got dmesg errors on generic/281 on our overnight xfstests.  Looking at
> the history this happens occasionally, with errors like this
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 673217 at fs/btrfs/extent_io.c:6848 assert_eb_page_uptodate+0x3f/0x50
> CPU: 0 PID: 673217 Comm: kworker/u4:13 Tainted: G        W         5.16.0-rc2+ #469
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
> Workqueue: btrfs-cache btrfs_work_helper
> RIP: 0010:assert_eb_page_uptodate+0x3f/0x50
> RSP: 0018:ffffae598230bc60 EFLAGS: 00010246
> RAX: 0017ffffc0002112 RBX: ffffebaec4100900 RCX: 0000000000001000
> RDX: ffffebaec45733c7 RSI: ffffebaec4100900 RDI: ffff9fd98919f340
> RBP: 0000000000000d56 R08: ffff9fd98e300000 R09: 0000000000000000
> R10: 0001207370a91c50 R11: 0000000000000000 R12: 00000000000007b0
> R13: ffff9fd98919f340 R14: 0000000001500000 R15: 0000000001cb0000
> FS:  0000000000000000(0000) GS:ffff9fd9fbc00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f549fcf8940 CR3: 0000000114908004 CR4: 0000000000370ef0
> Call Trace:
> 
>  extent_buffer_test_bit+0x3f/0x70
>  free_space_test_bit+0xa6/0xc0
>  load_free_space_tree+0x1d6/0x430
>  caching_thread+0x454/0x630
>  ? rcu_read_lock_sched_held+0x12/0x60
>  ? rcu_read_lock_sched_held+0x12/0x60
>  ? rcu_read_lock_sched_held+0x12/0x60
>  ? lock_release+0x1f0/0x2d0
>  btrfs_work_helper+0xf2/0x3e0
>  ? lock_release+0x1f0/0x2d0
>  ? finish_task_switch.isra.0+0xf9/0x3a0
>  process_one_work+0x270/0x5a0
>  worker_thread+0x55/0x3c0
>  ? process_one_work+0x5a0/0x5a0
>  kthread+0x174/0x1a0
>  ? set_kthread_struct+0x40/0x40
>  ret_from_fork+0x1f/0x30
> 
> This happens because we're trying to read from a extent buffer page that
> is !PageUptodate.  This happens because we will clear the page uptodate
> when we have an IO error, but we don't clear the extent buffer uptodate.
> If we do a read later and find this extent buffer we'll think its valid
> and not return an error, and then trip over this warning.
> 
> Fix this by also clearing uptodate on the extent buffer when this
> happens, so that we get an error when we do a btrfs_search_slot() and
> find this block later.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/extent_io.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index b289d26aca0d..3454cac28389 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4308,6 +4308,12 @@ static void set_btree_ioerr(struct page *page, struct extent_buffer *eb)
>  	if (test_and_set_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags))
>  		return;
>  
> +	/*
> +	 * A read may stumble upon this buffer later, make sure that it gets an
> +	 * error and knows there was an error.
> +	 */
> +	clear_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);


Is it sufficient to set the flag only on the extent buffer, what about
using clear_extent_buffer_uptodate so that constituent pages also get
their UPTODATE cleared?

Also I can't help but think can't we get rid of the BUFFER_WRITE_ERR
because an error during write is signaled by both !UPTODATE and
BUFFER_WRITE_ERR being set.

Looking at the various call sites of set_btree_ioerr they'd call
set_btree_ioerr when the bio has errored out or if
EXTENT_BUFFER_WRITE_ERR is set but in the latter case set_btree_ioerr is
a noop due to the test_and_set_bit() call in set_btree_ioerr.

> +
>  	/*
>  	 * If we error out, we should add back the dirty_metadata_bytes
>  	 * to make it consistent.
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-11-25  8:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-11-24 17:37 [PATCH 0/2] Metadata IO error fixes Josef Bacik
2021-11-24 17:37 ` [PATCH 1/2] btrfs: clear extent buffer uptodate when we fail to write it Josef Bacik
2021-11-25  8:50   ` Nikolay Borisov
2021-11-24 17:37 ` [PATCH 2/2] btrfs: check the root node for uptodate before returning it Josef Bacik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox