public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] Error handling fixes
@ 2023-02-07 16:57 Josef Bacik
  2023-02-07 16:57 ` [PATCH 1/7] btrfs: use btrfs_handle_fs_error in btrfs_fill_super Josef Bacik
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Josef Bacik @ 2023-02-07 16:57 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Hello,

For a short period of time our btrfs backport had 947a629988f1 ("btrfs: move
tree block parentness check into validate_extent_buffer()") without the
associated fix, which resulted in a lot of hilarity.

One of the things that popped was a WARN_ON(ret == 1) in __btrfs_free_extent
where we didn't find the bytenr we were looking for.  This was troubling, as it
appeared that we were losing the EIO and returning 1 from btrfs_search_slot.

I rigged up my error injection stress test with
btrfs_check_leaf/btrfs_check_node with balance (as this was the path that we saw
the error).  This of course uncovered a few other unrelated things, but
eventually I reproduced what we saw in production.  Thankfully it was not that
we were eating the -EIO and returning 1 instead, however the actual problem is
worse.  We do not handle the errors properly in snapshot delete (which also gets
used by reloation), and then we do not abort the transaction when we hit errors
in this path, which leads to the file system being corrupted and eventually
triggers the above WARN_ON().

With these fixes in place my stress testing was running overnight without
tripping over any other leaks, corruptions, or panics.  Previously I wasn't able
to run for longer than a couple of minutes without falling over.  Thanks,

Josef

Josef Bacik (7):
  btrfs: use btrfs_handle_fs_error in btrfs_fill_super
  btrfs: replace BUG_ON(level == 0) with ASSERT(level)
  btrfs: handle errors from btrfs_read_node_slot in split
  btrfs: iput on orphan cleanup failure
  btrfs: drop root refs properly when orphan cleanup fails
  btrfs: handle errors in walk_down_tree properly
  btrfs: abort the transaction if we get an error during snapshot drop

 fs/btrfs/ctree.c       | 55 +++++++++++++++++++++---------------------
 fs/btrfs/disk-io.c     |  4 +--
 fs/btrfs/extent-tree.c | 10 +++++---
 fs/btrfs/inode.c       |  5 +++-
 fs/btrfs/super.c       |  1 +
 5 files changed, 40 insertions(+), 35 deletions(-)

-- 
2.26.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-02-20 19:34 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-07 16:57 [PATCH 0/7] Error handling fixes Josef Bacik
2023-02-07 16:57 ` [PATCH 1/7] btrfs: use btrfs_handle_fs_error in btrfs_fill_super Josef Bacik
2023-02-08  9:39   ` Johannes Thumshirn
2023-02-07 16:57 ` [PATCH 2/7] btrfs: replace BUG_ON(level == 0) with ASSERT(level) Josef Bacik
2023-02-08  9:39   ` Johannes Thumshirn
2023-02-07 16:57 ` [PATCH 3/7] btrfs: handle errors from btrfs_read_node_slot in split Josef Bacik
2023-02-07 16:57 ` [PATCH 4/7] btrfs: iput on orphan cleanup failure Josef Bacik
2023-02-07 16:57 ` [PATCH 5/7] btrfs: drop root refs properly when orphan cleanup fails Josef Bacik
2023-02-08  9:42   ` Johannes Thumshirn
2023-02-07 16:57 ` [PATCH 6/7] btrfs: handle errors in walk_down_tree properly Josef Bacik
2023-02-07 16:57 ` [PATCH 7/7] btrfs: abort the transaction if we get an error during snapshot drop Josef Bacik
2023-02-15 19:26 ` [PATCH 0/7] Error handling fixes David Sterba
2023-02-20 19:28   ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox