public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH v2 0/3] btrfs: vairous bug fixes related to generic/475 failure with subpage cases
Date: Tue, 12 Apr 2022 20:30:12 +0800	[thread overview]
Message-ID: <cover.1649766550.git.wqu@suse.com> (raw)

[CHANGELOG]
v2:
- Make submit_one_bio() to return void just in the same patch

- Update the commit message the for 2nd patch
  To mention remaining error path (which is not really possible to
  trigger), to prove the first patch is already fixing all involved
  error paths.

[DESCRIPTION]
When testing my subpage raid56 support, generic/475 is always hanging
with some data page unable to be unlocked.

It turns out that, the hang is not related to the raid56 subpage
support (obviously, as the test case is not utilizing RAID56 at all),
but a recent commit 1784b7d502a9 ("btrfs: handle csum lookup errors
properly on reads") introduced a new error path, and it caught us by
surprise.

The new error path is from btrfs_lookup_bio_sums(), which can now return
error if the csum search failed.

This new error path exposed several problems:

- Double cleanup for submit_one_bio() and its callers
  Bio submission hooks, btrfs_submit_data_bio() and
  btrfs_submit_metadata_bio() will call endio to cleanup on errors.

  But those bio submission hooks will also return error, and
  finally callers of submit_extent_page() will also try to do
  cleanup.

  This will be fixed by the first patch, by always returning 0 for
  submit_one_bio().
  This fix is kept as minimal as possible, to make backport easier.
  The proper conversion to return void will be done in the last patch.

- btrfs_do_readpage() can leave page locked on error
  If submit_extent_page() failed in btrfs_do_readpage(), we only
  cleanup the current range, and leaving the remaining subpage
  range locked.

  This bug is subpage specific, and will not affect regular cases.

  Fix it by cleaning up all the remaining subpage range before
  exiting.

- __extent_writepage_io() can return 0 even it hit some error
  Although we continue writing the remaining ranges, we didn't save
  the first error, causing @ret to be overwritten.
 
  This bug is subpage specific, as for regular cases we only have one
  sector inside the page.

  Fix it by introducing @has_error and @saved_ret.

I manually checked all other submit_extent_page() callers, they all
look fine and won't cause problems like the above.

Finally since submit_one_bio() will always return 0, the final patch
will make it return void, which greatly makes our code cleaner.

But that patch is introducing quite some modifications, not a candidate
for backport, unlike the first 3 patches.

Special thanks to Josef, as my initial bisection points to his patch and
I have no clue why it can cause problems at all.
His hints immediately solved all my questions, and lead to this
patchset.


Qu Wenruo (3):
  btrfs: avoid double clean up when submit_one_bio() failed
  btrfs: fix the error handling for submit_extent_page() for
    btrfs_do_readpage()
  btrfs: return correct error number for __extent_writepage_io()

 fs/btrfs/extent_io.c | 135 ++++++++++++++++++-------------------------
 fs/btrfs/extent_io.h |   3 +-
 fs/btrfs/inode.c     |  13 ++---
 3 files changed, 62 insertions(+), 89 deletions(-)

-- 
2.35.1


             reply	other threads:[~2022-04-12 12:57 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-12 12:30 Qu Wenruo [this message]
2022-04-12 12:30 ` [PATCH v2 1/3] btrfs: avoid double clean up when submit_one_bio() failed Qu Wenruo
2022-04-12 20:41   ` David Sterba
2022-04-12 23:32     ` Qu Wenruo
2022-04-13 13:46       ` David Sterba
2022-04-13 23:23         ` Qu Wenruo
2022-04-15  6:28   ` Christoph Hellwig
2022-04-15  7:02     ` Qu Wenruo
2022-04-15  7:12       ` Christoph Hellwig
2022-04-15  7:14         ` Qu Wenruo
2022-04-24 23:26           ` Qu Wenruo
2022-04-12 12:30 ` [PATCH v2 2/3] btrfs: fix the error handling for submit_extent_page() for btrfs_do_readpage() Qu Wenruo
2022-04-12 12:30 ` [PATCH v2 3/3] btrfs: return correct error number for __extent_writepage_io() Qu Wenruo
2022-04-12 20:42 ` [PATCH v2 0/3] btrfs: vairous bug fixes related to generic/475 failure with subpage cases David Sterba
2022-04-14 19:38   ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1649766550.git.wqu@suse.com \
    --to=wqu@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox