From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH v3 00/10] btrfs: error handling fixes
Date: Fri, 10 Jan 2025 14:01:31 +1030 [thread overview]
Message-ID: <cover.1736479224.git.wqu@suse.com> (raw)
[CHANGELOG]
v3:
- Add a new patch to move the ordered extent cleanup into
cow_file_range() and run_delalloc_nocow()
- Update the comment of writepage_dealloc()
To give a more detailed view on what should be done for all the 3
return value patterns
- Rename the variable @last_finished to @last_finished_delalloc_end
And enhance the comment of it.
- Add a comment on why we want submit_one_bio() after
submit_one_sector() failed
- Add a comment explaining what cleanup_dirty_folios() does
- Update the ASCII graph to use @cur_offset other than @cur_start
v2:
- Fix the btrfs_cleanup_ordered_extents() call inside
btrfs_run_delalloc_range()
Since we no longer call btrfs_mark_ordered_io_finished() if
btrfs_run_delalloc_range() failed, the existing
btrfs_cleanup_ordered_extents() call with @locked_folio will cause the
subpage range not to be properly cleaned up.
This can lead to hanging ordered extents for subpage cases.
- Update the commit message of the first patch
With more detailed analyse on how the double accounting happens.
It's pretty complex and very lengthy, but is easier to understand (as
least I hope so).
The root cause is the btrfs_cleanup_ordered_extents()'s range split
behavior, which is not subpage compatible and is cursed in the first
place.
So the fix is still the same, by removing the split OE handling
completely.
- A new patch to cleanup the @locked_folio parameter of
btrfs_cleanup_ordered_extents()
I believe there is a regression in the last 2 or 3 releases where
metadata/data space reservation code is no longer working properly,
result us to hit -ENOSPC during btrfs_run_delalloc_range().
One of the most common situation to hit such problem is during
generic/750, along with other long running generic tests.
Although I should start bisecting the space reservation bug, but I can
not help but fixing the exposed bugs first.
This exposed quite some long existing bugs, all in the error handling
paths, that can lead to the following crashes
- Double ordered extent accounting
Triggers WARN_ON_OCE() inside can_finish_ordered_extent() then crash.
This bug is fixed by the first 3 patches.
The first patch is the most important one, since it's pretty easy to
trigger in the real world, and very long existing.
The second patch is just a precautious fix, not easy to happen in the
real world.
The third one is also possible in the real world, but only possible
with the recently enabled subpage compression write support.
- Subpage ASSERT() triggered, where subpage folio bitmap differs from
folio status
This happens most likey in submit_uncompressed_range(), where it
unlock the folio without updating the subpage bitmaps.
This bug is fixed by the 3rd patch.
- WARN_ON() if out-of-tree patch "btrfs: reject out-of-band dirty folios
during writeback" applied
This is a more complex case, where error handling leaves some folios
dirty, but with EXTENT_DELALLOC flag cleared from extent io tree.
Such dirty folios are still possible to be written back later, but
since there is no EXTENT_DELALLOC flag, it will be treat as
out-of-band dirty flags and trigger COW fixup.
This bug is fixed by the 4th and 5th patch
With so many existing bugs exposed, there is more than enough motivation
to make btrfs_run_delalloc_range() (and its delalloc range functions)
output extra error messages so that at least we know something is wrong.
And those error messages have already helped a lot during my
development.
Patches 6~8 are here to enhance the error messages.
And the final one is to cleanup the unnecessary @locked_folio parameter
of btrfs_cleanup_ordered_extents().
With all these patches applied, at least fstests can finish reliably,
otherwise it frequently crashes in generic tests that I was unable to
finish even one full run since the space reservation regression.
Qu Wenruo (10):
btrfs: fix double accounting race when btrfs_run_delalloc_range()
failed
btrfs: fix double accounting race when extent_writepage_io() failed
btrfs: fix the error handling of submit_uncompressed_range()
btrfs: do proper folio cleanup when cow_file_range() failed
btrfs: do proper folio cleanup when run_delalloc_nocow() failed
btrfs: subpage: fix the bitmap dump for the locked flags
btrfs: subpage: dump the involved bitmap when ASSERT() failed
btrfs: add extra error messages for delalloc range related errors
btrfs: remove the unused @locked_folio parameter from
btrfs_cleanup_ordered_extents()
btrfs: move ordered extent cleanup to where they are allocated
fs/btrfs/extent_io.c | 104 +++++++++++++----
fs/btrfs/inode.c | 268 +++++++++++++++++++++++++------------------
fs/btrfs/subpage.c | 48 +++++---
fs/btrfs/subpage.h | 13 +++
4 files changed, 290 insertions(+), 143 deletions(-)
--
2.47.1
next reply other threads:[~2025-01-10 3:32 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-10 3:31 Qu Wenruo [this message]
2025-01-10 3:31 ` [PATCH v3 01/10] btrfs: fix double accounting race when btrfs_run_delalloc_range() failed Qu Wenruo
2025-01-10 3:31 ` [PATCH v3 02/10] btrfs: fix double accounting race when extent_writepage_io() failed Qu Wenruo
2025-01-10 3:31 ` [PATCH v3 03/10] btrfs: fix the error handling of submit_uncompressed_range() Qu Wenruo
2025-01-10 3:31 ` [PATCH v3 04/10] btrfs: do proper folio cleanup when cow_file_range() failed Qu Wenruo
2025-01-10 3:31 ` [PATCH v3 05/10] btrfs: do proper folio cleanup when run_delalloc_nocow() failed Qu Wenruo
2025-01-10 3:31 ` [PATCH v3 06/10] btrfs: subpage: fix the bitmap dump for the locked flags Qu Wenruo
2025-01-10 3:31 ` [PATCH v3 07/10] btrfs: subpage: dump the involved bitmap when ASSERT() failed Qu Wenruo
2025-01-10 3:31 ` [PATCH v3 08/10] btrfs: add extra error messages for delalloc range related errors Qu Wenruo
2025-01-10 16:20 ` David Sterba
2025-01-10 23:17 ` Qu Wenruo
2025-01-10 16:29 ` Filipe Manana
2025-01-10 3:31 ` [PATCH v3 09/10] btrfs: remove the unused @locked_folio parameter from btrfs_cleanup_ordered_extents() Qu Wenruo
2025-01-10 3:31 ` [PATCH v3 10/10] btrfs: move ordered extent cleanup to where they are allocated Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1736479224.git.wqu@suse.com \
--to=wqu@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox