public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] btrfs: error handling fixes for extent_writepage()
@ 2024-11-27  8:15 Qu Wenruo
  2024-11-27  8:15 ` [PATCH v2 1/2] btrfs: fix double accounting race in extent_writepage() Qu Wenruo
  2024-11-27  8:15 ` [PATCH v2 2/2] btrfs: handle submit_one_sector() error inside extent_writepage_io() Qu Wenruo
  0 siblings, 2 replies; 3+ messages in thread
From: Qu Wenruo @ 2024-11-27  8:15 UTC (permalink / raw)
  To: linux-btrfs

[CHANGELOG]
v2:
- Update the commit message for the first patch
  It turns out the root cause is the race between
  btrfs_finish_one_ordered() and btrfs_mark_ordered_io_finished()

  And that race is possible no matter if the sector size is smaller than
  page size.

  So update the commit message to reflect that.

- Remove comments update in the patch
  To make backport easier.

- Update the commit message for the second patch
  Mostly to down play the possible problem, as the extent map is already
  pinned, thus no way to failed to grab the extent map.

It's more and more common to hit crash in my aarch64 testing VM.

The main symptom is ordered extent double accounting, causing various
problems mostly kernel warning and crashes (for debug builds).

The direct cause the failure from writepage_delalloc() with -ENOSPC,
which is another rabbit hole, but here we need to focus on the error
handling.

All the call traces points to the btrfs_mark_ordered_io_finished()
inside extent_writepage() for error handling.

It turns out that btrfs_mark_ordered_io_finished() inside
extent_writepage() is racing with the same cleanup inside
btrfs_run_delalloc_range().

And if the one inside extent_writepage() is called before the ordered
extent removed from the ordered tree (the removal is queued in a
workqueue), then we hit the double accounting.


There is also a theoretical failure path from submit_one_sector(),  but
I have never hit a case caused by that failure, the fix is only for the
sake of consistency.

Both fixes are similar, by moving the btrfs_mark_ordered_io_finished()
calls for error handling into each function, so that we can avoid
touching ranges that is already covered.

Although these fixes are mostly for backports, the proper fix in the end
would be reworking how we handle dirty folio writeback.

The current way is map-map-map, then submit-submit-submit (run delalloc
for every dirty sector of the folio, then submit all dirty sectors).

The planned new fix would be more like iomap to go
map-submit-map-submit-map-submit (run one delalloc, then immeidately submit
it).


Qu Wenruo (2):
  btrfs: fix double accounting race in extent_writepage()
  btrfs: handle submit_one_sector() error inside extent_writepage_io()

 fs/btrfs/extent_io.c | 63 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 48 insertions(+), 15 deletions(-)

-- 
2.47.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-11-27  8:15 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-27  8:15 [PATCH v2 0/2] btrfs: error handling fixes for extent_writepage() Qu Wenruo
2024-11-27  8:15 ` [PATCH v2 1/2] btrfs: fix double accounting race in extent_writepage() Qu Wenruo
2024-11-27  8:15 ` [PATCH v2 2/2] btrfs: handle submit_one_sector() error inside extent_writepage_io() Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox