* [PATCH 0/3] btrfs: extra debug output for sector size < page size cases
@ 2024-11-27 4:06 Qu Wenruo
2024-11-27 4:06 ` [PATCH 1/3] btrfs: subpage: fix the bitmap dump for the locked flags Qu Wenruo
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Qu Wenruo @ 2024-11-27 4:06 UTC (permalink / raw)
To: linux-btrfs
The first patch is the long existing bug that full subpage bitmap dump
is not working for checked bitmap.
Thankfully even myself is not affected by the bug.
The second one is for a crash I hit where ASSERT() got triggered in
btrfs_folio_set_locked() after a btrfs_run_delalloc_range() failure.
The last one is for the btrfs_run_delalloc_range() failure, which is
not that rare in my environment, I guess the unsafe cache mode for my
aarch64 VM makes it too easy to hit ENOSPC.
But ENOSPC from btrfs_run_delalloc_range() itself is already a problem
for our data/metadata space reservation code, thus it should be
outputted even for non-debug build.
Qu Wenruo (3):
btrfs: subpage: fix the bitmap dump for the locked flags
btrfs: subpage: dump the involved bitmap when ASSERT() failed
btrfs: add extra error messages for extent_writepage() failure
fs/btrfs/extent_io.c | 16 +++++++++++++++
fs/btrfs/subpage.c | 47 ++++++++++++++++++++++++++++++++------------
2 files changed, 50 insertions(+), 13 deletions(-)
--
2.47.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/3] btrfs: subpage: fix the bitmap dump for the locked flags
2024-11-27 4:06 [PATCH 0/3] btrfs: extra debug output for sector size < page size cases Qu Wenruo
@ 2024-11-27 4:06 ` Qu Wenruo
2024-11-27 4:06 ` [PATCH 2/3] btrfs: subpage: dump the involved bitmap when ASSERT() failed Qu Wenruo
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2024-11-27 4:06 UTC (permalink / raw)
To: linux-btrfs
We're dumping the locked bitmap into the @checked_bitmap variable,
causing incorrect values during debug.
Thankfuklly even during my development I haven't hit a case where I need
to dump the locked bitmap.
But for the sake of consistency, fix it by dumpping the locked bitmap
into @locked_bitmap variable for output.
Fixes: 75258f20fb70 ("btrfs: subpage: dump extra subpage bitmaps for debug")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/subpage.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 8c68059ac1b0..03d7bfc042e2 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -716,6 +716,7 @@ void __cold btrfs_subpage_dump_bitmap(const struct btrfs_fs_info *fs_info,
unsigned long writeback_bitmap;
unsigned long ordered_bitmap;
unsigned long checked_bitmap;
+ unsigned long locked_bitmap;
unsigned long flags;
ASSERT(folio_test_private(folio) && folio_get_private(folio));
@@ -728,15 +729,16 @@ void __cold btrfs_subpage_dump_bitmap(const struct btrfs_fs_info *fs_info,
GET_SUBPAGE_BITMAP(subpage, fs_info, writeback, &writeback_bitmap);
GET_SUBPAGE_BITMAP(subpage, fs_info, ordered, &ordered_bitmap);
GET_SUBPAGE_BITMAP(subpage, fs_info, checked, &checked_bitmap);
- GET_SUBPAGE_BITMAP(subpage, fs_info, locked, &checked_bitmap);
+ GET_SUBPAGE_BITMAP(subpage, fs_info, locked, &locked_bitmap);
spin_unlock_irqrestore(&subpage->lock, flags);
dump_page(folio_page(folio, 0), "btrfs subpage dump");
btrfs_warn(fs_info,
-"start=%llu len=%u page=%llu, bitmaps uptodate=%*pbl dirty=%*pbl writeback=%*pbl ordered=%*pbl checked=%*pbl",
+"start=%llu len=%u page=%llu, bitmaps uptodate=%*pbl dirty=%*pbl locked=%*pbl writeback=%*pbl ordered=%*pbl checked=%*pbl",
start, len, folio_pos(folio),
sectors_per_page, &uptodate_bitmap,
sectors_per_page, &dirty_bitmap,
+ sectors_per_page, &locked_bitmap,
sectors_per_page, &writeback_bitmap,
sectors_per_page, &ordered_bitmap,
sectors_per_page, &checked_bitmap);
--
2.47.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/3] btrfs: subpage: dump the involved bitmap when ASSERT() failed
2024-11-27 4:06 [PATCH 0/3] btrfs: extra debug output for sector size < page size cases Qu Wenruo
2024-11-27 4:06 ` [PATCH 1/3] btrfs: subpage: fix the bitmap dump for the locked flags Qu Wenruo
@ 2024-11-27 4:06 ` Qu Wenruo
2024-11-27 15:32 ` David Sterba
2024-11-27 4:06 ` [PATCH 3/3] btrfs: add extra error messages for extent_writepage() failure Qu Wenruo
2024-11-27 15:30 ` [PATCH 0/3] btrfs: extra debug output for sector size < page size cases David Sterba
3 siblings, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2024-11-27 4:06 UTC (permalink / raw)
To: linux-btrfs
For btrfs_folio_assert_not_dirty() and btrfs_folio_set_lock(), we call
bitmap_test_range_all_zero() to ensure the involved range has not bit
set.
However with my recent enhanced delalloc range error handling, I'm
hitting the ASSERT() inside btrfs_folio_set_lock(), and is wondering if
it's some error handling not properly cleanup the locked bitmap but
directly unlock the page.
So add some extra dumpping for the ASSERTs to dump the involved bitmap
to help debug.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/subpage.c | 41 ++++++++++++++++++++++++++++++-----------
1 file changed, 30 insertions(+), 11 deletions(-)
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 03d7bfc042e2..d692bc34a3af 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -635,6 +635,28 @@ IMPLEMENT_BTRFS_PAGE_OPS(ordered, folio_set_ordered, folio_clear_ordered,
IMPLEMENT_BTRFS_PAGE_OPS(checked, folio_set_checked, folio_clear_checked,
folio_test_checked);
+#define GET_SUBPAGE_BITMAP(subpage, fs_info, name, dst) \
+{ \
+ const int sectors_per_page = fs_info->sectors_per_page; \
+ \
+ ASSERT(sectors_per_page < BITS_PER_LONG); \
+ *dst = bitmap_read(subpage->bitmaps, \
+ sectors_per_page * btrfs_bitmap_nr_##name, \
+ sectors_per_page); \
+}
+
+#define subpage_dump_bitmap(fs_info, folio, name, start, len) \
+{ \
+ struct btrfs_subpage *subpage = folio_get_private(folio); \
+ unsigned long bitmap; \
+ \
+ GET_SUBPAGE_BITMAP(subpage, fs_info, name, &bitmap); \
+ btrfs_warn(fs_info, \
+ "dumpping bitmap start=%llu len=%u folio=%llu" #name "_bitmap=%*pbl", \
+ start, len, folio_pos(folio), \
+ fs_info->sectors_per_page, &bitmap); \
+}
+
/*
* Make sure not only the page dirty bit is cleared, but also subpage dirty bit
* is cleared.
@@ -660,6 +682,10 @@ void btrfs_folio_assert_not_dirty(const struct btrfs_fs_info *fs_info,
subpage = folio_get_private(folio);
ASSERT(subpage);
spin_lock_irqsave(&subpage->lock, flags);
+ if (unlikely(!bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits))) {
+ subpage_dump_bitmap(fs_info, folio, dirty, start, len);
+ ASSERT(bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits));
+ }
ASSERT(bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits));
spin_unlock_irqrestore(&subpage->lock, flags);
}
@@ -689,23 +715,16 @@ void btrfs_folio_set_lock(const struct btrfs_fs_info *fs_info,
nbits = len >> fs_info->sectorsize_bits;
spin_lock_irqsave(&subpage->lock, flags);
/* Target range should not yet be locked. */
- ASSERT(bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits));
+ if (unlikely(!bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits))) {
+ subpage_dump_bitmap(fs_info, folio, locked, start, len);
+ ASSERT(bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits));
+ }
bitmap_set(subpage->bitmaps, start_bit, nbits);
ret = atomic_add_return(nbits, &subpage->nr_locked);
ASSERT(ret <= fs_info->sectors_per_page);
spin_unlock_irqrestore(&subpage->lock, flags);
}
-#define GET_SUBPAGE_BITMAP(subpage, fs_info, name, dst) \
-{ \
- const int sectors_per_page = fs_info->sectors_per_page; \
- \
- ASSERT(sectors_per_page < BITS_PER_LONG); \
- *dst = bitmap_read(subpage->bitmaps, \
- sectors_per_page * btrfs_bitmap_nr_##name, \
- sectors_per_page); \
-}
-
void __cold btrfs_subpage_dump_bitmap(const struct btrfs_fs_info *fs_info,
struct folio *folio, u64 start, u32 len)
{
--
2.47.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/3] btrfs: add extra error messages for extent_writepage() failure
2024-11-27 4:06 [PATCH 0/3] btrfs: extra debug output for sector size < page size cases Qu Wenruo
2024-11-27 4:06 ` [PATCH 1/3] btrfs: subpage: fix the bitmap dump for the locked flags Qu Wenruo
2024-11-27 4:06 ` [PATCH 2/3] btrfs: subpage: dump the involved bitmap when ASSERT() failed Qu Wenruo
@ 2024-11-27 4:06 ` Qu Wenruo
2024-11-27 15:30 ` [PATCH 0/3] btrfs: extra debug output for sector size < page size cases David Sterba
3 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2024-11-27 4:06 UTC (permalink / raw)
To: linux-btrfs
During my development on aarch64 with 64K page size and 4K sector size,
I'm hitting several problems related to error handling of
extent_writepage(), most of them are caused by
btrfs_run_delalloc_range() failure with -ENOSPC error.
This error itself is already a problem for our data/metadata space
reservation code, but we also need extra info like the submit_bitmap to
calculate if the error handling is doing its job correctly.
So add two btrfs_err_rl()s to indicate the errors that is affecting the
error handling of extent_writepage(), which has already saved me several
times during debugging.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/extent_io.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 438974d4def4..e33f843c403c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1240,6 +1240,15 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
found_start,
found_start + found_len - 1,
wbc);
+ if (unlikely(ret < 0))
+ btrfs_err_rl(fs_info,
+"failed to run delalloc range, root %lld ino %llu folio %llu submit_bitmap %*pbl start %llu len %u: %d",
+ inode->root->root_key.objectid,
+ btrfs_ino(inode),
+ folio_pos(folio),
+ fs_info->sectors_per_page,
+ &bio_ctrl->submit_bitmap,
+ found_start, found_len, ret);
} else {
/*
* We've hit an error during previous delalloc range,
@@ -1506,6 +1515,13 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
PAGE_SIZE, bio_ctrl, i_size);
if (ret == 1)
return 0;
+ if (ret < 0)
+ btrfs_err_rl(fs_info,
+"failed to submit blocks, root %lld ino %llu folio %llu submit_bitmap %*pbl: %d",
+ BTRFS_I(inode)->root->root_key.objectid,
+ btrfs_ino(BTRFS_I(inode)),
+ folio_pos(folio), fs_info->sectors_per_page,
+ &bio_ctrl->submit_bitmap, ret);
bio_ctrl->wbc->nr_to_write--;
--
2.47.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 0/3] btrfs: extra debug output for sector size < page size cases
2024-11-27 4:06 [PATCH 0/3] btrfs: extra debug output for sector size < page size cases Qu Wenruo
` (2 preceding siblings ...)
2024-11-27 4:06 ` [PATCH 3/3] btrfs: add extra error messages for extent_writepage() failure Qu Wenruo
@ 2024-11-27 15:30 ` David Sterba
2024-11-27 20:30 ` Qu Wenruo
3 siblings, 1 reply; 8+ messages in thread
From: David Sterba @ 2024-11-27 15:30 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Wed, Nov 27, 2024 at 02:36:35PM +1030, Qu Wenruo wrote:
> The first patch is the long existing bug that full subpage bitmap dump
> is not working for checked bitmap.
> Thankfully even myself is not affected by the bug.
>
> The second one is for a crash I hit where ASSERT() got triggered in
> btrfs_folio_set_locked() after a btrfs_run_delalloc_range() failure.
>
> The last one is for the btrfs_run_delalloc_range() failure, which is
> not that rare in my environment, I guess the unsafe cache mode for my
> aarch64 VM makes it too easy to hit ENOSPC.
>
> But ENOSPC from btrfs_run_delalloc_range() itself is already a problem
> for our data/metadata space reservation code, thus it should be
> outputted even for non-debug build.
>
> Qu Wenruo (3):
> btrfs: subpage: fix the bitmap dump for the locked flags
> btrfs: subpage: dump the involved bitmap when ASSERT() failed
> btrfs: add extra error messages for extent_writepage() failure
Reviewed-by: David Sterba <dsterba@suse.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] btrfs: subpage: dump the involved bitmap when ASSERT() failed
2024-11-27 4:06 ` [PATCH 2/3] btrfs: subpage: dump the involved bitmap when ASSERT() failed Qu Wenruo
@ 2024-11-27 15:32 ` David Sterba
0 siblings, 0 replies; 8+ messages in thread
From: David Sterba @ 2024-11-27 15:32 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Wed, Nov 27, 2024 at 02:36:37PM +1030, Qu Wenruo wrote:
> +#define subpage_dump_bitmap(fs_info, folio, name, start, len) \
> +{ \
> + struct btrfs_subpage *subpage = folio_get_private(folio); \
> + unsigned long bitmap; \
> + \
> + GET_SUBPAGE_BITMAP(subpage, fs_info, name, &bitmap); \
> + btrfs_warn(fs_info, \
> + "dumpping bitmap start=%llu len=%u folio=%llu" #name "_bitmap=%*pbl", \
The stringified #name would be next to the folio=number, please add one " ".
> + start, len, folio_pos(folio), \
> + fs_info->sectors_per_page, &bitmap); \
> +}
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/3] btrfs: extra debug output for sector size < page size cases
2024-11-27 15:30 ` [PATCH 0/3] btrfs: extra debug output for sector size < page size cases David Sterba
@ 2024-11-27 20:30 ` Qu Wenruo
2024-11-28 13:10 ` David Sterba
0 siblings, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2024-11-27 20:30 UTC (permalink / raw)
To: dsterba, Qu Wenruo; +Cc: linux-btrfs
在 2024/11/28 02:00, David Sterba 写道:
> On Wed, Nov 27, 2024 at 02:36:35PM +1030, Qu Wenruo wrote:
>> The first patch is the long existing bug that full subpage bitmap dump
>> is not working for checked bitmap.
>> Thankfully even myself is not affected by the bug.
>>
>> The second one is for a crash I hit where ASSERT() got triggered in
>> btrfs_folio_set_locked() after a btrfs_run_delalloc_range() failure.
>>
>> The last one is for the btrfs_run_delalloc_range() failure, which is
>> not that rare in my environment, I guess the unsafe cache mode for my
>> aarch64 VM makes it too easy to hit ENOSPC.
>>
>> But ENOSPC from btrfs_run_delalloc_range() itself is already a problem
>> for our data/metadata space reservation code, thus it should be
>> outputted even for non-debug build.
>>
>> Qu Wenruo (3):
>> btrfs: subpage: fix the bitmap dump for the locked flags
>> btrfs: subpage: dump the involved bitmap when ASSERT() failed
>> btrfs: add extra error messages for extent_writepage() failure
>
> Reviewed-by: David Sterba <dsterba@suse.com>
Thanks for the review.
Although I'd prefer this series to be merged after the double accounting
fix (which also affects non-subpage cases)
The problem is in the 3rd patch, which is touching the code just after
btrfs_run_delalloc_range() returned.
That area is also where the double accounting fix is touching.
To make backport a little easier, I'd prefer make the fix first, then
the extra debug patches.
Thanks,
Qu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/3] btrfs: extra debug output for sector size < page size cases
2024-11-27 20:30 ` Qu Wenruo
@ 2024-11-28 13:10 ` David Sterba
0 siblings, 0 replies; 8+ messages in thread
From: David Sterba @ 2024-11-28 13:10 UTC (permalink / raw)
To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs
On Thu, Nov 28, 2024 at 07:00:58AM +1030, Qu Wenruo wrote:
>
>
> 在 2024/11/28 02:00, David Sterba 写道:
> > On Wed, Nov 27, 2024 at 02:36:35PM +1030, Qu Wenruo wrote:
> >> The first patch is the long existing bug that full subpage bitmap dump
> >> is not working for checked bitmap.
> >> Thankfully even myself is not affected by the bug.
> >>
> >> The second one is for a crash I hit where ASSERT() got triggered in
> >> btrfs_folio_set_locked() after a btrfs_run_delalloc_range() failure.
> >>
> >> The last one is for the btrfs_run_delalloc_range() failure, which is
> >> not that rare in my environment, I guess the unsafe cache mode for my
> >> aarch64 VM makes it too easy to hit ENOSPC.
> >>
> >> But ENOSPC from btrfs_run_delalloc_range() itself is already a problem
> >> for our data/metadata space reservation code, thus it should be
> >> outputted even for non-debug build.
> >>
> >> Qu Wenruo (3):
> >> btrfs: subpage: fix the bitmap dump for the locked flags
> >> btrfs: subpage: dump the involved bitmap when ASSERT() failed
> >> btrfs: add extra error messages for extent_writepage() failure
> >
> > Reviewed-by: David Sterba <dsterba@suse.com>
>
> Thanks for the review.
>
> Although I'd prefer this series to be merged after the double accounting
> fix (which also affects non-subpage cases)
>
> The problem is in the 3rd patch, which is touching the code just after
> btrfs_run_delalloc_range() returned.
>
> That area is also where the double accounting fix is touching.
>
> To make backport a little easier, I'd prefer make the fix first, then
> the extra debug patches.
Right, no objections here. As you've been finding the problems any
debugging and reporting improvement is good.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-11-28 13:10 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-27 4:06 [PATCH 0/3] btrfs: extra debug output for sector size < page size cases Qu Wenruo
2024-11-27 4:06 ` [PATCH 1/3] btrfs: subpage: fix the bitmap dump for the locked flags Qu Wenruo
2024-11-27 4:06 ` [PATCH 2/3] btrfs: subpage: dump the involved bitmap when ASSERT() failed Qu Wenruo
2024-11-27 15:32 ` David Sterba
2024-11-27 4:06 ` [PATCH 3/3] btrfs: add extra error messages for extent_writepage() failure Qu Wenruo
2024-11-27 15:30 ` [PATCH 0/3] btrfs: extra debug output for sector size < page size cases David Sterba
2024-11-27 20:30 ` Qu Wenruo
2024-11-28 13:10 ` David Sterba
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.