* [PATCH] btrfs: remove an ASSERT() which can cause false alert
@ 2022-02-15 4:46 Qu Wenruo
2022-02-17 5:42 ` Qu Wenruo
0 siblings, 1 reply; 2+ messages in thread
From: Qu Wenruo @ 2022-02-15 4:46 UTC (permalink / raw)
To: linux-btrfs; +Cc: Su Yue
[BUG]
Su reported that with his VM running on an M1 mac, it has a high chance
to trigger the following ASSERT() with scrub and balance test cases:
ASSERT(cache->start == chunk_offset);
ret = scrub_chunk(sctx, cache, scrub_dev, found_key.offset,
dev_extent_len);
[CAUSE]
The ASSERT() is making sure that the block group cache (@cache) and the
chunk offset from the device extent match.
But the truth is, block group caches and dev extent items are deleted at
different timing.
For block group caches and items, after btrfs_remove_block_group() they
will be marked deleted, but the device extent and the chunk item is
still in dev tree and chunk tree respectively.
The device extents and chunk items are only deleted in
btrfs_remove_chunk(), which happens either a btrfs_delete_unused_bgs()
or btrfs_relocate_chunk().
This timing difference leaves a window where we can have a mismatch
between block group cache and device extent tree, and triggering the
ASSERT().
[FIX]
Just remove the ASSERT().
In fact there are several checks in scrub_chunk() to skip such device
extents.
In scrub_chunk() we do an chunk mapping search, and handling (!em) or
(em->start != bg->start) case by simply exit with ret = 0.
So the ASSERT() is really not necessary.
Please fold this fix into the patch "btrfs: scrub: cleanup the argument
list of scrub_chunk()".
Reported-by: Su Yue <l@damenly.su>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/scrub.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 11089568b287..9c6cf1149f08 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3822,7 +3822,6 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
dev_replace->item_needs_writeback = 1;
up_write(&dev_replace->rwsem);
- ASSERT(cache->start == chunk_offset);
ret = scrub_chunk(sctx, cache, scrub_dev, found_key.offset,
dev_extent_len);
--
2.35.1
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] btrfs: remove an ASSERT() which can cause false alert
2022-02-15 4:46 [PATCH] btrfs: remove an ASSERT() which can cause false alert Qu Wenruo
@ 2022-02-17 5:42 ` Qu Wenruo
0 siblings, 0 replies; 2+ messages in thread
From: Qu Wenruo @ 2022-02-17 5:42 UTC (permalink / raw)
To: linux-btrfs; +Cc: Su Yue
On 2022/2/15 12:46, Qu Wenruo wrote:
> [BUG]
> Su reported that with his VM running on an M1 mac, it has a high chance
> to trigger the following ASSERT() with scrub and balance test cases:
>
> ASSERT(cache->start == chunk_offset);
> ret = scrub_chunk(sctx, cache, scrub_dev, found_key.offset,
> dev_extent_len);
>
> [CAUSE]
> The ASSERT() is making sure that the block group cache (@cache) and the
> chunk offset from the device extent match.
>
> But the truth is, block group caches and dev extent items are deleted at
> different timing.
>
> For block group caches and items, after btrfs_remove_block_group() they
> will be marked deleted, but the device extent and the chunk item is
> still in dev tree and chunk tree respectively.
>
> The device extents and chunk items are only deleted in
> btrfs_remove_chunk(), which happens either a btrfs_delete_unused_bgs()
> or btrfs_relocate_chunk().
>
> This timing difference leaves a window where we can have a mismatch
> between block group cache and device extent tree, and triggering the
> ASSERT().
>
> [FIX]
> Just remove the ASSERT().
>
> In fact there are several checks in scrub_chunk() to skip such device
> extents.
> In scrub_chunk() we do an chunk mapping search, and handling (!em) or
> (em->start != bg->start) case by simply exit with ret = 0.
>
> So the ASSERT() is really not necessary.
>
> Please fold this fix into the patch "btrfs: scrub: cleanup the argument
> list of scrub_chunk()".
>
> Reported-by: Su Yue <l@damenly.su>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/scrub.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index 11089568b287..9c6cf1149f08 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -3822,7 +3822,6 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
> dev_replace->item_needs_writeback = 1;
> up_write(&dev_replace->rwsem);
>
> - ASSERT(cache->start == chunk_offset);
This check still has some sense, as we expect the bg cache to match with
the dev extent.
> ret = scrub_chunk(sctx, cache, scrub_dev, found_key.offset,
> dev_extent_len);
And since we have removed chunk_offset passed to scrub_chunk(), we have
an extra check before we call scrub_stripe() against found_key.offset.
So removing the ASSERT() is still OK.
But it would be better to replace the ASSERT() with a full check between
cache->start and chunk_offset.
I'd better find a way to reproduce the case, which I can not reproduce
at all under x86_64 vms.
Thanks,
Qu
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2022-02-17 5:42 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-02-15 4:46 [PATCH] btrfs: remove an ASSERT() which can cause false alert Qu Wenruo
2022-02-17 5:42 ` Qu Wenruo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox