public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Ben Millwood <thebenmachine@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: dev extent physical offset [...] on devid 1 doesn't have corresponding chunk
Date: Sun, 15 Dec 2024 15:16:05 +1030	[thread overview]
Message-ID: <2809427a-41f5-4b59-9d03-2c2012e16f76@gmx.com> (raw)
In-Reply-To: <d5372478-70f4-4a3c-bf9d-26366f955e5e@gmx.com>



在 2024/12/15 07:30, Qu Wenruo 写道:
>
>
> 在 2024/12/15 04:09, Ben Millwood 写道:
>> On Sat, 14 Dec 2024 at 02:51, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>> Both kernel and btrfs-progs should go with metadata COW with transaction
>>> protection, so even something went wrong (power loss or Ctrl-C) we
>>> should only see the previous transaction, thus everything should be
>>> fine.
>>
>> Thanks for the reassurance, that is what I'd hoped would be true :)
>>
>>> 在 2024/12/14 12:47, Ben Millwood 写道:
>>>> While I'm waiting for the lowmem check to progress, are there any
>>>> other useful recovery / diagnosis steps I could try?
>>>
>>> If you do not want to waste too long time on btrfs check, please dump
>>> the device tree and chunk tree:
>>>
>>> # btrfs ins dump-tree -t chunk <device>
>>> # btrfs ins dump-tree -t dev <device>
>>>
>>> That's all the info we need to cross-check the result.
>>>
>>> Although `btrfs check --readonly --mode=lowmem` would be the best, as it
>>> will save me a lot of time to either manually verify the output or craft
>>> a script to do that.
>>
>> Well, the check is still going:
>>
>> root@vigilance:~# btrfs check --progress --mode lowmem /dev/
>> masterchef-vg/btrfs
>> Opening filesystem to check...
>> Checking filesystem on /dev/masterchef-vg/btrfs
>> UUID: a0ed3709-1490-4f2d-96b5-bb1fb22f0b45
>> [1/7] checking root items                      (0:46:43 elapsed,
>> 68928137 items checked)
>> [2/7] checking extents                         (14:31:49 elapsed,
>> 239591 items checked)
>>
>> I'll let it continue. In the meantime I'll e-mail you the trees you
>> asked for off-thread: they don't obviously look like they contain
>> private information, but I'd like to minimise the exposure anyway.
>> (Feel free to send them to other btrfs devs.)
>
> Those trees are completely anonymous, the only information that contains
> are:
>
> - How large your fs is
> - How many bytes and their ranges are allocated
> - The type of the allocated chunks
>
> So it should be very safe to share, unless you have some very
> confidential info hidden in the device size :)
>
> [...]
>>>
>>> That's all the info we need to cross-check the result.
>>>
>>> Although `btrfs check --readonly --mode=lowmem` would be the best, as it
>>> will save me a lot of time to either manually verify the output or craft
>>> a script to do that.
>>>
>>> My current assumption is a bitflip at runtime, but no proof yet.
>
> Unfortunately it doesn't look like this.
>
> I scanned the last several dev-extents and chunks, it turns out that
> it's very possible `btrfs clear-space-cache` is causing something wrong:
>
> - The offending dev-extent have chunk_tree_uuid set
>    This is not the kernel behavior, but progs specific one.
>    This means there are two chunks allocated during
>    `btrfs-progs clear-space-cache`, but one is missing.
>
> - One of chunk allocated by btrfs-progs is totally fine
>    And it's still in the chunk tree
>
> - The other (the offending one) points to a chunk that's beyond
>    the last known chunk
>
> So I guess either:
>
> - The btrfs-progs has a bug in the chunk creation code
>    So that a chunk and its dev-extent are not created in the same
>    transaction, and Ctrl-C breaks it, causing an orphan dev-extent
>
> - The btrfs-progs has a bug in the chunk deletion code
>    Similar but in the empty chunk cleanup code.
>
> Anyway I'll need to dig deeper to fix the bug.

Unfortunately I failed to find why the chunk is removed but free dev
extent is left.

Both kernel and progs are doing the proper chunk removal inside one
transaction to remove both the chunk item and dev-extent item, from the
very beginning.

The same is for the chunk allocation part.

So unless there is something totally wrong, I didn't see why progs or
kernel can cause such mismatch.

Do you still remember if there is any error message for the
clear-space-cache interruption and the next RW mount of it?

Thanks,
Qu
>
> Meanwhile I have created a branch for you to manually fix the bug:
> https://github.com/adam900710/btrfs-progs/tree/orphan_dev_extent_cleanup
>
> Since the lowmem is still running, you can prepare an environment to
> build btrfs-progs, so after the lowmem check finished, you can use that
> branch to delete the offending item by:
>
> # ./btrfs-corrupt-block -X <device>
>
> Thanks,
> Qu
>
>>>
>>> Thanks,
>>> Qu
>>>
>>>> smartctl appears
>>>> not to work with this disk, so I can't easily say whether the disk is
>>>> or is not healthy.
>>>>
>>>
>
>


  reply	other threads:[~2024-12-15  4:46 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-14  2:17 dev extent physical offset [...] on devid 1 doesn't have corresponding chunk Ben Millwood
2024-12-14  2:51 ` Qu Wenruo
2024-12-14 17:39   ` Ben Millwood
2024-12-14 21:00     ` Qu Wenruo
2024-12-15  4:46       ` Qu Wenruo [this message]
2024-12-20 23:11         ` Ben Millwood
2024-12-20 23:51           ` Qu Wenruo
2025-01-02 17:58             ` Ben Millwood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2809427a-41f5-4b59-9d03-2c2012e16f76@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=thebenmachine@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox