dev extent physical offset [...] on devid 1 doesn't have corresponding chunk

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* dev extent physical offset [...] on devid 1 doesn't have corresponding chunk
@ 2024-12-14  2:17 Ben Millwood
  2024-12-14  2:51 ` Qu Wenruo
  0 siblings, 1 reply; 8+ messages in thread
From: Ben Millwood @ 2024-12-14  2:17 UTC (permalink / raw)
  To: linux-btrfs

Hi folks,

I encountered this error recently, and I can't find it anywhere on
Google except in the patches that first added the check, so I come to
you for guidance.

This is one of my removable USB drives, formatted btrfs and primarily
for the purpose of receiving snapshots from my laptop's root drive.
I'm running:

$ mount /dev/masterchef-vg/btrfs /mnt/masterchef/btrfs -o compress
mount: /mnt/masterchef/btrfs: mount(2) system call failed: Structure
needs cleaning.
       dmesg(1) may have more information after failed mount system call.

Here's what dmesg says:

[13570.361767] BTRFS info (device dm-4): first mount of filesystem
a0ed3709-1490-4f2d-96b5-bb1fb22f0b45
[13570.361779] BTRFS info (device dm-4): using crc32c (crc32c-intel)
checksum algorithm
[13570.361783] BTRFS info (device dm-4): use zlib compression, level 3
[13570.361785] BTRFS info (device dm-4): disk space caching is enabled
[13570.374442] BTRFS error (device dm-4): dev extent physical offset
1997265698816 on devid 1 doesn't have corresponding chunk
[13570.374448] BTRFS error (device dm-4): failed to verify dev extents
against chunks: -117
[13570.375329] BTRFS error (device dm-4): open_ctree failed

This issue emerged around the time I was trying to mount this
filesystem from my Raspberry Pi for the first time, but now occurs on
both my own laptop and my rpi.

Here's my laptop's details:

$ uname -a
Linux noether 6.6.63 #1-NixOS SMP PREEMPT_DYNAMIC Fri Nov 22 14:38:37
UTC 2024 x86_64 GNU/Linux

$ btrfs --version
btrfs-progs v6.11
-EXPERIMENTAL -INJECT -STATIC +LZO +ZSTD +UDEV +FSVERITY +ZONED CRYPTO=builtin

$ btrfs fi show
Label: 'noether-root'  uuid: b7ad9a05-8f7b-44af-8952-a7f717e897e0
    Total devices 1 FS bytes used 319.96GiB
    devid    1 size 390.62GiB used 390.62GiB path /dev/mapper/noether-lv

Label: 'masterchef-btrfs'  uuid: a0ed3709-1490-4f2d-96b5-bb1fb22f0b45
    Total devices 1 FS bytes used 1.62TiB
    devid    1 size 1.82TiB used 1.82TiB path /dev/mapper/masterchef--vg-btrfs

and the rpi:

$ uname -a
Linux vigilance 6.6.62+rpt-rpi-2712 #1 SMP PREEMPT Debian
1:6.6.62-1+rpt1 (2024-11-25) aarch64 GNU/Linux

$ btrfs --version
btrfs-progs v6.2

(btrfs fi show is the same for masterchef-btrfs)

In terms of possible events that could have caused this:
1. I had some issues with the raspberry pi not being able to supply
enough power for 2 external disks, and for this and related reasons
it's possible the disk got disconnected without being unmounted
properly / the pi was uncleanly shut down a few times (though, I
expect I usually didn't actually write to the disk any of these
times...)
2. When I try to mount on the raspberry pi, I see this in dmesg:

[ 5658.798634] BTRFS info (device dm-2): first mount of filesystem
a0ed3709-1490-4f2d-96b5-bb1fb22f0b45
[ 5658.798653] BTRFS info (device dm-2): using crc32c (crc32c-generic)
checksum algorithm
[ 5658.798663] BTRFS info (device dm-2): use zlib compression, level 3
[ 5658.798666] BTRFS info (device dm-2): disk space caching is enabled
[ 5658.798669] BTRFS warning (device dm-2): v1 space cache is not
supported for page size 16384 with sectorsize 4096
[ 5658.798706] BTRFS error (device dm-2): open_ctree failed

so I went and looked up what the "v1 space cache" was, and ran this:

$ btrfs check --clear-space-cache v1 <device>

and then read some more -- oh, nowadays it's a btrfs rescue command
instead, so I ctrl-C'd the above and ran:

$ btrfs rescue clear-space-cache v1 <device>

which appeared to complete successfully.

(I suppose despite seeing this message on the pi, I must have run
these commands on my laptop, since my pi's btrfs-progs doesn't have
the rescue clear-space-cache command.)

Anyway, maybe ctrl-C-ing the btrfs check --clear-space-cache was wrong?

It's noticeable that the dmesg output, at least on the raspberry pi,
still mentions the v1 space cache message when trying to mount, unless
I pass the nospace_cache mount option, in which case I get the "failed
to verify dev extents" message. (I think I get the latter message in
either case on my laptop with the newer kernel + btrfs-progs).

A natural thing to do at this stage would be to run btrfs check, but
the non-lowmem version is always OOM-killed (on either device) while
checking extents, and the lowmem version has so far not had time to
complete (and I'm not convinced it will in a reasonable duration). I
could try to borrow a machine with more RAM, though I have no idea
whether I need 20% more RAM or 20x more. (The pi is 8G, the laptop is
16G, the btrfs partition I'm checking is ~2T.)

While I'm waiting for the lowmem check to progress, are there any
other useful recovery / diagnosis steps I could try? smartctl appears
not to work with this disk, so I can't easily say whether the disk is
or is not healthy.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dev extent physical offset [...] on devid 1 doesn't have corresponding chunk
  2024-12-14  2:17 dev extent physical offset [...] on devid 1 doesn't have corresponding chunk Ben Millwood
@ 2024-12-14  2:51 ` Qu Wenruo
  2024-12-14 17:39   ` Ben Millwood
  0 siblings, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2024-12-14  2:51 UTC (permalink / raw)
  To: Ben Millwood, linux-btrfs



在 2024/12/14 12:47, Ben Millwood 写道:
> Hi folks,
>
> I encountered this error recently, and I can't find it anywhere on
> Google except in the patches that first added the check, so I come to
> you for guidance.
>
> This is one of my removable USB drives, formatted btrfs and primarily
> for the purpose of receiving snapshots from my laptop's root drive.
> I'm running:
>
> $ mount /dev/masterchef-vg/btrfs /mnt/masterchef/btrfs -o compress
> mount: /mnt/masterchef/btrfs: mount(2) system call failed: Structure
> needs cleaning.
>         dmesg(1) may have more information after failed mount system call.
>
> Here's what dmesg says:
>
> [13570.361767] BTRFS info (device dm-4): first mount of filesystem
> a0ed3709-1490-4f2d-96b5-bb1fb22f0b45
> [13570.361779] BTRFS info (device dm-4): using crc32c (crc32c-intel)
> checksum algorithm
> [13570.361783] BTRFS info (device dm-4): use zlib compression, level 3
> [13570.361785] BTRFS info (device dm-4): disk space caching is enabled
> [13570.374442] BTRFS error (device dm-4): dev extent physical offset
> 1997265698816 on devid 1 doesn't have corresponding chunk
> [13570.374448] BTRFS error (device dm-4): failed to verify dev extents
> against chunks: -117
> [13570.375329] BTRFS error (device dm-4): open_ctree failed

The problem is exactly what it said, there is an dev-extent but no chunk
item for it.

I'm wondering if there a chunk without its dev extent.

>
> This issue emerged around the time I was trying to mount this
> filesystem from my Raspberry Pi for the first time, but now occurs on
> both my own laptop and my rpi.
>
> Here's my laptop's details:
>
> $ uname -a
> Linux noether 6.6.63 #1-NixOS SMP PREEMPT_DYNAMIC Fri Nov 22 14:38:37
> UTC 2024 x86_64 GNU/Linux
>
> $ btrfs --version
> btrfs-progs v6.11
> -EXPERIMENTAL -INJECT -STATIC +LZO +ZSTD +UDEV +FSVERITY +ZONED CRYPTO=builtin
>
> $ btrfs fi show
> Label: 'noether-root'  uuid: b7ad9a05-8f7b-44af-8952-a7f717e897e0
>      Total devices 1 FS bytes used 319.96GiB
>      devid    1 size 390.62GiB used 390.62GiB path /dev/mapper/noether-lv
>
> Label: 'masterchef-btrfs'  uuid: a0ed3709-1490-4f2d-96b5-bb1fb22f0b45
>      Total devices 1 FS bytes used 1.62TiB
>      devid    1 size 1.82TiB used 1.82TiB path /dev/mapper/masterchef--vg-btrfs
>
> and the rpi:
>
> $ uname -a
> Linux vigilance 6.6.62+rpt-rpi-2712 #1 SMP PREEMPT Debian
> 1:6.6.62-1+rpt1 (2024-11-25) aarch64 GNU/Linux
>
> $ btrfs --version
> btrfs-progs v6.2
>
> (btrfs fi show is the same for masterchef-btrfs)
>
> In terms of possible events that could have caused this:
> 1. I had some issues with the raspberry pi not being able to supply
> enough power for 2 external disks, and for this and related reasons
> it's possible the disk got disconnected without being unmounted
> properly / the pi was uncleanly shut down a few times (though, I
> expect I usually didn't actually write to the disk any of these
> times...)
> 2. When I try to mount on the raspberry pi, I see this in dmesg:
>
> [ 5658.798634] BTRFS info (device dm-2): first mount of filesystem
> a0ed3709-1490-4f2d-96b5-bb1fb22f0b45
> [ 5658.798653] BTRFS info (device dm-2): using crc32c (crc32c-generic)
> checksum algorithm
> [ 5658.798663] BTRFS info (device dm-2): use zlib compression, level 3
> [ 5658.798666] BTRFS info (device dm-2): disk space caching is enabled
> [ 5658.798669] BTRFS warning (device dm-2): v1 space cache is not
> supported for page size 16384 with sectorsize 4096
> [ 5658.798706] BTRFS error (device dm-2): open_ctree failed
>
> so I went and looked up what the "v1 space cache" was, and ran this:
>
> $ btrfs check --clear-space-cache v1 <device>
>
> and then read some more -- oh, nowadays it's a btrfs rescue command
> instead, so I ctrl-C'd the above and ran:
>
> $ btrfs rescue clear-space-cache v1 <device>
>
> which appeared to complete successfully.
>
> (I suppose despite seeing this message on the pi, I must have run
> these commands on my laptop, since my pi's btrfs-progs doesn't have
> the rescue clear-space-cache command.)
>
> Anyway, maybe ctrl-C-ing the btrfs check --clear-space-cache was wrong?

It should not, if so then it's a bug in the code.

Both kernel and btrfs-progs should go with metadata COW with transaction
protection, so even something went wrong (power loss or Ctrl-C) we
should only see the previous transaction, thus everything should be fine.

>
> It's noticeable that the dmesg output, at least on the raspberry pi,
> still mentions the v1 space cache message when trying to mount, unless
> I pass the nospace_cache mount option, in which case I get the "failed
> to verify dev extents" message. (I think I get the latter message in
> either case on my laptop with the newer kernel + btrfs-progs).
>
> A natural thing to do at this stage would be to run btrfs check, but
> the non-lowmem version is always OOM-killed (on either device) while
> checking extents, and the lowmem version has so far not had time to
> complete (and I'm not convinced it will in a reasonable duration). I
> could try to borrow a machine with more RAM, though I have no idea
> whether I need 20% more RAM or 20x more. (The pi is 8G, the laptop is
> 16G, the btrfs partition I'm checking is ~2T.)

Then I'd say 32G may be enough, but lowmem should always work.

>
> While I'm waiting for the lowmem check to progress, are there any
> other useful recovery / diagnosis steps I could try?

If you do not want to waste too long time on btrfs check, please dump
the device tree and chunk tree:

# btrfs ins dump-tree -t chunk <device>
# btrfs ins dump-tree -t dev <device>

That's all the info we need to cross-check the result.

Although `btrfs check --readonly --mode=lowmem` would be the best, as it
will save me a lot of time to either manually verify the output or craft
a script to do that.

My current assumption is a bitflip at runtime, but no proof yet.

Thanks,
Qu

> smartctl appears
> not to work with this disk, so I can't easily say whether the disk is
> or is not healthy.
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dev extent physical offset [...] on devid 1 doesn't have corresponding chunk
  2024-12-14  2:51 ` Qu Wenruo
@ 2024-12-14 17:39   ` Ben Millwood
  2024-12-14 21:00     ` Qu Wenruo
  0 siblings, 1 reply; 8+ messages in thread
From: Ben Millwood @ 2024-12-14 17:39 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Sat, 14 Dec 2024 at 02:51, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> Both kernel and btrfs-progs should go with metadata COW with transaction
> protection, so even something went wrong (power loss or Ctrl-C) we
> should only see the previous transaction, thus everything should be fine.

Thanks for the reassurance, that is what I'd hoped would be true :)

> 在 2024/12/14 12:47, Ben Millwood 写道:
> > While I'm waiting for the lowmem check to progress, are there any
> > other useful recovery / diagnosis steps I could try?
>
> If you do not want to waste too long time on btrfs check, please dump
> the device tree and chunk tree:
>
> # btrfs ins dump-tree -t chunk <device>
> # btrfs ins dump-tree -t dev <device>
>
> That's all the info we need to cross-check the result.
>
> Although `btrfs check --readonly --mode=lowmem` would be the best, as it
> will save me a lot of time to either manually verify the output or craft
> a script to do that.

Well, the check is still going:

root@vigilance:~# btrfs check --progress --mode lowmem /dev/masterchef-vg/btrfs
Opening filesystem to check...
Checking filesystem on /dev/masterchef-vg/btrfs
UUID: a0ed3709-1490-4f2d-96b5-bb1fb22f0b45
[1/7] checking root items                      (0:46:43 elapsed,
68928137 items checked)
[2/7] checking extents                         (14:31:49 elapsed,
239591 items checked)

I'll let it continue. In the meantime I'll e-mail you the trees you
asked for off-thread: they don't obviously look like they contain
private information, but I'd like to minimise the exposure anyway.
(Feel free to send them to other btrfs devs.)

On Sat, 14 Dec 2024 at 02:51, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> 在 2024/12/14 12:47, Ben Millwood 写道:
> > Hi folks,
> >
> > I encountered this error recently, and I can't find it anywhere on
> > Google except in the patches that first added the check, so I come to
> > you for guidance.
> >
> > This is one of my removable USB drives, formatted btrfs and primarily
> > for the purpose of receiving snapshots from my laptop's root drive.
> > I'm running:
> >
> > $ mount /dev/masterchef-vg/btrfs /mnt/masterchef/btrfs -o compress
> > mount: /mnt/masterchef/btrfs: mount(2) system call failed: Structure
> > needs cleaning.
> >         dmesg(1) may have more information after failed mount system call.
> >
> > Here's what dmesg says:
> >
> > [13570.361767] BTRFS info (device dm-4): first mount of filesystem
> > a0ed3709-1490-4f2d-96b5-bb1fb22f0b45
> > [13570.361779] BTRFS info (device dm-4): using crc32c (crc32c-intel)
> > checksum algorithm
> > [13570.361783] BTRFS info (device dm-4): use zlib compression, level 3
> > [13570.361785] BTRFS info (device dm-4): disk space caching is enabled
> > [13570.374442] BTRFS error (device dm-4): dev extent physical offset
> > 1997265698816 on devid 1 doesn't have corresponding chunk
> > [13570.374448] BTRFS error (device dm-4): failed to verify dev extents
> > against chunks: -117
> > [13570.375329] BTRFS error (device dm-4): open_ctree failed
>
> The problem is exactly what it said, there is an dev-extent but no chunk
> item for it.
>
> I'm wondering if there a chunk without its dev extent.
>
> >
> > This issue emerged around the time I was trying to mount this
> > filesystem from my Raspberry Pi for the first time, but now occurs on
> > both my own laptop and my rpi.
> >
> > Here's my laptop's details:
> >
> > $ uname -a
> > Linux noether 6.6.63 #1-NixOS SMP PREEMPT_DYNAMIC Fri Nov 22 14:38:37
> > UTC 2024 x86_64 GNU/Linux
> >
> > $ btrfs --version
> > btrfs-progs v6.11
> > -EXPERIMENTAL -INJECT -STATIC +LZO +ZSTD +UDEV +FSVERITY +ZONED CRYPTO=builtin
> >
> > $ btrfs fi show
> > Label: 'noether-root'  uuid: b7ad9a05-8f7b-44af-8952-a7f717e897e0
> >      Total devices 1 FS bytes used 319.96GiB
> >      devid    1 size 390.62GiB used 390.62GiB path /dev/mapper/noether-lv
> >
> > Label: 'masterchef-btrfs'  uuid: a0ed3709-1490-4f2d-96b5-bb1fb22f0b45
> >      Total devices 1 FS bytes used 1.62TiB
> >      devid    1 size 1.82TiB used 1.82TiB path /dev/mapper/masterchef--vg-btrfs
> >
> > and the rpi:
> >
> > $ uname -a
> > Linux vigilance 6.6.62+rpt-rpi-2712 #1 SMP PREEMPT Debian
> > 1:6.6.62-1+rpt1 (2024-11-25) aarch64 GNU/Linux
> >
> > $ btrfs --version
> > btrfs-progs v6.2
> >
> > (btrfs fi show is the same for masterchef-btrfs)
> >
> > In terms of possible events that could have caused this:
> > 1. I had some issues with the raspberry pi not being able to supply
> > enough power for 2 external disks, and for this and related reasons
> > it's possible the disk got disconnected without being unmounted
> > properly / the pi was uncleanly shut down a few times (though, I
> > expect I usually didn't actually write to the disk any of these
> > times...)
> > 2. When I try to mount on the raspberry pi, I see this in dmesg:
> >
> > [ 5658.798634] BTRFS info (device dm-2): first mount of filesystem
> > a0ed3709-1490-4f2d-96b5-bb1fb22f0b45
> > [ 5658.798653] BTRFS info (device dm-2): using crc32c (crc32c-generic)
> > checksum algorithm
> > [ 5658.798663] BTRFS info (device dm-2): use zlib compression, level 3
> > [ 5658.798666] BTRFS info (device dm-2): disk space caching is enabled
> > [ 5658.798669] BTRFS warning (device dm-2): v1 space cache is not
> > supported for page size 16384 with sectorsize 4096
> > [ 5658.798706] BTRFS error (device dm-2): open_ctree failed
> >
> > so I went and looked up what the "v1 space cache" was, and ran this:
> >
> > $ btrfs check --clear-space-cache v1 <device>
> >
> > and then read some more -- oh, nowadays it's a btrfs rescue command
> > instead, so I ctrl-C'd the above and ran:
> >
> > $ btrfs rescue clear-space-cache v1 <device>
> >
> > which appeared to complete successfully.
> >
> > (I suppose despite seeing this message on the pi, I must have run
> > these commands on my laptop, since my pi's btrfs-progs doesn't have
> > the rescue clear-space-cache command.)
> >
> > Anyway, maybe ctrl-C-ing the btrfs check --clear-space-cache was wrong?
>
> It should not, if so then it's a bug in the code.
>
> Both kernel and btrfs-progs should go with metadata COW with transaction
> protection, so even something went wrong (power loss or Ctrl-C) we
> should only see the previous transaction, thus everything should be fine.
>
> >
> > It's noticeable that the dmesg output, at least on the raspberry pi,
> > still mentions the v1 space cache message when trying to mount, unless
> > I pass the nospace_cache mount option, in which case I get the "failed
> > to verify dev extents" message. (I think I get the latter message in
> > either case on my laptop with the newer kernel + btrfs-progs).
> >
> > A natural thing to do at this stage would be to run btrfs check, but
> > the non-lowmem version is always OOM-killed (on either device) while
> > checking extents, and the lowmem version has so far not had time to
> > complete (and I'm not convinced it will in a reasonable duration). I
> > could try to borrow a machine with more RAM, though I have no idea
> > whether I need 20% more RAM or 20x more. (The pi is 8G, the laptop is
> > 16G, the btrfs partition I'm checking is ~2T.)
>
> Then I'd say 32G may be enough, but lowmem should always work.
>
> >
> > While I'm waiting for the lowmem check to progress, are there any
> > other useful recovery / diagnosis steps I could try?
>
> If you do not want to waste too long time on btrfs check, please dump
> the device tree and chunk tree:
>
> # btrfs ins dump-tree -t chunk <device>
> # btrfs ins dump-tree -t dev <device>
>
> That's all the info we need to cross-check the result.
>
> Although `btrfs check --readonly --mode=lowmem` would be the best, as it
> will save me a lot of time to either manually verify the output or craft
> a script to do that.
>
> My current assumption is a bitflip at runtime, but no proof yet.
>
> Thanks,
> Qu
>
> > smartctl appears
> > not to work with this disk, so I can't easily say whether the disk is
> > or is not healthy.
> >
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dev extent physical offset [...] on devid 1 doesn't have corresponding chunk
  2024-12-14 17:39   ` Ben Millwood
@ 2024-12-14 21:00     ` Qu Wenruo
  2024-12-15  4:46       ` Qu Wenruo
  0 siblings, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2024-12-14 21:00 UTC (permalink / raw)
  To: Ben Millwood; +Cc: linux-btrfs



在 2024/12/15 04:09, Ben Millwood 写道:
> On Sat, 14 Dec 2024 at 02:51, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> Both kernel and btrfs-progs should go with metadata COW with transaction
>> protection, so even something went wrong (power loss or Ctrl-C) we
>> should only see the previous transaction, thus everything should be fine.
>
> Thanks for the reassurance, that is what I'd hoped would be true :)
>
>> 在 2024/12/14 12:47, Ben Millwood 写道:
>>> While I'm waiting for the lowmem check to progress, are there any
>>> other useful recovery / diagnosis steps I could try?
>>
>> If you do not want to waste too long time on btrfs check, please dump
>> the device tree and chunk tree:
>>
>> # btrfs ins dump-tree -t chunk <device>
>> # btrfs ins dump-tree -t dev <device>
>>
>> That's all the info we need to cross-check the result.
>>
>> Although `btrfs check --readonly --mode=lowmem` would be the best, as it
>> will save me a lot of time to either manually verify the output or craft
>> a script to do that.
>
> Well, the check is still going:
>
> root@vigilance:~# btrfs check --progress --mode lowmem /dev/masterchef-vg/btrfs
> Opening filesystem to check...
> Checking filesystem on /dev/masterchef-vg/btrfs
> UUID: a0ed3709-1490-4f2d-96b5-bb1fb22f0b45
> [1/7] checking root items                      (0:46:43 elapsed,
> 68928137 items checked)
> [2/7] checking extents                         (14:31:49 elapsed,
> 239591 items checked)
>
> I'll let it continue. In the meantime I'll e-mail you the trees you
> asked for off-thread: they don't obviously look like they contain
> private information, but I'd like to minimise the exposure anyway.
> (Feel free to send them to other btrfs devs.)

Those trees are completely anonymous, the only information that contains
are:

- How large your fs is
- How many bytes and their ranges are allocated
- The type of the allocated chunks

So it should be very safe to share, unless you have some very
confidential info hidden in the device size :)

[...]
>>
>> That's all the info we need to cross-check the result.
>>
>> Although `btrfs check --readonly --mode=lowmem` would be the best, as it
>> will save me a lot of time to either manually verify the output or craft
>> a script to do that.
>>
>> My current assumption is a bitflip at runtime, but no proof yet.

Unfortunately it doesn't look like this.

I scanned the last several dev-extents and chunks, it turns out that
it's very possible `btrfs clear-space-cache` is causing something wrong:

- The offending dev-extent have chunk_tree_uuid set
   This is not the kernel behavior, but progs specific one.
   This means there are two chunks allocated during
   `btrfs-progs clear-space-cache`, but one is missing.

- One of chunk allocated by btrfs-progs is totally fine
   And it's still in the chunk tree

- The other (the offending one) points to a chunk that's beyond
   the last known chunk

So I guess either:

- The btrfs-progs has a bug in the chunk creation code
   So that a chunk and its dev-extent are not created in the same
   transaction, and Ctrl-C breaks it, causing an orphan dev-extent

- The btrfs-progs has a bug in the chunk deletion code
   Similar but in the empty chunk cleanup code.

Anyway I'll need to dig deeper to fix the bug.

Meanwhile I have created a branch for you to manually fix the bug:
https://github.com/adam900710/btrfs-progs/tree/orphan_dev_extent_cleanup

Since the lowmem is still running, you can prepare an environment to
build btrfs-progs, so after the lowmem check finished, you can use that
branch to delete the offending item by:

# ./btrfs-corrupt-block -X <device>

Thanks,
Qu

>>
>> Thanks,
>> Qu
>>
>>> smartctl appears
>>> not to work with this disk, so I can't easily say whether the disk is
>>> or is not healthy.
>>>
>>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dev extent physical offset [...] on devid 1 doesn't have corresponding chunk
  2024-12-14 21:00     ` Qu Wenruo
@ 2024-12-15  4:46       ` Qu Wenruo
  2024-12-20 23:11         ` Ben Millwood
  0 siblings, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2024-12-15  4:46 UTC (permalink / raw)
  To: Ben Millwood; +Cc: linux-btrfs



在 2024/12/15 07:30, Qu Wenruo 写道:
>
>
> 在 2024/12/15 04:09, Ben Millwood 写道:
>> On Sat, 14 Dec 2024 at 02:51, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>> Both kernel and btrfs-progs should go with metadata COW with transaction
>>> protection, so even something went wrong (power loss or Ctrl-C) we
>>> should only see the previous transaction, thus everything should be
>>> fine.
>>
>> Thanks for the reassurance, that is what I'd hoped would be true :)
>>
>>> 在 2024/12/14 12:47, Ben Millwood 写道:
>>>> While I'm waiting for the lowmem check to progress, are there any
>>>> other useful recovery / diagnosis steps I could try?
>>>
>>> If you do not want to waste too long time on btrfs check, please dump
>>> the device tree and chunk tree:
>>>
>>> # btrfs ins dump-tree -t chunk <device>
>>> # btrfs ins dump-tree -t dev <device>
>>>
>>> That's all the info we need to cross-check the result.
>>>
>>> Although `btrfs check --readonly --mode=lowmem` would be the best, as it
>>> will save me a lot of time to either manually verify the output or craft
>>> a script to do that.
>>
>> Well, the check is still going:
>>
>> root@vigilance:~# btrfs check --progress --mode lowmem /dev/
>> masterchef-vg/btrfs
>> Opening filesystem to check...
>> Checking filesystem on /dev/masterchef-vg/btrfs
>> UUID: a0ed3709-1490-4f2d-96b5-bb1fb22f0b45
>> [1/7] checking root items                      (0:46:43 elapsed,
>> 68928137 items checked)
>> [2/7] checking extents                         (14:31:49 elapsed,
>> 239591 items checked)
>>
>> I'll let it continue. In the meantime I'll e-mail you the trees you
>> asked for off-thread: they don't obviously look like they contain
>> private information, but I'd like to minimise the exposure anyway.
>> (Feel free to send them to other btrfs devs.)
>
> Those trees are completely anonymous, the only information that contains
> are:
>
> - How large your fs is
> - How many bytes and their ranges are allocated
> - The type of the allocated chunks
>
> So it should be very safe to share, unless you have some very
> confidential info hidden in the device size :)
>
> [...]
>>>
>>> That's all the info we need to cross-check the result.
>>>
>>> Although `btrfs check --readonly --mode=lowmem` would be the best, as it
>>> will save me a lot of time to either manually verify the output or craft
>>> a script to do that.
>>>
>>> My current assumption is a bitflip at runtime, but no proof yet.
>
> Unfortunately it doesn't look like this.
>
> I scanned the last several dev-extents and chunks, it turns out that
> it's very possible `btrfs clear-space-cache` is causing something wrong:
>
> - The offending dev-extent have chunk_tree_uuid set
>    This is not the kernel behavior, but progs specific one.
>    This means there are two chunks allocated during
>    `btrfs-progs clear-space-cache`, but one is missing.
>
> - One of chunk allocated by btrfs-progs is totally fine
>    And it's still in the chunk tree
>
> - The other (the offending one) points to a chunk that's beyond
>    the last known chunk
>
> So I guess either:
>
> - The btrfs-progs has a bug in the chunk creation code
>    So that a chunk and its dev-extent are not created in the same
>    transaction, and Ctrl-C breaks it, causing an orphan dev-extent
>
> - The btrfs-progs has a bug in the chunk deletion code
>    Similar but in the empty chunk cleanup code.
>
> Anyway I'll need to dig deeper to fix the bug.

Unfortunately I failed to find why the chunk is removed but free dev
extent is left.

Both kernel and progs are doing the proper chunk removal inside one
transaction to remove both the chunk item and dev-extent item, from the
very beginning.

The same is for the chunk allocation part.

So unless there is something totally wrong, I didn't see why progs or
kernel can cause such mismatch.

Do you still remember if there is any error message for the
clear-space-cache interruption and the next RW mount of it?

Thanks,
Qu
>
> Meanwhile I have created a branch for you to manually fix the bug:
> https://github.com/adam900710/btrfs-progs/tree/orphan_dev_extent_cleanup
>
> Since the lowmem is still running, you can prepare an environment to
> build btrfs-progs, so after the lowmem check finished, you can use that
> branch to delete the offending item by:
>
> # ./btrfs-corrupt-block -X <device>
>
> Thanks,
> Qu
>
>>>
>>> Thanks,
>>> Qu
>>>
>>>> smartctl appears
>>>> not to work with this disk, so I can't easily say whether the disk is
>>>> or is not healthy.
>>>>
>>>
>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dev extent physical offset [...] on devid 1 doesn't have corresponding chunk
  2024-12-15  4:46       ` Qu Wenruo
@ 2024-12-20 23:11         ` Ben Millwood
  2024-12-20 23:51           ` Qu Wenruo
  0 siblings, 1 reply; 8+ messages in thread
From: Ben Millwood @ 2024-12-20 23:11 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Sorry for the delayed reply. I left this for a few days to see how the
check would get along.

I think probably the terminal I was doing the check in was resized a
bit, so the output got a little shuffled around, but it now looks like
this:

root@vigilance:~# btrfs check --progress --mode lowmem
/dev/masterchef-vg/btrfs
Opening filesystem to check...
Checking filesystem on /dev/masterchef-vg/btrfs
UUID: a0ed3709-1490-4f2d-96b5-bb1fb22f0b45
[1/7] checking root items                      (0:46:43 elapsed,
68928137 items checked)
[2/7] checking extents                         (14:49:23 elapsed,
2419[2/7] checking extents                         (14:49:24 elapsed,
2419[
2/7] checking extents                         (14:49:25 elapsed,
2419[2/7] checking extents                         (14:49:26 elapsed,
2419[2
ERROR: device extent[1, 1997265698816, 576716800] did not find the
related chunkhecked)
[2/7] checking extents                         (164:06:57 elapsed,
34215503 items checked)

so it looks like the check has noticed the same problem that the mount
has, at least.

I don't actually understand all this terminology -- is the "items
checked" number for checking extents counting towards the same total
as the "root items" number? Or is there any other way of estimating
how far it needs to count? (Obviously using that to estimate time
remaining would be highly approximate, but hopefully I could still
find out if it's measured in weeks or years).

On Sun, 15 Dec 2024 at 04:46, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> Do you still remember if there is any error message for the
> clear-space-cache interruption and the next RW mount of it?

I can't say confidently at this stage, but I think there was no error
at clear-space-cache interruption time. I think it's highly possible I
could have missed an error at my next RW mount attempt, I was probably
trying a lot of mounts and often only paying attention to the part of
the error I thought I could debug. (there has been no next
*successful* mount of this disk, RW or otherwise...)

> Thanks,
> Qu
> >
> > Meanwhile I have created a branch for you to manually fix the bug:
> > https://github.com/adam900710/btrfs-progs/tree/orphan_dev_extent_cleanup
> >
> > Since the lowmem is still running, you can prepare an environment to
> > build btrfs-progs, so after the lowmem check finished, you can use that
> > branch to delete the offending item by:
> >
> > # ./btrfs-corrupt-block -X <device>

(I have been able to build this but haven't run it yet, since I'm
still waiting to see if the check says anything interesting)

> > Thanks,
> > Qu
> >
> >>>
> >>> Thanks,
> >>> Qu
> >>>
> >>>> smartctl appears
> >>>> not to work with this disk, so I can't easily say whether the disk is
> >>>> or is not healthy.
> >>>>
> >>>
> >
> >
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dev extent physical offset [...] on devid 1 doesn't have corresponding chunk
  2024-12-20 23:11         ` Ben Millwood
@ 2024-12-20 23:51           ` Qu Wenruo
  2025-01-02 17:58             ` Ben Millwood
  0 siblings, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2024-12-20 23:51 UTC (permalink / raw)
  To: Ben Millwood; +Cc: linux-btrfs



在 2024/12/21 09:41, Ben Millwood 写道:
> Sorry for the delayed reply. I left this for a few days to see how the
> check would get along.
>
> I think probably the terminal I was doing the check in was resized a
> bit, so the output got a little shuffled around, but it now looks like
> this:
>
> root@vigilance:~# btrfs check --progress --mode lowmem
> /dev/masterchef-vg/btrfs
> Opening filesystem to check...
> Checking filesystem on /dev/masterchef-vg/btrfs
> UUID: a0ed3709-1490-4f2d-96b5-bb1fb22f0b45
> [1/7] checking root items                      (0:46:43 elapsed,
> 68928137 items checked)
> [2/7] checking extents                         (14:49:23 elapsed,
> 2419[2/7] checking extents                         (14:49:24 elapsed,
> 2419[
> 2/7] checking extents                         (14:49:25 elapsed,
> 2419[2/7] checking extents                         (14:49:26 elapsed,
> 2419[2
> ERROR: device extent[1, 1997265698816, 576716800] did not find the
> related chunkhecked)
> [2/7] checking extents                         (164:06:57 elapsed,
> 34215503 items checked)
>
> so it looks like the check has noticed the same problem that the mount
> has, at least.
>
> I don't actually understand all this terminology -- is the "items
> checked" number for checking extents counting towards the same total
> as the "root items" number? Or is there any other way of estimating
> how far it needs to count? (Obviously using that to estimate time
> remaining would be highly approximate, but hopefully I could still
> find out if it's measured in weeks or years).
>
> On Sun, 15 Dec 2024 at 04:46, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> Do you still remember if there is any error message for the
>> clear-space-cache interruption and the next RW mount of it?
>
> I can't say confidently at this stage, but I think there was no error
> at clear-space-cache interruption time. I think it's highly possible I
> could have missed an error at my next RW mount attempt, I was probably
> trying a lot of mounts and often only paying attention to the part of
> the error I thought I could debug. (there has been no next
> *successful* mount of this disk, RW or otherwise...)
>
>> Thanks,
>> Qu
>>>
>>> Meanwhile I have created a branch for you to manually fix the bug:
>>> https://github.com/adam900710/btrfs-progs/tree/orphan_dev_extent_cleanup
>>>
>>> Since the lowmem is still running, you can prepare an environment to
>>> build btrfs-progs, so after the lowmem check finished, you can use that
>>> branch to delete the offending item by:
>>>
>>> # ./btrfs-corrupt-block -X <device>
>
> (I have been able to build this but haven't run it yet, since I'm
> still waiting to see if the check says anything interesting)

Please go ahead. Weirdly this error is really a single orphan dev
extent, without any extra other problem.

Thus that command should fix it.

Thanks,
Qu
>
>
>>> Thanks,
>>> Qu
>>>
>>>>>
>>>>> Thanks,
>>>>> Qu
>>>>>
>>>>>> smartctl appears
>>>>>> not to work with this disk, so I can't easily say whether the disk is
>>>>>> or is not healthy.
>>>>>>
>>>>>
>>>
>>>
>>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dev extent physical offset [...] on devid 1 doesn't have corresponding chunk
  2024-12-20 23:51           ` Qu Wenruo
@ 2025-01-02 17:58             ` Ben Millwood
  0 siblings, 0 replies; 8+ messages in thread
From: Ben Millwood @ 2025-01-02 17:58 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

About three weeks and 97 million items into checking extents, the rpi
was accidentally shut down by something else I was doing, so I might
as well just try the fix now.

I tried to run your binary. It said:

ben@vigilance:~/btrfs-progs $ sudo ./btrfs-corrupt-block -X
/dev/masterchef-vg/btrfs
ERROR: failed to find the offending dev extent item: No such file or directory
WARNING: reserved space leaked, flag=0x4 bytes_reserved=32768
extent buffer leak: start 217866993664 len 16384
extent buffer leak: start 217441599488 len 16384
WARNING: dirty eb leak (aborted trans): start 217441599488 len 16384
extent buffer leak: start 1789991600128 len 16384
WARNING: dirty eb leak (aborted trans): start 1789991600128 len 16384

(unsurprisingly no change in fs mount error message)

On Fri, 20 Dec 2024 at 23:51, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> 在 2024/12/21 09:41, Ben Millwood 写道:
> > Sorry for the delayed reply. I left this for a few days to see how the
> > check would get along.
> >
> > I think probably the terminal I was doing the check in was resized a
> > bit, so the output got a little shuffled around, but it now looks like
> > this:
> >
> > root@vigilance:~# btrfs check --progress --mode lowmem
> > /dev/masterchef-vg/btrfs
> > Opening filesystem to check...
> > Checking filesystem on /dev/masterchef-vg/btrfs
> > UUID: a0ed3709-1490-4f2d-96b5-bb1fb22f0b45
> > [1/7] checking root items                      (0:46:43 elapsed,
> > 68928137 items checked)
> > [2/7] checking extents                         (14:49:23 elapsed,
> > 2419[2/7] checking extents                         (14:49:24 elapsed,
> > 2419[
> > 2/7] checking extents                         (14:49:25 elapsed,
> > 2419[2/7] checking extents                         (14:49:26 elapsed,
> > 2419[2
> > ERROR: device extent[1, 1997265698816, 576716800] did not find the
> > related chunkhecked)
> > [2/7] checking extents                         (164:06:57 elapsed,
> > 34215503 items checked)
> >
> > so it looks like the check has noticed the same problem that the mount
> > has, at least.
> >
> > I don't actually understand all this terminology -- is the "items
> > checked" number for checking extents counting towards the same total
> > as the "root items" number? Or is there any other way of estimating
> > how far it needs to count? (Obviously using that to estimate time
> > remaining would be highly approximate, but hopefully I could still
> > find out if it's measured in weeks or years).
> >
> > On Sun, 15 Dec 2024 at 04:46, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >> Do you still remember if there is any error message for the
> >> clear-space-cache interruption and the next RW mount of it?
> >
> > I can't say confidently at this stage, but I think there was no error
> > at clear-space-cache interruption time. I think it's highly possible I
> > could have missed an error at my next RW mount attempt, I was probably
> > trying a lot of mounts and often only paying attention to the part of
> > the error I thought I could debug. (there has been no next
> > *successful* mount of this disk, RW or otherwise...)
> >
> >> Thanks,
> >> Qu
> >>>
> >>> Meanwhile I have created a branch for you to manually fix the bug:
> >>> https://github.com/adam900710/btrfs-progs/tree/orphan_dev_extent_cleanup
> >>>
> >>> Since the lowmem is still running, you can prepare an environment to
> >>> build btrfs-progs, so after the lowmem check finished, you can use that
> >>> branch to delete the offending item by:
> >>>
> >>> # ./btrfs-corrupt-block -X <device>
> >
> > (I have been able to build this but haven't run it yet, since I'm
> > still waiting to see if the check says anything interesting)
>
> Please go ahead. Weirdly this error is really a single orphan dev
> extent, without any extra other problem.
>
> Thus that command should fix it.
>
> Thanks,
> Qu
> >
> >
> >>> Thanks,
> >>> Qu
> >>>
> >>>>>
> >>>>> Thanks,
> >>>>> Qu
> >>>>>
> >>>>>> smartctl appears
> >>>>>> not to work with this disk, so I can't easily say whether the disk is
> >>>>>> or is not healthy.
> >>>>>>
> >>>>>
> >>>
> >>>
> >>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-01-02 17:58 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-14  2:17 dev extent physical offset [...] on devid 1 doesn't have corresponding chunk Ben Millwood
2024-12-14  2:51 ` Qu Wenruo
2024-12-14 17:39   ` Ben Millwood
2024-12-14 21:00     ` Qu Wenruo
2024-12-15  4:46       ` Qu Wenruo
2024-12-20 23:11         ` Ben Millwood
2024-12-20 23:51           ` Qu Wenruo
2025-01-02 17:58             ` Ben Millwood

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox