public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Ben Millwood <thebenmachine@gmail.com>, linux-btrfs@vger.kernel.org
Subject: Re: dev extent physical offset [...] on devid 1 doesn't have corresponding chunk
Date: Sat, 14 Dec 2024 13:21:56 +1030	[thread overview]
Message-ID: <56d3885e-5651-4fd4-af6d-89897f8bd240@gmx.com> (raw)
In-Reply-To: <CAJhrHS2b5fv7wmchdqkCy-jEWZ7hD_3YUgCO_oUCNaf9ossq6w@mail.gmail.com>



在 2024/12/14 12:47, Ben Millwood 写道:
> Hi folks,
>
> I encountered this error recently, and I can't find it anywhere on
> Google except in the patches that first added the check, so I come to
> you for guidance.
>
> This is one of my removable USB drives, formatted btrfs and primarily
> for the purpose of receiving snapshots from my laptop's root drive.
> I'm running:
>
> $ mount /dev/masterchef-vg/btrfs /mnt/masterchef/btrfs -o compress
> mount: /mnt/masterchef/btrfs: mount(2) system call failed: Structure
> needs cleaning.
>         dmesg(1) may have more information after failed mount system call.
>
> Here's what dmesg says:
>
> [13570.361767] BTRFS info (device dm-4): first mount of filesystem
> a0ed3709-1490-4f2d-96b5-bb1fb22f0b45
> [13570.361779] BTRFS info (device dm-4): using crc32c (crc32c-intel)
> checksum algorithm
> [13570.361783] BTRFS info (device dm-4): use zlib compression, level 3
> [13570.361785] BTRFS info (device dm-4): disk space caching is enabled
> [13570.374442] BTRFS error (device dm-4): dev extent physical offset
> 1997265698816 on devid 1 doesn't have corresponding chunk
> [13570.374448] BTRFS error (device dm-4): failed to verify dev extents
> against chunks: -117
> [13570.375329] BTRFS error (device dm-4): open_ctree failed

The problem is exactly what it said, there is an dev-extent but no chunk
item for it.

I'm wondering if there a chunk without its dev extent.

>
> This issue emerged around the time I was trying to mount this
> filesystem from my Raspberry Pi for the first time, but now occurs on
> both my own laptop and my rpi.
>
> Here's my laptop's details:
>
> $ uname -a
> Linux noether 6.6.63 #1-NixOS SMP PREEMPT_DYNAMIC Fri Nov 22 14:38:37
> UTC 2024 x86_64 GNU/Linux
>
> $ btrfs --version
> btrfs-progs v6.11
> -EXPERIMENTAL -INJECT -STATIC +LZO +ZSTD +UDEV +FSVERITY +ZONED CRYPTO=builtin
>
> $ btrfs fi show
> Label: 'noether-root'  uuid: b7ad9a05-8f7b-44af-8952-a7f717e897e0
>      Total devices 1 FS bytes used 319.96GiB
>      devid    1 size 390.62GiB used 390.62GiB path /dev/mapper/noether-lv
>
> Label: 'masterchef-btrfs'  uuid: a0ed3709-1490-4f2d-96b5-bb1fb22f0b45
>      Total devices 1 FS bytes used 1.62TiB
>      devid    1 size 1.82TiB used 1.82TiB path /dev/mapper/masterchef--vg-btrfs
>
> and the rpi:
>
> $ uname -a
> Linux vigilance 6.6.62+rpt-rpi-2712 #1 SMP PREEMPT Debian
> 1:6.6.62-1+rpt1 (2024-11-25) aarch64 GNU/Linux
>
> $ btrfs --version
> btrfs-progs v6.2
>
> (btrfs fi show is the same for masterchef-btrfs)
>
> In terms of possible events that could have caused this:
> 1. I had some issues with the raspberry pi not being able to supply
> enough power for 2 external disks, and for this and related reasons
> it's possible the disk got disconnected without being unmounted
> properly / the pi was uncleanly shut down a few times (though, I
> expect I usually didn't actually write to the disk any of these
> times...)
> 2. When I try to mount on the raspberry pi, I see this in dmesg:
>
> [ 5658.798634] BTRFS info (device dm-2): first mount of filesystem
> a0ed3709-1490-4f2d-96b5-bb1fb22f0b45
> [ 5658.798653] BTRFS info (device dm-2): using crc32c (crc32c-generic)
> checksum algorithm
> [ 5658.798663] BTRFS info (device dm-2): use zlib compression, level 3
> [ 5658.798666] BTRFS info (device dm-2): disk space caching is enabled
> [ 5658.798669] BTRFS warning (device dm-2): v1 space cache is not
> supported for page size 16384 with sectorsize 4096
> [ 5658.798706] BTRFS error (device dm-2): open_ctree failed
>
> so I went and looked up what the "v1 space cache" was, and ran this:
>
> $ btrfs check --clear-space-cache v1 <device>
>
> and then read some more -- oh, nowadays it's a btrfs rescue command
> instead, so I ctrl-C'd the above and ran:
>
> $ btrfs rescue clear-space-cache v1 <device>
>
> which appeared to complete successfully.
>
> (I suppose despite seeing this message on the pi, I must have run
> these commands on my laptop, since my pi's btrfs-progs doesn't have
> the rescue clear-space-cache command.)
>
> Anyway, maybe ctrl-C-ing the btrfs check --clear-space-cache was wrong?

It should not, if so then it's a bug in the code.

Both kernel and btrfs-progs should go with metadata COW with transaction
protection, so even something went wrong (power loss or Ctrl-C) we
should only see the previous transaction, thus everything should be fine.

>
> It's noticeable that the dmesg output, at least on the raspberry pi,
> still mentions the v1 space cache message when trying to mount, unless
> I pass the nospace_cache mount option, in which case I get the "failed
> to verify dev extents" message. (I think I get the latter message in
> either case on my laptop with the newer kernel + btrfs-progs).
>
> A natural thing to do at this stage would be to run btrfs check, but
> the non-lowmem version is always OOM-killed (on either device) while
> checking extents, and the lowmem version has so far not had time to
> complete (and I'm not convinced it will in a reasonable duration). I
> could try to borrow a machine with more RAM, though I have no idea
> whether I need 20% more RAM or 20x more. (The pi is 8G, the laptop is
> 16G, the btrfs partition I'm checking is ~2T.)

Then I'd say 32G may be enough, but lowmem should always work.

>
> While I'm waiting for the lowmem check to progress, are there any
> other useful recovery / diagnosis steps I could try?

If you do not want to waste too long time on btrfs check, please dump
the device tree and chunk tree:

# btrfs ins dump-tree -t chunk <device>
# btrfs ins dump-tree -t dev <device>

That's all the info we need to cross-check the result.

Although `btrfs check --readonly --mode=lowmem` would be the best, as it
will save me a lot of time to either manually verify the output or craft
a script to do that.

My current assumption is a bitflip at runtime, but no proof yet.

Thanks,
Qu

> smartctl appears
> not to work with this disk, so I can't easily say whether the disk is
> or is not healthy.
>


  reply	other threads:[~2024-12-14  2:52 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-14  2:17 dev extent physical offset [...] on devid 1 doesn't have corresponding chunk Ben Millwood
2024-12-14  2:51 ` Qu Wenruo [this message]
2024-12-14 17:39   ` Ben Millwood
2024-12-14 21:00     ` Qu Wenruo
2024-12-15  4:46       ` Qu Wenruo
2024-12-20 23:11         ` Ben Millwood
2024-12-20 23:51           ` Qu Wenruo
2025-01-02 17:58             ` Ben Millwood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56d3885e-5651-4fd4-af6d-89897f8bd240@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=thebenmachine@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox