From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Lukas Pirl <btrfs@lukas-pirl.de>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers")
Date: Fri, 10 Dec 2021 21:53:45 -0500 [thread overview]
Message-ID: <20211211025344.GR17148@hungrycats.org> (raw)
In-Reply-To: <d946882c8042f779cdccc8de59cc10166f77fb04.camel@lukas-pirl.de>
On Fri, Dec 10, 2021 at 02:28:28PM +0100, Lukas Pirl wrote:
> (friendly, humble re-post)
>
> Hello Zygo,
>
> it took me (and the disks) a while to report back; here we go:
>
> On Thu, 2021-12-02 13:11 -0500, Zygo Blaxell wrote as excerpted:
> > > On Thu, 2021-11-25 19:06 +0100, Lukas Pirl wrote as excerpted:
> > > > Dear btrfs community,
> > > >
> > > > this is another report of a probably endless balance which loops on
> > > > "found 1 extents, stage: update data pointers".
> > > >
> > > > I observe it on a btrfs RAID 1 on around 7 luks-encrypted spinning
> > > > disks (more fs details below) used for storing cold data. One disk
> > > > failed physically. Now, I try to "btrfs device delete missing". The
> > > > operation runs forever (probably, waited more than 30 days, another
> > > > time more than 50 days).
> > > >
> > > > dmesg says:
> > > > [ 22:26] BTRFS info (device dm-1): relocating block group
> > > > 1109204664320
> > > > flags data|raid1
> > > > [ 22:27] BTRFS info (device dm-1): found 4164 extents, stage: move
> > > > data
> > > > extents
> > > > [ +5.476247] BTRFS info (device dm-1): found 4164 extents, stage:
> > > > update
> > > > data pointers
> > > > [ +2.545299] BTRFS info (device dm-1): found 1 extents, stage: update
> > > > data
> > > > pointers
> > > >
> > > > and then the last message repeats every ~ .25 seconds ("forever").
> > > > Memory and CPU usage are not excessive (most is IO wait, I assume).
> > > >
> > > > What I have tried:
> > > > * Linux 4 (multiple minor versions, don't remember which exactly)
> > > > * Linux 5.10
> > > > * Linux 5.15
> > > > * btrfs-progs v5.15
> > > > * remove subvolues (before: ~ 200, after: ~ 90)
> > > > * free space cache v1, v2, none
> > > > * reboot, restart removal/balance (multiple times)
> >
> > Does it always happen on the same block group? If so, that points to
> > something lurking in your metadata. If a reboot fixes it for one block
> > group and then it gets stuck on some other block group, it points to
> > an issue in kernel memory state.
>
> Although I haven't paid attention to the block group number in the past,
> another run of ``btrfs dev del`` just now gave the same last block group
> number (1109204664320) before, presumably, looping.
>
> > What do you get from 'btrfs check --readonly'?
>
> $ btrfs check --readonly --mode lowmem /dev/disk/by-label/pool_16-03
>
> [1/7] checking root items
> Opening filesystem to check...
> warning, device 6 is missing
> Checking filesystem on /dev/disk/by-label/pool_16-03
> UUID: 59301fea-434a-4c43-bb45-08fcfe8ce113
> [2/7] checking extents
> ERROR: extent[1109584044032, 8192] referencer count mismatch (root: 276,
> owner: 1154248, offset: 100401152) wanted: 1, have: 0
> ERROR: errors found in extent allocation tree or chunk allocation
> [3/7] checking free space tree
> [4/7] checking fs roots
> [5/7] checking only csums items (without verifying data)
> [6/7] checking root refs done with fs roots in lowmem mode, skipping
> [7/7] checking quota groups skipped (not enabled on this FS)
> found 4252313206784 bytes used, error(s) found
> total csum bytes: 4128183360
> total tree bytes: 25053184000
> total fs tree bytes: 16415014912
> total extent tree bytes: 3662594048
> btree space waste bytes: 4949241278
> file data blocks allocated: 8025128243200
> referenced 7552211206144
OK that's not too bad, just one bad reference.
> So what can be done? ``check --repair``? Or too dangerous? :)
If you have backups are you are prepared to restore them, you can try
check --repair.
> Thanks for your help
>
> Lukas
>
> > > > ======================================================================
> > > >
> > > > filesystem show
> > > > ===============
> > > >
> > > > Label: 'pool_16-03' uuid: 59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > > Total devices 8 FS bytes used 3.84TiB
> > > > devid 1 size 931.51GiB used 592.00GiB path /dev/mapper/WD-
> > > > WCAU45xxxx03
> > > > devid 3 size 1.82TiB used 1.37TiB path /dev/mapper/WD-
> > > > WCAZAFxxxx78
> > > > devid 4 size 931.51GiB used 593.00GiB path /dev/mapper/WD-
> > > > WCC4J7xxxxSZ
> > > > devid 5 size 1.82TiB used 1.46TiB path /dev/mapper/WD-
> > > > WCC4M2xxxxXH
> > > > devid 7 size 931.51GiB used 584.00GiB path
> > > > /dev/mapper/S1xxxxJ3
> > > > devid 9 size 2.73TiB used 2.28TiB path /dev/mapper/WD-
> > > > WCC4N3xxxx17
> > > > devid 10 size 3.64TiB used 1.03TiB path /dev/mapper/WD-
> > > > WCC7K2xxxxNS
> > > > *** Some devices missing
> > > >
> > > > subvolumes
> > > > ==========
> > > >
> > > > ~ 90, of which ~ 60 are read-only snapshots of the other ~ 30
> > > >
> > > > filesystem usage
> > > > ================
> > > >
> > > > Overall:
> > > > Device size: 12.74TiB
> > > > Device allocated: 8.36TiB
> > > > Device unallocated: 4.38TiB
> > > > Device missing: 0.00B
> > > > Used: 7.69TiB
> > > > Free (estimated): 2.50TiB (min: 2.50TiB)
> > > > Free (statfs, df): 1.46TiB
> > > > Data ratio: 2.00
> > > > Metadata ratio: 2.00
> > > > Global reserve: 512.00MiB (used: 48.00KiB)
> > > > Multiple profiles: no
> > > >
> > > > Data,RAID1: Size:4.14TiB, Used:3.82TiB (92.33%)
> > > > /dev/mapper/WD-WCAU45xxxx03 584.00GiB
> > > > /dev/mapper/WD-WCAZAFxxxx78 1.35TiB
> > > > /dev/mapper/WD-WCC4J7xxxxSZ 588.00GiB
> > > > /dev/mapper/WD-WCC4M2xxxxXH 1.44TiB
> > > > missing 510.00GiB
> > > > /dev/mapper/S1xxxxJ3 579.00GiB
> > > > /dev/mapper/WD-WCC4N3xxxx17 2.26TiB
> > > > /dev/mapper/WD-WCC7K2xxxxNS 1.01TiB
> > > >
> > > > Metadata,RAID1: Size:41.00GiB, Used:23.14GiB (56.44%)
> > > > /dev/mapper/WD-WCAU45xxxx03 8.00GiB
> > > > /dev/mapper/WD-WCAZAFxxxx78 17.00GiB
> > > > /dev/mapper/WD-WCC4J7xxxxSZ 5.00GiB
> > > > /dev/mapper/WD-WCC4M2xxxxXH 13.00GiB
> > > > missing 3.00GiB
> > > > /dev/mapper/S1xxxxJ3 5.00GiB
> > > > /dev/mapper/WD-WCC4N3xxxx17 16.00GiB
> > > > /dev/mapper/WD-WCC7K2xxxxNS 15.00GiB
> > > >
> > > > System,RAID1: Size:32.00MiB, Used:848.00KiB (2.59%)
> > > > missing 32.00MiB
> > > > /dev/mapper/WD-WCC4N3xxxx17 32.00MiB
> > > >
> > > > Unallocated:
> > > > /dev/mapper/WD-WCAU45xxxx03 339.51GiB
> > > > /dev/mapper/WD-WCAZAFxxxx78 461.01GiB
> > > > /dev/mapper/WD-WCC4J7xxxxSZ 338.51GiB
> > > > /dev/mapper/WD-WCC4M2xxxxXH 373.01GiB
> > > > missing -513.03GiB
> > > > /dev/mapper/S1xxxxJ3 347.51GiB
> > > > /dev/mapper/WD-WCC4N3xxxx17 460.47GiB
> > > > /dev/mapper/WD-WCC7K2xxxxNS 2.61TiB
> > > >
> > > > dump-super
> > > > ==========
> > > >
> > > > superblock: bytenr=65536, device=/dev/mapper/WD-WCAU45xxxx03
> > > > ---------------------------------------------------------
> > > > csum_type 0 (crc32c)
> > > > csum_size 4
> > > > csum 0x51beb068 [match]
> > > > bytenr 65536
> > > > flags 0x1
> > > > ( WRITTEN )
> > > > magic _BHRfS_M [match]
> > > > fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > > metadata_uuid 59301fea-434a-xxxx-bb45-08fcfe8ce113
> > > > label pool_16-03
> > > > generation 113519755
> > > > root 15602414796800
> > > > sys_array_size 129
> > > > chunk_root_generation 63394299
> > > > root_level 1
> > > > chunk_root 19216820502528
> > > > chunk_root_level 1
> > > > log_root 0
> > > > log_root_transid 0
> > > > log_root_level 0
> > > > total_bytes 16003136864256
> > > > bytes_used 4227124142080
> > > > sectorsize 4096
> > > > nodesize 16384
> > > > leafsize (deprecated) 16384
> > > > stripesize 4096
> > > > root_dir 6
> > > > num_devices 8
> > > > compat_flags 0x0
> > > > compat_ro_flags 0x0
> > > > incompat_flags 0x371
> > > > ( MIXED_BACKREF |
> > > > COMPRESS_ZSTD |
> > > > BIG_METADATA |
> > > > EXTENDED_IREF |
> > > > SKINNY_METADATA |
> > > > NO_HOLES )
> > > > cache_generation 2975866
> > > > uuid_tree_generation 113519755
> > > > dev_item.uuid a9b2e4ea-404c-xxxx-a450-dc84b0956ce1
> > > > dev_item.fsid 59301fea-434a-xxxx-bb45-08fcfe8ce113 [match]
> > > > dev_item.type 0
> > > > dev_item.total_bytes 1000201740288
> > > > dev_item.bytes_used 635655159808
> > > > dev_item.io_align 4096
> > > > dev_item.io_width 4096
> > > > dev_item.sector_size 4096
> > > > dev_item.devid 1
> > > > dev_item.dev_group 0
> > > > dev_item.seek_speed 0
> > > > dev_item.bandwidth 0
> > > > dev_item.generation 0
> > > >
> > > > device stats
> > > > ============
> > > >
> > > > [/dev/mapper/WD-WCAU45xxxx03].write_io_errs 0
> > > > [/dev/mapper/WD-WCAU45xxxx03].read_io_errs 0
> > > > [/dev/mapper/WD-WCAU45xxxx03].flush_io_errs 0
> > > > [/dev/mapper/WD-WCAU45xxxx03].corruption_errs 0
> > > > [/dev/mapper/WD-WCAU45xxxx03].generation_errs 0
> > > > [/dev/mapper/WD-WCAZAFxxxx78].write_io_errs 0
> > > > [/dev/mapper/WD-WCAZAFxxxx78].read_io_errs 0
> > > > [/dev/mapper/WD-WCAZAFxxxx78].flush_io_errs 0
> > > > [/dev/mapper/WD-WCAZAFxxxx78].corruption_errs 0
> > > > [/dev/mapper/WD-WCAZAFxxxx78].generation_errs 0
> > > > [/dev/mapper/WD-WCC4J7xxxxSZ].write_io_errs 0
> > > > [/dev/mapper/WD-WCC4J7xxxxSZ].read_io_errs 1
> > > > [/dev/mapper/WD-WCC4J7xxxxSZ].flush_io_errs 0
> > > > [/dev/mapper/WD-WCC4J7xxxxSZ].corruption_errs 0
> > > > [/dev/mapper/WD-WCC4J7xxxxSZ].generation_errs 0
> > > > [/dev/mapper/WD-WCC4M2xxxxXH].write_io_errs 0
> > > > [/dev/mapper/WD-WCC4M2xxxxXH].read_io_errs 0
> > > > [/dev/mapper/WD-WCC4M2xxxxXH].flush_io_errs 0
> > > > [/dev/mapper/WD-WCC4M2xxxxXH].corruption_errs 0
> > > > [/dev/mapper/WD-WCC4M2xxxxXH].generation_errs 0
> > > > [devid:6].write_io_errs 0
> > > > [devid:6].read_io_errs 0
> > > > [devid:6].flush_io_errs 0
> > > > [devid:6].corruption_errs 72016
> > > > [devid:6].generation_errs 100
> > > > [/dev/mapper/S1xxxxJ3].write_io_errs 0
> > > > [/dev/mapper/S1xxxxJ3].read_io_errs 0
> > > > [/dev/mapper/S1xxxxJ3].flush_io_errs 0
> > > > [/dev/mapper/S1xxxxJ3].corruption_errs 2
> > > > [/dev/mapper/S1xxxxJ3].generation_errs 0
> > > > [/dev/mapper/WD-WCC4N3xxxx17].write_io_errs 0
> > > > [/dev/mapper/WD-WCC4N3xxxx17].read_io_errs 0
> > > > [/dev/mapper/WD-WCC4N3xxxx17].flush_io_errs 0
> > > > [/dev/mapper/WD-WCC4N3xxxx17].corruption_errs 0
> > > > [/dev/mapper/WD-WCC4N3xxxx17].generation_errs 0
> > > > [/dev/mapper/WD-WCC7K2xxxxNS].write_io_errs 0
> > > > [/dev/mapper/WD-WCC7K2xxxxNS].read_io_errs 0
> > > > [/dev/mapper/WD-WCC7K2xxxxNS].flush_io_errs 0
> > > > [/dev/mapper/WD-WCC7K2xxxxNS].corruption_errs 0
> > > > [/dev/mapper/WD-WCC7K2xxxxNS].generation_errs 0
> > >
>
>
next prev parent reply other threads:[~2021-12-11 2:53 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-25 18:06 Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers") Lukas Pirl
2021-12-02 14:49 ` Lukas Pirl
2021-12-02 18:11 ` Zygo Blaxell
2021-12-03 10:14 ` Lukas Pirl
2021-12-05 11:54 ` Lukas Pirl
2021-12-10 13:28 ` Lukas Pirl
2021-12-11 2:53 ` Zygo Blaxell [this message]
2021-12-16 20:52 ` Lukas Pirl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211211025344.GR17148@hungrycats.org \
--to=ce3g8jdj@umail.furryterror.org \
--cc=btrfs@lukas-pirl.de \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox