From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68601C4361B for ; Fri, 18 Dec 2020 01:52:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 14A2C23A3C for ; Fri, 18 Dec 2020 01:52:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732112AbgLRBv4 (ORCPT ); Thu, 17 Dec 2020 20:51:56 -0500 Received: from james.kirk.hungrycats.org ([174.142.39.145]:45724 "EHLO james.kirk.hungrycats.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729402AbgLRBv4 (ORCPT ); Thu, 17 Dec 2020 20:51:56 -0500 Received: by james.kirk.hungrycats.org (Postfix, from userid 1002) id D7DA6901FF7; Thu, 17 Dec 2020 20:51:14 -0500 (EST) Date: Thu, 17 Dec 2020 20:51:14 -0500 From: Zygo Blaxell To: Ulrich Windl Cc: linux-btrfs@vger.kernel.org Subject: Re: Antw: [EXT] Re: Unrecoverable filesystem (ERROR: child eb corrupted: parent bytenr=1106952192 item=75 parent level=1 child level=1) Message-ID: <20201218015114.GE28049@hungrycats.org> References: <5FD3816B020000A10003D798@gwsmtp.uni-regensburg.de> <20201215181828.GN31381@hungrycats.org> <5FDB6190020000A10003DA53@gwsmtp.uni-regensburg.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <5FDB6190020000A10003DA53@gwsmtp.uni-regensburg.de> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Thu, Dec 17, 2020 at 02:48:00PM +0100, Ulrich Windl wrote: > >>> Zygo Blaxell schrieb am 15.12.2020 um > 19:18 in > Nachricht <20201215181828.GN31381@hungrycats.org>: > > On Fri, Dec 11, 2020 at 03:25:47PM +0100, Ulrich Windl wrote: > >> Hi! > >> > >> While configuring a VM environment in a cluster I had setup an SLES15 SP2 > > test VM using BtrFS. Due to some problem with libvirt (or the VirtualDomain > > > RA) the VM was active on more than one cluster node at a time, corrupting > the > > filesystem beyond repair it seems: > >> hvc0:rescue:~ # btrfs check /dev/xvda2 > >> Opening filesystem to check... > >> Checking filesystem on /dev/xvda2 > >> UUID: 1b651baa‑327b‑45fe‑9512‑e7147b24eb49 > >> [1/7] checking root items > >> ERROR: child eb corrupted: parent bytenr=1107230720 item=75 parent level=1 > > > child level=1 > >> ERROR: failed to repair root items: Input/output error > >> hvc0:rescue:~ # btrfsck ‑b /dev/xvda2 > >> Opening filesystem to check... > >> Checking filesystem on /dev/xvda2 > >> UUID: 1b651baa‑327b‑45fe‑9512‑e7147b24eb49 > >> [1/7] checking root items > >> ERROR: child eb corrupted: parent bytenr=1106952192 item=75 parent level=1 > > > child level=1 > >> ERROR: failed to repair root items: Input/output error > >> hvc0:rescue:~ # btrfsck ‑‑repair /dev/xvda2 > >> enabling repair mode > >> Opening filesystem to check... > >> Checking filesystem on /dev/xvda2 > >> UUID: 1b651baa‑327b‑45fe‑9512‑e7147b24eb49 > >> [1/7] checking root items > >> ERROR: child eb corrupted: parent bytenr=1107230720 item=75 parent level=1 > > > child level=1 > >> ERROR: failed to repair root items: Input/output error > >> > >> Two questions arising: > >> 1) Can't the kernel set some "open flag" early when opening the > >> filesystem, and refuse to open it again (the other VM) when the flag > >> is set? That could avoid such situations I guess > > > > If btrfs wrote "the filesystem is open" to the disk, the filesystem > > would not be mountable after a crash. > > > > The kernel does set an "open flag" (it detects that it is about to mount > > the same btrfs by uuid, and does something like a bind mount instead) > > but that applies only to multiple btrfs mounts on the _same_ kernel. > > In your case there are multiple kernels present (one in each node) > > and there's no way for them to communicate with each other. > > > > There are at least 3 different ways libvirt or other hosting > > infrastructure software on the VM host could have avoided passing the > > same physical device to multiple VM guests. I would suggest implementing > > some or all of them. > > > >> 2) Can't btrfs check try somewhat harder to rescue anything, or is > >> the fs structure in a way that everything is lost? > > > >> What really puzzles me is this: > >> There are several snapshots and subvolumes on the BtFS device. It's > >> hard to believe that absolutely nothing seems to be recoverable. > > > > The most likely outcome is that the root tree nodes and most of the > > interior nodes of all the filesystem trees are broken. The kernel > > relies on the trees to work‑‑everything in btrfs except the superblocks > > can be at any location on disk‑‑so the filesystem will be unreadable by > > the kernel. Only recovery tools would be able to read the filesystem now. > > > > Recovery requires a brute force search of the disk to find as many > > surviving leaf nodes as possible and rebuild the filesystem trees. > > This is more or less what 'btrfs check ‑‑repair ‑‑init‑extent‑tree' does. > > Hi! > > As I didn't have a backup (it was just a test VM to test HA cluster > configuration), I tried your command: > It finished rather quickly even with little RAM, but found *many* problems: > ... > Deleting bad dir index [715,96,8] root 257 > Deleting bad dir index [257,96,14] root 257 > Deleting bad dir index [257,96,15] root 257 > Deleting bad dir index [259,96,21] root 257 > Deleting bad dir index [291,96,6] root 257 > Deleting bad dir index [1804,96,2] root 257 > Deleting bad dir index [1804,96,3] root 257 > Deleting bad dir index [1804,96,4] root 257 > Deleting bad dir index [1804,96,5] root 257 > Deleting bad dir index [320,96,5] root 257 > Deleting bad dir index [1805,96,2] root 257 > Deleting bad dir index [257,96,16] root 257 > Deleting bad dir index [326,96,6] root 257 > ERROR: errors found in fs roots > found 30851072 bytes used, error(s) found > total csum bytes: 1370452 > total tree bytes: 3211264 > total fs tree bytes: 1458176 > total extent tree bytes: 16384 > btree space waste bytes: 597304 > file data blocks allocated: 27607040 > referenced 27607040 > > A subsequent " btrfs check /dev/xvda2" found many problems again: > ... > root 257 inode 7589 errors 2001, no inode item, link count wrong > unresolved ref dir 1804 index 0 namelen 7 name main.cf filetype 1 > errors 6, no dir index, no inode ref > root 257 inode 7590 errors 2001, no inode item, link count wrong > unresolved ref dir 320 index 0 namelen 18 name postfix.configured > filetype 1 errors 6, no dir index, no inode ref > root 257 inode 7591 errors 2001, no inode item, link count wrong > unresolved ref dir 1806 index 0 namelen 3 name pid filetype 2 errors > 6, no dir index, no inode ref > root 257 inode 7593 errors 2001, no inode item, link count wrong > unresolved ref dir 1805 index 0 namelen 11 name master.lock filetype 1 > errors 6, no dir index, no inode ref > root 257 inode 7641 errors 2001, no inode item, link count wrong > unresolved ref dir 257 index 0 namelen 11 name snapper.log filetype 1 > errors 6, no dir index, no inode ref > root 257 inode 7644 errors 2001, no inode item, link count wrong > unresolved ref dir 326 index 0 namelen 16 name logrotate.status > filetype 1 errors 6, no dir index, no inode ref > ERROR: errors found in fs roots > found 30965760 bytes used, error(s) found > total csum bytes: 1370452 > total tree bytes: 3342336 > total fs tree bytes: 1523712 > total extent tree bytes: 81920 > btree space waste bytes: 669123 > file data blocks allocated: 27607040 > referenced 27607040 > > Even after iterating a "normal" check a few times, I could not mount the > "repaired" filesystem: > hvc0:rescue:~ # mount -r /dev/xvda2 /mnt > mount.bin: /mnt: wrong fs type, bad option, bad superblock on /dev/xvda2, > missing codepage or helper program, or other error. > hvc0:rescue:~ # journalctl -f > -- Logs begin at Thu 2020-12-17 13:36:57 UTC. -- > Dec 17 13:44:33 rescue kernel: BTRFS info (device xvda2): disk space caching > is enabled > Dec 17 13:44:33 rescue kernel: BTRFS info (device xvda2): has skinny extents > Dec 17 13:44:33 rescue kernel: BTRFS error (device xvda2): chunk 1048576 has > missing dev extent, have 0 expect 1 > Dec 17 13:44:33 rescue kernel: BTRFS error (device xvda2): failed to verify > dev extents against chunks: -117 > Dec 17 13:44:33 rescue kernel: BTRFS error (device xvda2): open_ctree failed > ^C > > I'm not hoping to recover the system to a usable state, but out of curiosity > I'd like to get an impression what had survived and what had not. If you're missing dev extents you'll need to run chunk-recover to brute-force scan for the chunk headers. But this is really stretching the abilities of the current tools. > Regards, > Ulrich > > > > > If you run ‑‑init‑extent‑tree, assuming it works (you should not assume > > that it will work), you would then have to audit the filesystem contents > > to see what data was not recovered. At a minimum, you would lose a few > > hundred filesystem items, since each metadata leaf node contains around > > 200 items and you definitely will not recover them all. The data csum > > trees might not be in sync with the rest of the filesytem, so you can't > > rely on scrub to check data integrity. If this is successful, you will > > have a similar result to mounting ext4 on multiple VMs simultaneously‑‑ > > fsck runs, the filesystem is read‑write again, but you don't get all > > the data back, nor even a list of data that was lost or corrupted. > > > > ‑‑init‑extent‑tree can be quite slow, especially if you don't have enough > > RAM to hold all the filesystem's metadata. It's still under development, > > so one possible outcome is that it crashes with an assertion failure > > and leaves you with a even more broken filesystem. > > > > It's usually faster and easier to mkfs and restore from backups instead. > > > >> I have this: > >> hvc0:rescue:~ # btrfs inspect‑internal dump‑super /dev/xvda2 > >> superblock: bytenr=65536, device=/dev/xvda2 > >> ‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑ > >> csum_type 0 (crc32c) > >> csum_size 4 > >> csum 0x659898f3 [match] > >> bytenr 65536 > >> flags 0x1 > >> ( WRITTEN ) > >> magic _BHRfS_M [match] > >> fsid 1b651baa‑327b‑45fe‑9512‑e7147b24eb49 > >> metadata_uuid 1b651baa‑327b‑45fe‑9512‑e7147b24eb49 > >> label > >> generation 280 > >> root 1107214336 > >> sys_array_size 97 > >> chunk_root_generation 35 > >> root_level 0 > >> chunk_root 1048576 > >> chunk_root_level 0 > >> log_root 0 > >> log_root_transid 0 > >> log_root_level 0 > >> total_bytes 10727960576 > >> bytes_used 1461825536 > >> sectorsize 4096 > >> nodesize 16384 > >> leafsize (deprecated) 16384 > >> stripesize 4096 > >> root_dir 6 > >> num_devices 1 > >> compat_flags 0x0 > >> compat_ro_flags 0x0 > >> incompat_flags 0x163 > >> ( MIXED_BACKREF | > >> DEFAULT_SUBVOL | > >> BIG_METADATA | > >> EXTENDED_IREF | > >> SKINNY_METADATA ) > >> cache_generation 280 > >> uuid_tree_generation 40 > >> dev_item.uuid 2abdf93e‑2f2d‑4eef‑a1d8‑9325f809ebce > >> dev_item.fsid 1b651baa‑327b‑45fe‑9512‑e7147b24eb49 [match] > >> dev_item.type 0 > >> dev_item.total_bytes 10727960576 > >> dev_item.bytes_used 2436890624 > >> dev_item.io_align 4096 > >> dev_item.io_width 4096 > >> dev_item.sector_size 4096 > >> dev_item.devid 1 > >> dev_item.dev_group 0 > >> dev_item.seek_speed 0 > >> dev_item.bandwidth 0 > >> dev_item.generation 0 > >> > >> Regards, > >> Ulrich Windl > >> > >> > > > >