From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: "Tomáš Metelka" <tomas.metelka@metaliza.cz>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Broken chunk tree - Was: Mount issue, mount /dev/sdc2: can't read superblock
Date: Sun, 30 Dec 2018 12:38:20 +0800 [thread overview]
Message-ID: <c9414ea9-23e7-d361-b78a-bbb4a2546b83@gmx.com> (raw)
In-Reply-To: <07b88bad-e1fa-7485-d410-ee261ace321c@metaliza.cz>
[-- Attachment #1.1: Type: text/plain, Size: 6033 bytes --]
On 2018/12/30 上午8:48, Tomáš Metelka wrote:
> Ok, I've got it:-(
>
> But just a few questions: I've tried (with btrfs-progs v4.19.1) to
> recover files through btrfs restore -s -m -S -v -i ... and following
> events occurred:
>
> 1) Just 1 "hard" error:
> ERROR: cannot map block logical 117058830336 length 1073741824: -2
> Error copying data for /mnt/...
> (file which absence really doesn't pain me:-))
This means one data extent can't be recovered due to missing chunk mapping.
Not impossible for heavily damaged fs, but nothing serious.
>
> 2) For 24 files a I got "too much loops" warning (U mean this: "if
> (loops >= 0 && loops++ >= 1024) { ..."). I've always answered yes but
> I'm afraid these files are corrupted (at least 2 of them seems corrupted).
>
> How much bad is this?
Not sure, but I don't think store is robust enough for such case.
Maybe false alert.
> Does the error mentioned in #1 mean that it's the
> only file which is totally lost?
Not even total lost, as it's just one file extent, maybe other part is OK.
Thanks,
Qu
> I can live without those 24 + 1 files
> so if #1 and #2 would be the only errors then I could say the recovery
> was successful ... but I'm afraid things aren't such easy:-)
>
> Thanks
> M.
>
>
> Tomáš Metelka
> Business & IT Analyst
>
> Tel: +420 728 627 252
> Email: tomas.metelka@metaliza.cz
>
>
>
> On 24. 12. 18 15:19, Qu Wenruo wrote:
>>
>>
>> On 2018/12/24 下午9:52, Tomáš Metelka wrote:
>>> On 24. 12. 18 14:02, Qu Wenruo wrote:
>>>> btrfs check --readonly output please.
>>>>
>>>> btrfs check --readonly is always the most reliable and detailed output
>>>> for any possible recovery.
>>>
>>> This is very weird because it prints only:
>>> ERROR: cannot open file system
>>
>> A new place to enhance ;)
>>
>>>
>>> I've tried also "btrfs check -r 75152310272" but it only says:
>>> parent transid verify failed on 75152310272 wanted 2488742 found 2488741
>>> parent transid verify failed on 75152310272 wanted 2488742 found 2488741
>>> Ignoring transid failure
>>> ERROR: cannot open file system
>>>
>>> I've tried that because:
>>> backup 3:
>>> backup_tree_root: 75152310272 gen: 2488741 level: 1
>>>
>>>> Also kernel message for the mount failure could help.
>>>
>>> Sorry, my fault, I should start from this point:
>>>
>>> Dec 23 21:59:07 tisc5 kernel: [10319.442615] BTRFS: device fsid
>>> be557007-42c9-4079-be16-568997e94cd9 devid 1 transid 2488742 /dev/loop0
>>> Dec 23 22:00:49 tisc5 kernel: [10421.167028] BTRFS info (device loop0):
>>> disk space caching is enabled
>>> Dec 23 22:00:49 tisc5 kernel: [10421.167034] BTRFS info (device loop0):
>>> has skinny extents
>>> Dec 23 22:00:50 tisc5 kernel: [10421.807564] BTRFS critical (device
>>> loop0): corrupt node: root=1 block=75150311424 slot=245, invalid NULL
>>> node pointer
>> This explains the problem.
>>
>> Your root tree has one node pointer which is not correct.
>> For pointer it should never points to 0.
>>
>> This is pretty weird, at least some corruption pattern I have never seen.
>>
>> Since your tree root get corrupted, there isn't much thing we can do,
>> but try to use older tree roots.
>>
>> You could go try all backup roots, starting from the newest backup (with
>> highest generation), and check the backup root bytenr using:
>> # btrfs check -r <backup root bytenr> <device>
>>
>> To see which one get least error, but normally the chance is near 0.
>>
>>> Dec 23 22:00:50 tisc5 kernel: [10421.807653] BTRFS error (device loop0):
>>> failed to read block groups: -5
>>> Dec 23 22:00:50 tisc5 kernel: [10421.877001] BTRFS error (device loop0):
>>> open_ctree failed
>>>
>>>
>>> So i tried to do:
>>> 1) btrfs inspect-internal dump-super (with the snippet posted above)
>>> 2) btrfs inspect-internal dump-tree -b 75150311424
>>>
>>> And it showed (header + snippet for items 243-248):
>>> node 75150311424 level 1 items 249 free 244 generation 2488741 owner 2
>>> fs uuid be557007-42c9-4079-be16-568997e94cd9
>>> chunk uuid dbe69c7e-2d50-4001-af31-148c5475b48b
>>> ...
>>> key (14799519744 EXTENT_ITEM 4096) block 233423224832 (14247023) gen
>>> 2484894
>>> key (14811271168 EXTENT_ITEM 135168) block 656310272 (40058) gen
>>> 2488049
>>
>>
>>> key (1505328190277054464 UNKNOWN.4 366981796979539968) block 0 (0)
>>> gen 0
>>> key (0 UNKNOWN.0 1419267647995904) block 6468220747776 (394788864)
>>> gen
>>> 7786775707648
>>
>> Pretty obviously, these two nodes are garbage.
>> Something corrupted the memory at runtime, and we don't have runtime
>> check against corruption yet.
>>
>> So IMHO, I think the problem is, some kernel code, either btrfs or other
>> parts, corrupted the memory.
>> And then btrfs fails to detect it, write it back to disk, and finally
>> kernel get its chance to read the tree block from disk and finally
>> caught the problem.
>>
>> I could add such check for node, but normally it needs
>> CONFIG_BTRFS_FS_CHECK_INTEGRITY, so makes no sense for normal user.
>>
>>> key (12884901888 EXTENT_ITEM 24576) block 816693248 (49847) gen
>>> 2484931
>>> key (14902849536 EXTENT_ITEM 131072) block 75135844352 (4585928) gen
>>> 2488739
>>>
>>>
>>> I looked at that numbers quite a while (also in hex) trying to figure
>>> out what has happened (bit flips (it was on SSD), byte shifts (I
>>> suspected bad CPU also ... because it has died after 2 months from
>>> that)) and tried to guess "correct" values for that items ... but no
>>> idea:-(
>>
>> I'm not that sure, unless you're super lucky (or unlucky in this case),
>> or it will normally get caught by csum first.
>>
>>>
>>> So this why I have asked about that log_root and whether there is a
>>> chance to "log-replay things":-)
>>
>> For your case, definitely not related to log replay.
>>
>> Thanks,
>> Qu
>>
>>>
>>>
>>> Thanks
>>> M.
>>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2018-12-30 4:38 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-20 21:21 Mount issue, mount /dev/sdc2: can't read superblock Peter Chant
2018-12-21 22:25 ` Chris Murphy
2018-12-22 12:34 ` Peter Chant
2018-12-24 0:58 ` Chris Murphy
2018-12-24 2:00 ` Qu Wenruo
2018-12-24 11:36 ` Peter Chant
2018-12-24 11:31 ` Peter Chant
2018-12-24 12:02 ` Qu Wenruo
2018-12-24 12:48 ` Tomáš Metelka
2018-12-24 13:02 ` Qu Wenruo
2018-12-24 13:52 ` Tomáš Metelka
2018-12-24 14:19 ` Qu Wenruo
2018-12-30 0:48 ` Broken chunk tree - Was: " Tomáš Metelka
2018-12-30 3:59 ` Duncan
2018-12-30 4:38 ` Qu Wenruo [this message]
2018-12-24 23:20 ` Chris Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c9414ea9-23e7-d361-b78a-bbb4a2546b83@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=tomas.metelka@metaliza.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox