Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: "Tomáš Metelka" <tomas.metelka@metaliza.cz>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Broken chunk tree - Was: Mount issue, mount /dev/sdc2: can't read superblock
Date: Sun, 30 Dec 2018 12:38:20 +0800	[thread overview]
Message-ID: <c9414ea9-23e7-d361-b78a-bbb4a2546b83@gmx.com> (raw)
In-Reply-To: <07b88bad-e1fa-7485-d410-ee261ace321c@metaliza.cz>


[-- Attachment #1.1: Type: text/plain, Size: 6033 bytes --]



On 2018/12/30 上午8:48, Tomáš Metelka wrote:
> Ok, I've got it:-(
> 
> But just a few questions: I've tried (with btrfs-progs v4.19.1) to
> recover files through btrfs restore -s -m -S -v -i ... and following
> events occurred:
> 
> 1) Just 1 "hard" error:
> ERROR: cannot map block logical 117058830336 length 1073741824: -2
> Error copying data for /mnt/...
> (file which absence really doesn't pain me:-))

This means one data extent can't be recovered due to missing chunk mapping.

Not impossible for heavily damaged fs, but nothing serious.

> 
> 2) For 24 files a I got "too much loops" warning (U mean this: "if
> (loops >= 0 && loops++ >= 1024) { ..."). I've always answered yes but
> I'm afraid these files are corrupted (at least 2 of them seems corrupted).
> 
> How much bad is this?

Not sure, but I don't think store is robust enough for such case.
Maybe false alert.

> Does the error mentioned in #1 mean that it's the
> only file which is totally lost?

Not even total lost, as it's just one file extent, maybe other part is OK.

Thanks,
Qu

> I can live without those 24 + 1 files
> so if #1 and #2 would be the only errors then I could say the recovery
> was successful ... but I'm afraid things aren't such easy:-)
> 
> Thanks
> M.
> 
> 
>   Tomáš Metelka
>   Business & IT Analyst
> 
>   Tel: +420 728 627 252
>   Email: tomas.metelka@metaliza.cz
> 
> 
> 
> On 24. 12. 18 15:19, Qu Wenruo wrote:
>>
>>
>> On 2018/12/24 下午9:52, Tomáš Metelka wrote:
>>> On 24. 12. 18 14:02, Qu Wenruo wrote:
>>>> btrfs check --readonly output please.
>>>>
>>>> btrfs check --readonly is always the most reliable and detailed output
>>>> for any possible recovery.
>>>
>>> This is very weird because it prints only:
>>> ERROR: cannot open file system
>>
>> A new place to enhance ;)
>>
>>>
>>> I've tried also "btrfs check -r 75152310272" but it only says:
>>> parent transid verify failed on 75152310272 wanted 2488742 found 2488741
>>> parent transid verify failed on 75152310272 wanted 2488742 found 2488741
>>> Ignoring transid failure
>>> ERROR: cannot open file system
>>>
>>> I've tried that because:
>>>      backup 3:
>>>   backup_tree_root:    75152310272    gen: 2488741 level: 1
>>>
>>>> Also kernel message for the mount failure could help.
>>>
>>> Sorry, my fault, I should start from this point:
>>>
>>> Dec 23 21:59:07 tisc5 kernel: [10319.442615] BTRFS: device fsid
>>> be557007-42c9-4079-be16-568997e94cd9 devid 1 transid 2488742 /dev/loop0
>>> Dec 23 22:00:49 tisc5 kernel: [10421.167028] BTRFS info (device loop0):
>>> disk space caching is enabled
>>> Dec 23 22:00:49 tisc5 kernel: [10421.167034] BTRFS info (device loop0):
>>> has skinny extents
>>> Dec 23 22:00:50 tisc5 kernel: [10421.807564] BTRFS critical (device
>>> loop0): corrupt node: root=1 block=75150311424 slot=245, invalid NULL
>>> node pointer
>> This explains the problem.
>>
>> Your root tree has one node pointer which is not correct.
>> For pointer it should never points to 0.
>>
>> This is pretty weird, at least some corruption pattern I have never seen.
>>
>> Since your tree root get corrupted, there isn't much thing we can do,
>> but try to use older tree roots.
>>
>> You could go try all backup roots, starting from the newest backup (with
>> highest generation), and check the backup root bytenr using:
>> # btrfs check -r <backup root bytenr> <device>
>>
>> To see which one get least error, but normally the chance is near 0.
>>
>>> Dec 23 22:00:50 tisc5 kernel: [10421.807653] BTRFS error (device loop0):
>>> failed to read block groups: -5
>>> Dec 23 22:00:50 tisc5 kernel: [10421.877001] BTRFS error (device loop0):
>>> open_ctree failed
>>>
>>>
>>> So i tried to do:
>>> 1) btrfs inspect-internal dump-super (with the snippet posted above)
>>> 2) btrfs inspect-internal dump-tree -b 75150311424
>>>
>>> And it showed (header + snippet for items 243-248):
>>> node 75150311424 level 1 items 249 free 244 generation 2488741 owner 2
>>> fs uuid be557007-42c9-4079-be16-568997e94cd9
>>> chunk uuid dbe69c7e-2d50-4001-af31-148c5475b48b
>>> ...
>>>    key (14799519744 EXTENT_ITEM 4096) block 233423224832 (14247023) gen
>>> 2484894
>>>    key (14811271168 EXTENT_ITEM 135168) block 656310272 (40058) gen
>>> 2488049
>>
>>
>>>    key (1505328190277054464 UNKNOWN.4 366981796979539968) block 0 (0)
>>> gen 0
>>>    key (0 UNKNOWN.0 1419267647995904) block 6468220747776 (394788864)
>>> gen
>>> 7786775707648
>>
>> Pretty obviously, these two nodes are garbage.
>> Something corrupted the memory at runtime, and we don't have runtime
>> check against corruption yet.
>>
>> So IMHO, I think the problem is, some kernel code, either btrfs or other
>> parts, corrupted the memory.
>> And then btrfs fails to detect it, write it back to disk, and finally
>> kernel get its chance to read the tree block from disk and finally
>> caught the problem.
>>
>> I could add such check for node, but normally it needs
>> CONFIG_BTRFS_FS_CHECK_INTEGRITY, so makes no sense for normal user.
>>
>>>    key (12884901888 EXTENT_ITEM 24576) block 816693248 (49847) gen
>>> 2484931
>>>    key (14902849536 EXTENT_ITEM 131072) block 75135844352 (4585928) gen
>>> 2488739
>>>
>>>
>>> I looked at that numbers quite a while (also in hex) trying to figure
>>> out what has happened (bit flips (it was on SSD), byte shifts (I
>>> suspected bad CPU also ... because it has died after 2 months from
>>> that)) and tried to guess "correct" values for that items ... but no
>>> idea:-(
>>
>> I'm not that sure, unless you're super lucky (or unlucky in this case),
>> or it will normally get caught by csum first.
>>
>>>
>>> So this why I have asked about that log_root and whether there is a
>>> chance to "log-replay things":-)
>>
>> For your case, definitely not related to log replay.
>>
>> Thanks,
>> Qu
>>
>>>
>>>
>>> Thanks
>>> M.
>>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  parent reply	other threads:[~2018-12-30  4:38 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-20 21:21 Mount issue, mount /dev/sdc2: can't read superblock Peter Chant
2018-12-21 22:25 ` Chris Murphy
2018-12-22 12:34   ` Peter Chant
2018-12-24  0:58     ` Chris Murphy
2018-12-24  2:00       ` Qu Wenruo
2018-12-24 11:36         ` Peter Chant
2018-12-24 11:31       ` Peter Chant
2018-12-24 12:02         ` Qu Wenruo
2018-12-24 12:48           ` Tomáš Metelka
2018-12-24 13:02             ` Qu Wenruo
2018-12-24 13:52               ` Tomáš Metelka
2018-12-24 14:19                 ` Qu Wenruo
2018-12-30  0:48                   ` Broken chunk tree - Was: " Tomáš Metelka
2018-12-30  3:59                     ` Duncan
2018-12-30  4:38                     ` Qu Wenruo [this message]
2018-12-24 23:20         ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c9414ea9-23e7-d361-b78a-bbb4a2546b83@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=tomas.metelka@metaliza.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox