linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: Dion Gullotta <Dion.Gullotta@faredge.com.au>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: btrfs partition spontaneously corrupted - No recovery options. Kernel oops / "Kernel Bug"?
Date: Thu, 4 Feb 2016 10:23:09 +0800	[thread overview]
Message-ID: <56B2B60D.4000203@cn.fujitsu.com> (raw)
In-Reply-To: <CCD0821AE781994EB8BAB510676B5EF502F2987D711D@ZEUS.faredge.local>



Dion Gullotta wrote on 2016/02/04 12:53 +1100:
> Hi Qu, thanks so much for your fast reply.
>
> I'm running this right now and hoping for some good results:
>
> root@odin:/var/readynasd# btrfs rescue chunk-recover -vy /dev/md127
> All Devices:
>          Device: id = 1, name = /dev/md127
>
>
> You said " Other idea including try to use backup roots manually" how do I do this? I tried btrfs-find-root but it doesn't find anything.
>

Use btrfs-show-super -f.
You'll see things like:
------
backup_roots[4]:
	backup 0:
		backup_tree_root:	29392896	gen: 6	level: 0
		backup_chunk_root:	20987904	gen: 5	level: 0
		backup_extent_root:	29409280	gen: 6	level: 0
		backup_fs_root:		29360128	gen: 4	level: 0
		backup_dev_root:	29507584	gen: 6	level: 0
		backup_csum_root:	29425664	gen: 4	level: 0
		backup_total_bytes:	10737418240
		backup_bytes_used:	393216
		backup_num_devices:	1

	backup 1:
		backup_tree_root:	29540352	gen: 7	level: 0
		backup_chunk_root:	20987904	gen: 5	level: 0
		backup_extent_root:	29556736	gen: 7	level: 0
		backup_fs_root:		29360128	gen: 4	level: 0
		backup_dev_root:	29507584	gen: 6	level: 0
		backup_csum_root:	29573120	gen: 7	level: 0
		backup_total_bytes:	10737418240
		backup_bytes_used:	409600
		backup_num_devices:	1
------

Find a backup_chunk_root whose gen is smaller than your current 
chunk_root, which is also shown in btrfs-show-super -f(before backup 
sections):
------
chunk_root_generation	5 <<< Here
root_level		0
chunk_root		20987904
chunk_root_level	0
------

But most case, the chunk changes are quite seldom, so no much luck though.

Another way is use btrfs-find-root, which should find all old chunks.
But the problem is, current btrfs-find-root can't handle chunk tree.
So no luck either.

Thanks,
Qu

> Any further info appreciated.
>
> Cheers,
> Dion
>
>
> -----Original Message-----
> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-owner@vger.kernel.org] On Behalf Of Qu Wenruo
> Sent: Thursday, 4 February 2016 12:42 PM
> To: Dion Gullotta <Dion.Gullotta@faredge.com.au>; linux-btrfs@vger.kernel.org
> Subject: Re: btrfs partition spontaneously corrupted - No recovery options. Kernel oops / "Kernel Bug"?
>
>
>
> Dion Gullotta wrote on 2016/02/04 12:28 +1100:
>> Hi,
>>
>> We have a btrfs partition that was working fine up until last night whereupon it stopped working. The first thing I tried was rebooting the server, which got stuck on a hung mount process. I've tried every diagnostic and recovery option I can find online and nothing is working.
>>
>> We did have regular snapshots being taken, and regular scrubbing was being performed as well. If you need any information I'm more than happy to provide.
>>
>> The OS is ReadyNAS which is linux under the hood. Readynas OS version
>> 6.2.4
>>
>> Here are the relevant details:
>>
>> Broken device is /dev/md127 which is usually mounted under /data
>>
>> root@odin:/var/readynasd# uname -a
>> Linux odin 3.0.101.RN2120.3 #1 SMP Wed Apr 1 16:09:30 PDT 2015 armv7l
>> GNU/Linux
>>
>> root@odin:/var/readynasd# btrfs fi show checksum verify failed on
>> 18949527437312 found 4A677799 wanted CB641650 checksum verify failed
>> on 18949527437312 found 4A677799 wanted CB641650 checksum verify
>> failed on 18949527437312 found 4A677799 wanted CB641650 checksum
>> verify failed on 18949527437312 found 4A677799 wanted CB641650 Csum
>> didn't match Couldn't read chunk root
>
> One of the most deadly corruption for current btrfs, chunk tree root corrupt.
>
> Normally, btrfs rescue chunk-recovery should be the correct tool to fix it, but several bug and some bad design makes chunk-recovery quite easy to crash, and not recover the fs.
>
> But you can alwasy try that tool.
>
> Other idea including try to use backup roots manually, but under most case it doesn't work as backup root is only up to 4 backups, which normally doesn't contain the needed chunk root.
>
> Thanks,
> Qu
>
>
>> Label: '2fe6230e:data'  uuid: 04c95625-4927-4ade-80e7-de45a7536271
>>           Total devices 1 FS bytes used 13.62TiB
>>           devid    1 size 21.82TiB used 14.24TiB path /dev/md127
>>
>> Btrfs v3.17.3
>>
>> This is the relevant part of dmesg
>>
>> udevd[862]: starting version 175
>> btrfs: device label 2fe6230e:data devid 1 transid 248531 /dev/md127
>> Adding 1047420k swap on /dev/md1. Priority:-1 extents:1
>> across:1047420k BTRFS critical (device md127): unable to find logical
>> 1357341392896 len 4096 kernel BUG at fs/btrfs/inode.c:1621!
>> Unable to handle kernel NULL pointer dereference at virtual address
>> 00000000 pgd = f0260000 [00000000] *pgd=30015831, *pte=00000000,
>> *ppte=00000000 Internal error: Oops: 817 [#1] SMP
>>
>> Note the kernel bug and kernel oops lines.
>>
>>
>> I've tried the following things, results shown:
>>
>> mount -o recovery /dev/md127 /data
>>
>> mount -o ro,recovery /dev/md127 /data
>>
>> mount -o ro /dev/md127 /data
>>
>> All of these just hang and a reboot is necessary in order to kill the process.
>>
>>
>>
>> Things that don't work:
>>
>> root@odin:/tmp# btrfs-zero-log /dev/md127 checksum verify failed on
>> 18949527437312 found 4A677799 wanted CB641650 checksum verify failed
>> on 18949527437312 found 4A677799 wanted CB641650 checksum verify
>> failed on 18949527437312 found 4A677799 wanted CB641650 checksum
>> verify failed on 18949527437312 found 4A677799 wanted CB641650 Csum
>> didn't match Couldn't read chunk root
>>
>>
>> root@odin:/tmp# btrfs restore -F -i -D -v /dev/md127 /dev/null
>> checksum verify failed on 18949527437312 found 4A677799 wanted
>> CB641650 checksum verify failed on 18949527437312 found 4A677799
>> wanted CB641650 checksum verify failed on 18949527437312 found
>> 4A677799 wanted CB641650 checksum verify failed on 18949527437312
>> found 4A677799 wanted CB641650 Csum didn't match Couldn't read chunk
>> root Could not open root, trying backup super checksum verify failed
>> on 18949527437312 found 4A677799 wanted CB641650 checksum verify
>> failed on 18949527437312 found 4A677799 wanted CB641650 checksum
>> verify failed on 18949527437312 found 4A677799 wanted CB641650
>> checksum verify failed on 18949527437312 found 4A677799 wanted
>> CB641650 Csum didn't match Couldn't read chunk root Could not open
>> root, trying backup super checksum verify failed on 18949527437312
>> found 4A677799 wanted CB641650 checksum verify failed on
>> 18949527437312 found 4A677799 wanted CB641650 checksum verify failed
>> on 18949527437312 found 4A677799 wanted CB641650 checksum verify
>> failed on 18949527437312 found 4A677799 wanted CB641650 Csum didn't
>> match Couldn't read chunk root
>>
>>
>> root@odin:/tmp# btrfs-find-root /dev/md127 checksum verify failed on
>> 18949527437312 found 4A677799 wanted CB641650 checksum verify failed
>> on 18949527437312 found 4A677799 wanted CB641650 checksum verify
>> failed on 18949527437312 found 4A677799 wanted CB641650 checksum
>> verify failed on 18949527437312 found 4A677799 wanted CB641650 Csum
>> didn't match Couldn't read chunk root Open ctree failed
>>
>> root@odin:/tmp# btrfsck /dev/md127
>> Couldn't open file system
>>
>> oot@odin:/tmp# btrfs rescue super-recover -v /dev/md127 All Devices:
>> Device: id = 1, name = /dev/md127
>>
>> Before Recovering:
>> [All good supers]:
>> device name = /dev/md127
>> superblock bytenr = 65536
>>
>> device name = /dev/md127
>> superblock bytenr = 67108864
>>
>> device name = /dev/md127
>> superblock bytenr = 274877906944
>>
>> [All bad supers]:
>>
>> All supers are valid, no need to recover
>>
>>
>> root@odin:/tmp# btrfs check /dev/md127 Couldn't open file system
>> root@odin:/tmp# btrfsck /dev/md127 Couldn't open file system
>>
>> Other info
>>
>> root@odin:/tmp# lsblk -o name,type,size,fstype,mountpoint NAME TYPE
>> SIZE FSTYPE MOUNTPOINT
>> mtdblock0 disk 1.5M
>> mtdblock1 disk 128K
>> mtdblock2 disk 6M
>> mtdblock3 disk 4M
>> mtdblock4 disk 116M
>> sda disk 7.3T
>> ├─sda1 part 4G linux_raid_member
>> │ └─md0 raid1 4G ext4 /
>> ├─sda2 part 512M linux_raid_member
>> │ └─md1 raid6 1022.9M swap [SWAP]
>> └─sda3 part 7.3T linux_raid_member
>> └─md127 raid5 21.8T btrfs
>> sdb disk 7.3T
>> ├─sdb1 part 4G linux_raid_member
>> │ └─md0 raid1 4G ext4 /
>> ├─sdb2 part 512M linux_raid_member
>> │ └─md1 raid6 1022.9M swap [SWAP]
>> └─sdb3 part 7.3T linux_raid_member
>> └─md127 raid5 21.8T btrfs
>> sdc disk 7.3T
>> ├─sdc1 part 4G linux_raid_member
>> │ └─md0 raid1 4G ext4 /
>> ├─sdc2 part 512M linux_raid_member
>> │ └─md1 raid6 1022.9M swap [SWAP]
>> └─sdc3 part 7.3T linux_raid_member
>> └─md127 raid5 21.8T btrfs
>> sdd disk 7.3T
>> ├─sdd1 part 4G linux_raid_member
>> │ └─md0 raid1 4G ext4 /
>> ├─sdd2 part 512M linux_raid_member
>> │ └─md1 raid6 1022.9M swap [SWAP]
>> └─sdd3 part 7.3T linux_raid_member
>> └─md127 raid5 21.8T btrfs
>>
>>
>>    Disk health seems fine:
>> root@odin:/tmp# smartctl -a /dev/sda | grep PASSED SMART
>> overall-health self-assessment test result: PASSED root@odin:/tmp#
>> smartctl -a /dev/sdb | grep PASSED SMART overall-health
>> self-assessment test result: PASSED root@odin:/tmp# smartctl -a
>> /dev/sdc | grep PASSED SMART overall-health self-assessment test
>> result: PASSED root@odin:/tmp# smartctl -a /dev/sdd | grep PASSED
>> SMART overall-health self-assessment test result: PASSED
>>
>>
>>
>>
>> Dion
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
>> in the body of a message to majordomo@vger.kernel.org More majordomo
>> info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
> N�����r��y���b�X��ǧv�^�)޺{.n�+����{�n�߲)���w*\x1fjg���\x1e�����ݢj/���z�ޖ��2�ޙ���&�)ߡ�a��\x7f��\x1e�G���h�\x0f�j:+v���w�٥
>



  reply	other threads:[~2016-02-04  2:23 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-04  1:28 btrfs partition spontaneously corrupted - No recovery options. Kernel oops / "Kernel Bug"? Dion Gullotta
2016-02-04  1:41 ` Qu Wenruo
2016-02-04  1:53   ` Dion Gullotta
2016-02-04  2:23     ` Qu Wenruo [this message]
2016-02-04 11:58 ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56B2B60D.4000203@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=Dion.Gullotta@faredge.com.au \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).