* btrfs partition spontaneously corrupted - No recovery options. Kernel oops / "Kernel Bug"?
@ 2016-02-04 1:28 Dion Gullotta
2016-02-04 1:41 ` Qu Wenruo
2016-02-04 11:58 ` Duncan
0 siblings, 2 replies; 5+ messages in thread
From: Dion Gullotta @ 2016-02-04 1:28 UTC (permalink / raw)
To: linux-btrfs@vger.kernel.org
Hi,
We have a btrfs partition that was working fine up until last night whereupon it stopped working. The first thing I tried was rebooting the server, which got stuck on a hung mount process. I've tried every diagnostic and recovery option I can find online and nothing is working.
We did have regular snapshots being taken, and regular scrubbing was being performed as well. If you need any information I'm more than happy to provide.
The OS is ReadyNAS which is linux under the hood. Readynas OS version 6.2.4
Here are the relevant details:
Broken device is /dev/md127 which is usually mounted under /data
root@odin:/var/readynasd# uname -a
Linux odin 3.0.101.RN2120.3 #1 SMP Wed Apr 1 16:09:30 PDT 2015 armv7l GNU/Linux
root@odin:/var/readynasd# btrfs fi show
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
Csum didn't match
Couldn't read chunk root
Label: '2fe6230e:data' uuid: 04c95625-4927-4ade-80e7-de45a7536271
Total devices 1 FS bytes used 13.62TiB
devid 1 size 21.82TiB used 14.24TiB path /dev/md127
Btrfs v3.17.3
This is the relevant part of dmesg
udevd[862]: starting version 175
btrfs: device label 2fe6230e:data devid 1 transid 248531 /dev/md127
Adding 1047420k swap on /dev/md1. Priority:-1 extents:1 across:1047420k
BTRFS critical (device md127): unable to find logical 1357341392896 len 4096
kernel BUG at fs/btrfs/inode.c:1621!
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = f0260000
[00000000] *pgd=30015831, *pte=00000000, *ppte=00000000
Internal error: Oops: 817 [#1] SMP
Note the kernel bug and kernel oops lines.
I've tried the following things, results shown:
mount -o recovery /dev/md127 /data
mount -o ro,recovery /dev/md127 /data
mount -o ro /dev/md127 /data
All of these just hang and a reboot is necessary in order to kill the process.
Things that don't work:
root@odin:/tmp# btrfs-zero-log /dev/md127
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
Csum didn't match
Couldn't read chunk root
root@odin:/tmp# btrfs restore -F -i -D -v /dev/md127 /dev/null
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
Csum didn't match
Couldn't read chunk root
Could not open root, trying backup super
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
Csum didn't match
Couldn't read chunk root
Could not open root, trying backup super
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
Csum didn't match
Couldn't read chunk root
root@odin:/tmp# btrfs-find-root /dev/md127
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
Csum didn't match
Couldn't read chunk root
Open ctree failed
root@odin:/tmp# btrfsck /dev/md127
Couldn't open file system
oot@odin:/tmp# btrfs rescue super-recover -v /dev/md127
All Devices:
Device: id = 1, name = /dev/md127
Before Recovering:
[All good supers]:
device name = /dev/md127
superblock bytenr = 65536
device name = /dev/md127
superblock bytenr = 67108864
device name = /dev/md127
superblock bytenr = 274877906944
[All bad supers]:
All supers are valid, no need to recover
root@odin:/tmp# btrfs check /dev/md127
Couldn't open file system
root@odin:/tmp# btrfsck /dev/md127
Couldn't open file system
Other info
root@odin:/tmp# lsblk -o name,type,size,fstype,mountpoint
NAME TYPE SIZE FSTYPE MOUNTPOINT
mtdblock0 disk 1.5M
mtdblock1 disk 128K
mtdblock2 disk 6M
mtdblock3 disk 4M
mtdblock4 disk 116M
sda disk 7.3T
├─sda1 part 4G linux_raid_member
│ └─md0 raid1 4G ext4 /
├─sda2 part 512M linux_raid_member
│ └─md1 raid6 1022.9M swap [SWAP]
└─sda3 part 7.3T linux_raid_member
└─md127 raid5 21.8T btrfs
sdb disk 7.3T
├─sdb1 part 4G linux_raid_member
│ └─md0 raid1 4G ext4 /
├─sdb2 part 512M linux_raid_member
│ └─md1 raid6 1022.9M swap [SWAP]
└─sdb3 part 7.3T linux_raid_member
└─md127 raid5 21.8T btrfs
sdc disk 7.3T
├─sdc1 part 4G linux_raid_member
│ └─md0 raid1 4G ext4 /
├─sdc2 part 512M linux_raid_member
│ └─md1 raid6 1022.9M swap [SWAP]
└─sdc3 part 7.3T linux_raid_member
└─md127 raid5 21.8T btrfs
sdd disk 7.3T
├─sdd1 part 4G linux_raid_member
│ └─md0 raid1 4G ext4 /
├─sdd2 part 512M linux_raid_member
│ └─md1 raid6 1022.9M swap [SWAP]
└─sdd3 part 7.3T linux_raid_member
└─md127 raid5 21.8T btrfs
Disk health seems fine:
root@odin:/tmp# smartctl -a /dev/sda | grep PASSED
SMART overall-health self-assessment test result: PASSED
root@odin:/tmp# smartctl -a /dev/sdb | grep PASSED
SMART overall-health self-assessment test result: PASSED
root@odin:/tmp# smartctl -a /dev/sdc | grep PASSED
SMART overall-health self-assessment test result: PASSED
root@odin:/tmp# smartctl -a /dev/sdd | grep PASSED
SMART overall-health self-assessment test result: PASSED
Dion
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: btrfs partition spontaneously corrupted - No recovery options. Kernel oops / "Kernel Bug"? 2016-02-04 1:28 btrfs partition spontaneously corrupted - No recovery options. Kernel oops / "Kernel Bug"? Dion Gullotta @ 2016-02-04 1:41 ` Qu Wenruo 2016-02-04 1:53 ` Dion Gullotta 2016-02-04 11:58 ` Duncan 1 sibling, 1 reply; 5+ messages in thread From: Qu Wenruo @ 2016-02-04 1:41 UTC (permalink / raw) To: Dion Gullotta, linux-btrfs@vger.kernel.org Dion Gullotta wrote on 2016/02/04 12:28 +1100: > Hi, > > We have a btrfs partition that was working fine up until last night whereupon it stopped working. The first thing I tried was rebooting the server, which got stuck on a hung mount process. I've tried every diagnostic and recovery option I can find online and nothing is working. > > We did have regular snapshots being taken, and regular scrubbing was being performed as well. If you need any information I'm more than happy to provide. > > The OS is ReadyNAS which is linux under the hood. Readynas OS version 6.2.4 > > Here are the relevant details: > > Broken device is /dev/md127 which is usually mounted under /data > > root@odin:/var/readynasd# uname -a > Linux odin 3.0.101.RN2120.3 #1 SMP Wed Apr 1 16:09:30 PDT 2015 armv7l GNU/Linux > > root@odin:/var/readynasd# btrfs fi show > checksum verify failed on 18949527437312 found 4A677799 wanted CB641650 > checksum verify failed on 18949527437312 found 4A677799 wanted CB641650 > checksum verify failed on 18949527437312 found 4A677799 wanted CB641650 > checksum verify failed on 18949527437312 found 4A677799 wanted CB641650 > Csum didn't match > Couldn't read chunk root One of the most deadly corruption for current btrfs, chunk tree root corrupt. Normally, btrfs rescue chunk-recovery should be the correct tool to fix it, but several bug and some bad design makes chunk-recovery quite easy to crash, and not recover the fs. But you can alwasy try that tool. Other idea including try to use backup roots manually, but under most case it doesn't work as backup root is only up to 4 backups, which normally doesn't contain the needed chunk root. Thanks, Qu > Label: '2fe6230e:data' uuid: 04c95625-4927-4ade-80e7-de45a7536271 > Total devices 1 FS bytes used 13.62TiB > devid 1 size 21.82TiB used 14.24TiB path /dev/md127 > > Btrfs v3.17.3 > > This is the relevant part of dmesg > > udevd[862]: starting version 175 > btrfs: device label 2fe6230e:data devid 1 transid 248531 /dev/md127 > Adding 1047420k swap on /dev/md1. Priority:-1 extents:1 across:1047420k > BTRFS critical (device md127): unable to find logical 1357341392896 len 4096 > kernel BUG at fs/btrfs/inode.c:1621! > Unable to handle kernel NULL pointer dereference at virtual address 00000000 > pgd = f0260000 > [00000000] *pgd=30015831, *pte=00000000, *ppte=00000000 > Internal error: Oops: 817 [#1] SMP > > Note the kernel bug and kernel oops lines. > > > I've tried the following things, results shown: > > mount -o recovery /dev/md127 /data > > mount -o ro,recovery /dev/md127 /data > > mount -o ro /dev/md127 /data > > All of these just hang and a reboot is necessary in order to kill the process. > > > > Things that don't work: > > root@odin:/tmp# btrfs-zero-log /dev/md127 > checksum verify failed on 18949527437312 found 4A677799 wanted CB641650 > checksum verify failed on 18949527437312 found 4A677799 wanted CB641650 > checksum verify failed on 18949527437312 found 4A677799 wanted CB641650 > checksum verify failed on 18949527437312 found 4A677799 wanted CB641650 > Csum didn't match > Couldn't read chunk root > > > root@odin:/tmp# btrfs restore -F -i -D -v /dev/md127 /dev/null > checksum verify failed on 18949527437312 found 4A677799 wanted CB641650 > checksum verify failed on 18949527437312 found 4A677799 wanted CB641650 > checksum verify failed on 18949527437312 found 4A677799 wanted CB641650 > checksum verify failed on 18949527437312 found 4A677799 wanted CB641650 > Csum didn't match > Couldn't read chunk root > Could not open root, trying backup super > checksum verify failed on 18949527437312 found 4A677799 wanted CB641650 > checksum verify failed on 18949527437312 found 4A677799 wanted CB641650 > checksum verify failed on 18949527437312 found 4A677799 wanted CB641650 > checksum verify failed on 18949527437312 found 4A677799 wanted CB641650 > Csum didn't match > Couldn't read chunk root > Could not open root, trying backup super > checksum verify failed on 18949527437312 found 4A677799 wanted CB641650 > checksum verify failed on 18949527437312 found 4A677799 wanted CB641650 > checksum verify failed on 18949527437312 found 4A677799 wanted CB641650 > checksum verify failed on 18949527437312 found 4A677799 wanted CB641650 > Csum didn't match > Couldn't read chunk root > > > root@odin:/tmp# btrfs-find-root /dev/md127 > checksum verify failed on 18949527437312 found 4A677799 wanted CB641650 > checksum verify failed on 18949527437312 found 4A677799 wanted CB641650 > checksum verify failed on 18949527437312 found 4A677799 wanted CB641650 > checksum verify failed on 18949527437312 found 4A677799 wanted CB641650 > Csum didn't match > Couldn't read chunk root > Open ctree failed > > root@odin:/tmp# btrfsck /dev/md127 > Couldn't open file system > > oot@odin:/tmp# btrfs rescue super-recover -v /dev/md127 > All Devices: > Device: id = 1, name = /dev/md127 > > Before Recovering: > [All good supers]: > device name = /dev/md127 > superblock bytenr = 65536 > > device name = /dev/md127 > superblock bytenr = 67108864 > > device name = /dev/md127 > superblock bytenr = 274877906944 > > [All bad supers]: > > All supers are valid, no need to recover > > > root@odin:/tmp# btrfs check /dev/md127 > Couldn't open file system > root@odin:/tmp# btrfsck /dev/md127 > Couldn't open file system > > Other info > > root@odin:/tmp# lsblk -o name,type,size,fstype,mountpoint > NAME TYPE SIZE FSTYPE MOUNTPOINT > mtdblock0 disk 1.5M > mtdblock1 disk 128K > mtdblock2 disk 6M > mtdblock3 disk 4M > mtdblock4 disk 116M > sda disk 7.3T > ├─sda1 part 4G linux_raid_member > │ └─md0 raid1 4G ext4 / > ├─sda2 part 512M linux_raid_member > │ └─md1 raid6 1022.9M swap [SWAP] > └─sda3 part 7.3T linux_raid_member > └─md127 raid5 21.8T btrfs > sdb disk 7.3T > ├─sdb1 part 4G linux_raid_member > │ └─md0 raid1 4G ext4 / > ├─sdb2 part 512M linux_raid_member > │ └─md1 raid6 1022.9M swap [SWAP] > └─sdb3 part 7.3T linux_raid_member > └─md127 raid5 21.8T btrfs > sdc disk 7.3T > ├─sdc1 part 4G linux_raid_member > │ └─md0 raid1 4G ext4 / > ├─sdc2 part 512M linux_raid_member > │ └─md1 raid6 1022.9M swap [SWAP] > └─sdc3 part 7.3T linux_raid_member > └─md127 raid5 21.8T btrfs > sdd disk 7.3T > ├─sdd1 part 4G linux_raid_member > │ └─md0 raid1 4G ext4 / > ├─sdd2 part 512M linux_raid_member > │ └─md1 raid6 1022.9M swap [SWAP] > └─sdd3 part 7.3T linux_raid_member > └─md127 raid5 21.8T btrfs > > > Disk health seems fine: > root@odin:/tmp# smartctl -a /dev/sda | grep PASSED > SMART overall-health self-assessment test result: PASSED > root@odin:/tmp# smartctl -a /dev/sdb | grep PASSED > SMART overall-health self-assessment test result: PASSED > root@odin:/tmp# smartctl -a /dev/sdc | grep PASSED > SMART overall-health self-assessment test result: PASSED > root@odin:/tmp# smartctl -a /dev/sdd | grep PASSED > SMART overall-health self-assessment test result: PASSED > > > > > Dion > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: btrfs partition spontaneously corrupted - No recovery options. Kernel oops / "Kernel Bug"? 2016-02-04 1:41 ` Qu Wenruo @ 2016-02-04 1:53 ` Dion Gullotta 2016-02-04 2:23 ` Qu Wenruo 0 siblings, 1 reply; 5+ messages in thread From: Dion Gullotta @ 2016-02-04 1:53 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs@vger.kernel.org [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 8356 bytes --] Hi Qu, thanks so much for your fast reply. I'm running this right now and hoping for some good results: root@odin:/var/readynasd# btrfs rescue chunk-recover -vy /dev/md127 All Devices: Device: id = 1, name = /dev/md127 You said " Other idea including try to use backup roots manually" how do I do this? I tried btrfs-find-root but it doesn't find anything. Any further info appreciated. Cheers, Dion -----Original Message----- From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-owner@vger.kernel.org] On Behalf Of Qu Wenruo Sent: Thursday, 4 February 2016 12:42 PM To: Dion Gullotta <Dion.Gullotta@faredge.com.au>; linux-btrfs@vger.kernel.org Subject: Re: btrfs partition spontaneously corrupted - No recovery options. Kernel oops / "Kernel Bug"? Dion Gullotta wrote on 2016/02/04 12:28 +1100: > Hi, > > We have a btrfs partition that was working fine up until last night whereupon it stopped working. The first thing I tried was rebooting the server, which got stuck on a hung mount process. I've tried every diagnostic and recovery option I can find online and nothing is working. > > We did have regular snapshots being taken, and regular scrubbing was being performed as well. If you need any information I'm more than happy to provide. > > The OS is ReadyNAS which is linux under the hood. Readynas OS version > 6.2.4 > > Here are the relevant details: > > Broken device is /dev/md127 which is usually mounted under /data > > root@odin:/var/readynasd# uname -a > Linux odin 3.0.101.RN2120.3 #1 SMP Wed Apr 1 16:09:30 PDT 2015 armv7l > GNU/Linux > > root@odin:/var/readynasd# btrfs fi show checksum verify failed on > 18949527437312 found 4A677799 wanted CB641650 checksum verify failed > on 18949527437312 found 4A677799 wanted CB641650 checksum verify > failed on 18949527437312 found 4A677799 wanted CB641650 checksum > verify failed on 18949527437312 found 4A677799 wanted CB641650 Csum > didn't match Couldn't read chunk root One of the most deadly corruption for current btrfs, chunk tree root corrupt. Normally, btrfs rescue chunk-recovery should be the correct tool to fix it, but several bug and some bad design makes chunk-recovery quite easy to crash, and not recover the fs. But you can alwasy try that tool. Other idea including try to use backup roots manually, but under most case it doesn't work as backup root is only up to 4 backups, which normally doesn't contain the needed chunk root. Thanks, Qu > Label: '2fe6230e:data' uuid: 04c95625-4927-4ade-80e7-de45a7536271 > Total devices 1 FS bytes used 13.62TiB > devid 1 size 21.82TiB used 14.24TiB path /dev/md127 > > Btrfs v3.17.3 > > This is the relevant part of dmesg > > udevd[862]: starting version 175 > btrfs: device label 2fe6230e:data devid 1 transid 248531 /dev/md127 > Adding 1047420k swap on /dev/md1. Priority:-1 extents:1 > across:1047420k BTRFS critical (device md127): unable to find logical > 1357341392896 len 4096 kernel BUG at fs/btrfs/inode.c:1621! > Unable to handle kernel NULL pointer dereference at virtual address > 00000000 pgd = f0260000 [00000000] *pgd=30015831, *pte=00000000, > *ppte=00000000 Internal error: Oops: 817 [#1] SMP > > Note the kernel bug and kernel oops lines. > > > I've tried the following things, results shown: > > mount -o recovery /dev/md127 /data > > mount -o ro,recovery /dev/md127 /data > > mount -o ro /dev/md127 /data > > All of these just hang and a reboot is necessary in order to kill the process. > > > > Things that don't work: > > root@odin:/tmp# btrfs-zero-log /dev/md127 checksum verify failed on > 18949527437312 found 4A677799 wanted CB641650 checksum verify failed > on 18949527437312 found 4A677799 wanted CB641650 checksum verify > failed on 18949527437312 found 4A677799 wanted CB641650 checksum > verify failed on 18949527437312 found 4A677799 wanted CB641650 Csum > didn't match Couldn't read chunk root > > > root@odin:/tmp# btrfs restore -F -i -D -v /dev/md127 /dev/null > checksum verify failed on 18949527437312 found 4A677799 wanted > CB641650 checksum verify failed on 18949527437312 found 4A677799 > wanted CB641650 checksum verify failed on 18949527437312 found > 4A677799 wanted CB641650 checksum verify failed on 18949527437312 > found 4A677799 wanted CB641650 Csum didn't match Couldn't read chunk > root Could not open root, trying backup super checksum verify failed > on 18949527437312 found 4A677799 wanted CB641650 checksum verify > failed on 18949527437312 found 4A677799 wanted CB641650 checksum > verify failed on 18949527437312 found 4A677799 wanted CB641650 > checksum verify failed on 18949527437312 found 4A677799 wanted > CB641650 Csum didn't match Couldn't read chunk root Could not open > root, trying backup super checksum verify failed on 18949527437312 > found 4A677799 wanted CB641650 checksum verify failed on > 18949527437312 found 4A677799 wanted CB641650 checksum verify failed > on 18949527437312 found 4A677799 wanted CB641650 checksum verify > failed on 18949527437312 found 4A677799 wanted CB641650 Csum didn't > match Couldn't read chunk root > > > root@odin:/tmp# btrfs-find-root /dev/md127 checksum verify failed on > 18949527437312 found 4A677799 wanted CB641650 checksum verify failed > on 18949527437312 found 4A677799 wanted CB641650 checksum verify > failed on 18949527437312 found 4A677799 wanted CB641650 checksum > verify failed on 18949527437312 found 4A677799 wanted CB641650 Csum > didn't match Couldn't read chunk root Open ctree failed > > root@odin:/tmp# btrfsck /dev/md127 > Couldn't open file system > > oot@odin:/tmp# btrfs rescue super-recover -v /dev/md127 All Devices: > Device: id = 1, name = /dev/md127 > > Before Recovering: > [All good supers]: > device name = /dev/md127 > superblock bytenr = 65536 > > device name = /dev/md127 > superblock bytenr = 67108864 > > device name = /dev/md127 > superblock bytenr = 274877906944 > > [All bad supers]: > > All supers are valid, no need to recover > > > root@odin:/tmp# btrfs check /dev/md127 Couldn't open file system > root@odin:/tmp# btrfsck /dev/md127 Couldn't open file system > > Other info > > root@odin:/tmp# lsblk -o name,type,size,fstype,mountpoint NAME TYPE > SIZE FSTYPE MOUNTPOINT > mtdblock0 disk 1.5M > mtdblock1 disk 128K > mtdblock2 disk 6M > mtdblock3 disk 4M > mtdblock4 disk 116M > sda disk 7.3T > ââsda1 part 4G linux_raid_member > â ââmd0 raid1 4G ext4 / > ââsda2 part 512M linux_raid_member > â ââmd1 raid6 1022.9M swap [SWAP] > ââsda3 part 7.3T linux_raid_member > ââmd127 raid5 21.8T btrfs > sdb disk 7.3T > ââsdb1 part 4G linux_raid_member > â ââmd0 raid1 4G ext4 / > ââsdb2 part 512M linux_raid_member > â ââmd1 raid6 1022.9M swap [SWAP] > ââsdb3 part 7.3T linux_raid_member > ââmd127 raid5 21.8T btrfs > sdc disk 7.3T > ââsdc1 part 4G linux_raid_member > â ââmd0 raid1 4G ext4 / > ââsdc2 part 512M linux_raid_member > â ââmd1 raid6 1022.9M swap [SWAP] > ââsdc3 part 7.3T linux_raid_member > ââmd127 raid5 21.8T btrfs > sdd disk 7.3T > ââsdd1 part 4G linux_raid_member > â ââmd0 raid1 4G ext4 / > ââsdd2 part 512M linux_raid_member > â ââmd1 raid6 1022.9M swap [SWAP] > ââsdd3 part 7.3T linux_raid_member > ââmd127 raid5 21.8T btrfs > > > Disk health seems fine: > root@odin:/tmp# smartctl -a /dev/sda | grep PASSED SMART > overall-health self-assessment test result: PASSED root@odin:/tmp# > smartctl -a /dev/sdb | grep PASSED SMART overall-health > self-assessment test result: PASSED root@odin:/tmp# smartctl -a > /dev/sdc | grep PASSED SMART overall-health self-assessment test > result: PASSED root@odin:/tmp# smartctl -a /dev/sdd | grep PASSED > SMART overall-health self-assessment test result: PASSED > > > > > Dion > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" > in the body of a message to majordomo@vger.kernel.org More majordomo > info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±ý»k~ÏâØ^nr¡ö¦zË\x1aëh¨èÚ&£ûàz¿äz¹Þú+Ê+zf£¢·h§~Ûiÿÿïêÿêçz_è®\x0fæj:+v¨þ)ߣøm ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: btrfs partition spontaneously corrupted - No recovery options. Kernel oops / "Kernel Bug"? 2016-02-04 1:53 ` Dion Gullotta @ 2016-02-04 2:23 ` Qu Wenruo 0 siblings, 0 replies; 5+ messages in thread From: Qu Wenruo @ 2016-02-04 2:23 UTC (permalink / raw) To: Dion Gullotta, linux-btrfs@vger.kernel.org Dion Gullotta wrote on 2016/02/04 12:53 +1100: > Hi Qu, thanks so much for your fast reply. > > I'm running this right now and hoping for some good results: > > root@odin:/var/readynasd# btrfs rescue chunk-recover -vy /dev/md127 > All Devices: > Device: id = 1, name = /dev/md127 > > > You said " Other idea including try to use backup roots manually" how do I do this? I tried btrfs-find-root but it doesn't find anything. > Use btrfs-show-super -f. You'll see things like: ------ backup_roots[4]: backup 0: backup_tree_root: 29392896 gen: 6 level: 0 backup_chunk_root: 20987904 gen: 5 level: 0 backup_extent_root: 29409280 gen: 6 level: 0 backup_fs_root: 29360128 gen: 4 level: 0 backup_dev_root: 29507584 gen: 6 level: 0 backup_csum_root: 29425664 gen: 4 level: 0 backup_total_bytes: 10737418240 backup_bytes_used: 393216 backup_num_devices: 1 backup 1: backup_tree_root: 29540352 gen: 7 level: 0 backup_chunk_root: 20987904 gen: 5 level: 0 backup_extent_root: 29556736 gen: 7 level: 0 backup_fs_root: 29360128 gen: 4 level: 0 backup_dev_root: 29507584 gen: 6 level: 0 backup_csum_root: 29573120 gen: 7 level: 0 backup_total_bytes: 10737418240 backup_bytes_used: 409600 backup_num_devices: 1 ------ Find a backup_chunk_root whose gen is smaller than your current chunk_root, which is also shown in btrfs-show-super -f(before backup sections): ------ chunk_root_generation 5 <<< Here root_level 0 chunk_root 20987904 chunk_root_level 0 ------ But most case, the chunk changes are quite seldom, so no much luck though. Another way is use btrfs-find-root, which should find all old chunks. But the problem is, current btrfs-find-root can't handle chunk tree. So no luck either. Thanks, Qu > Any further info appreciated. > > Cheers, > Dion > > > -----Original Message----- > From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-owner@vger.kernel.org] On Behalf Of Qu Wenruo > Sent: Thursday, 4 February 2016 12:42 PM > To: Dion Gullotta <Dion.Gullotta@faredge.com.au>; linux-btrfs@vger.kernel.org > Subject: Re: btrfs partition spontaneously corrupted - No recovery options. Kernel oops / "Kernel Bug"? > > > > Dion Gullotta wrote on 2016/02/04 12:28 +1100: >> Hi, >> >> We have a btrfs partition that was working fine up until last night whereupon it stopped working. The first thing I tried was rebooting the server, which got stuck on a hung mount process. I've tried every diagnostic and recovery option I can find online and nothing is working. >> >> We did have regular snapshots being taken, and regular scrubbing was being performed as well. If you need any information I'm more than happy to provide. >> >> The OS is ReadyNAS which is linux under the hood. Readynas OS version >> 6.2.4 >> >> Here are the relevant details: >> >> Broken device is /dev/md127 which is usually mounted under /data >> >> root@odin:/var/readynasd# uname -a >> Linux odin 3.0.101.RN2120.3 #1 SMP Wed Apr 1 16:09:30 PDT 2015 armv7l >> GNU/Linux >> >> root@odin:/var/readynasd# btrfs fi show checksum verify failed on >> 18949527437312 found 4A677799 wanted CB641650 checksum verify failed >> on 18949527437312 found 4A677799 wanted CB641650 checksum verify >> failed on 18949527437312 found 4A677799 wanted CB641650 checksum >> verify failed on 18949527437312 found 4A677799 wanted CB641650 Csum >> didn't match Couldn't read chunk root > > One of the most deadly corruption for current btrfs, chunk tree root corrupt. > > Normally, btrfs rescue chunk-recovery should be the correct tool to fix it, but several bug and some bad design makes chunk-recovery quite easy to crash, and not recover the fs. > > But you can alwasy try that tool. > > Other idea including try to use backup roots manually, but under most case it doesn't work as backup root is only up to 4 backups, which normally doesn't contain the needed chunk root. > > Thanks, > Qu > > >> Label: '2fe6230e:data' uuid: 04c95625-4927-4ade-80e7-de45a7536271 >> Total devices 1 FS bytes used 13.62TiB >> devid 1 size 21.82TiB used 14.24TiB path /dev/md127 >> >> Btrfs v3.17.3 >> >> This is the relevant part of dmesg >> >> udevd[862]: starting version 175 >> btrfs: device label 2fe6230e:data devid 1 transid 248531 /dev/md127 >> Adding 1047420k swap on /dev/md1. Priority:-1 extents:1 >> across:1047420k BTRFS critical (device md127): unable to find logical >> 1357341392896 len 4096 kernel BUG at fs/btrfs/inode.c:1621! >> Unable to handle kernel NULL pointer dereference at virtual address >> 00000000 pgd = f0260000 [00000000] *pgd=30015831, *pte=00000000, >> *ppte=00000000 Internal error: Oops: 817 [#1] SMP >> >> Note the kernel bug and kernel oops lines. >> >> >> I've tried the following things, results shown: >> >> mount -o recovery /dev/md127 /data >> >> mount -o ro,recovery /dev/md127 /data >> >> mount -o ro /dev/md127 /data >> >> All of these just hang and a reboot is necessary in order to kill the process. >> >> >> >> Things that don't work: >> >> root@odin:/tmp# btrfs-zero-log /dev/md127 checksum verify failed on >> 18949527437312 found 4A677799 wanted CB641650 checksum verify failed >> on 18949527437312 found 4A677799 wanted CB641650 checksum verify >> failed on 18949527437312 found 4A677799 wanted CB641650 checksum >> verify failed on 18949527437312 found 4A677799 wanted CB641650 Csum >> didn't match Couldn't read chunk root >> >> >> root@odin:/tmp# btrfs restore -F -i -D -v /dev/md127 /dev/null >> checksum verify failed on 18949527437312 found 4A677799 wanted >> CB641650 checksum verify failed on 18949527437312 found 4A677799 >> wanted CB641650 checksum verify failed on 18949527437312 found >> 4A677799 wanted CB641650 checksum verify failed on 18949527437312 >> found 4A677799 wanted CB641650 Csum didn't match Couldn't read chunk >> root Could not open root, trying backup super checksum verify failed >> on 18949527437312 found 4A677799 wanted CB641650 checksum verify >> failed on 18949527437312 found 4A677799 wanted CB641650 checksum >> verify failed on 18949527437312 found 4A677799 wanted CB641650 >> checksum verify failed on 18949527437312 found 4A677799 wanted >> CB641650 Csum didn't match Couldn't read chunk root Could not open >> root, trying backup super checksum verify failed on 18949527437312 >> found 4A677799 wanted CB641650 checksum verify failed on >> 18949527437312 found 4A677799 wanted CB641650 checksum verify failed >> on 18949527437312 found 4A677799 wanted CB641650 checksum verify >> failed on 18949527437312 found 4A677799 wanted CB641650 Csum didn't >> match Couldn't read chunk root >> >> >> root@odin:/tmp# btrfs-find-root /dev/md127 checksum verify failed on >> 18949527437312 found 4A677799 wanted CB641650 checksum verify failed >> on 18949527437312 found 4A677799 wanted CB641650 checksum verify >> failed on 18949527437312 found 4A677799 wanted CB641650 checksum >> verify failed on 18949527437312 found 4A677799 wanted CB641650 Csum >> didn't match Couldn't read chunk root Open ctree failed >> >> root@odin:/tmp# btrfsck /dev/md127 >> Couldn't open file system >> >> oot@odin:/tmp# btrfs rescue super-recover -v /dev/md127 All Devices: >> Device: id = 1, name = /dev/md127 >> >> Before Recovering: >> [All good supers]: >> device name = /dev/md127 >> superblock bytenr = 65536 >> >> device name = /dev/md127 >> superblock bytenr = 67108864 >> >> device name = /dev/md127 >> superblock bytenr = 274877906944 >> >> [All bad supers]: >> >> All supers are valid, no need to recover >> >> >> root@odin:/tmp# btrfs check /dev/md127 Couldn't open file system >> root@odin:/tmp# btrfsck /dev/md127 Couldn't open file system >> >> Other info >> >> root@odin:/tmp# lsblk -o name,type,size,fstype,mountpoint NAME TYPE >> SIZE FSTYPE MOUNTPOINT >> mtdblock0 disk 1.5M >> mtdblock1 disk 128K >> mtdblock2 disk 6M >> mtdblock3 disk 4M >> mtdblock4 disk 116M >> sda disk 7.3T >> ├─sda1 part 4G linux_raid_member >> │ └─md0 raid1 4G ext4 / >> ├─sda2 part 512M linux_raid_member >> │ └─md1 raid6 1022.9M swap [SWAP] >> └─sda3 part 7.3T linux_raid_member >> └─md127 raid5 21.8T btrfs >> sdb disk 7.3T >> ├─sdb1 part 4G linux_raid_member >> │ └─md0 raid1 4G ext4 / >> ├─sdb2 part 512M linux_raid_member >> │ └─md1 raid6 1022.9M swap [SWAP] >> └─sdb3 part 7.3T linux_raid_member >> └─md127 raid5 21.8T btrfs >> sdc disk 7.3T >> ├─sdc1 part 4G linux_raid_member >> │ └─md0 raid1 4G ext4 / >> ├─sdc2 part 512M linux_raid_member >> │ └─md1 raid6 1022.9M swap [SWAP] >> └─sdc3 part 7.3T linux_raid_member >> └─md127 raid5 21.8T btrfs >> sdd disk 7.3T >> ├─sdd1 part 4G linux_raid_member >> │ └─md0 raid1 4G ext4 / >> ├─sdd2 part 512M linux_raid_member >> │ └─md1 raid6 1022.9M swap [SWAP] >> └─sdd3 part 7.3T linux_raid_member >> └─md127 raid5 21.8T btrfs >> >> >> Disk health seems fine: >> root@odin:/tmp# smartctl -a /dev/sda | grep PASSED SMART >> overall-health self-assessment test result: PASSED root@odin:/tmp# >> smartctl -a /dev/sdb | grep PASSED SMART overall-health >> self-assessment test result: PASSED root@odin:/tmp# smartctl -a >> /dev/sdc | grep PASSED SMART overall-health self-assessment test >> result: PASSED root@odin:/tmp# smartctl -a /dev/sdd | grep PASSED >> SMART overall-health self-assessment test result: PASSED >> >> >> >> >> Dion >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" >> in the body of a message to majordomo@vger.kernel.org More majordomo >> info at http://vger.kernel.org/majordomo-info.html >> >> > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html > N�����r��y���b�X��ǧv�^�){.n�+����{�n�߲)���w*\x1fjg���\x1e�����ݢj/���z�ޖ��2�ޙ���&�)ߡ�a��\x7f��\x1e�G���h�\x0f�j:+v���w�٥ > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: btrfs partition spontaneously corrupted - No recovery options. Kernel oops / "Kernel Bug"? 2016-02-04 1:28 btrfs partition spontaneously corrupted - No recovery options. Kernel oops / "Kernel Bug"? Dion Gullotta 2016-02-04 1:41 ` Qu Wenruo @ 2016-02-04 11:58 ` Duncan 1 sibling, 0 replies; 5+ messages in thread From: Duncan @ 2016-02-04 11:58 UTC (permalink / raw) To: linux-btrfs Dion Gullotta posted on Thu, 04 Feb 2016 12:28:09 +1100 as excerpted: > root@odin:/var/readynasd# uname -a > Linux odin 3.0.101.RN2120.3 #1 SMP > Wed Apr 1 16:09:30 PDT 2015 armv7l GNU/Linux Qu's helping you on the practical side, as a dev, far better than I could as a user, but here's a bit of additional higher-level information for your consideration. That's an incredibly ancient kernel, in btrfs terms. Btrfs didn't have the experimental tag stripped until 3.12, which is already seriously ancient in btrfs terms, and at least nominally that's a 3.0 kernel, not only half a decade old now (four years of 3.x kernels plus a year of 4.x, at five releases/year), but more than two years back before the experimental label came off in 3.12! Of course the comparatively (still nearly a year old, tho) recent April, 2015 build date does hint that it's likely a heavily backport-patched kernel, but what btrfs patches have been backported and which ones haven't, probably only the project devs are tracking, and it's nothing we'd know here, targeting mainstream, where for btrfs at least, that's a very old and heavily experimental btrfs kernel indeed! General recommendations here, in view of the fact that btrfs, while no longer experimental, is still under heavy development, is to keep to the latest couple of release series, either current or LTS. With 4.4 out as an LTS, that would be 4.4 and 4.1 as LTS kernel series, and 4.4 and 4.3 as current kernel series, tho with 4.4 so new in LTS terms and btrfs stability and maturity still developing, the previous LTS, 3.18, is still somewhat supported and may continue to be, making it three LTS series. But certainly, before 3.18 LTS, while we do still try to help, development remains fast enough that it's simply old, and unlikely to be well supported at all simply for practical reasons. So you may want to either upgrade to at /least/ the 3.18 LTS series and preferably at least 4.1 LTS if you're continuing to run btrfs, *OR* get support from your distro, as presumably if they're still running and choosing to support then highly experimental btrfs on such nominally old and seriously experimental btrfs kernels, they have good reasons, and they may be better positioned to provide btrfs support on something that ancient than this list is, *OR* if you have good reason to continue on such old kernels, I'd strongly urge you to reconsider whether btrfs, particularly at the experimental level it was back in kernel 3.0, is an appropriate choice for such long-term-stable projects as that appears to be, using half-decade-old kernels with then still highly experimental btrfs. It's very likely in the latter case, that a fully stable and mature filesystem such as ext3/4 (was ext4 even really stable yet a half decade ago? ext3 may be better on a 3.0 kernel, I'm not sure), or the reiserfs I used for years and have had very good experience with (at least since the data=ordered default was introduced a decade or so ago), is much more suitable for that use-case, than something like the still stabilizING and maturING even in current kernels btrfs. Now your btrfs userspace tools are rather more current at 3.17, but even that's relatively old, now, the rule of thumb for userspace being to keep its version at least matching the kernel version, assuming it's kept reasonably current, which would again mean 3.18 series at the oldest, that being the oldest LTS series really supported, and again, preferably newer than that, 4.1 series or newer, up to the current 4.4, as current userspace can normally be used with older kernels without issue except for mkfs.btrfs, where you'll want to specify options to be compatible with older kernels that didn't have code for newer on-device formats, yet. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-02-04 11:58 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-02-04 1:28 btrfs partition spontaneously corrupted - No recovery options. Kernel oops / "Kernel Bug"? Dion Gullotta 2016-02-04 1:41 ` Qu Wenruo 2016-02-04 1:53 ` Dion Gullotta 2016-02-04 2:23 ` Qu Wenruo 2016-02-04 11:58 ` Duncan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).