linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* btrfs partition spontaneously corrupted - No recovery options. Kernel oops / "Kernel Bug"?
@ 2016-02-04  1:28 Dion Gullotta
  2016-02-04  1:41 ` Qu Wenruo
  2016-02-04 11:58 ` Duncan
  0 siblings, 2 replies; 5+ messages in thread
From: Dion Gullotta @ 2016-02-04  1:28 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

Hi,

We have a btrfs partition that was working fine up until last night whereupon it stopped working. The first thing I tried was rebooting the server, which got stuck on a hung mount process. I've tried every diagnostic and recovery option I can find online and nothing is working.

We did have regular snapshots being taken, and regular scrubbing was being performed as well. If you need any information I'm more than happy to provide. 

The OS is ReadyNAS which is linux under the hood. Readynas OS version 6.2.4 

Here are the relevant details:

Broken device is /dev/md127 which is usually mounted under /data

root@odin:/var/readynasd# uname -a
Linux odin 3.0.101.RN2120.3 #1 SMP Wed Apr 1 16:09:30 PDT 2015 armv7l GNU/Linux

root@odin:/var/readynasd# btrfs fi show
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
Csum didn't match
Couldn't read chunk root
Label: '2fe6230e:data'  uuid: 04c95625-4927-4ade-80e7-de45a7536271
        Total devices 1 FS bytes used 13.62TiB
        devid    1 size 21.82TiB used 14.24TiB path /dev/md127

Btrfs v3.17.3

This is the relevant part of dmesg

udevd[862]: starting version 175
btrfs: device label 2fe6230e:data devid 1 transid 248531 /dev/md127
Adding 1047420k swap on /dev/md1. Priority:-1 extents:1 across:1047420k
BTRFS critical (device md127): unable to find logical 1357341392896 len 4096
kernel BUG at fs/btrfs/inode.c:1621!
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = f0260000
[00000000] *pgd=30015831, *pte=00000000, *ppte=00000000
Internal error: Oops: 817 [#1] SMP

Note the kernel bug and kernel oops lines.


I've tried the following things, results shown:

mount -o recovery /dev/md127 /data
 
mount -o ro,recovery /dev/md127 /data
 
mount -o ro /dev/md127 /data
 
All of these just hang and a reboot is necessary in order to kill the process.



Things that don't work:
 
root@odin:/tmp# btrfs-zero-log /dev/md127
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
Csum didn't match
Couldn't read chunk root
 
 
root@odin:/tmp# btrfs restore -F -i -D -v /dev/md127 /dev/null
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
Csum didn't match
Couldn't read chunk root
Could not open root, trying backup super
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
Csum didn't match
Couldn't read chunk root
Could not open root, trying backup super
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
Csum didn't match
Couldn't read chunk root
 
 
root@odin:/tmp# btrfs-find-root /dev/md127
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
Csum didn't match
Couldn't read chunk root
Open ctree failed
 
root@odin:/tmp# btrfsck /dev/md127
Couldn't open file system

oot@odin:/tmp# btrfs rescue super-recover -v /dev/md127
All Devices:
Device: id = 1, name = /dev/md127
 
Before Recovering:
[All good supers]:
device name = /dev/md127
superblock bytenr = 65536
 
device name = /dev/md127
superblock bytenr = 67108864
 
device name = /dev/md127
superblock bytenr = 274877906944
 
[All bad supers]:
 
All supers are valid, no need to recover
 
 
root@odin:/tmp# btrfs check /dev/md127
Couldn't open file system
root@odin:/tmp# btrfsck /dev/md127
Couldn't open file system

Other info

root@odin:/tmp# lsblk -o name,type,size,fstype,mountpoint
NAME TYPE SIZE FSTYPE MOUNTPOINT
mtdblock0 disk 1.5M
mtdblock1 disk 128K
mtdblock2 disk 6M
mtdblock3 disk 4M
mtdblock4 disk 116M
sda disk 7.3T
├─sda1 part 4G linux_raid_member
│ └─md0 raid1 4G ext4 /
├─sda2 part 512M linux_raid_member
│ └─md1 raid6 1022.9M swap [SWAP]
└─sda3 part 7.3T linux_raid_member
└─md127 raid5 21.8T btrfs
sdb disk 7.3T
├─sdb1 part 4G linux_raid_member
│ └─md0 raid1 4G ext4 /
├─sdb2 part 512M linux_raid_member
│ └─md1 raid6 1022.9M swap [SWAP]
└─sdb3 part 7.3T linux_raid_member
└─md127 raid5 21.8T btrfs
sdc disk 7.3T
├─sdc1 part 4G linux_raid_member
│ └─md0 raid1 4G ext4 /
├─sdc2 part 512M linux_raid_member
│ └─md1 raid6 1022.9M swap [SWAP]
└─sdc3 part 7.3T linux_raid_member
└─md127 raid5 21.8T btrfs
sdd disk 7.3T
├─sdd1 part 4G linux_raid_member
│ └─md0 raid1 4G ext4 /
├─sdd2 part 512M linux_raid_member
│ └─md1 raid6 1022.9M swap [SWAP]
└─sdd3 part 7.3T linux_raid_member
└─md127 raid5 21.8T btrfs


 Disk health seems fine:
root@odin:/tmp# smartctl -a /dev/sda | grep PASSED
SMART overall-health self-assessment test result: PASSED
root@odin:/tmp# smartctl -a /dev/sdb | grep PASSED
SMART overall-health self-assessment test result: PASSED
root@odin:/tmp# smartctl -a /dev/sdc | grep PASSED
SMART overall-health self-assessment test result: PASSED
root@odin:/tmp# smartctl -a /dev/sdd | grep PASSED
SMART overall-health self-assessment test result: PASSED




Dion 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: btrfs partition spontaneously corrupted - No recovery options. Kernel oops / "Kernel Bug"?
  2016-02-04  1:28 btrfs partition spontaneously corrupted - No recovery options. Kernel oops / "Kernel Bug"? Dion Gullotta
@ 2016-02-04  1:41 ` Qu Wenruo
  2016-02-04  1:53   ` Dion Gullotta
  2016-02-04 11:58 ` Duncan
  1 sibling, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2016-02-04  1:41 UTC (permalink / raw)
  To: Dion Gullotta, linux-btrfs@vger.kernel.org



Dion Gullotta wrote on 2016/02/04 12:28 +1100:
> Hi,
>
> We have a btrfs partition that was working fine up until last night whereupon it stopped working. The first thing I tried was rebooting the server, which got stuck on a hung mount process. I've tried every diagnostic and recovery option I can find online and nothing is working.
>
> We did have regular snapshots being taken, and regular scrubbing was being performed as well. If you need any information I'm more than happy to provide.
>
> The OS is ReadyNAS which is linux under the hood. Readynas OS version 6.2.4
>
> Here are the relevant details:
>
> Broken device is /dev/md127 which is usually mounted under /data
>
> root@odin:/var/readynasd# uname -a
> Linux odin 3.0.101.RN2120.3 #1 SMP Wed Apr 1 16:09:30 PDT 2015 armv7l GNU/Linux
>
> root@odin:/var/readynasd# btrfs fi show
> checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
> checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
> checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
> checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
> Csum didn't match
> Couldn't read chunk root

One of the most deadly corruption for current btrfs, chunk tree root 
corrupt.

Normally, btrfs rescue chunk-recovery should be the correct tool to fix 
it, but several bug and some bad design makes chunk-recovery quite easy 
to crash, and not recover the fs.

But you can alwasy try that tool.

Other idea including try to use backup roots manually, but under most 
case it doesn't work as backup root is only up to 4 backups, which 
normally doesn't contain the needed chunk root.

Thanks,
Qu


> Label: '2fe6230e:data'  uuid: 04c95625-4927-4ade-80e7-de45a7536271
>          Total devices 1 FS bytes used 13.62TiB
>          devid    1 size 21.82TiB used 14.24TiB path /dev/md127
>
> Btrfs v3.17.3
>
> This is the relevant part of dmesg
>
> udevd[862]: starting version 175
> btrfs: device label 2fe6230e:data devid 1 transid 248531 /dev/md127
> Adding 1047420k swap on /dev/md1. Priority:-1 extents:1 across:1047420k
> BTRFS critical (device md127): unable to find logical 1357341392896 len 4096
> kernel BUG at fs/btrfs/inode.c:1621!
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
> pgd = f0260000
> [00000000] *pgd=30015831, *pte=00000000, *ppte=00000000
> Internal error: Oops: 817 [#1] SMP
>
> Note the kernel bug and kernel oops lines.
>
>
> I've tried the following things, results shown:
>
> mount -o recovery /dev/md127 /data
>
> mount -o ro,recovery /dev/md127 /data
>
> mount -o ro /dev/md127 /data
>
> All of these just hang and a reboot is necessary in order to kill the process.
>
>
>
> Things that don't work:
>
> root@odin:/tmp# btrfs-zero-log /dev/md127
> checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
> checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
> checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
> checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
> Csum didn't match
> Couldn't read chunk root
>
>
> root@odin:/tmp# btrfs restore -F -i -D -v /dev/md127 /dev/null
> checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
> checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
> checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
> checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
> Csum didn't match
> Couldn't read chunk root
> Could not open root, trying backup super
> checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
> checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
> checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
> checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
> Csum didn't match
> Couldn't read chunk root
> Could not open root, trying backup super
> checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
> checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
> checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
> checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
> Csum didn't match
> Couldn't read chunk root
>
>
> root@odin:/tmp# btrfs-find-root /dev/md127
> checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
> checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
> checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
> checksum verify failed on 18949527437312 found 4A677799 wanted CB641650
> Csum didn't match
> Couldn't read chunk root
> Open ctree failed
>
> root@odin:/tmp# btrfsck /dev/md127
> Couldn't open file system
>
> oot@odin:/tmp# btrfs rescue super-recover -v /dev/md127
> All Devices:
> Device: id = 1, name = /dev/md127
>
> Before Recovering:
> [All good supers]:
> device name = /dev/md127
> superblock bytenr = 65536
>
> device name = /dev/md127
> superblock bytenr = 67108864
>
> device name = /dev/md127
> superblock bytenr = 274877906944
>
> [All bad supers]:
>
> All supers are valid, no need to recover
>
>
> root@odin:/tmp# btrfs check /dev/md127
> Couldn't open file system
> root@odin:/tmp# btrfsck /dev/md127
> Couldn't open file system
>
> Other info
>
> root@odin:/tmp# lsblk -o name,type,size,fstype,mountpoint
> NAME TYPE SIZE FSTYPE MOUNTPOINT
> mtdblock0 disk 1.5M
> mtdblock1 disk 128K
> mtdblock2 disk 6M
> mtdblock3 disk 4M
> mtdblock4 disk 116M
> sda disk 7.3T
> ├─sda1 part 4G linux_raid_member
> │ └─md0 raid1 4G ext4 /
> ├─sda2 part 512M linux_raid_member
> │ └─md1 raid6 1022.9M swap [SWAP]
> └─sda3 part 7.3T linux_raid_member
> └─md127 raid5 21.8T btrfs
> sdb disk 7.3T
> ├─sdb1 part 4G linux_raid_member
> │ └─md0 raid1 4G ext4 /
> ├─sdb2 part 512M linux_raid_member
> │ └─md1 raid6 1022.9M swap [SWAP]
> └─sdb3 part 7.3T linux_raid_member
> └─md127 raid5 21.8T btrfs
> sdc disk 7.3T
> ├─sdc1 part 4G linux_raid_member
> │ └─md0 raid1 4G ext4 /
> ├─sdc2 part 512M linux_raid_member
> │ └─md1 raid6 1022.9M swap [SWAP]
> └─sdc3 part 7.3T linux_raid_member
> └─md127 raid5 21.8T btrfs
> sdd disk 7.3T
> ├─sdd1 part 4G linux_raid_member
> │ └─md0 raid1 4G ext4 /
> ├─sdd2 part 512M linux_raid_member
> │ └─md1 raid6 1022.9M swap [SWAP]
> └─sdd3 part 7.3T linux_raid_member
> └─md127 raid5 21.8T btrfs
>
>
>   Disk health seems fine:
> root@odin:/tmp# smartctl -a /dev/sda | grep PASSED
> SMART overall-health self-assessment test result: PASSED
> root@odin:/tmp# smartctl -a /dev/sdb | grep PASSED
> SMART overall-health self-assessment test result: PASSED
> root@odin:/tmp# smartctl -a /dev/sdc | grep PASSED
> SMART overall-health self-assessment test result: PASSED
> root@odin:/tmp# smartctl -a /dev/sdd | grep PASSED
> SMART overall-health self-assessment test result: PASSED
>
>
>
>
> Dion
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: btrfs partition spontaneously corrupted - No recovery options. Kernel oops / "Kernel Bug"?
  2016-02-04  1:41 ` Qu Wenruo
@ 2016-02-04  1:53   ` Dion Gullotta
  2016-02-04  2:23     ` Qu Wenruo
  0 siblings, 1 reply; 5+ messages in thread
From: Dion Gullotta @ 2016-02-04  1:53 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs@vger.kernel.org

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 8356 bytes --]

Hi Qu, thanks so much for your fast reply. 

I'm running this right now and hoping for some good results:

root@odin:/var/readynasd# btrfs rescue chunk-recover -vy /dev/md127
All Devices:
        Device: id = 1, name = /dev/md127


You said " Other idea including try to use backup roots manually" how do I do this? I tried btrfs-find-root but it doesn't find anything. 

Any further info appreciated.

Cheers,
Dion


-----Original Message-----
From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-owner@vger.kernel.org] On Behalf Of Qu Wenruo
Sent: Thursday, 4 February 2016 12:42 PM
To: Dion Gullotta <Dion.Gullotta@faredge.com.au>; linux-btrfs@vger.kernel.org
Subject: Re: btrfs partition spontaneously corrupted - No recovery options. Kernel oops / "Kernel Bug"?



Dion Gullotta wrote on 2016/02/04 12:28 +1100:
> Hi,
>
> We have a btrfs partition that was working fine up until last night whereupon it stopped working. The first thing I tried was rebooting the server, which got stuck on a hung mount process. I've tried every diagnostic and recovery option I can find online and nothing is working.
>
> We did have regular snapshots being taken, and regular scrubbing was being performed as well. If you need any information I'm more than happy to provide.
>
> The OS is ReadyNAS which is linux under the hood. Readynas OS version 
> 6.2.4
>
> Here are the relevant details:
>
> Broken device is /dev/md127 which is usually mounted under /data
>
> root@odin:/var/readynasd# uname -a
> Linux odin 3.0.101.RN2120.3 #1 SMP Wed Apr 1 16:09:30 PDT 2015 armv7l 
> GNU/Linux
>
> root@odin:/var/readynasd# btrfs fi show checksum verify failed on 
> 18949527437312 found 4A677799 wanted CB641650 checksum verify failed 
> on 18949527437312 found 4A677799 wanted CB641650 checksum verify 
> failed on 18949527437312 found 4A677799 wanted CB641650 checksum 
> verify failed on 18949527437312 found 4A677799 wanted CB641650 Csum 
> didn't match Couldn't read chunk root

One of the most deadly corruption for current btrfs, chunk tree root corrupt.

Normally, btrfs rescue chunk-recovery should be the correct tool to fix it, but several bug and some bad design makes chunk-recovery quite easy to crash, and not recover the fs.

But you can alwasy try that tool.

Other idea including try to use backup roots manually, but under most case it doesn't work as backup root is only up to 4 backups, which normally doesn't contain the needed chunk root.

Thanks,
Qu


> Label: '2fe6230e:data'  uuid: 04c95625-4927-4ade-80e7-de45a7536271
>          Total devices 1 FS bytes used 13.62TiB
>          devid    1 size 21.82TiB used 14.24TiB path /dev/md127
>
> Btrfs v3.17.3
>
> This is the relevant part of dmesg
>
> udevd[862]: starting version 175
> btrfs: device label 2fe6230e:data devid 1 transid 248531 /dev/md127 
> Adding 1047420k swap on /dev/md1. Priority:-1 extents:1 
> across:1047420k BTRFS critical (device md127): unable to find logical 
> 1357341392896 len 4096 kernel BUG at fs/btrfs/inode.c:1621!
> Unable to handle kernel NULL pointer dereference at virtual address 
> 00000000 pgd = f0260000 [00000000] *pgd=30015831, *pte=00000000, 
> *ppte=00000000 Internal error: Oops: 817 [#1] SMP
>
> Note the kernel bug and kernel oops lines.
>
>
> I've tried the following things, results shown:
>
> mount -o recovery /dev/md127 /data
>
> mount -o ro,recovery /dev/md127 /data
>
> mount -o ro /dev/md127 /data
>
> All of these just hang and a reboot is necessary in order to kill the process.
>
>
>
> Things that don't work:
>
> root@odin:/tmp# btrfs-zero-log /dev/md127 checksum verify failed on 
> 18949527437312 found 4A677799 wanted CB641650 checksum verify failed 
> on 18949527437312 found 4A677799 wanted CB641650 checksum verify 
> failed on 18949527437312 found 4A677799 wanted CB641650 checksum 
> verify failed on 18949527437312 found 4A677799 wanted CB641650 Csum 
> didn't match Couldn't read chunk root
>
>
> root@odin:/tmp# btrfs restore -F -i -D -v /dev/md127 /dev/null 
> checksum verify failed on 18949527437312 found 4A677799 wanted 
> CB641650 checksum verify failed on 18949527437312 found 4A677799 
> wanted CB641650 checksum verify failed on 18949527437312 found 
> 4A677799 wanted CB641650 checksum verify failed on 18949527437312 
> found 4A677799 wanted CB641650 Csum didn't match Couldn't read chunk 
> root Could not open root, trying backup super checksum verify failed 
> on 18949527437312 found 4A677799 wanted CB641650 checksum verify 
> failed on 18949527437312 found 4A677799 wanted CB641650 checksum 
> verify failed on 18949527437312 found 4A677799 wanted CB641650 
> checksum verify failed on 18949527437312 found 4A677799 wanted 
> CB641650 Csum didn't match Couldn't read chunk root Could not open 
> root, trying backup super checksum verify failed on 18949527437312 
> found 4A677799 wanted CB641650 checksum verify failed on 
> 18949527437312 found 4A677799 wanted CB641650 checksum verify failed 
> on 18949527437312 found 4A677799 wanted CB641650 checksum verify 
> failed on 18949527437312 found 4A677799 wanted CB641650 Csum didn't 
> match Couldn't read chunk root
>
>
> root@odin:/tmp# btrfs-find-root /dev/md127 checksum verify failed on 
> 18949527437312 found 4A677799 wanted CB641650 checksum verify failed 
> on 18949527437312 found 4A677799 wanted CB641650 checksum verify 
> failed on 18949527437312 found 4A677799 wanted CB641650 checksum 
> verify failed on 18949527437312 found 4A677799 wanted CB641650 Csum 
> didn't match Couldn't read chunk root Open ctree failed
>
> root@odin:/tmp# btrfsck /dev/md127
> Couldn't open file system
>
> oot@odin:/tmp# btrfs rescue super-recover -v /dev/md127 All Devices:
> Device: id = 1, name = /dev/md127
>
> Before Recovering:
> [All good supers]:
> device name = /dev/md127
> superblock bytenr = 65536
>
> device name = /dev/md127
> superblock bytenr = 67108864
>
> device name = /dev/md127
> superblock bytenr = 274877906944
>
> [All bad supers]:
>
> All supers are valid, no need to recover
>
>
> root@odin:/tmp# btrfs check /dev/md127 Couldn't open file system 
> root@odin:/tmp# btrfsck /dev/md127 Couldn't open file system
>
> Other info
>
> root@odin:/tmp# lsblk -o name,type,size,fstype,mountpoint NAME TYPE 
> SIZE FSTYPE MOUNTPOINT
> mtdblock0 disk 1.5M
> mtdblock1 disk 128K
> mtdblock2 disk 6M
> mtdblock3 disk 4M
> mtdblock4 disk 116M
> sda disk 7.3T
> ├─sda1 part 4G linux_raid_member
> │ └─md0 raid1 4G ext4 /
> ├─sda2 part 512M linux_raid_member
> │ └─md1 raid6 1022.9M swap [SWAP]
> └─sda3 part 7.3T linux_raid_member
> └─md127 raid5 21.8T btrfs
> sdb disk 7.3T
> ├─sdb1 part 4G linux_raid_member
> │ └─md0 raid1 4G ext4 /
> ├─sdb2 part 512M linux_raid_member
> │ └─md1 raid6 1022.9M swap [SWAP]
> └─sdb3 part 7.3T linux_raid_member
> └─md127 raid5 21.8T btrfs
> sdc disk 7.3T
> ├─sdc1 part 4G linux_raid_member
> │ └─md0 raid1 4G ext4 /
> ├─sdc2 part 512M linux_raid_member
> │ └─md1 raid6 1022.9M swap [SWAP]
> └─sdc3 part 7.3T linux_raid_member
> └─md127 raid5 21.8T btrfs
> sdd disk 7.3T
> ├─sdd1 part 4G linux_raid_member
> │ └─md0 raid1 4G ext4 /
> ├─sdd2 part 512M linux_raid_member
> │ └─md1 raid6 1022.9M swap [SWAP]
> └─sdd3 part 7.3T linux_raid_member
> └─md127 raid5 21.8T btrfs
>
>
>   Disk health seems fine:
> root@odin:/tmp# smartctl -a /dev/sda | grep PASSED SMART 
> overall-health self-assessment test result: PASSED root@odin:/tmp# 
> smartctl -a /dev/sdb | grep PASSED SMART overall-health 
> self-assessment test result: PASSED root@odin:/tmp# smartctl -a 
> /dev/sdc | grep PASSED SMART overall-health self-assessment test 
> result: PASSED root@odin:/tmp# smartctl -a /dev/sdd | grep PASSED 
> SMART overall-health self-assessment test result: PASSED
>
>
>
>
> Dion
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: btrfs partition spontaneously corrupted - No recovery options. Kernel oops / "Kernel Bug"?
  2016-02-04  1:53   ` Dion Gullotta
@ 2016-02-04  2:23     ` Qu Wenruo
  0 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2016-02-04  2:23 UTC (permalink / raw)
  To: Dion Gullotta, linux-btrfs@vger.kernel.org



Dion Gullotta wrote on 2016/02/04 12:53 +1100:
> Hi Qu, thanks so much for your fast reply.
>
> I'm running this right now and hoping for some good results:
>
> root@odin:/var/readynasd# btrfs rescue chunk-recover -vy /dev/md127
> All Devices:
>          Device: id = 1, name = /dev/md127
>
>
> You said " Other idea including try to use backup roots manually" how do I do this? I tried btrfs-find-root but it doesn't find anything.
>

Use btrfs-show-super -f.
You'll see things like:
------
backup_roots[4]:
	backup 0:
		backup_tree_root:	29392896	gen: 6	level: 0
		backup_chunk_root:	20987904	gen: 5	level: 0
		backup_extent_root:	29409280	gen: 6	level: 0
		backup_fs_root:		29360128	gen: 4	level: 0
		backup_dev_root:	29507584	gen: 6	level: 0
		backup_csum_root:	29425664	gen: 4	level: 0
		backup_total_bytes:	10737418240
		backup_bytes_used:	393216
		backup_num_devices:	1

	backup 1:
		backup_tree_root:	29540352	gen: 7	level: 0
		backup_chunk_root:	20987904	gen: 5	level: 0
		backup_extent_root:	29556736	gen: 7	level: 0
		backup_fs_root:		29360128	gen: 4	level: 0
		backup_dev_root:	29507584	gen: 6	level: 0
		backup_csum_root:	29573120	gen: 7	level: 0
		backup_total_bytes:	10737418240
		backup_bytes_used:	409600
		backup_num_devices:	1
------

Find a backup_chunk_root whose gen is smaller than your current 
chunk_root, which is also shown in btrfs-show-super -f(before backup 
sections):
------
chunk_root_generation	5 <<< Here
root_level		0
chunk_root		20987904
chunk_root_level	0
------

But most case, the chunk changes are quite seldom, so no much luck though.

Another way is use btrfs-find-root, which should find all old chunks.
But the problem is, current btrfs-find-root can't handle chunk tree.
So no luck either.

Thanks,
Qu

> Any further info appreciated.
>
> Cheers,
> Dion
>
>
> -----Original Message-----
> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-owner@vger.kernel.org] On Behalf Of Qu Wenruo
> Sent: Thursday, 4 February 2016 12:42 PM
> To: Dion Gullotta <Dion.Gullotta@faredge.com.au>; linux-btrfs@vger.kernel.org
> Subject: Re: btrfs partition spontaneously corrupted - No recovery options. Kernel oops / "Kernel Bug"?
>
>
>
> Dion Gullotta wrote on 2016/02/04 12:28 +1100:
>> Hi,
>>
>> We have a btrfs partition that was working fine up until last night whereupon it stopped working. The first thing I tried was rebooting the server, which got stuck on a hung mount process. I've tried every diagnostic and recovery option I can find online and nothing is working.
>>
>> We did have regular snapshots being taken, and regular scrubbing was being performed as well. If you need any information I'm more than happy to provide.
>>
>> The OS is ReadyNAS which is linux under the hood. Readynas OS version
>> 6.2.4
>>
>> Here are the relevant details:
>>
>> Broken device is /dev/md127 which is usually mounted under /data
>>
>> root@odin:/var/readynasd# uname -a
>> Linux odin 3.0.101.RN2120.3 #1 SMP Wed Apr 1 16:09:30 PDT 2015 armv7l
>> GNU/Linux
>>
>> root@odin:/var/readynasd# btrfs fi show checksum verify failed on
>> 18949527437312 found 4A677799 wanted CB641650 checksum verify failed
>> on 18949527437312 found 4A677799 wanted CB641650 checksum verify
>> failed on 18949527437312 found 4A677799 wanted CB641650 checksum
>> verify failed on 18949527437312 found 4A677799 wanted CB641650 Csum
>> didn't match Couldn't read chunk root
>
> One of the most deadly corruption for current btrfs, chunk tree root corrupt.
>
> Normally, btrfs rescue chunk-recovery should be the correct tool to fix it, but several bug and some bad design makes chunk-recovery quite easy to crash, and not recover the fs.
>
> But you can alwasy try that tool.
>
> Other idea including try to use backup roots manually, but under most case it doesn't work as backup root is only up to 4 backups, which normally doesn't contain the needed chunk root.
>
> Thanks,
> Qu
>
>
>> Label: '2fe6230e:data'  uuid: 04c95625-4927-4ade-80e7-de45a7536271
>>           Total devices 1 FS bytes used 13.62TiB
>>           devid    1 size 21.82TiB used 14.24TiB path /dev/md127
>>
>> Btrfs v3.17.3
>>
>> This is the relevant part of dmesg
>>
>> udevd[862]: starting version 175
>> btrfs: device label 2fe6230e:data devid 1 transid 248531 /dev/md127
>> Adding 1047420k swap on /dev/md1. Priority:-1 extents:1
>> across:1047420k BTRFS critical (device md127): unable to find logical
>> 1357341392896 len 4096 kernel BUG at fs/btrfs/inode.c:1621!
>> Unable to handle kernel NULL pointer dereference at virtual address
>> 00000000 pgd = f0260000 [00000000] *pgd=30015831, *pte=00000000,
>> *ppte=00000000 Internal error: Oops: 817 [#1] SMP
>>
>> Note the kernel bug and kernel oops lines.
>>
>>
>> I've tried the following things, results shown:
>>
>> mount -o recovery /dev/md127 /data
>>
>> mount -o ro,recovery /dev/md127 /data
>>
>> mount -o ro /dev/md127 /data
>>
>> All of these just hang and a reboot is necessary in order to kill the process.
>>
>>
>>
>> Things that don't work:
>>
>> root@odin:/tmp# btrfs-zero-log /dev/md127 checksum verify failed on
>> 18949527437312 found 4A677799 wanted CB641650 checksum verify failed
>> on 18949527437312 found 4A677799 wanted CB641650 checksum verify
>> failed on 18949527437312 found 4A677799 wanted CB641650 checksum
>> verify failed on 18949527437312 found 4A677799 wanted CB641650 Csum
>> didn't match Couldn't read chunk root
>>
>>
>> root@odin:/tmp# btrfs restore -F -i -D -v /dev/md127 /dev/null
>> checksum verify failed on 18949527437312 found 4A677799 wanted
>> CB641650 checksum verify failed on 18949527437312 found 4A677799
>> wanted CB641650 checksum verify failed on 18949527437312 found
>> 4A677799 wanted CB641650 checksum verify failed on 18949527437312
>> found 4A677799 wanted CB641650 Csum didn't match Couldn't read chunk
>> root Could not open root, trying backup super checksum verify failed
>> on 18949527437312 found 4A677799 wanted CB641650 checksum verify
>> failed on 18949527437312 found 4A677799 wanted CB641650 checksum
>> verify failed on 18949527437312 found 4A677799 wanted CB641650
>> checksum verify failed on 18949527437312 found 4A677799 wanted
>> CB641650 Csum didn't match Couldn't read chunk root Could not open
>> root, trying backup super checksum verify failed on 18949527437312
>> found 4A677799 wanted CB641650 checksum verify failed on
>> 18949527437312 found 4A677799 wanted CB641650 checksum verify failed
>> on 18949527437312 found 4A677799 wanted CB641650 checksum verify
>> failed on 18949527437312 found 4A677799 wanted CB641650 Csum didn't
>> match Couldn't read chunk root
>>
>>
>> root@odin:/tmp# btrfs-find-root /dev/md127 checksum verify failed on
>> 18949527437312 found 4A677799 wanted CB641650 checksum verify failed
>> on 18949527437312 found 4A677799 wanted CB641650 checksum verify
>> failed on 18949527437312 found 4A677799 wanted CB641650 checksum
>> verify failed on 18949527437312 found 4A677799 wanted CB641650 Csum
>> didn't match Couldn't read chunk root Open ctree failed
>>
>> root@odin:/tmp# btrfsck /dev/md127
>> Couldn't open file system
>>
>> oot@odin:/tmp# btrfs rescue super-recover -v /dev/md127 All Devices:
>> Device: id = 1, name = /dev/md127
>>
>> Before Recovering:
>> [All good supers]:
>> device name = /dev/md127
>> superblock bytenr = 65536
>>
>> device name = /dev/md127
>> superblock bytenr = 67108864
>>
>> device name = /dev/md127
>> superblock bytenr = 274877906944
>>
>> [All bad supers]:
>>
>> All supers are valid, no need to recover
>>
>>
>> root@odin:/tmp# btrfs check /dev/md127 Couldn't open file system
>> root@odin:/tmp# btrfsck /dev/md127 Couldn't open file system
>>
>> Other info
>>
>> root@odin:/tmp# lsblk -o name,type,size,fstype,mountpoint NAME TYPE
>> SIZE FSTYPE MOUNTPOINT
>> mtdblock0 disk 1.5M
>> mtdblock1 disk 128K
>> mtdblock2 disk 6M
>> mtdblock3 disk 4M
>> mtdblock4 disk 116M
>> sda disk 7.3T
>> ├─sda1 part 4G linux_raid_member
>> │ └─md0 raid1 4G ext4 /
>> ├─sda2 part 512M linux_raid_member
>> │ └─md1 raid6 1022.9M swap [SWAP]
>> └─sda3 part 7.3T linux_raid_member
>> └─md127 raid5 21.8T btrfs
>> sdb disk 7.3T
>> ├─sdb1 part 4G linux_raid_member
>> │ └─md0 raid1 4G ext4 /
>> ├─sdb2 part 512M linux_raid_member
>> │ └─md1 raid6 1022.9M swap [SWAP]
>> └─sdb3 part 7.3T linux_raid_member
>> └─md127 raid5 21.8T btrfs
>> sdc disk 7.3T
>> ├─sdc1 part 4G linux_raid_member
>> │ └─md0 raid1 4G ext4 /
>> ├─sdc2 part 512M linux_raid_member
>> │ └─md1 raid6 1022.9M swap [SWAP]
>> └─sdc3 part 7.3T linux_raid_member
>> └─md127 raid5 21.8T btrfs
>> sdd disk 7.3T
>> ├─sdd1 part 4G linux_raid_member
>> │ └─md0 raid1 4G ext4 /
>> ├─sdd2 part 512M linux_raid_member
>> │ └─md1 raid6 1022.9M swap [SWAP]
>> └─sdd3 part 7.3T linux_raid_member
>> └─md127 raid5 21.8T btrfs
>>
>>
>>    Disk health seems fine:
>> root@odin:/tmp# smartctl -a /dev/sda | grep PASSED SMART
>> overall-health self-assessment test result: PASSED root@odin:/tmp#
>> smartctl -a /dev/sdb | grep PASSED SMART overall-health
>> self-assessment test result: PASSED root@odin:/tmp# smartctl -a
>> /dev/sdc | grep PASSED SMART overall-health self-assessment test
>> result: PASSED root@odin:/tmp# smartctl -a /dev/sdd | grep PASSED
>> SMART overall-health self-assessment test result: PASSED
>>
>>
>>
>>
>> Dion
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
>> in the body of a message to majordomo@vger.kernel.org More majordomo
>> info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
> N�����r��y���b�X��ǧv�^�)޺{.n�+����{�n�߲)���w*\x1fjg���\x1e�����ݢj/���z�ޖ��2�ޙ���&�)ߡ�a��\x7f��\x1e�G���h�\x0f�j:+v���w�٥
>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: btrfs partition spontaneously corrupted - No recovery options. Kernel oops / "Kernel Bug"?
  2016-02-04  1:28 btrfs partition spontaneously corrupted - No recovery options. Kernel oops / "Kernel Bug"? Dion Gullotta
  2016-02-04  1:41 ` Qu Wenruo
@ 2016-02-04 11:58 ` Duncan
  1 sibling, 0 replies; 5+ messages in thread
From: Duncan @ 2016-02-04 11:58 UTC (permalink / raw)
  To: linux-btrfs

Dion Gullotta posted on Thu, 04 Feb 2016 12:28:09 +1100 as excerpted:

> root@odin:/var/readynasd# uname -a
> Linux odin 3.0.101.RN2120.3 #1 SMP
> Wed Apr 1 16:09:30 PDT 2015 armv7l GNU/Linux

Qu's helping you on the practical side, as a dev, far better than I could 
as a user, but here's a bit of additional higher-level information for 
your consideration.

That's an incredibly ancient kernel, in btrfs terms.  Btrfs didn't have 
the experimental tag stripped until 3.12, which is already seriously 
ancient in btrfs terms, and at least nominally that's a 3.0 kernel, not 
only half a decade old now (four years of 3.x kernels plus a year of 4.x, 
at five releases/year), but more than two years back before the 
experimental label came off in 3.12!

Of course the comparatively (still nearly a year old, tho) recent April, 
2015 build date does hint that it's likely a heavily backport-patched 
kernel, but what btrfs patches have been backported and which ones 
haven't, probably only the project devs are tracking, and it's nothing 
we'd know here, targeting mainstream, where for btrfs at least, that's a 
very old and heavily experimental btrfs kernel indeed!

General recommendations here, in view of the fact that btrfs, while no 
longer experimental, is still under heavy development, is to keep to the 
latest couple of release series, either current or LTS.  With 4.4 out as 
an LTS, that would be 4.4 and 4.1 as LTS kernel series, and 4.4 and 4.3 
as current kernel series, tho with 4.4 so new in LTS terms and btrfs 
stability and maturity still developing, the previous LTS, 3.18, is still 
somewhat supported and may continue to be, making it three LTS series.  
But certainly, before 3.18 LTS, while we do still try to help, 
development remains fast enough that it's simply old, and unlikely to be 
well supported at all simply for practical reasons.

So you may want to either upgrade to at /least/ the 3.18 LTS series and 
preferably at least 4.1 LTS if you're continuing to run btrfs, *OR* get 
support from your distro, as presumably if they're still running and 
choosing to support then highly experimental btrfs on such nominally old 
and seriously experimental btrfs kernels, they have good reasons, and 
they may be better positioned to provide btrfs support on something that 
ancient than this list is, *OR* if you have good reason to continue on 
such old kernels, I'd strongly urge you to reconsider whether btrfs, 
particularly at the experimental level it was back in kernel 3.0, is an 
appropriate choice for such long-term-stable projects as that appears to 
be, using half-decade-old kernels with then still highly experimental 
btrfs.  It's very likely in the latter case, that a fully stable and 
mature filesystem such as ext3/4 (was ext4 even really stable yet a half 
decade ago? ext3 may be better on a 3.0 kernel, I'm not sure), or the 
reiserfs I used for years and have had very good experience with (at 
least since the data=ordered default was introduced a decade or so ago), 
is much more suitable for that use-case, than something like the still 
stabilizING and maturING even in current kernels btrfs.

Now your btrfs userspace tools are rather more current at 3.17, but even 
that's relatively old, now, the rule of thumb for userspace being to keep 
its version at least matching the kernel version, assuming it's kept 
reasonably current, which would again mean 3.18 series at the oldest, 
that being the oldest LTS series really supported, and again, preferably 
newer than that, 4.1 series or newer, up to the current 4.4, as current 
userspace can normally be used with older kernels without issue except 
for mkfs.btrfs, where you'll want to specify options to be compatible 
with older kernels that didn't have code for newer on-device formats, yet.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-02-04 11:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-04  1:28 btrfs partition spontaneously corrupted - No recovery options. Kernel oops / "Kernel Bug"? Dion Gullotta
2016-02-04  1:41 ` Qu Wenruo
2016-02-04  1:53   ` Dion Gullotta
2016-02-04  2:23     ` Qu Wenruo
2016-02-04 11:58 ` Duncan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).