* Mount failing - unable to find logical @ 2017-10-17 20:43 Cameron Kelley 2017-10-18 1:24 ` Qu Wenruo 0 siblings, 1 reply; 6+ messages in thread From: Cameron Kelley @ 2017-10-17 20:43 UTC (permalink / raw) To: linux-btrfs Hey btrfs gurus, I have a 4 disk btrfs filesystem that has suddenly stopped mounting after a recent reboot. The data is in an odd configuration due to originally being in a 3 disk RAID1 before adding a 4th disk and running a balance to convert to RAID10. There wasn't enough free space to completely convert, so about half the data is still in RAID1 while the other half is in RAID10. Both metadata and system are RAID10. It has been in this configuration for 6 months or so now since adding the 4th disk. It just holds archived media and hasn't had any data added or modified in quite some time. I feel pretty stupid now for not correcting that sooner though. I have tried mounting with different mount options for recovery, ro, degraded, etc. Log shows errors about "unable to find logical 3746892939264 length 4096" When I do a btrfs check, it doesn't find any issues. Running btrfs-find-root comes up with a message about a block that the generation doesn't match. If I specify that block on the btrfs check, I get transid verify failures. I ran a dry run of a recovery of the entire filesystem which runs through every file with no errors. I would just restore the data and start fresh, but unfortunately I don't have the free space at the moment for the ~4.5TB of data. I also ran full smart self tests on all 4 disks with no errors. root@nas2:~# uname -a Linux nas2 4.13.7-041307-generic #201710141430 SMP Sat Oct 14 14:39:06 UTC 2017 i686 i686 i686 GNU/Linux root@nas2:~# btrfs version btrfs-progs v4.13.2 root@nas2:~# btrfs fi show Label: none uuid: 827029a4-8625-4a50-a22d-0fd28dbe2d36 Total devices 4 FS bytes used 4.60TiB devid 1 size 2.73TiB used 2.33TiB path /dev/sdb1 devid 2 size 2.73TiB used 2.33TiB path /dev/sdc devid 3 size 2.73TiB used 2.33TiB path /dev/sdd1 devid 4 size 2.73TiB used 2.33TiB path /dev/sde1 root@nas2:~# mount /dev/sdb1 /mnt/nas2/ mount: wrong fs type, bad option, bad superblock on /dev/sdb1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. root@nas2:~# dmesg | tail [ 801.332623] BTRFS info (device sdb1): disk space caching is enabled [ 801.332627] BTRFS info (device sdb1): has skinny extents [ 801.333386] BTRFS critical (device sdb1): unable to find logical 3746892939264 length 4096 [ 801.333472] BTRFS critical (device sdb1): unable to find logical 3746892939264 length 4096 [ 801.333769] BTRFS critical (device sdb1): unable to find logical 3746892939264 length 4096 [ 801.333835] BTRFS critical (device sdb1): unable to find logical 3746892939264 length 4096 [ 801.333909] BTRFS critical (device sdb1): unable to find logical 3746892939264 length 4096 [ 801.333968] BTRFS critical (device sdb1): unable to find logical 3746892939264 length 4096 [ 801.334028] BTRFS error (device sdb1): failed to read chunk root [ 801.365452] BTRFS error (device sdb1): open_ctree failed root@nas2:~# btrfs check /dev/sdb1 Checking filesystem on /dev/sdb1 UUID: 827029a4-8625-4a50-a22d-0fd28dbe2d36 checking extents checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots checking csums checking root refs found 5054297628672 bytes used, no error found total csum bytes: 4929567064 total tree bytes: 5197856768 total fs tree bytes: 15237120 total extent tree bytes: 43433984 btree space waste bytes: 161510789 file data blocks allocated: 5050024812544 referenced 5049610178560 root@nas2:~# btrfs-find-root /dev/sdb1 Superblock thinks the generation is 147970 Superblock thinks the level is 1 Found tree root at 21335861559296 gen 147970 level 1 Well block 21335857758208(gen: 147969 level: 1) seems good, but generation/level doesn't match, want gen: 147970 level: 1 root@nas2:~# btrfs check -r 21335857758208 /dev/sdb1 parent transid verify failed on 21335857758208 wanted 147970 found 147969 parent transid verify failed on 21335857758208 wanted 147970 found 147969 parent transid verify failed on 21335857758208 wanted 147970 found 147969 parent transid verify failed on 21335857758208 wanted 147970 found 147969 Ignoring transid failure Checking filesystem on /dev/sdb1 UUID: 827029a4-8625-4a50-a22d-0fd28dbe2d36 checking extents checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots checking csums checking root refs ERROR: transid errors in file system found 5054297628672 bytes used, error(s) found total csum bytes: 4929567064 total tree bytes: 5197856768 total fs tree bytes: 15237120 total extent tree bytes: 43433984 btree space waste bytes: 161510789 file data blocks allocated: 5050024812544 referenced 5049610178560 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Mount failing - unable to find logical 2017-10-17 20:43 Mount failing - unable to find logical Cameron Kelley @ 2017-10-18 1:24 ` Qu Wenruo 2017-10-18 3:22 ` Cameron Kelley 2017-10-18 5:10 ` Roman Mamedov 0 siblings, 2 replies; 6+ messages in thread From: Qu Wenruo @ 2017-10-18 1:24 UTC (permalink / raw) To: Cameron Kelley, linux-btrfs On 2017年10月18日 04:43, Cameron Kelley wrote: > Hey btrfs gurus, > > I have a 4 disk btrfs filesystem that has suddenly stopped mounting > after a recent reboot. The data is in an odd configuration due to > originally being in a 3 disk RAID1 before adding a 4th disk and running > a balance to convert to RAID10. There wasn't enough free space to > completely convert, so about half the data is still in RAID1 while the > other half is in RAID10. Both metadata and system are RAID10. It has > been in this configuration for 6 months or so now since adding the 4th > disk. It just holds archived media and hasn't had any data added or > modified in quite some time. I feel pretty stupid now for not correcting > that sooner though. > > I have tried mounting with different mount options for recovery, ro, > degraded, etc. Log shows errors about "unable to find logical > 3746892939264 length 4096" > > When I do a btrfs check, it doesn't find any issues. Running > btrfs-find-root comes up with a message about a block that the > generation doesn't match. If I specify that block on the btrfs check, I > get transid verify failures. > > I ran a dry run of a recovery of the entire filesystem which runs > through every file with no errors. I would just restore the data and > start fresh, but unfortunately I don't have the free space at the moment > for the ~4.5TB of data. > > I also ran full smart self tests on all 4 disks with no errors. > > root@nas2:~# uname -a > Linux nas2 4.13.7-041307-generic #201710141430 SMP Sat Oct 14 14:39:06 > UTC 2017 i686 i686 i686 GNU/Linux I don't think i686 kernel will cause any difference, but considering most of us are using x86_64 to develop/test, maybe it will be a good idea to upgrade to x86_64 kernel? > > root@nas2:~# btrfs version > btrfs-progs v4.13.2 > > root@nas2:~# btrfs fi show > Label: none uuid: 827029a4-8625-4a50-a22d-0fd28dbe2d36 > Total devices 4 FS bytes used 4.60TiB > devid 1 size 2.73TiB used 2.33TiB path /dev/sdb1 > devid 2 size 2.73TiB used 2.33TiB path /dev/sdc > devid 3 size 2.73TiB used 2.33TiB path /dev/sdd1 > devid 4 size 2.73TiB used 2.33TiB path /dev/sde1 > > root@nas2:~# mount /dev/sdb1 /mnt/nas2/ > mount: wrong fs type, bad option, bad superblock on /dev/sdb1, > missing codepage or helper program, or other error > > In some cases useful info is found in syslog - try > dmesg | tail or so. > > root@nas2:~# dmesg | tail > [ 801.332623] BTRFS info (device sdb1): disk space caching is enabled > [ 801.332627] BTRFS info (device sdb1): has skinny extents > [ 801.333386] BTRFS critical (device sdb1): unable to find logical > 3746892939264 length 4096 > [ 801.333472] BTRFS critical (device sdb1): unable to find logical > 3746892939264 length 4096 > [ 801.333769] BTRFS critical (device sdb1): unable to find logical > 3746892939264 length 4096 > [ 801.333835] BTRFS critical (device sdb1): unable to find logical > 3746892939264 length 4096 > [ 801.333909] BTRFS critical (device sdb1): unable to find logical > 3746892939264 length 4096 > [ 801.333968] BTRFS critical (device sdb1): unable to find logical > 3746892939264 length 4096 > [ 801.334028] BTRFS error (device sdb1): failed to read chunk root > [ 801.365452] BTRFS error (device sdb1): open_ctree failed Some of the chunk tree failed to be read out. Either chunk tree or system chunk array has some problem. Would you please dump the chunk tree and superblock by the following commands? # btrfs inspect-internal dump-tree -t chunk /dev/sdb1 # btrfs inspect-internal dump-super -fa /dev/sdb1 > > root@nas2:~# btrfs check /dev/sdb1 > Checking filesystem on /dev/sdb1 > UUID: 827029a4-8625-4a50-a22d-0fd28dbe2d36 > checking extents > checking free space cache > cache and super generation don't match, space cache will be invalidated > checking fs roots > checking csums > checking root refs > found 5054297628672 bytes used, no error found > total csum bytes: 4929567064 > total tree bytes: 5197856768 > total fs tree bytes: 15237120 > total extent tree bytes: 43433984 > btree space waste bytes: 161510789 > file data blocks allocated: 5050024812544 > referenced 5049610178560 Unless we has some bug in btrfs-progs chunk mapping, the result seems quite good. Just in case, would you please also run "btrfs check --mode=lowmem /dev/sdb1" to see if it's OK? > > root@nas2:~# btrfs-find-root /dev/sdb1 > Superblock thinks the generation is 147970 > Superblock thinks the level is 1 > Found tree root at 21335861559296 gen 147970 level 1 > Well block 21335857758208(gen: 147969 level: 1) seems good, but > generation/level doesn't match, want gen: 147970 level: 1 Since it's mostly related to chunk tree, would you please try the following command? # btrfs-find-root -o 3 /dev/sdb1 # btrfs check --chunk-root <the next chunk root bytenr> /dev/sdb1 Thanks, Qu > > root@nas2:~# btrfs check -r 21335857758208 /dev/sdb1 > parent transid verify failed on 21335857758208 wanted 147970 found 147969 > parent transid verify failed on 21335857758208 wanted 147970 found 147969 > parent transid verify failed on 21335857758208 wanted 147970 found 147969 > parent transid verify failed on 21335857758208 wanted 147970 found 147969 > Ignoring transid failure > Checking filesystem on /dev/sdb1 > UUID: 827029a4-8625-4a50-a22d-0fd28dbe2d36 > checking extents > checking free space cache > cache and super generation don't match, space cache will be invalidated > checking fs roots > checking csums > checking root refs > ERROR: transid errors in file system > found 5054297628672 bytes used, error(s) found > total csum bytes: 4929567064 > total tree bytes: 5197856768 > total fs tree bytes: 15237120 > total extent tree bytes: 43433984 > btree space waste bytes: 161510789 > file data blocks allocated: 5050024812544 > referenced 5049610178560 > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Mount failing - unable to find logical 2017-10-18 1:24 ` Qu Wenruo @ 2017-10-18 3:22 ` Cameron Kelley 2017-10-18 4:36 ` Qu Wenruo 2017-10-18 5:10 ` Roman Mamedov 1 sibling, 1 reply; 6+ messages in thread From: Cameron Kelley @ 2017-10-18 3:22 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs On 10-17-2017 6:24 PM, Qu Wenruo wrote: > > > On 2017-10-18 04:43, Cameron Kelley wrote: >> Hey btrfs gurus, >> >> I have a 4 disk btrfs filesystem that has suddenly stopped mounting >> after a recent reboot. The data is in an odd configuration due to >> originally being in a 3 disk RAID1 before adding a 4th disk and running >> a balance to convert to RAID10. There wasn't enough free space to >> completely convert, so about half the data is still in RAID1 while the >> other half is in RAID10. Both metadata and system are RAID10. It has >> been in this configuration for 6 months or so now since adding the 4th >> disk. It just holds archived media and hasn't had any data added or >> modified in quite some time. I feel pretty stupid now for not >> correcting that sooner though. >> >> I have tried mounting with different mount options for recovery, ro, >> degraded, etc. Log shows errors about "unable to find logical >> 3746892939264 length 4096" >> >> When I do a btrfs check, it doesn't find any issues. Running >> btrfs-find-root comes up with a message about a block that the >> generation doesn't match. If I specify that block on the btrfs check, I >> get transid verify failures. >> >> I ran a dry run of a recovery of the entire filesystem which runs >> through every file with no errors. I would just restore the data and >> start fresh, but unfortunately I don't have the free space at the >> moment for the ~4.5TB of data. >> >> I also ran full smart self tests on all 4 disks with no errors. >> >> root@nas2:~# uname -a >> Linux nas2 4.13.7-041307-generic #201710141430 SMP Sat Oct 14 14:39:06 >> UTC 2017 i686 i686 i686 GNU/Linux > > I don't think i686 kernel will cause any difference, but considering > most of us are using x86_64 to develop/test, maybe it will be a good > idea to upgrade to x86_64 kernel? > Thanks for the quick response. This is an old x86 Pentium NAS I inherited, so unfortunately I'm stuck on a 32-bit kernel. If push comes to shove, I can disassemble another x64 machine to test with. >> >> root@nas2:~# btrfs version >> btrfs-progs v4.13.2 >> >> root@nas2:~# btrfs fi show >> Label: none uuid: 827029a4-8625-4a50-a22d-0fd28dbe2d36 >> Total devices 4 FS bytes used 4.60TiB >> devid 1 size 2.73TiB used 2.33TiB path /dev/sdb1 >> devid 2 size 2.73TiB used 2.33TiB path /dev/sdc >> devid 3 size 2.73TiB used 2.33TiB path /dev/sdd1 >> devid 4 size 2.73TiB used 2.33TiB path /dev/sde1 >> >> root@nas2:~# mount /dev/sdb1 /mnt/nas2/ >> mount: wrong fs type, bad option, bad superblock on /dev/sdb1, >> missing codepage or helper program, or other error >> >> In some cases useful info is found in syslog - try >> dmesg | tail or so. >> >> root@nas2:~# dmesg | tail >> [ 801.332623] BTRFS info (device sdb1): disk space caching is enabled >> [ 801.332627] BTRFS info (device sdb1): has skinny extents >> [ 801.333386] BTRFS critical (device sdb1): unable to find logical >> 3746892939264 length 4096 >> [ 801.333472] BTRFS critical (device sdb1): unable to find logical >> 3746892939264 length 4096 >> [ 801.333769] BTRFS critical (device sdb1): unable to find logical >> 3746892939264 length 4096 >> [ 801.333835] BTRFS critical (device sdb1): unable to find logical >> 3746892939264 length 4096 >> [ 801.333909] BTRFS critical (device sdb1): unable to find logical >> 3746892939264 length 4096 >> [ 801.333968] BTRFS critical (device sdb1): unable to find logical >> 3746892939264 length 4096 >> [ 801.334028] BTRFS error (device sdb1): failed to read chunk root >> [ 801.365452] BTRFS error (device sdb1): open_ctree failed > > Some of the chunk tree failed to be read out. > > Either chunk tree or system chunk array has some problem. > > Would you please dump the chunk tree and superblock by the following > commands? > > # btrfs inspect-internal dump-tree -t chunk /dev/sdb1 > # btrfs inspect-internal dump-super -fa /dev/sdb1 > # btrfs inspect-internal dump-tree -t chunk /dev/sdb1 http://pastebin.ubuntu.com/25763241/ # btrfs inspect-internal dump-super -fa /dev/sdb1 http://pastebin.ubuntu.com/25763246/ >> >> root@nas2:~# btrfs check /dev/sdb1 >> Checking filesystem on /dev/sdb1 >> UUID: 827029a4-8625-4a50-a22d-0fd28dbe2d36 >> checking extents >> checking free space cache >> cache and super generation don't match, space cache will be invalidated >> checking fs roots >> checking csums >> checking root refs >> found 5054297628672 bytes used, no error found >> total csum bytes: 4929567064 >> total tree bytes: 5197856768 >> total fs tree bytes: 15237120 >> total extent tree bytes: 43433984 >> btree space waste bytes: 161510789 >> file data blocks allocated: 5050024812544 >> referenced 5049610178560 > > Unless we has some bug in btrfs-progs chunk mapping, the result seems > quite good. > > Just in case, would you please also run "btrfs check --mode=lowmem > /dev/sdb1" to see if it's OK? > # btrfs check --mode=lowmem /dev/sdb1 Checking filesystem on /dev/sdb1 UUID: 827029a4-8625-4a50-a22d-0fd28dbe2d36 checking extents checking free space cache checking fs roots checking csums checking root refs found 5053072662528 bytes used, no error found total csum bytes: 4929567064 total tree bytes: 5195988992 total fs tree bytes: 15237120 total extent tree bytes: 43368448 btree space waste bytes: 161593371 file data blocks allocated: 5048801714176 referenced 5048387080192 >> >> root@nas2:~# btrfs-find-root /dev/sdb1 >> Superblock thinks the generation is 147970 >> Superblock thinks the level is 1 >> Found tree root at 21335861559296 gen 147970 level 1 >> Well block 21335857758208(gen: 147969 level: 1) seems good, but >> generation/level doesn't match, want gen: 147970 level: 1 > > Since it's mostly related to chunk tree, would you please try the > following command? > > # btrfs-find-root -o 3 /dev/sdb1 > # btrfs check --chunk-root <the next chunk root bytenr> /dev/sdb1 > > Thanks, > Qu > # btrfs-find-root -o 3 /dev/sdb1 Superblock thinks the generation is 147728 Superblock thinks the level is 1 Found tree root at 21339078983680 gen 147728 level 1 # btrfs check --chunk-root 21339078983680 /dev/sdb1 Checking filesystem on /dev/sdb1 UUID: 827029a4-8625-4a50-a22d-0fd28dbe2d36 checking extents checking free space cache checking fs roots checking csums checking root refs found 5053072662528 bytes used, no error found total csum bytes: 4929567064 total tree bytes: 5195988992 total fs tree bytes: 15237120 total extent tree bytes: 43368448 btree space waste bytes: 161593371 file data blocks allocated: 5048801714176 referenced 5048387080192 >> >> root@nas2:~# btrfs check -r 21335857758208 /dev/sdb1 >> parent transid verify failed on 21335857758208 wanted 147970 found 147969 >> parent transid verify failed on 21335857758208 wanted 147970 found 147969 >> parent transid verify failed on 21335857758208 wanted 147970 found 147969 >> parent transid verify failed on 21335857758208 wanted 147970 found 147969 >> Ignoring transid failure >> Checking filesystem on /dev/sdb1 >> UUID: 827029a4-8625-4a50-a22d-0fd28dbe2d36 >> checking extents >> checking free space cache >> cache and super generation don't match, space cache will be invalidated >> checking fs roots >> checking csums >> checking root refs >> ERROR: transid errors in file system >> found 5054297628672 bytes used, error(s) found >> total csum bytes: 4929567064 >> total tree bytes: 5197856768 >> total fs tree bytes: 15237120 >> total extent tree bytes: 43433984 >> btree space waste bytes: 161510789 >> file data blocks allocated: 5050024812544 >> referenced 5049610178560 >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Mount failing - unable to find logical 2017-10-18 3:22 ` Cameron Kelley @ 2017-10-18 4:36 ` Qu Wenruo 0 siblings, 0 replies; 6+ messages in thread From: Qu Wenruo @ 2017-10-18 4:36 UTC (permalink / raw) To: Cameron Kelley, linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 9930 bytes --] On 2017年10月18日 11:22, Cameron Kelley wrote: > > > On 10-17-2017 6:24 PM, Qu Wenruo wrote: >> >> >> On 2017-10-18 04:43, Cameron Kelley wrote: >>> Hey btrfs gurus, >>> >>> I have a 4 disk btrfs filesystem that has suddenly stopped mounting >>> after a recent reboot. The data is in an odd configuration due to >>> originally being in a 3 disk RAID1 before adding a 4th disk and running >>> a balance to convert to RAID10. There wasn't enough free space to >>> completely convert, so about half the data is still in RAID1 while the >>> other half is in RAID10. Both metadata and system are RAID10. It has >>> been in this configuration for 6 months or so now since adding the 4th >>> disk. It just holds archived media and hasn't had any data added or >>> modified in quite some time. I feel pretty stupid now for not >>> correcting that sooner though. >>> >>> I have tried mounting with different mount options for recovery, ro, >>> degraded, etc. Log shows errors about "unable to find logical >>> 3746892939264 length 4096" >>> >>> When I do a btrfs check, it doesn't find any issues. Running >>> btrfs-find-root comes up with a message about a block that the >>> generation doesn't match. If I specify that block on the btrfs check, I >>> get transid verify failures. >>> >>> I ran a dry run of a recovery of the entire filesystem which runs >>> through every file with no errors. I would just restore the data and >>> start fresh, but unfortunately I don't have the free space at the >>> moment for the ~4.5TB of data. >>> >>> I also ran full smart self tests on all 4 disks with no errors. >>> >>> root@nas2:~# uname -a >>> Linux nas2 4.13.7-041307-generic #201710141430 SMP Sat Oct 14 14:39:06 >>> UTC 2017 i686 i686 i686 GNU/Linux >> >> I don't think i686 kernel will cause any difference, but considering >> most of us are using x86_64 to develop/test, maybe it will be a good >> idea to upgrade to x86_64 kernel? >> > > Thanks for the quick response. > > This is an old x86 Pentium NAS I inherited, so unfortunately I'm stuck > on a 32-bit kernel. If push comes to shove, I can disassemble another > x64 machine to test with. I think it's better to try x86_64, since the problem is quite weird and I don't even know how the strange bytenr come from, so I think it's better to wipe out any possibility. > >>> >>> root@nas2:~# btrfs version >>> btrfs-progs v4.13.2 >>> >>> root@nas2:~# btrfs fi show >>> Label: none uuid: 827029a4-8625-4a50-a22d-0fd28dbe2d36 >>> Total devices 4 FS bytes used 4.60TiB >>> devid 1 size 2.73TiB used 2.33TiB path /dev/sdb1 >>> devid 2 size 2.73TiB used 2.33TiB path /dev/sdc >>> devid 3 size 2.73TiB used 2.33TiB path /dev/sdd1 >>> devid 4 size 2.73TiB used 2.33TiB path /dev/sde1 >>> >>> root@nas2:~# mount /dev/sdb1 /mnt/nas2/ >>> mount: wrong fs type, bad option, bad superblock on /dev/sdb1, >>> missing codepage or helper program, or other error >>> >>> In some cases useful info is found in syslog - try >>> dmesg | tail or so. >>> >>> root@nas2:~# dmesg | tail >>> [ 801.332623] BTRFS info (device sdb1): disk space caching is enabled >>> [ 801.332627] BTRFS info (device sdb1): has skinny extents >>> [ 801.333386] BTRFS critical (device sdb1): unable to find logical >>> 3746892939264 length 4096 >>> [ 801.333472] BTRFS critical (device sdb1): unable to find logical >>> 3746892939264 length 4096 >>> [ 801.333769] BTRFS critical (device sdb1): unable to find logical >>> 3746892939264 length 4096 >>> [ 801.333835] BTRFS critical (device sdb1): unable to find logical >>> 3746892939264 length 4096 >>> [ 801.333909] BTRFS critical (device sdb1): unable to find logical >>> 3746892939264 length 4096 >>> [ 801.333968] BTRFS critical (device sdb1): unable to find logical >>> 3746892939264 length 4096 >>> [ 801.334028] BTRFS error (device sdb1): failed to read chunk root >>> [ 801.365452] BTRFS error (device sdb1): open_ctree failed >> >> Some of the chunk tree failed to be read out. >> >> Either chunk tree or system chunk array has some problem. >> >> Would you please dump the chunk tree and superblock by the following >> commands? >> >> # btrfs inspect-internal dump-tree -t chunk /dev/sdb1 >> # btrfs inspect-internal dump-super -fa /dev/sdb1 >> > > # btrfs inspect-internal dump-tree -t chunk /dev/sdb1 > http://pastebin.ubuntu.com/25763241/ > > # btrfs inspect-internal dump-super -fa /dev/sdb1 > http://pastebin.ubuntu.com/25763246/ Strange. All your chunks start from 16365829226496, which is completely common if you tried or have already balanced the whole fs (for several times). But this leads to the problem, there is no chunk mapping for 3746892939264, since all your chunk starts far behind that byte number. I can't see any thing related to the bytenr in the dump, both chunk tree and super dump. All your chunk tree are in valid chunk map range. So your chunk tree is good. Both check mode also verified this. And further more, you are using 16K nodesize, which is the default value, while kernel is trying to read a 4K block, which means it's data. But according to your dmesg, you are stuck where we failed to read out the chunk root (not the whole chunk tree). This makes the whole thing even more weird, as your superblock shows your chunk root is at 21339078983680, which is completely valid. So I recommend to try to mount the fs on a x64 or at least some newer machine, to see if the problem just suddenly disappears. Thanks, Qu > >>> >>> root@nas2:~# btrfs check /dev/sdb1 >>> Checking filesystem on /dev/sdb1 >>> UUID: 827029a4-8625-4a50-a22d-0fd28dbe2d36 >>> checking extents >>> checking free space cache >>> cache and super generation don't match, space cache will be invalidated >>> checking fs roots >>> checking csums >>> checking root refs >>> found 5054297628672 bytes used, no error found >>> total csum bytes: 4929567064 >>> total tree bytes: 5197856768 >>> total fs tree bytes: 15237120 >>> total extent tree bytes: 43433984 >>> btree space waste bytes: 161510789 >>> file data blocks allocated: 5050024812544 >>> referenced 5049610178560 >> >> Unless we has some bug in btrfs-progs chunk mapping, the result seems >> quite good. >> >> Just in case, would you please also run "btrfs check --mode=lowmem >> /dev/sdb1" to see if it's OK? >> > > # btrfs check --mode=lowmem /dev/sdb1 > Checking filesystem on /dev/sdb1 > UUID: 827029a4-8625-4a50-a22d-0fd28dbe2d36 > checking extents > checking free space cache > checking fs roots > checking csums > checking root refs > found 5053072662528 bytes used, no error found > total csum bytes: 4929567064 > total tree bytes: 5195988992 > total fs tree bytes: 15237120 > total extent tree bytes: 43368448 > btree space waste bytes: 161593371 > file data blocks allocated: 5048801714176 > referenced 5048387080192 > >>> >>> root@nas2:~# btrfs-find-root /dev/sdb1 >>> Superblock thinks the generation is 147970 >>> Superblock thinks the level is 1 >>> Found tree root at 21335861559296 gen 147970 level 1 >>> Well block 21335857758208(gen: 147969 level: 1) seems good, but >>> generation/level doesn't match, want gen: 147970 level: 1 >> >> Since it's mostly related to chunk tree, would you please try the >> following command? >> >> # btrfs-find-root -o 3 /dev/sdb1 >> # btrfs check --chunk-root <the next chunk root bytenr> /dev/sdb1 >> >> Thanks, >> Qu >> > > # btrfs-find-root -o 3 /dev/sdb1 > Superblock thinks the generation is 147728 > Superblock thinks the level is 1 > Found tree root at 21339078983680 gen 147728 level 1 > > # btrfs check --chunk-root 21339078983680 /dev/sdb1 > Checking filesystem on /dev/sdb1 > UUID: 827029a4-8625-4a50-a22d-0fd28dbe2d36 > checking extents > checking free space cache > checking fs roots > checking csums > checking root refs > found 5053072662528 bytes used, no error found > total csum bytes: 4929567064 > total tree bytes: 5195988992 > total fs tree bytes: 15237120 > total extent tree bytes: 43368448 > btree space waste bytes: 161593371 > file data blocks allocated: 5048801714176 > referenced 5048387080192 > >>> >>> root@nas2:~# btrfs check -r 21335857758208 /dev/sdb1 >>> parent transid verify failed on 21335857758208 wanted 147970 found > 147969 >>> parent transid verify failed on 21335857758208 wanted 147970 found > 147969 >>> parent transid verify failed on 21335857758208 wanted 147970 found > 147969 >>> parent transid verify failed on 21335857758208 wanted 147970 found > 147969 >>> Ignoring transid failure >>> Checking filesystem on /dev/sdb1 >>> UUID: 827029a4-8625-4a50-a22d-0fd28dbe2d36 >>> checking extents >>> checking free space cache >>> cache and super generation don't match, space cache will be invalidated >>> checking fs roots >>> checking csums >>> checking root refs >>> ERROR: transid errors in file system >>> found 5054297628672 bytes used, error(s) found >>> total csum bytes: 4929567064 >>> total tree bytes: 5197856768 >>> total fs tree bytes: 15237120 >>> total extent tree bytes: 43433984 >>> btree space waste bytes: 161510789 >>> file data blocks allocated: 5050024812544 >>> referenced 5049610178560 >>> -- >>> To unsubscribe from this list: send the line "unsubscribe > linux-btrfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 504 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Mount failing - unable to find logical 2017-10-18 1:24 ` Qu Wenruo 2017-10-18 3:22 ` Cameron Kelley @ 2017-10-18 5:10 ` Roman Mamedov 2017-10-18 16:40 ` SOLVED - 32-bit kernel 4.13 bug - " Cameron Kelley 1 sibling, 1 reply; 6+ messages in thread From: Roman Mamedov @ 2017-10-18 5:10 UTC (permalink / raw) To: Qu Wenruo; +Cc: Cameron Kelley, linux-btrfs On Wed, 18 Oct 2017 09:24:01 +0800 Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > On 2017年10月18日 04:43, Cameron Kelley wrote: > > Hey btrfs gurus, > > > > I have a 4 disk btrfs filesystem that has suddenly stopped mounting > > after a recent reboot. The data is in an odd configuration due to > > originally being in a 3 disk RAID1 before adding a 4th disk and running > > a balance to convert to RAID10. There wasn't enough free space to > > completely convert, so about half the data is still in RAID1 while the > > other half is in RAID10. Both metadata and system are RAID10. It has > > been in this configuration for 6 months or so now since adding the 4th > > disk. It just holds archived media and hasn't had any data added or > > modified in quite some time. I feel pretty stupid now for not correcting > > that sooner though. > > > > I have tried mounting with different mount options for recovery, ro, > > degraded, etc. Log shows errors about "unable to find logical > > 3746892939264 length 4096" > > > > When I do a btrfs check, it doesn't find any issues. Running > > btrfs-find-root comes up with a message about a block that the > > generation doesn't match. If I specify that block on the btrfs check, I > > get transid verify failures. > > > > I ran a dry run of a recovery of the entire filesystem which runs > > through every file with no errors. I would just restore the data and > > start fresh, but unfortunately I don't have the free space at the moment > > for the ~4.5TB of data. > > > > I also ran full smart self tests on all 4 disks with no errors. > > > > root@nas2:~# uname -a > > Linux nas2 4.13.7-041307-generic #201710141430 SMP Sat Oct 14 14:39:06 > > UTC 2017 i686 i686 i686 GNU/Linux > > I don't think i686 kernel will cause any difference, but considering > most of us are using x86_64 to develop/test, maybe it will be a good > idea to upgrade to x86_64 kernel? Indeed a problem with mounting on 32-bit in 4.13 has been reported recently: https://www.spinics.net/lists/linux-btrfs/msg69734.html with the same error message. I believe it's this patchset that is supposed to fix that. https://www.spinics.net/lists/linux-btrfs/msg70001.html @Cameron maybe you didn't just reboot, but also upgraded your kernel at the same time? In any case, try a 4.9 series kernel, or a 64-bit machine if you want to stay with 4.13. -- With respect, Roman ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: SOLVED - 32-bit kernel 4.13 bug - Mount failing - unable to find logical 2017-10-18 5:10 ` Roman Mamedov @ 2017-10-18 16:40 ` Cameron Kelley 0 siblings, 0 replies; 6+ messages in thread From: Cameron Kelley @ 2017-10-18 16:40 UTC (permalink / raw) To: Roman Mamedov, Qu Wenruo; +Cc: linux-btrfs On 10-17-2017 10:10 PM, Roman Mamedov wrote: > On Wed, 18 Oct 2017 09:24:01 +0800 > Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > >> >> >> On 2017-10-18 04:43, Cameron Kelley wrote: >>> Hey btrfs gurus, >>> >>> I have a 4 disk btrfs filesystem that has suddenly stopped mounting >>> after a recent reboot. The data is in an odd configuration due to >>> originally being in a 3 disk RAID1 before adding a 4th disk and running >>> a balance to convert to RAID10. There wasn't enough free space to >>> completely convert, so about half the data is still in RAID1 while the >>> other half is in RAID10. Both metadata and system are RAID10. It has >>> been in this configuration for 6 months or so now since adding the 4th >>> disk. It just holds archived media and hasn't had any data added or >>> modified in quite some time. I feel pretty stupid now for not correcting >>> that sooner though. >>> >>> I have tried mounting with different mount options for recovery, ro, >>> degraded, etc. Log shows errors about "unable to find logical >>> 3746892939264 length 4096" >>> >>> When I do a btrfs check, it doesn't find any issues. Running >>> btrfs-find-root comes up with a message about a block that the >>> generation doesn't match. If I specify that block on the btrfs check, I >>> get transid verify failures. >>> >>> I ran a dry run of a recovery of the entire filesystem which runs >>> through every file with no errors. I would just restore the data and >>> start fresh, but unfortunately I don't have the free space at the moment >>> for the ~4.5TB of data. >>> >>> I also ran full smart self tests on all 4 disks with no errors. >>> >>> root@nas2:~# uname -a >>> Linux nas2 4.13.7-041307-generic #201710141430 SMP Sat Oct 14 14:39:06 >>> UTC 2017 i686 i686 i686 GNU/Linux >> >> I don't think i686 kernel will cause any difference, but considering >> most of us are using x86_64 to develop/test, maybe it will be a good >> idea to upgrade to x86_64 kernel? > > Indeed a problem with mounting on 32-bit in 4.13 has been reported recently: > https://www.spinics.net/lists/linux-btrfs/msg69734.html > with the same error message. > > I believe it's this patchset that is supposed to fix that. > https://www.spinics.net/lists/linux-btrfs/msg70001.html > > @Cameron maybe you didn't just reboot, but also upgraded your kernel at the > same time? In any case, try a 4.9 series kernel, or a 64-bit machine if you > want to stay with 4.13. > Just for reference to anyone else having this issue, it is indeed a bug in the 32-bit release of the 4.13 kernel. The x64 kernel had no issues mounting it. An interesting thing to note is that I still had all the exact same mount issues and errors when I booted the latest PartedMagic live image with kernel 4.12.9 in 32-bit mode. The same PatedMagic image in 64-bit mode had no issues which is how I confirmed your suspicions. Now for the part where I feel more stupid than I have in a long time. 1. Apparently I had updated the kernel one this NAS without realizing it since I was doing updates on multiple appliances at once a little while ago and just hadn't rebooted it since. When I ran into issues, I updated the kernel to the latest without looking at the kernel I was on just to see if that solved it. 2. And here's the real kicker. The processor in this NAS (Pentium E5200) is actually x64 capable. I must have skimmed information too quickly when I first built this years ago and thought it wasn't x64 capable. I have rebuilt the NAS and I'm now running a scrub just to make sure steps I was taking to recover didn't cause any issues. Anything else you would recommend to make sure there aren't any other issues that could have been caused by my tinkering? Thank you very much for your help as I was banging my head against a wall. This NAS does so little that I tend to get careless with it. Lesson learned and embarrassment felt. The only solace is that this might help someone else who runs into this with kernel 4.13 on a 32-bit system. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-10-18 16:40 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-10-17 20:43 Mount failing - unable to find logical Cameron Kelley 2017-10-18 1:24 ` Qu Wenruo 2017-10-18 3:22 ` Cameron Kelley 2017-10-18 4:36 ` Qu Wenruo 2017-10-18 5:10 ` Roman Mamedov 2017-10-18 16:40 ` SOLVED - 32-bit kernel 4.13 bug - " Cameron Kelley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).