* How to recover uncorrectable errors ?
@ 2013-03-08 8:54 Frédéric COIFFIER
2013-03-13 8:10 ` Frédéric COIFFIER
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Frédéric COIFFIER @ 2013-03-08 8:54 UTC (permalink / raw)
To: linux-btrfs
Hi,
I'm using a Linux 3.7.6 (Gentoo Linux) with btrfs-progs-0.20_rc1_p56 and since few days, I have some uncorrectable errors :
# btrfs scrub status /
scrub status for 6b6ea99b-edee-498d-bf07-f3a3f1cba2f3
scrub started at Thu Mar 7 20:12:31 2013 and finished after 515 seconds
total bytes scrubbed: 31.02GB with 6 errors
error details: csum=6
corrected errors: 0, uncorrectable errors: 6, unverified errors: 0
I don't know what has produced this error (maybe an hard reset or a power cut) but I use an old not-SSD hard-disk.
I have discovered this problem thanks to several errors in dmesg when I try to access to a file :
[ 2985.163718] btrfs: sda2 checksum verify failed on 26326409216 wanted 59A31CB1 found DFB0FE7F level 0
[ 2985.169191] btrfs: sda2 checksum verify failed on 26326409216 wanted 59A31CB1 found DFB0FE7F level 0
[ 2993.102810] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0
[ 2993.114213] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0
[ 2993.114527] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0
[ 2993.114795] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0
[ 2993.115097] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0
[ 2993.115349] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0
[ 2993.115585] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0
[ 2993.115956] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0
[ 2993.116260] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0
[ 2993.116558] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0
[ 2998.100230] csum_tree_block: 27408 callbacks suppressed
[ 2998.100233] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0
[ 2998.100406] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0
[ 2998.100591] btrfs: sda2 checksum verify failed on 272228352 wanted 1A0FCFD3 found 119281BE level 0
If I restart a btrfs scrub, I get these messages :
[ 3047.835131] btrfs: checksum error at logical 272228352 on dev /dev/sda2, sector 548080: metadata leaf (level 0) in tree 5
[ 3047.835134] btrfs: checksum error at logical 272228352 on dev /dev/sda2, sector 548080: metadata leaf (level 0) in tree 5
[ 3047.835137] btrfs: bdev /dev/sda2 errs: wr 0, rd 0, flush 0, corrupt 20, gen 0
[ 3047.953751] btrfs: unable to fixup (regular) error at logical 272228352 on dev /dev/sda2
[ 3052.349518] btrfs: checksum error at logical 556208128 on dev /dev/sda2, sector 1102728: metadata leaf (level 0) in tree 5
[ 3052.349521] btrfs: checksum error at logical 556208128 on dev /dev/sda2, sector 1102728: metadata leaf (level 0) in tree 5
[ 3052.349524] btrfs: bdev /dev/sda2 errs: wr 0, rd 0, flush 0, corrupt 21, gen 0
[ 3055.840357] btrfs: unable to fixup (regular) error at logical 556208128 on dev /dev/sda2
[ 3061.032879] btrfs: checksum error at logical 272228352 on dev /dev/sda2, sector 2645232: metadata leaf (level 0) in tree 5
[ 3061.032882] btrfs: checksum error at logical 272228352 on dev /dev/sda2, sector 2645232: metadata leaf (level 0) in tree 5
[ 3061.032885] btrfs: bdev /dev/sda2 errs: wr 0, rd 0, flush 0, corrupt 22, gen 0
[ 3063.014553] btrfs: unable to fixup (regular) error at logical 272228352 on dev /dev/sda2
[ 3067.758444] btrfs: checksum error at logical 556208128 on dev /dev/sda2, sector 3199880: metadata leaf (level 0) in tree 5
[ 3067.758447] btrfs: checksum error at logical 556208128 on dev /dev/sda2, sector 3199880: metadata leaf (level 0) in tree 5
[ 3067.758450] btrfs: bdev /dev/sda2 errs: wr 0, rd 0, flush 0, corrupt 23, gen 0
[ 3067.822206] btrfs: unable to fixup (regular) error at logical 556208128 on dev /dev/sda2
I tried a LiveCD to make a btrfsck [I have to check its version] but it segfaults during the test.
Today, I can't remove the file (and I can't delete its directory), updatedb runs during hours when it tries to read this file.
So, what is the best way to recover these errors (as I think that some files are definitely lost) ?
I would like to identify the corrupted files and to delete them.
Regards,
Frederic
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: How to recover uncorrectable errors ?
2013-03-08 8:54 How to recover uncorrectable errors ? Frédéric COIFFIER
@ 2013-03-13 8:10 ` Frédéric COIFFIER
2013-03-16 18:16 ` Martin Steigerwald
2013-03-16 18:19 ` Martin Steigerwald
2 siblings, 0 replies; 14+ messages in thread
From: Frédéric COIFFIER @ 2013-03-13 8:10 UTC (permalink / raw)
To: linux-btrfs
To complete my previous request, the log of btrfsck 0.20_rc1_p56 which segfaults :
checking extents
checksum verify failed on 272228352 wanted 119281BE found FFFFFFD3
checksum verify failed on 272228352 wanted 119281BE found FFFFFFD3
checksum verify failed on 272228352 wanted 119281BE found FFFFFFD3
checksum verify failed on 272228352 wanted 119281BE found FFFFFFD3
Csum didn't match
checksum verify failed on 556208128 wanted D88A417C found 11
checksum verify failed on 556208128 wanted D88A417C found 11
checksum verify failed on 556208128 wanted D88A417C found 11
checksum verify failed on 556208128 wanted D88A417C found 11
Csum didn't match
owner ref check failed [272228352 4096]
repair deleting extent record: key 272228352 168 4096
adding new tree backref on start 272228352 len 4096 parent 5 root 5
Backref 402792448 parent 2 root 2 not found in extent tree
Backref 402792448 root 2 not referenced back 0xa55ce08
Incorrect global backref count on 402792448 found 1 wanted 0
backpointer mismatch on [402792448 4096]
Backref 436817920 parent 2 root 2 not found in extent tree
Backref 436817920 root 2 not referenced back 0xa4b9488
Incorrect global backref count on 436817920 found 1 wanted 0
backpointer mismatch on [436817920 4096]
Backref 518414336 parent 2 root 2 not found in extent tree
Backref 518414336 root 2 not referenced back 0xa55b0c8
Incorrect global backref count on 518414336 found 1 wanted 0
backpointer mismatch on [518414336 4096]
Backref 540577792 parent 2 root 2 not found in extent tree
Backref 540577792 root 2 not referenced back 0x8793588
Incorrect global backref count on 540577792 found 1 wanted 0
backpointer mismatch on [540577792 4096]
owner ref check failed [556208128 4096]
repair deleting extent record: key 556208128 168 4096
adding new tree backref on start 556208128 len 4096 parent 5 root 5
Backref 565420032 parent 2 root 2 not found in extent tree
Backref 565420032 root 2 not referenced back 0xa437620
Incorrect global backref count on 565420032 found 1 wanted 0
backpointer mismatch on [565420032 4096]
Backref 597049344 parent 2 root 2 not found in extent tree
Backref 597049344 root 2 not referenced back 0xa414720
Incorrect global backref count on 597049344 found 1 wanted 0
backpointer mismatch on [597049344 4096]
Backref 610033664 parent 2 root 2 not found in extent tree
Backref 610033664 root 2 not referenced back 0xa46f250
Incorrect global backref count on 610033664 found 1 wanted 0
backpointer mismatch on [610033664 4096]
Backref 636481536 parent 2 root 2 not found in extent tree
Backref 636481536 root 2 not referenced back 0xa4d53c8
Incorrect global backref count on 636481536 found 1 wanted 0
backpointer mismatch on [636481536 4096]
Backref 673796096 parent 2 root 2 not found in extent tree
Backref 673796096 root 2 not referenced back 0xa474368
Incorrect global backref count on 673796096 found 1 wanted 0
backpointer mismatch on [673796096 4096]
Backref 717684736 parent 2 root 2 not found in extent tree
Backref 717684736 root 2 not referenced back 0x8793658
Incorrect global backref count on 717684736 found 1 wanted 0
backpointer mismatch on [717684736 4096]
Backref 739885056 parent 2 root 2 not found in extent tree
Backref 739885056 root 2 not referenced back 0xa501d20
Incorrect global backref count on 739885056 found 1 wanted 0
backpointer mismatch on [739885056 4096]
Backref 745107456 parent 2 root 2 not found in extent tree
Backref 745107456 root 2 not referenced back 0xa562260
Incorrect global backref count on 745107456 found 1 wanted 0
backpointer mismatch on [745107456 4096]
Backref 770273280 parent 2 root 2 not found in extent tree
Backref 770273280 root 2 not referenced back 0xa482138
Incorrect global backref count on 770273280 found 1 wanted 0
backpointer mismatch on [770273280 4096]
Backref 771325952 parent 2 root 2 not found in extent tree
Backref 771325952 root 2 not referenced back 0x8638260
Incorrect global backref count on 771325952 found 1 wanted 0
backpointer mismatch on [771325952 4096]
Backref 775409664 parent 2 root 2 not found in extent tree
Backref 775409664 root 2 not referenced back 0x81a3068
Incorrect global backref count on 775409664 found 1 wanted 0
backpointer mismatch on [775409664 4096]
Backref 775598080 parent 2 root 2 not found in extent tree
Backref 775598080 root 2 not referenced back 0x81a7540
Incorrect global backref count on 775598080 found 1 wanted 0
backpointer mismatch on [775598080 4096]
Backref 775700480 parent 2 root 2 not found in extent tree
Backref 775700480 root 2 not referenced back 0x8639100
Incorrect global backref count on 775700480 found 1 wanted 0
backpointer mismatch on [775700480 4096]
Backref 775729152 parent 2 root 2 not found in extent tree
Backref 775729152 root 2 not referenced back 0xa428248
Incorrect global backref count on 775729152 found 1 wanted 0
backpointer mismatch on [775729152 4096]
Backref 775761920 parent 2 root 2 not found in extent tree
Backref 775761920 root 2 not referenced back 0xa428318
Incorrect global backref count on 775761920 found 1 wanted 0
backpointer mismatch on [775761920 4096]
Backref 775892992 parent 2 root 2 not found in extent tree
Backref 775892992 root 2 not referenced back 0xa4283e8
Incorrect global backref count on 775892992 found 1 wanted 0
backpointer mismatch on [775892992 4096]
Backref 775909376 parent 2 root 2 not found in extent tree
Backref 775909376 root 2 not referenced back 0xa4284b8
Incorrect global backref count on 775909376 found 1 wanted 0
backpointer mismatch on [775909376 4096]
Backref 775950336 parent 2 root 2 not found in extent tree
Backref 775950336 root 2 not referenced back 0x86391d0
Incorrect global backref count on 775950336 found 1 wanted 0
backpointer mismatch on [775950336 4096]
Backref 776458240 parent 2 root 2 not found in extent tree
Backref 776458240 root 2 not referenced back 0xa428a68
Incorrect global backref count on 776458240 found 1 wanted 0
backpointer mismatch on [776458240 4096]
Backref 776753152 parent 2 root 2 not found in extent tree
Backref 776753152 root 2 not referenced back 0xa428728
Incorrect global backref count on 776753152 found 1 wanted 0
backpointer mismatch on [776753152 4096]
Backref 776765440 parent 2 root 2 not found in extent tree
Backref 776765440 root 2 not referenced back 0xa4288c8
Incorrect global backref count on 776765440 found 1 wanted 0
backpointer mismatch on [776765440 4096]
Backref 776851456 parent 2 root 2 not found in extent tree
Backref 776851456 root 2 not referenced back 0x8637b10
Incorrect global backref count on 776851456 found 1 wanted 0
backpointer mismatch on [776851456 4096]
Backref 777175040 parent 2 root 2 not found in extent tree
Backref 777175040 root 2 not referenced back 0x86392a0
Incorrect global backref count on 777175040 found 1 wanted 0
backpointer mismatch on [777175040 4096]
Backref 777465856 parent 2 root 2 not found in extent tree
Backref 777465856 root 2 not referenced back 0xa44bee8
Incorrect global backref count on 777465856 found 1 wanted 0
backpointer mismatch on [777465856 4096]
Backref 777633792 parent 2 root 2 not found in extent tree
Backref 777633792 root 2 not referenced back 0x8638330
Incorrect global backref count on 777633792 found 1 wanted 0
backpointer mismatch on [777633792 4096]
Backref 778764288 parent 2 root 2 not found in extent tree
Backref 778764288 root 2 not referenced back 0x8637ff0
Incorrect global backref count on 778764288 found 1 wanted 0
backpointer mismatch on [778764288 4096]
Backref 778891264 parent 2 root 2 not found in extent tree
Backref 778891264 root 2 not referenced back 0x8638740
Incorrect global backref count on 778891264 found 1 wanted 0
backpointer mismatch on [778891264 4096]
Backref 779280384 parent 2 root 2 not found in extent tree
Backref 779280384 root 2 not referenced back 0x8638e90
Incorrect global backref count on 779280384 found 1 wanted 0
backpointer mismatch on [779280384 4096]
Backref 779673600 parent 2 root 2 not found in extent tree
Backref 779673600 root 2 not referenced back 0xa428cd8
Incorrect global backref count on 779673600 found 1 wanted 0
backpointer mismatch on [779673600 4096]
Backref 781275136 parent 2 root 2 not found in extent tree
Backref 781275136 root 2 not referenced back 0xa429768
Incorrect global backref count on 781275136 found 1 wanted 0
backpointer mismatch on [781275136 4096]
Backref 781406208 parent 2 root 2 not found in extent tree
Backref 781406208 root 2 not referenced back 0xa517e80
Incorrect global backref count on 781406208 found 1 wanted 0
backpointer mismatch on [781406208 4096]
Backref 782848000 parent 2 root 2 not found in extent tree
Backref 782848000 root 2 not referenced back 0xa42c098
Incorrect global backref count on 782848000 found 1 wanted 0
backpointer mismatch on [782848000 4096]
Backref 783228928 parent 2 root 2 not found in extent tree
Backref 783228928 root 2 not referenced back 0xa429e78
Incorrect global backref count on 783228928 found 1 wanted 0
backpointer mismatch on [783228928 4096]
Backref 783355904 parent 2 root 2 not found in extent tree
Backref 783355904 root 2 not referenced back 0xa42a4f8
Incorrect global backref count on 783355904 found 1 wanted 0
backpointer mismatch on [783355904 4096]
Backref 783376384 parent 2 root 2 not found in extent tree
Backref 783376384 root 2 not referenced back 0xa42a698
Incorrect global backref count on 783376384 found 1 wanted 0
backpointer mismatch on [783376384 4096]
Backref 783405056 parent 2 root 2 not found in extent tree
Backref 783405056 root 2 not referenced back 0xa42a768
Incorrect global backref count on 783405056 found 1 wanted 0
backpointer mismatch on [783405056 4096]
Backref 784048128 parent 2 root 2 not found in extent tree
Backref 784048128 root 2 not referenced back 0xa429288
Incorrect global backref count on 784048128 found 1 wanted 0
backpointer mismatch on [784048128 4096]
Backref 784093184 parent 2 root 2 not found in extent tree
Backref 784093184 root 2 not referenced back 0xa429358
Incorrect global backref count on 784093184 found 1 wanted 0
backpointer mismatch on [784093184 4096]
Backref 784101376 parent 2 root 2 not found in extent tree
Backref 784101376 root 2 not referenced back 0xa4294f8
Incorrect global backref count on 784101376 found 1 wanted 0
backpointer mismatch on [784101376 4096]
Backref 784330752 parent 2 root 2 not found in extent tree
Backref 784330752 root 2 not referenced back 0xa426ab8
Incorrect global backref count on 784330752 found 1 wanted 0
backpointer mismatch on [784330752 4096]
Backref 784388096 parent 2 root 2 not found in extent tree
Backref 784388096 root 2 not referenced back 0xa42ab78
Incorrect global backref count on 784388096 found 1 wanted 0
backpointer mismatch on [784388096 4096]
Backref 784637952 parent 2 root 2 not found in extent tree
Backref 784637952 root 2 not referenced back 0xa42ac48
Incorrect global backref count on 784637952 found 1 wanted 0
backpointer mismatch on [784637952 4096]
Backref 784650240 parent 2 root 2 not found in extent tree
Backref 784650240 root 2 not referenced back 0xa42ade8
Incorrect global backref count on 784650240 found 1 wanted 0
backpointer mismatch on [784650240 4096]
Backref 784830464 parent 2 root 2 not found in extent tree
Backref 784830464 root 2 not referenced back 0x8637150
Incorrect global backref count on 784830464 found 1 wanted 0
backpointer mismatch on [784830464 4096]
Backref 785096704 parent 2 root 2 not found in extent tree
Backref 785096704 root 2 not referenced back 0xa42aeb8
Incorrect global backref count on 785096704 found 1 wanted 0
backpointer mismatch on [785096704 4096]
Backref 785129472 parent 2 root 2 not found in extent tree
Backref 785129472 root 2 not referenced back 0xa42aaa8
Incorrect global backref count on 785129472 found 1 wanted 0
backpointer mismatch on [785129472 4096]
Backref 785215488 parent 2 root 2 not found in extent tree
Backref 785215488 root 2 not referenced back 0xa42b058
Incorrect global backref count on 785215488 found 1 wanted 0
backpointer mismatch on [785215488 4096]
Backref 786165760 parent 2 root 2 not found in extent tree
Backref 786165760 root 2 not referenced back 0x81aada8
Incorrect global backref count on 786165760 found 1 wanted 0
backpointer mismatch on [786165760 4096]
Backref 786239488 parent 2 root 2 not found in extent tree
Backref 786239488 root 2 not referenced back 0xa44b528
Incorrect global backref count on 786239488 found 1 wanted 0
backpointer mismatch on [786239488 4096]
Backref 786452480 parent 2 root 2 not found in extent tree
Backref 786452480 root 2 not referenced back 0xa44be18
Incorrect global backref count on 786452480 found 1 wanted 0
backpointer mismatch on [786452480 4096]
Backref 786620416 parent 2 root 2 not found in extent tree
Backref 786620416 root 2 not referenced back 0xa42b468
Incorrect global backref count on 786620416 found 1 wanted 0
backpointer mismatch on [786620416 4096]
Backref 786780160 parent 2 root 2 not found in extent tree
Backref 786780160 root 2 not referenced back 0xa42b608
Incorrect global backref count on 786780160 found 1 wanted 0
backpointer mismatch on [786780160 4096]
Backref 786808832 parent 2 root 2 not found in extent tree
Backref 786808832 root 2 not referenced back 0xa503d68
Incorrect global backref count on 786808832 found 1 wanted 0
backpointer mismatch on [786808832 4096]
Backref 786870272 parent 2 root 2 not found in extent tree
Backref 786870272 root 2 not referenced back 0xa42b6d8
Incorrect global backref count on 786870272 found 1 wanted 0
backpointer mismatch on [786870272 4096]
Backref 786874368 parent 2 root 2 not found in extent tree
Backref 786874368 root 2 not referenced back 0xa42b7a8
Incorrect global backref count on 786874368 found 1 wanted 0
backpointer mismatch on [786874368 4096]
Backref 787447808 parent 2 root 2 not found in extent tree
Backref 787447808 root 2 not referenced back 0xa45a820
Incorrect global backref count on 787447808 found 1 wanted 0
backpointer mismatch on [787447808 4096]
Backref 787599360 parent 2 root 2 not found in extent tree
Backref 787599360 root 2 not referenced back 0xa42ca58
Incorrect global backref count on 787599360 found 1 wanted 0
backpointer mismatch on [787599360 4096]
Backref 787660800 parent 2 root 2 not found in extent tree
Backref 787660800 root 2 not referenced back 0xa42ba18
Incorrect global backref count on 787660800 found 1 wanted 0
backpointer mismatch on [787660800 4096]
Backref 787668992 parent 2 root 2 not found in extent tree
Backref 787668992 root 2 not referenced back 0xa42bae8
Incorrect global backref count on 787668992 found 1 wanted 0
backpointer mismatch on [787668992 4096]
Backref 787922944 parent 2 root 2 not found in extent tree
Backref 787922944 root 2 not referenced back 0x8638190
Incorrect global backref count on 787922944 found 1 wanted 0
backpointer mismatch on [787922944 4096]
Backref 788348928 parent 2 root 2 not found in extent tree
Backref 788348928 root 2 not referenced back 0x86385a0
Incorrect global backref count on 788348928 found 1 wanted 0
backpointer mismatch on [788348928 4096]
Backref 788504576 parent 2 root 2 not found in extent tree
Backref 788504576 root 2 not referenced back 0xa4e3460
Incorrect global backref count on 788504576 found 1 wanted 0
backpointer mismatch on [788504576 4096]
Backref 788635648 parent 2 root 2 not found in extent tree
Backref 788635648 root 2 not referenced back 0xa42bc88
Incorrect global backref count on 788635648 found 1 wanted 0
backpointer mismatch on [788635648 4096]
Backref 788688896 parent 2 root 2 not found in extent tree
Backref 788688896 root 2 not referenced back 0xa42be28
Incorrect global backref count on 788688896 found 1 wanted 0
backpointer mismatch on [788688896 4096]
Backref 788709376 parent 2 root 2 not found in extent tree
Backref 788709376 root 2 not referenced back 0xa42bef8
Incorrect global backref count on 788709376 found 1 wanted 0
backpointer mismatch on [788709376 4096]
Backref 788717568 parent 2 root 2 not found in extent tree
Backref 788717568 root 2 not referenced back 0xa42bfc8
Incorrect global backref count on 788717568 found 1 wanted 0
backpointer mismatch on [788717568 4096]
Backref 790511616 parent 2 root 2 not found in extent tree
Backref 790511616 root 2 not referenced back 0xa44bd48
Incorrect global backref count on 790511616 found 1 wanted 0
backpointer mismatch on [790511616 4096]
Backref 790540288 parent 2 root 2 not found in extent tree
Backref 790540288 root 2 not referenced back 0xa42c988
Incorrect global backref count on 790540288 found 1 wanted 0
backpointer mismatch on [790540288 4096]
Backref 790740992 parent 2 root 2 not found in extent tree
Backref 790740992 root 2 not referenced back 0x86389b0
Incorrect global backref count on 790740992 found 1 wanted 0
backpointer mismatch on [790740992 4096]
Backref 790753280 parent 2 root 2 not found in extent tree
Backref 790753280 root 2 not referenced back 0x8638cf0
Incorrect global backref count on 790753280 found 1 wanted 0
backpointer mismatch on [790753280 4096]
Backref 792076288 parent 2 root 2 not found in extent tree
Backref 792076288 root 2 not referenced back 0x86384d0
Incorrect global backref count on 792076288 found 1 wanted 0
backpointer mismatch on [792076288 4096]
Backref 793780224 parent 2 root 2 not found in extent tree
Backref 793780224 root 2 not referenced back 0x81baea8
Incorrect global backref count on 793780224 found 1 wanted 0
backpointer mismatch on [793780224 4096]
Backref 793821184 parent 2 root 2 not found in extent tree
Backref 793821184 root 2 not referenced back 0x8637080
Incorrect global backref count on 793821184 found 1 wanted 0
backpointer mismatch on [793821184 4096]
Backref 793853952 parent 2 root 2 not found in extent tree
Backref 793853952 root 2 not referenced back 0x86378a0
Incorrect global backref count on 793853952 found 1 wanted 0
backpointer mismatch on [793853952 4096]
ref mismatch on [31078330368 20480] extent item 1, found 0
repair deleting extent record: key 31078330368 168 20480
Incorrect local backref count on 31078330368 root 5 owner 3024239 offset 0 found 0 wanted 1 back 0x8f00580
backpointer mismatch on [31078330368 20480]
owner ref check failed [31078330368 20480]
repaired damaged extent references
checking fs roots
checksum verify failed on 272228352 wanted 119281BE found FFFFFFD3
checksum verify failed on 272228352 wanted 119281BE found FFFFFFD3
checksum verify failed on 272228352 wanted 119281BE found FFFFFFD3
checksum verify failed on 272228352 wanted 119281BE found FFFFFFD3
Csum didn't match
Regards,
Frederic
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: How to recover uncorrectable errors ?
2013-03-08 8:54 How to recover uncorrectable errors ? Frédéric COIFFIER
2013-03-13 8:10 ` Frédéric COIFFIER
@ 2013-03-16 18:16 ` Martin Steigerwald
2013-03-20 13:33 ` Frédéric COIFFIER
2013-03-16 18:19 ` Martin Steigerwald
2 siblings, 1 reply; 14+ messages in thread
From: Martin Steigerwald @ 2013-03-16 18:16 UTC (permalink / raw)
To: linux-btrfs; +Cc: Frédéric COIFFIER
Am Freitag, 8. März 2013 schrieb Frédéric COIFFIER:
> Hi,
Hi Frédéric,
> I'm using a Linux 3.7.6 (Gentoo Linux) with btrfs-progs-0.20_rc1_p56 and since few days, I have some uncorrectable errors :
>
> # btrfs scrub status /
> scrub status for 6b6ea99b-edee-498d-bf07-f3a3f1cba2f3
> scrub started at Thu Mar 7 20:12:31 2013 and finished after 515 seconds
> total bytes scrubbed: 31.02GB with 6 errors
> error details: csum=6
> corrected errors: 0, uncorrectable errors: 6, unverified errors: 0
>
> I don't know what has produced this error (maybe an hard reset or a power cut) but I use an old not-SSD hard-disk.
This disk is still fine? Is smartctl -a happy with it?
> I have discovered this problem thanks to several errors in dmesg when I try to access to a file :
>
> [ 2985.163718] btrfs: sda2 checksum verify failed on 26326409216 wanted 59A31CB1 found DFB0FE7F level 0
> [ 2985.169191] btrfs: sda2 checksum verify failed on 26326409216 wanted 59A31CB1 found DFB0FE7F level 0
[…]
> If I restart a btrfs scrub, I get these messages :
>
> [ 3047.835131] btrfs: checksum error at logical 272228352 on dev /dev/sda2, sector 548080: metadata leaf (level 0) in tree 5
> [ 3047.835134] btrfs: checksum error at logical 272228352 on dev /dev/sda2, sector 548080: metadata leaf (level 0) in tree 5
> [ 3047.835137] btrfs: bdev /dev/sda2 errs: wr 0, rd 0, flush 0, corrupt 20, gen 0
> [ 3047.953751] btrfs: unable to fixup (regular) error at logical 272228352 on dev /dev/sda2
[…]
> I tried a LiveCD to make a btrfsck [I have to check its version] but it segfaults during the test.
>
> Today, I can't remove the file (and I can't delete its directory), updatedb runs during hours when it tries to read this file.
> So, what is the best way to recover these errors (as I think that some files are definitely lost) ?
> I would like to identify the corrupted files and to delete them.
I thought that with recent kernels BTRFS would report the file which is
affected, but here it doesn´t seem so.
I think its also possibe to find out the file from the block number. But I
do not remember the direct way to do it. I only know the other way around
with filefrag -v or hdparm --fibmap - well actually file thinking on it,
vice versa needs to have knowledge of filesystem structure… Maybe its
possible to map something in the output in btrfs-debug-tree to above output.
But I really think BTRFS displays the filename affected meanwhile. So
maybe if it does not, its some metadata being affected? So output of btrfsck
hints at that and that you can´t remove the file does as well. What happens
if you try to remove the file? Do you get an input/output error or
something like that?
Maybe someone else can help with that.
Aside from that: Thats uncorrectable errors for a reason :)
Thanks,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: How to recover uncorrectable errors ?
2013-03-08 8:54 How to recover uncorrectable errors ? Frédéric COIFFIER
2013-03-13 8:10 ` Frédéric COIFFIER
2013-03-16 18:16 ` Martin Steigerwald
@ 2013-03-16 18:19 ` Martin Steigerwald
2 siblings, 0 replies; 14+ messages in thread
From: Martin Steigerwald @ 2013-03-16 18:19 UTC (permalink / raw)
To: linux-btrfs; +Cc: Frédéric COIFFIER
Am Freitag, 8. März 2013 schrieb Frédéric COIFFIER:
> Today, I can't remove the file (and I can't delete its directory),
> updatedb runs during hours when it tries to read this file. So, what is
> the best way to recover these errors (as I think that some files are
> definitely lost) ? I would like to identify the corrupted files and to
> delete them.
Well, if nothing else works, you can still make a backup, diff it with an
older backup to possible recover the corrupted files or at least older
versions of it and redo the filesystem. After verify that the hardware
works okay :)
As said, these errors are called uncorrectable for a reason. When they
happen on file data it should be possible to delete the offending file,
but then AFAIK BTRFS also reports on which file they happen.
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: How to recover uncorrectable errors ?
2013-03-16 18:16 ` Martin Steigerwald
@ 2013-03-20 13:33 ` Frédéric COIFFIER
2013-03-20 18:19 ` Chris Murphy
2013-03-20 18:59 ` Martin Steigerwald
0 siblings, 2 replies; 14+ messages in thread
From: Frédéric COIFFIER @ 2013-03-20 13:33 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: linux-btrfs
Hi Martin,
Thank you for your reply.
Le samedi 16 mars 2013 19:16:54 Martin Steigerwald a écrit :
> Am Freitag, 8. März 2013 schrieb Frédéric COIFFIER:
> > # btrfs scrub status /
> > scrub status for 6b6ea99b-edee-498d-bf07-f3a3f1cba2f3
> > scrub started at Thu Mar 7 20:12:31 2013 and finished after 515 seconds
> > total bytes scrubbed: 31.02GB with 6 errors
> > error details: csum=6
> > corrected errors: 0, uncorrectable errors: 6, unverified errors: 0
> >
> > I don't know what has produced this error (maybe an hard reset or a power cut) but I use an old not-SSD hard-disk.
>
> This disk is still fine? Is smartctl -a happy with it?
It is old but it seems to be fine :
9 Power_On_Hours 0x0032 077 077 000 Old_age Always - 20238
...
195 Hardware_ECC_Recovered 0x001a 057 055 000 Old_age Always - 63508940
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 1
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 Data_Address_Mark_Errs 0x0032 100 253 000 Old_age Always - 0
...
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 15811 -
# 2 Short offline Aborted by host 20% 13984 -
# 3 Short offline Completed without error 00% 13984 -
# 4 Short offline Completed without error 00% 187 -
> > Today, I can't remove the file (and I can't delete its directory), updatedb runs during hours when it tries to read this file.
> > So, what is the best way to recover these errors (as I think that some files are definitely lost) ?
> > I would like to identify the corrupted files and to delete them.
>
> I thought that with recent kernels BTRFS would report the file which is
> affected, but here it doesn´t seem so.
Yes, I read on a mailing list that a patch was proposed but with 3.8.1, it doesn't work.
> I think its also possibe to find out the file from the block number. But I
> do not remember the direct way to do it. I only know the other way around
> with filefrag -v or hdparm --fibmap - well actually file thinking on it,
> vice versa needs to have knowledge of filesystem structure… Maybe its
> possible to map something in the output in btrfs-debug-tree to above output.
In fact, yesterday, I make an rsync from btrfs to ext4 and rsync has reported "Stale NFS handle errors" for these files.
So, now there are now longer problem about that.
The most annoying thing is that we can't delete these files. So, the only way to solve these problems is to replace the filesystem.
> But I really think BTRFS displays the filename affected meanwhile. So
> maybe if it does not, its some metadata being affected? So output of btrfsck
> hints at that and that you can´t remove the file does as well. What happens
> if you try to remove the file? Do you get an input/output error or
> something like that?
# rm -rf *
rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
...
> Maybe someone else can help with that.
>
> Aside from that: Thats uncorrectable errors for a reason :)
Yes, I absolutely agree that we can't recover some files but btrfsck sould propose to recover these error (like fsck.ext4) even if we loose some data.
In fact, I never got this kind of problem with ext filesystems.
Regards,
Frederic
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: How to recover uncorrectable errors ?
2013-03-20 13:33 ` Frédéric COIFFIER
@ 2013-03-20 18:19 ` Chris Murphy
2013-03-20 19:24 ` Roman Mamedov
2013-03-20 18:59 ` Martin Steigerwald
1 sibling, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2013-03-20 18:19 UTC (permalink / raw)
To: Frédéric COIFFIER; +Cc: Martin Steigerwald, linux-btrfs
On Mar 20, 2013, at 7:33 AM, Frédéric COIFFIER <frederic.coiffier@free.fr> wrote:
>
> 195 Hardware_ECC_Recovered 0x001a 057 055 000 Old_age Always - 63508940
> 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
> 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
> 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 1
> 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
> 202 Data_Address_Mark_Errs 0x0032 100 253 000 Old_age Always - 0
With such high ECC recovered events, I suspect SDC. The value is in manufacturer's tolerance to not fail the drive outright, but the ECC in a consumer SATA drive isn't fool proof. It will fail to detect some errors, and report bad data back to the file system. It will detect and incorrectly "correct" others. Even if most error is detected and correctly corrected, bottom line is you have a file system that knows better and it's saying something is significantly wrong.
If you're going to continue to use the drive, I would at least use hdparm to issue ATA enhanced security erase unit. Then I'd take a smartctl -x capture for reference. Then do an extended offline smart test with -t long, which this drive has never had in its lifetime. And another smartctl -x to compare to the reference and see if either the test completed or failed, and whether any of the attributes changed appreciably during the offline test. Otherwise get a replacement.
The one off UDMA error isn't a media error, but communication between drive and controller, I wouldn't be overly concerned with that.
> The most annoying thing is that we can't delete these files. So, the only way to solve these problems is to replace the filesystem.
The storage media isn't reliable. Replacing the file system eventually will get you right back where you are now, except in a case of multiple devices with a reliable 2nd device.
Chris Murphy
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: How to recover uncorrectable errors ?
2013-03-20 13:33 ` Frédéric COIFFIER
2013-03-20 18:19 ` Chris Murphy
@ 2013-03-20 18:59 ` Martin Steigerwald
2013-03-20 19:06 ` cwillu
2013-03-21 8:36 ` Frédéric COIFFIER
1 sibling, 2 replies; 14+ messages in thread
From: Martin Steigerwald @ 2013-03-20 18:59 UTC (permalink / raw)
To: Frédéric COIFFIER; +Cc: linux-btrfs
Am Mittwoch, 20. März 2013 schrieb Frédéric COIFFIER:
> > But I really think BTRFS displays the filename affected meanwhile. So
> > maybe if it does not, its some metadata being affected? So output of btrfsck
> > hints at that and that you can´t remove the file does as well. What happens
> > if you try to remove the file? Do you get an input/output error or
> > something like that?
>
> # rm -rf *
> rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
> rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
> rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
> rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
> rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
> rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
> rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
> rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
> ...
You are trying to remove the files from an NFS client. Stale NFS file
handle just means that the NFS handle is no longer valid. NFS <v4
clients refer to file by a file handle composed of filesystem id and
inode number. Maybe a change in there?
Anyway, to find the real error message its necessary to try to delete
the files on the server. Cause even if there is a real BTRFS issue, the
NFS client likely won´t report helpful error messages.
Thanks,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: How to recover uncorrectable errors ?
2013-03-20 18:59 ` Martin Steigerwald
@ 2013-03-20 19:06 ` cwillu
2013-03-21 8:36 ` Frédéric COIFFIER
1 sibling, 0 replies; 14+ messages in thread
From: cwillu @ 2013-03-20 19:06 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: Frédéric COIFFIER, linux-btrfs
>> # rm -rf *
>> rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
>> rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
>> rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
>> rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
>> rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
>> rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
>> rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
>> rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
>> ...
>
> You are trying to remove the files from an NFS client. Stale NFS file
> handle just means that the NFS handle is no longer valid. NFS <v4
> clients refer to file by a file handle composed of filesystem id and
> inode number. Maybe a change in there?
>
> Anyway, to find the real error message its necessary to try to delete
> the files on the server. Cause even if there is a real BTRFS issue, the
> NFS client likely won´t report helpful error messages.
Don't read too much into that "Stale NFS file handle" message; ESTALE
doesn't imply anything about NFS being involved, despite the standard
error string for that value.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: How to recover uncorrectable errors ?
2013-03-20 18:19 ` Chris Murphy
@ 2013-03-20 19:24 ` Roman Mamedov
2013-03-20 20:17 ` Chris Murphy
2013-03-21 8:57 ` Frédéric COIFFIER
0 siblings, 2 replies; 14+ messages in thread
From: Roman Mamedov @ 2013-03-20 19:24 UTC (permalink / raw)
To: Chris Murphy; +Cc: Frédéric COIFFIER, Martin Steigerwald, linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 399 bytes --]
On Wed, 20 Mar 2013 12:19:18 -0600
Chris Murphy <lists@colorremedies.com> wrote:
> > 195 Hardware_ECC_Recovered 0x001a 057 055 000 Old_age Always - 63508940
> With such high ECC recovered events, I suspect SDC.
If it's a Seagate drive, this is absolutely normal.
All Seagate drives have a high value in SMART Hardware_ECC_Recovered.
--
With respect,
Roman
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: How to recover uncorrectable errors ?
2013-03-20 19:24 ` Roman Mamedov
@ 2013-03-20 20:17 ` Chris Murphy
2013-03-21 8:57 ` Frédéric COIFFIER
1 sibling, 0 replies; 14+ messages in thread
From: Chris Murphy @ 2013-03-20 20:17 UTC (permalink / raw)
To: Roman Mamedov
Cc: Frédéric COIFFIER, Martin Steigerwald, linux-btrfs
On Mar 20, 2013, at 1:24 PM, Roman Mamedov <rm@romanrm.ru> wrote:
> On Wed, 20 Mar 2013 12:19:18 -0600
> Chris Murphy <lists@colorremedies.com> wrote:
>
>>> 195 Hardware_ECC_Recovered 0x001a 057 055 000 Old_age Always - 63508940
>
>> With such high ECC recovered events, I suspect SDC.
>
> If it's a Seagate drive, this is absolutely normal.
> All Seagate drives have a high value in SMART Hardware_ECC_Recovered.
http://forums.seagate.com/t5/Barracuda-XT-Barracuda-Barracuda/Seagate-s-Seek-Error-Rate-Raw-Read-Error-Rate-and-Hardware-ECC/td-p/122382
http://www.silentpcreview.com/forums/viewtopic.php?t=57212
If I read this correctly, the read error rate and hardware ECC recovered are sector counts, so they should be the same.
Nevertheless, the file system isn't happy about checksums. It's not that it isn't finding the checksum data, it's finding errors with it.
Chris Murphy
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: How to recover uncorrectable errors ?
2013-03-20 18:59 ` Martin Steigerwald
2013-03-20 19:06 ` cwillu
@ 2013-03-21 8:36 ` Frédéric COIFFIER
2013-03-21 13:27 ` Martin Steigerwald
1 sibling, 1 reply; 14+ messages in thread
From: Frédéric COIFFIER @ 2013-03-21 8:36 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: linux-btrfs
Hi Martin,
Le mercredi 20 mars 2013 19:59:54 Martin Steigerwald a écrit :
> Am Mittwoch, 20. März 2013 schrieb Frédéric COIFFIER:
> > > But I really think BTRFS displays the filename affected meanwhile. So
> > > maybe if it does not, its some metadata being affected? So output of btrfsck
> > > hints at that and that you can´t remove the file does as well. What happens
> > > if you try to remove the file? Do you get an input/output error or
> > > something like that?
> >
> > # rm -rf *
> > rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
> > rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
> > rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
> > rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
> > rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
> > rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
> > rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
> > rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
> > ...
>
> You are trying to remove the files from an NFS client. Stale NFS file
> handle just means that the NFS handle is no longer valid.
Absolutely not. I'm not using NFS and I try to remove the files locally.
It seems that btrfs returns a strange ESTALE errno...
grep -rsn ESTALE fs/btrfs/
fs/btrfs/inode.c:2412: if (ret && ret != -ESTALE)
fs/btrfs/inode.c:2415: if (ret == -ESTALE && root == root->fs_info->tree_root) {
fs/btrfs/inode.c:2451: if (ret == -ESTALE) {
fs/btrfs/inode.c:4273: inode = ERR_PTR(-ESTALE);
fs/btrfs/export.c:71: return ERR_PTR(-ESTALE);
fs/btrfs/export.c:104: return ERR_PTR(-ESTALE);
This error seems to be common (even if I can't see any recent reports) :
http://www.google.fr/search?q=btrfs+Stale+NFS+file+handle
Regards,
Frederic
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: How to recover uncorrectable errors ?
2013-03-20 19:24 ` Roman Mamedov
2013-03-20 20:17 ` Chris Murphy
@ 2013-03-21 8:57 ` Frédéric COIFFIER
2013-03-21 15:09 ` Chris Murphy
1 sibling, 1 reply; 14+ messages in thread
From: Frédéric COIFFIER @ 2013-03-21 8:57 UTC (permalink / raw)
To: Roman Mamedov; +Cc: Chris Murphy, Martin Steigerwald, linux-btrfs
Hi Roman,
Le jeudi 21 mars 2013 01:24:14 Roman Mamedov a écrit :
> On Wed, 20 Mar 2013 12:19:18 -0600
> Chris Murphy <lists@colorremedies.com> wrote:
>
> > > 195 Hardware_ECC_Recovered 0x001a 057 055 000 Old_age Always - 63508940
>
> > With such high ECC recovered events, I suspect SDC.
>
> If it's a Seagate drive, this is absolutely normal.
> All Seagate drives have a high value in SMART Hardware_ECC_Recovered.
You're right : it's a Seagate :
Model Family: Seagate Barracuda 7200.10
Device Model: ST3320620AS
Regards,
Frederic
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: How to recover uncorrectable errors ?
2013-03-21 8:36 ` Frédéric COIFFIER
@ 2013-03-21 13:27 ` Martin Steigerwald
0 siblings, 0 replies; 14+ messages in thread
From: Martin Steigerwald @ 2013-03-21 13:27 UTC (permalink / raw)
To: Frédéric COIFFIER; +Cc: linux-btrfs
Am Donnerstag, 21. März 2013 schrieb Frédéric COIFFIER:
> Hi Martin,
>
> Le mercredi 20 mars 2013 19:59:54 Martin Steigerwald a écrit :
> > Am Mittwoch, 20. März 2013 schrieb Frédéric COIFFIER:
> > > > But I really think BTRFS displays the filename affected meanwhile. So
> > > > maybe if it does not, its some metadata being affected? So output of btrfsck
> > > > hints at that and that you can´t remove the file does as well. What happens
> > > > if you try to remove the file? Do you get an input/output error or
> > > > something like that?
> > >
> > > # rm -rf *
> > > rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
> > > rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
> > > rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
> > > rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
> > > rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
> > > rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
> > > rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
> > > rm: cannot remove 'drivers/misc/lis3lv02d/lis3lv02d.c': Stale NFS file handle
> > > ...
> >
> > You are trying to remove the files from an NFS client. Stale NFS file
> > handle just means that the NFS handle is no longer valid.
>
> Absolutely not. I'm not using NFS and I try to remove the files locally.
> It seems that btrfs returns a strange ESTALE errno...
>
> grep -rsn ESTALE fs/btrfs/
> fs/btrfs/inode.c:2412: if (ret && ret != -ESTALE)
> fs/btrfs/inode.c:2415: if (ret == -ESTALE && root == root->fs_info->tree_root) {
> fs/btrfs/inode.c:2451: if (ret == -ESTALE) {
> fs/btrfs/inode.c:4273: inode = ERR_PTR(-ESTALE);
> fs/btrfs/export.c:71: return ERR_PTR(-ESTALE);
> fs/btrfs/export.c:104: return ERR_PTR(-ESTALE);
>
> This error seems to be common (even if I can't see any recent reports) :
> http://www.google.fr/search?q=btrfs+Stale+NFS+file+handle
Thanks for notice. Well I thought one can take the error message literally.
I only ever saw it with NFS and its also NFS in the error message. I think
the error message is at least misleading.
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: How to recover uncorrectable errors ?
2013-03-21 8:57 ` Frédéric COIFFIER
@ 2013-03-21 15:09 ` Chris Murphy
0 siblings, 0 replies; 14+ messages in thread
From: Chris Murphy @ 2013-03-21 15:09 UTC (permalink / raw)
To: linux-btrfs@vger.kernel.org BTRFS
On Mar 21, 2013, at 2:57 AM, Frédéric COIFFIER <frederic.coiffier@free.fr> wrote:
> Hi Roman,
>
> Le jeudi 21 mars 2013 01:24:14 Roman Mamedov a écrit :
>> On Wed, 20 Mar 2013 12:19:18 -0600
>> Chris Murphy <lists@colorremedies.com> wrote:
>>
>>>> 195 Hardware_ECC_Recovered 0x001a 057 055 000 Old_age Always - 63508940
>>
>>> With such high ECC recovered events, I suspect SDC.
>>
>> If it's a Seagate drive, this is absolutely normal.
>> All Seagate drives have a high value in SMART Hardware_ECC_Recovered.
>
> You're right : it's a Seagate :
Your first post, btrfs scrub, contains checksum errors in metadata. It reports two logical values, at four sector values. So that tells me this is metadata profile raid1. And because this isn't a fixable error, it sounds like the mirrored metadata agree with each other, but the data itself has changed. I don't think that's due to a reset or powerloss during a write.
The source of the problem sounds to me like SDC. Some parts of the drive have bad sectors and the drive is returning the wrong data, and the FS knows this.
previously:
> Yes, I absolutely agree that we can't recover some files but btrfsck sould propose to recover these error (like fsck.ext4) even if we loose some data.
> In fact, I never got this kind of problem with ext filesystems.
It's not a fair comparison. ext is stable. btrfs is not. ext's fsck repairs by default, btrfs's does not. There are no suggestions users ask devs on a list before running fsck repair on ext, but that is the case for btrfs. So far no dev has suggested using the --repair flag. I don't know whether this would help get the file system to allow the deletion of corrupt files.
There have been many changes since kernel 3.7.4 so I suspect a dev would want you to try something newer, and also much newer progs as well.
In any case, I would still use enhanced security erase on the drive, and then do a smartctl -t long (extended offline) test, and then ensure it completes after the estimated time with smartctl -a or -x.
Chris Murphy
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2013-03-21 15:09 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-08 8:54 How to recover uncorrectable errors ? Frédéric COIFFIER
2013-03-13 8:10 ` Frédéric COIFFIER
2013-03-16 18:16 ` Martin Steigerwald
2013-03-20 13:33 ` Frédéric COIFFIER
2013-03-20 18:19 ` Chris Murphy
2013-03-20 19:24 ` Roman Mamedov
2013-03-20 20:17 ` Chris Murphy
2013-03-21 8:57 ` Frédéric COIFFIER
2013-03-21 15:09 ` Chris Murphy
2013-03-20 18:59 ` Martin Steigerwald
2013-03-20 19:06 ` cwillu
2013-03-21 8:36 ` Frédéric COIFFIER
2013-03-21 13:27 ` Martin Steigerwald
2013-03-16 18:19 ` Martin Steigerwald
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).