linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* uncorrectable errors in Raid 10
@ 2017-11-19 16:21 Steffen Sindzinski
  0 siblings, 0 replies; 9+ messages in thread
From: Steffen Sindzinski @ 2017-11-19 16:21 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I have done a scrub on my Btrfs Raid 10 and have 2 uncorrectable errors. 
In fact I cannot access 2 directories, even as root, permission is 
denied and all directory attributes in ls -la are ????.

Before I have run this filesystem as Raid 1 with 3 disks without any 
problems for more than a year. Scrubbed regularily. A month ago I added 
a fouth HDD and balanced to Raid10. I am not sure if I did a scub 
afterwards, but I usualy do. Now it found that errors. Smart status of 
HDDs is healthy. The 2 directories were read-only for some years and not 
even read in the last month.

What can I do now? Should I do a btrfs rescue? Which device, it is a 
raid 10? Probably my files are OK, only the directories I cannot access. 
How to recover the files?

Thanks in advance!

Steffen


Here is my data:


Linux bigbox 4.13.0-17-generic #20-Ubuntu SMP Mon Nov 6 10:04:08 UTC 
2017 x86_64 x86_64 x86_64 GNU/Linux

btrfs-progs v4.12

Label: 'Videos'  uuid: 4fafd0d4-7dd9-4dcc-9a33-5f1ad9555358
     Total devices 4 FS bytes used 1.44TiB
     devid    3 size 1.82TiB used 785.56GiB path /dev/sdc2
     devid    4 size 1.82TiB used 785.56GiB path /dev/sde2
     devid    5 size 1.82TiB used 785.56GiB path /dev/sdd2
     devid    6 size 1.36TiB used 785.56GiB path /dev/sdf1

Data, RAID10: total=1.51TiB, used=1.43TiB
System, RAID10: total=128.00MiB, used=192.00KiB
Metadata, RAID10: total=21.00GiB, used=17.76GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


  % sudo btrfs scrub start -Bd /
scrub device /dev/sdc2 (id 3) canceled
     scrub started at Sun Nov 19 08:43:21 2017 and was aborted after 
01:58:27
     total bytes scrubbed: 568.86GiB with 0 errors
scrub device /dev/sde2 (id 4) canceled
     scrub started at Sun Nov 19 08:43:21 2017 and was aborted after 
01:58:28
     total bytes scrubbed: 702.88GiB with 1 errors
     error details: verify=1
     corrected errors: 0, uncorrectable errors: 1, unverified errors: 0
scrub device /dev/sdd2 (id 5) done
     scrub started at Sun Nov 19 08:43:21 2017 and finished after 01:48:00
     total bytes scrubbed: 737.74GiB with 1 errors
     error details: verify=1
     corrected errors: 0, uncorrectable errors: 1, unverified errors: 0
scrub device /dev/sdf1 (id 6) canceled
     scrub started at Sun Nov 19 08:43:21 2017 and was aborted after 
01:58:28
     total bytes scrubbed: 506.02GiB with 0 errors




[    4.985712] BTRFS: device label Videos devid 6 transid 772463 /dev/sdf1
[    4.985882] BTRFS: device label Videos devid 4 transid 772463 /dev/sde2
[    4.986541] BTRFS: device label Videos devid 5 transid 772463 /dev/sdd2
[    4.986713] BTRFS: device label Videos devid 3 transid 772463 /dev/sdc2
[    5.007986] BTRFS info (device sdc2): disk space caching is enabled
[    5.007988] BTRFS info (device sdc2): has skinny extents
[    5.149910] BTRFS info (device sdc2): bdev /dev/sdd2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 3
[    5.149916] BTRFS info (device sdc2): bdev /dev/sde2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 3
[   25.482987] BTRFS info (device sdc2): use lzo compression
[   25.482990] BTRFS info (device sdc2): disk space caching is enabled
[57830.611730] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf 
(level 0) in tree 318
[57830.611732] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf 
(level 0) in tree 318
[57830.611734] BTRFS error (device sdc2): bdev /dev/sdd2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 4
[57832.688689] BTRFS error (device sdc2): unable to fixup (regular) 
error at logical 17478699876352 on dev /dev/sdd2
[57870.488081] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sde2, sector 348423360: metadata leaf 
(level 0) in tree 318
[57870.488083] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sde2, sector 348423360: metadata leaf 
(level 0) in tree 318
[57870.488085] BTRFS error (device sdc2): bdev /dev/sde2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 4
[57870.500114] BTRFS error (device sdc2): unable to fixup (regular) 
error at logical 17478699876352 on dev /dev/sde2
[88979.005712] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf 
(level 0) in tree 318
[88979.005718] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf 
(level 0) in tree 318
[88979.005720] BTRFS error (device sdc2): bdev /dev/sdd2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 5
[88979.036670] BTRFS error (device sdc2): unable to fixup (regular) 
error at logical 17478699876352 on dev /dev/sdd2
[89026.609561] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sde2, sector 348423360: metadata leaf 
(level 0) in tree 318
[89026.609563] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sde2, sector 348423360: metadata leaf 
(level 0) in tree 318
[89026.609566] BTRFS error (device sdc2): bdev /dev/sde2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 5
[89026.617423] BTRFS error (device sdc2): unable to fixup (regular) 
error at logical 17478699876352 on dev /dev/sde2
[97765.699866] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf 
(level 0) in tree 318
[97765.699869] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf 
(level 0) in tree 318
[97765.699871] BTRFS error (device sdc2): bdev /dev/sdd2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 6
[97765.735887] BTRFS error (device sdc2): unable to fixup (regular) 
error at logical 17478699876352 on dev /dev/sdd2
[97789.909036] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sde2, sector 348423360: metadata leaf 
(level 0) in tree 318
[97789.909039] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sde2, sector 348423360: metadata leaf 
(level 0) in tree 318
[97789.909041] BTRFS error (device sdc2): bdev /dev/sde2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 6
[97789.918254] BTRFS error (device sdc2): unable to fixup (regular) 
error at logical 17478699876352 on dev /dev/sde2
[104087.660640] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf 
(level 0) in tree 318
[104087.660643] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf 
(level 0) in tree 318
[104087.660645] BTRFS error (device sdc2): bdev /dev/sdd2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 7
[104087.696248] BTRFS error (device sdc2): unable to fixup (regular) 
error at logical 17478699876352 on dev /dev/sdd2
[104126.980483] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sde2, sector 348423360: metadata leaf 
(level 0) in tree 318
[104126.980486] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sde2, sector 348423360: metadata leaf 
(level 0) in tree 318
[104126.980488] BTRFS error (device sdc2): bdev /dev/sde2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 7
[104126.995705] BTRFS error (device sdc2): unable to fixup (regular) 
error at logical 17478699876352 on dev /dev/sde2


steffen@bigbox ~ % sudo btrfs device stats /
[/dev/sdc2].write_io_errs    0
[/dev/sdc2].read_io_errs     0
[/dev/sdc2].flush_io_errs    0
[/dev/sdc2].corruption_errs  0
[/dev/sdc2].generation_errs  0
[/dev/sde2].write_io_errs    0
[/dev/sde2].read_io_errs     0
[/dev/sde2].flush_io_errs    0
[/dev/sde2].corruption_errs  0
[/dev/sde2].generation_errs  7
[/dev/sdd2].write_io_errs    0
[/dev/sdd2].read_io_errs     0
[/dev/sdd2].flush_io_errs    0
[/dev/sdd2].corruption_errs  0
[/dev/sdd2].generation_errs  7
[/dev/sdf1].write_io_errs    0
[/dev/sdf1].read_io_errs     0
[/dev/sdf1].flush_io_errs    0
[/dev/sdf1].corruption_errs  0
[/dev/sdf1].generation_errs  0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* uncorrectable errors in Raid 10
@ 2017-11-19 19:31 Steffen Sindzinski
  2017-11-20  2:03 ` Chris Murphy
  0 siblings, 1 reply; 9+ messages in thread
From: Steffen Sindzinski @ 2017-11-19 19:31 UTC (permalink / raw)
  To: linux-btrfs


Hello,

I have done a scrub on my Btrfs Raid 10 and have 2 uncorrectable errors. 
In fact I cannot access 2 directories, even as root, permission is 
denied and all directory attributes in ls -la are ????.

Before I have run this filesystem as Raid 1 with 3 disks without any 
problems for more than a year. Scrubbed regularily. A month ago I added 
a fouth HDD and balanced to Raid10. I am not sure if I did a scub 
afterwards, but I usualy do. Now it found that errors. Smart status of 
HDDs is healthy. The 2 directories were read-only for some years and not 
even read in the last month.

What can I do now? Should I do a btrfs rescue? Which device, it is a 
raid 10? Probably my files are OK, only the directories I cannot access. 
How to recover the files?

Thanks in advance!

Steffen


Here is my data:


Linux bigbox 4.13.0-17-generic #20-Ubuntu SMP Mon Nov 6 10:04:08 UTC 
2017 x86_64 x86_64 x86_64 GNU/Linux

btrfs-progs v4.12

Label: 'Videos'  uuid: 4fafd0d4-7dd9-4dcc-9a33-5f1ad9555358
     Total devices 4 FS bytes used 1.44TiB
     devid    3 size 1.82TiB used 785.56GiB path /dev/sdc2
     devid    4 size 1.82TiB used 785.56GiB path /dev/sde2
     devid    5 size 1.82TiB used 785.56GiB path /dev/sdd2
     devid    6 size 1.36TiB used 785.56GiB path /dev/sdf1

Data, RAID10: total=1.51TiB, used=1.43TiB
System, RAID10: total=128.00MiB, used=192.00KiB
Metadata, RAID10: total=21.00GiB, used=17.76GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


  % sudo btrfs scrub start -Bd /
scrub device /dev/sdc2 (id 3) canceled
     scrub started at Sun Nov 19 08:43:21 2017 and was aborted after 
01:58:27
     total bytes scrubbed: 568.86GiB with 0 errors
scrub device /dev/sde2 (id 4) canceled
     scrub started at Sun Nov 19 08:43:21 2017 and was aborted after 
01:58:28
     total bytes scrubbed: 702.88GiB with 1 errors
     error details: verify=1
     corrected errors: 0, uncorrectable errors: 1, unverified errors: 0
scrub device /dev/sdd2 (id 5) done
     scrub started at Sun Nov 19 08:43:21 2017 and finished after 01:48:00
     total bytes scrubbed: 737.74GiB with 1 errors
     error details: verify=1
     corrected errors: 0, uncorrectable errors: 1, unverified errors: 0
scrub device /dev/sdf1 (id 6) canceled
     scrub started at Sun Nov 19 08:43:21 2017 and was aborted after 
01:58:28
     total bytes scrubbed: 506.02GiB with 0 errors




[    4.985712] BTRFS: device label Videos devid 6 transid 772463 /dev/sdf1
[    4.985882] BTRFS: device label Videos devid 4 transid 772463 /dev/sde2
[    4.986541] BTRFS: device label Videos devid 5 transid 772463 /dev/sdd2
[    4.986713] BTRFS: device label Videos devid 3 transid 772463 /dev/sdc2
[    5.007986] BTRFS info (device sdc2): disk space caching is enabled
[    5.007988] BTRFS info (device sdc2): has skinny extents
[    5.149910] BTRFS info (device sdc2): bdev /dev/sdd2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 3
[    5.149916] BTRFS info (device sdc2): bdev /dev/sde2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 3
[   25.482987] BTRFS info (device sdc2): use lzo compression
[   25.482990] BTRFS info (device sdc2): disk space caching is enabled
[57830.611730] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf 
(level 0) in tree 318
[57830.611732] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf 
(level 0) in tree 318
[57830.611734] BTRFS error (device sdc2): bdev /dev/sdd2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 4
[57832.688689] BTRFS error (device sdc2): unable to fixup (regular) 
error at logical 17478699876352 on dev /dev/sdd2
[57870.488081] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sde2, sector 348423360: metadata leaf 
(level 0) in tree 318
[57870.488083] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sde2, sector 348423360: metadata leaf 
(level 0) in tree 318
[57870.488085] BTRFS error (device sdc2): bdev /dev/sde2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 4
[57870.500114] BTRFS error (device sdc2): unable to fixup (regular) 
error at logical 17478699876352 on dev /dev/sde2
[88979.005712] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf 
(level 0) in tree 318
[88979.005718] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf 
(level 0) in tree 318
[88979.005720] BTRFS error (device sdc2): bdev /dev/sdd2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 5
[88979.036670] BTRFS error (device sdc2): unable to fixup (regular) 
error at logical 17478699876352 on dev /dev/sdd2
[89026.609561] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sde2, sector 348423360: metadata leaf 
(level 0) in tree 318
[89026.609563] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sde2, sector 348423360: metadata leaf 
(level 0) in tree 318
[89026.609566] BTRFS error (device sdc2): bdev /dev/sde2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 5
[89026.617423] BTRFS error (device sdc2): unable to fixup (regular) 
error at logical 17478699876352 on dev /dev/sde2
[97765.699866] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf 
(level 0) in tree 318
[97765.699869] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf 
(level 0) in tree 318
[97765.699871] BTRFS error (device sdc2): bdev /dev/sdd2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 6
[97765.735887] BTRFS error (device sdc2): unable to fixup (regular) 
error at logical 17478699876352 on dev /dev/sdd2
[97789.909036] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sde2, sector 348423360: metadata leaf 
(level 0) in tree 318
[97789.909039] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sde2, sector 348423360: metadata leaf 
(level 0) in tree 318
[97789.909041] BTRFS error (device sdc2): bdev /dev/sde2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 6
[97789.918254] BTRFS error (device sdc2): unable to fixup (regular) 
error at logical 17478699876352 on dev /dev/sde2
[104087.660640] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf 
(level 0) in tree 318
[104087.660643] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf 
(level 0) in tree 318
[104087.660645] BTRFS error (device sdc2): bdev /dev/sdd2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 7
[104087.696248] BTRFS error (device sdc2): unable to fixup (regular) 
error at logical 17478699876352 on dev /dev/sdd2
[104126.980483] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sde2, sector 348423360: metadata leaf 
(level 0) in tree 318
[104126.980486] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sde2, sector 348423360: metadata leaf 
(level 0) in tree 318
[104126.980488] BTRFS error (device sdc2): bdev /dev/sde2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 7
[104126.995705] BTRFS error (device sdc2): unable to fixup (regular) 
error at logical 17478699876352 on dev /dev/sde2


steffen@bigbox ~ % sudo btrfs device stats /
[/dev/sdc2].write_io_errs    0
[/dev/sdc2].read_io_errs     0
[/dev/sdc2].flush_io_errs    0
[/dev/sdc2].corruption_errs  0
[/dev/sdc2].generation_errs  0
[/dev/sde2].write_io_errs    0
[/dev/sde2].read_io_errs     0
[/dev/sde2].flush_io_errs    0
[/dev/sde2].corruption_errs  0
[/dev/sde2].generation_errs  7
[/dev/sdd2].write_io_errs    0
[/dev/sdd2].read_io_errs     0
[/dev/sdd2].flush_io_errs    0
[/dev/sdd2].corruption_errs  0
[/dev/sdd2].generation_errs  7
[/dev/sdf1].write_io_errs    0
[/dev/sdf1].read_io_errs     0
[/dev/sdf1].flush_io_errs    0
[/dev/sdf1].corruption_errs  0
[/dev/sdf1].generation_errs  0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: uncorrectable errors in Raid 10
  2017-11-19 19:31 uncorrectable errors in Raid 10 Steffen Sindzinski
@ 2017-11-20  2:03 ` Chris Murphy
  2017-11-20 19:28   ` Roy Sigurd Karlsbakk
       [not found]   ` <1c612777-77eb-1b83-101b-a9e0a53ee8be@gmail.com>
  0 siblings, 2 replies; 9+ messages in thread
From: Chris Murphy @ 2017-11-20  2:03 UTC (permalink / raw)
  To: Steffen Sindzinski; +Cc: Btrfs BTRFS

On Sun, Nov 19, 2017 at 12:31 PM, Steffen Sindzinski <stesind@gmail.com> wrote:

> [57830.611730] BTRFS warning (device sdc2): checksum/header error at logical
> 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf (level 0)
> in tree 318
> [57830.611732] BTRFS warning (device sdc2): checksum/header error at logical
> 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf (level 0)
> in tree 318

The same leaf is corrupt in the same physical sector on two devices.
I'm guessing the checksum was computed incorrectly and written twice,
affecting both copies. I doubt it's a device problem. It might be
useful to look through the archives specifically for checksum header
error, it's kinda interesting Btrfs knows the problem is specifically
there.

What do you get for:

btrfs-debut-tree -b 17478699876352 /dev/sdd2

I think the problem is isolated but you're probably best off to
freshen up backups now while you can. Yes you can use restore to get
data off the volume if you don't already have backups. And as for
btrfs check --repair, only do that once you have backups and you're
prepared to lose the file system.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: uncorrectable errors in Raid 10
  2017-11-20  2:03 ` Chris Murphy
@ 2017-11-20 19:28   ` Roy Sigurd Karlsbakk
       [not found]   ` <1c612777-77eb-1b83-101b-a9e0a53ee8be@gmail.com>
  1 sibling, 0 replies; 9+ messages in thread
From: Roy Sigurd Karlsbakk @ 2017-11-20 19:28 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Steffen Sindzinski, linux-btrfs

> On Sun, Nov 19, 2017 at 12:31 PM, Steffen Sindzinski <stesind@gmail.com> wrote:
> 
>> [57830.611730] BTRFS warning (device sdc2): checksum/header error at logical
>> 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf (level 0)
>> in tree 318
>> [57830.611732] BTRFS warning (device sdc2): checksum/header error at logical
>> 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf (level 0)
>> in tree 318
> 
> The same leaf is corrupt in the same physical sector on two devices.
> I'm guessing the checksum was computed incorrectly and written twice,
> affecting both copies. I doubt it's a device problem. It might be
> useful to look through the archives specifically for checksum header
> error, it's kinda interesting Btrfs knows the problem is specifically
> there.

Sounds reasonable. Perhaps a memory check on the machine would be a good idea_

roy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: uncorrectable errors in Raid 10
       [not found]       ` <384b4a21-e74f-c52e-c6e2-f7930f187c94@gmail.com>
@ 2017-11-22 16:42         ` Chris Murphy
  2017-11-23 12:02           ` Steffen Sindzinski
  2017-11-23  0:41         ` Qu Wenruo
  1 sibling, 1 reply; 9+ messages in thread
From: Chris Murphy @ 2017-11-22 16:42 UTC (permalink / raw)
  To: Steffen Sindzinski; +Cc: Chris Murphy, Qu Wenruo, Btrfs BTRFS

On Wed, Nov 22, 2017 at 9:38 AM, Steffen Sindzinski <stesind@gmail.com> wrote:
> Hello,
>
> I did btrfs check --readonly on both disk without finding any error. To
> reconfirm I did a scrub again which still has found 2 uncorrectable errors.

Try --mode=lowmem option with btrfs-progs 4.3.3 or better 4.14. This
is a new implementation of btrfs check and sometimes it comes up with
different results. It's strange that there's only this error found by
scrub and not by btrfs check which should be fully checking all
metadata for sanity, and in the process it would surely hit a bad
checksum.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: uncorrectable errors in Raid 10
       [not found]       ` <384b4a21-e74f-c52e-c6e2-f7930f187c94@gmail.com>
  2017-11-22 16:42         ` Chris Murphy
@ 2017-11-23  0:41         ` Qu Wenruo
  1 sibling, 0 replies; 9+ messages in thread
From: Qu Wenruo @ 2017-11-23  0:41 UTC (permalink / raw)
  To: Steffen Sindzinski, Chris Murphy; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 4711 bytes --]



On 2017年11月23日 00:38, Steffen Sindzinski wrote:
> Hello,
> 
> I did btrfs check --readonly on both disk without finding any error. To
> reconfirm I did a scrub again which still has found 2 uncorrectable

Still metadata corruption?

> errors. I had to boot into Arch Linux 4.13.12-1-ARCH
> btrfs-progs v4.13 to run btrfs check.

Well, btrfs-progs v4.13 from Arch is out-of-data for a while.
Even ignoring the latest v4.14 release, there is no v4.13.x releases for
Arch.

> 
> 
> Checking filesystem on /dev/sde2
> UUID: 4fafd0d4-7dd9-4dcc-9a33-5f1ad9555358
> checking extents
> checking free space cache
> checking fs roots
> checking csums
> checking root refs
> found 1598669742080 bytes used, no error found
> total csum bytes: 1540740308
> total tree bytes: 19391266816
> total fs tree bytes: 9844703232
> total extent tree bytes: 6808731648
> btree space waste bytes: 3471512438
> file data blocks allocated: 2124703174656
>  referenced 1454353633280
> sudo btrfs check --readonly /dev/sde2  2708,20s user 594,58s system 15%
> cpu 5:45:40,36 total
> ***********************************************************************
> sudo btrfs check --readonly /dev/sdd2
> Checking filesystem on /dev/sdd2
> UUID: 4fafd0d4-7dd9-4dcc-9a33-5f1ad9555358
> checking extents
> checking free space cache
> checking fs roots
> checking csums
> checking root refs
> found 1598669742080 bytes used, no error found
> total csum bytes: 1540740308
> total tree bytes: 19391266816
> total fs tree bytes: 9844703232
> total extent tree bytes: 6808731648
> btree space waste bytes: 3471512438
> file data blocks allocated: 2124703174656
>  referenced 1454353633280
> sudo btrfs check --readonly /dev/sdd2  2655,23s user 626,48s system 15%
> cpu 5:56:16,85 total

Just as Chris mentioned, if the scrub is reporting metadata corruption,
please try "btrfs check --mode=lowmem" to see if lowmem mode can detect
such corruption.

> 
> 
> I mentioned that I cannot access 2 directories anymore. Permission is
> denied, not even as root I have permission nor I can change ownership.
> This happens in Ubuntu 17.10, kernel 4.13.0-17-generic. It looks like this:
> 
> ls -la The*
> The-Wire:
> ls: Zugriff auf 'The-Wire/.' nicht möglich: Keine Berechtigung
> ls: Zugriff auf 'The-Wire/..' nicht möglich: Keine Berechtigung
> <...>
> insgesamt 0
> d????????? ? ? ? ?            ? .
> d????????? ? ? ? ?            ? ..
> -????????? ? ? ? ?            ? The Wire S1E10.m4v
> -????????? ? ? ? ?            ? The Wire S1E11.m4v

Normally this means DIR_INDEX/DIR_ITEM missing or points to incorrect inode.

> <...>
> 
> Oddly in Arch Linux, 4.13.12-1-ARCH, btrfs-progs v4.13, I can access
> this same directory, permission is set correctly.>
> The debug trees for both drives you may find attached to this mail. They
> were done in Ubunutu.

Not really helping.
I don't know how you take the dump, but it is only a leaf of subvolume,
full of EXTENT_DATA without any useful info.

Thanks,
Qu

> 
> For reference here the scrub report on Arch linux.
> 
> 
> Steffen
> 
> 
> 
> Am 20.11.2017 um 21:26 schrieb Chris Murphy:
>> On Mon, Nov 20, 2017 at 1:36 AM, Steffen Sindzinski
>> <stesind@gmail.com> wrote:
>>> Hi,
>>>
>>> I did btrfs-debug-tree for this block on both devices. The result is
>>> attached to this mail.
>>>
>>> It is really weird, same block, different drives, different sector. I
>>> have
>>> no problem with bit rod. Btrfs worked perfectly fine with this both
>>> HDDs for
>>> so long on Raid1. The drive sdf1 which I attached to form a Raid 10
>>> was also
>>> in a different Btrfs in the same machine for years flawlessly.
>>>
>>> I have not found any other checksum errors than the ones from this
>>> scrub.
>>>
>>> Is there no way to just safely recreate the checksum of this particular
>>> block from the disk contents?
>>
>> I'll cc Qu because I don't understand what's going on. It's the 2nd
>> case of both copies of metadata being bad in as many days, which could
>> just be coincidence.
>>
>> I also don't understand the specific error "checksum/header error"
>> which sounds to me like Btrfs knows the leaf is otherwise OK, but
>> there is some kind of problem with either the leaf csum or its header.
>> In which case I'd like to think that btrfs check --repair can fix this
>> kind of problem.
>>
>> What do you get for btrfs check without --repair?
>>
>> Curiously, your scrub complains about this "checksum/header error" but
>> btrfs-debug-tree gives no indication that leaf has any problem at all.
>>
>>
>>
>>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: uncorrectable errors in Raid 10
  2017-11-22 16:42         ` Chris Murphy
@ 2017-11-23 12:02           ` Steffen Sindzinski
  2017-11-23 12:14             ` Qu Wenruo
  0 siblings, 1 reply; 9+ messages in thread
From: Steffen Sindzinski @ 2017-11-23 12:02 UTC (permalink / raw)
  To: Chris Murphy, quwenruo.btrfs; +Cc: linux-btrfs

Hi,

I updated to btrfs-progs v4.14-5-gf09e98a3. Unfortunately after 10h 
Gnome desktop / Arch crashed with btrfs check --lowmem unfinished. I 
will run it this night again.

The result until crash was:

Checking filesystem on /dev/sdd2
UUID: 4fafd0d4-7dd9-4dcc-9a33-5f1ad9555358
ERROR: extent[15923092037632, 73728] referencer count mismatch (root: 
260, owner: 3631467, offset: 1019904) wanted: 4, have: 6
ERROR: extent[16078964924416, 69632] referencer count mismatch (root: 
260, owner: 4086589, offset: 4296704) wanted: 5, have: 7

Steffen


Am 22.11.2017 um 17:42 schrieb Chris Murphy:
> On Wed, Nov 22, 2017 at 9:38 AM, Steffen Sindzinski <stesind@gmail.com> wrote:
>> Hello,
>>
>> I did btrfs check --readonly on both disk without finding any error. To
>> reconfirm I did a scrub again which still has found 2 uncorrectable errors.
> 
> Try --mode=lowmem option with btrfs-progs 4.3.3 or better 4.14. This
> is a new implementation of btrfs check and sometimes it comes up with
> different results. It's strange that there's only this error found by
> scrub and not by btrfs check which should be fully checking all
> metadata for sanity, and in the process it would surely hit a bad
> checksum.
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: uncorrectable errors in Raid 10
  2017-11-23 12:02           ` Steffen Sindzinski
@ 2017-11-23 12:14             ` Qu Wenruo
  2017-11-23 14:05               ` Steffen Sindzinski
  0 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2017-11-23 12:14 UTC (permalink / raw)
  To: Steffen Sindzinski, Chris Murphy; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1819 bytes --]



On 2017年11月23日 20:02, Steffen Sindzinski wrote:
> Hi,
> 
> I updated to btrfs-progs v4.14-5-gf09e98a3. Unfortunately after 10h
> Gnome desktop / Arch crashed with btrfs check --lowmem unfinished. I
> will run it this night again.

Any kernel backtrace about the crash?

IIRC user space program like btrfs check should not trigger a kernel crash.

> 
> The result until crash was:
> 
> Checking filesystem on /dev/sdd2
> UUID: 4fafd0d4-7dd9-4dcc-9a33-5f1ad9555358
> ERROR: extent[15923092037632, 73728] referencer count mismatch (root:
> 260, owner: 3631467, offset: 1019904) wanted: 4, have: 6
> ERROR: extent[16078964924416, 69632] referencer count mismatch (root:
> 260, owner: 4086589, offset: 4296704) wanted: 5, have: 7

This is known bug, please use this branch instead.
https://github.com/adam900710/btrfs-progs/tree/lowmem_fix

Thanks,
Qu

> 
> Steffen
> 
> 
> Am 22.11.2017 um 17:42 schrieb Chris Murphy:
>> On Wed, Nov 22, 2017 at 9:38 AM, Steffen Sindzinski
>> <stesind@gmail.com> wrote:
>>> Hello,
>>>
>>> I did btrfs check --readonly on both disk without finding any error. To
>>> reconfirm I did a scrub again which still has found 2 uncorrectable
>>> errors.
>>
>> Try --mode=lowmem option with btrfs-progs 4.3.3 or better 4.14. This
>> is a new implementation of btrfs check and sometimes it comes up with
>> different results. It's strange that there's only this error found by
>> scrub and not by btrfs check which should be fully checking all
>> metadata for sanity, and in the process it would surely hit a bad
>> checksum.
>>
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: uncorrectable errors in Raid 10
  2017-11-23 12:14             ` Qu Wenruo
@ 2017-11-23 14:05               ` Steffen Sindzinski
  0 siblings, 0 replies; 9+ messages in thread
From: Steffen Sindzinski @ 2017-11-23 14:05 UTC (permalink / raw)
  To: Qu Wenruo, Chris Murphy; +Cc: linux-btrfs

Hi,

I think just some XWayland Nouveau bug:

Nov 23 12:46:26 bigbox kernel: nouveau 0000:01:00.0: gr: TRAP ch 22 
[007d08d000 Xwayland[781]]
Nov 23 12:46:26 bigbox kernel: nouveau 0000:01:00.0: gr: GPC0/TPC0/TEX: 
80000041
Nov 23 12:46:26 bigbox kernel: nouveau 0000:01:00.0: gr: GPC0/TPC1/TEX: 
80000041
Nov 23 12:46:26 bigbox kernel: nouveau 0000:01:00.0: gr: GPC0/TPC2/TEX: 
80000041
Nov 23 12:46:26 bigbox kernel: nouveau 0000:01:00.0: gr: GPC0/TPC3/TEX: 
80000041
Nov 23 12:46:26 bigbox kernel: nouveau 0000:01:00.0: gr: GPC0/TPC4/TEX: 
80000041
Nov 23 12:46:26 bigbox kernel: nouveau 0000:01:00.0: fifo: read fault at 
0005860000 engine 00 [GR] client 15 [GPC0/PE_4] reason 02 [PTE] on 
channel 22 [007d08d000 Xwayland[781]]


Steffen


Am 23.11.2017 um 13:14 schrieb Qu Wenruo:
> 
> 
> On 2017年11月23日 20:02, Steffen Sindzinski wrote:
>> Hi,
>>
>> I updated to btrfs-progs v4.14-5-gf09e98a3. Unfortunately after 10h
>> Gnome desktop / Arch crashed with btrfs check --lowmem unfinished. I
>> will run it this night again.
> 
> Any kernel backtrace about the crash?
> 
> IIRC user space program like btrfs check should not trigger a kernel crash.
> 
>>
>> The result until crash was:
>>
>> Checking filesystem on /dev/sdd2
>> UUID: 4fafd0d4-7dd9-4dcc-9a33-5f1ad9555358
>> ERROR: extent[15923092037632, 73728] referencer count mismatch (root:
>> 260, owner: 3631467, offset: 1019904) wanted: 4, have: 6
>> ERROR: extent[16078964924416, 69632] referencer count mismatch (root:
>> 260, owner: 4086589, offset: 4296704) wanted: 5, have: 7
> 
> This is known bug, please use this branch instead.
> https://github.com/adam900710/btrfs-progs/tree/lowmem_fix
> 
> Thanks,
> Qu
> 
>>
>> Steffen
>>
>>
>> Am 22.11.2017 um 17:42 schrieb Chris Murphy:
>>> On Wed, Nov 22, 2017 at 9:38 AM, Steffen Sindzinski
>>> <stesind@gmail.com> wrote:
>>>> Hello,
>>>>
>>>> I did btrfs check --readonly on both disk without finding any error. To
>>>> reconfirm I did a scrub again which still has found 2 uncorrectable
>>>> errors.
>>>
>>> Try --mode=lowmem option with btrfs-progs 4.3.3 or better 4.14. This
>>> is a new implementation of btrfs check and sometimes it comes up with
>>> different results. It's strange that there's only this error found by
>>> scrub and not by btrfs check which should be fully checking all
>>> metadata for sanity, and in the process it would surely hit a bad
>>> checksum.
>>>
>>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-11-23 14:05 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-19 19:31 uncorrectable errors in Raid 10 Steffen Sindzinski
2017-11-20  2:03 ` Chris Murphy
2017-11-20 19:28   ` Roy Sigurd Karlsbakk
     [not found]   ` <1c612777-77eb-1b83-101b-a9e0a53ee8be@gmail.com>
     [not found]     ` <CAJCQCtRwO5E_xN-7u22uBQ57sYpUf8qQ7O0Abz2m+K8e9+DgdA@mail.gmail.com>
     [not found]       ` <384b4a21-e74f-c52e-c6e2-f7930f187c94@gmail.com>
2017-11-22 16:42         ` Chris Murphy
2017-11-23 12:02           ` Steffen Sindzinski
2017-11-23 12:14             ` Qu Wenruo
2017-11-23 14:05               ` Steffen Sindzinski
2017-11-23  0:41         ` Qu Wenruo
  -- strict thread matches above, loose matches on Subject: below --
2017-11-19 16:21 Steffen Sindzinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).