Is `btrfsck --repair` supposed to actually repair problems?

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Is `btrfsck --repair` supposed to actually repair problems?
@ 2013-10-01 21:12 Charles Cazabon
  2013-10-01 22:01 ` Chris Murphy
  0 siblings, 1 reply; 9+ messages in thread
From: Charles Cazabon @ 2013-10-01 21:12 UTC (permalink / raw)
  To: btrfs list

[-- Attachment #1: Type: text/plain, Size: 1623 bytes --]

Greetings,

I've been using btrfs for bulk-storage purposes for a couple of years now (on
vanilla linux-stable kernels on a few machines).  I recently set up a new
filesystem and have been copying data to it, when I had an unrelated kernel
lockup.  As expected, after rebooting btrfsck reported some checksum verify
errors like:

checksum verify failed on 806795800576 found 01A8A8FB wanted 51361541
checksum verify failed on 806795800576 found 01A8A8FB wanted 51361541
checksum verify failed on 846990413824 found FB9C4BDC wanted AA2E389E

There's a few dozen of these.

Running btrfsck with the --repair option, however, does not appear to fix
these problems.  I'll attach the complete output of running with the --repair
option; running btrfsck in check-only mode afterwards reports largely the same
checksum errors as it did originally, prior to "repair".

Shouldn't `btrfsck --repair` actually repair these errors?  Am I doing
something wrong?

System details:
  -current kernel is linux-stable 3.9.11 x86_64
  -btrfs-progs built from
    git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git, which
    doesn't appear to have changed in a long time
  -filesystem is 16.4TiB btrfs on LVM on md_crypt on an mdadm RAID-6 array.
  I know this is perhaps an odd setup, but btrfs didn't support RAID-6 when I 
  started using it.

Any advice appreciated.  Thanks,

Charles

-- 
-----------------------------------------------------------------------
Charles Cazabon
GPL'ed software available at:               http://pyropus.ca/software/
-----------------------------------------------------------------------

[-- Attachment #2: btrfsck-repair.log --]
[-- Type: text/plain, Size: 3983 bytes --]

# btrfsck --repair /dev/extbackup/bigbackup
enabling repair mode
Checking filesystem on /dev/extbackup/bigbackup
UUID: c18dfd04-d931-4269-b999-e94df3b1918c
checking extents
checksum verify failed on 78318182400 found 66D5FD23 wanted 5FFBFE3B
checksum verify failed on 78318182400 found 66D5FD23 wanted 5FFBFE3B
checksum verify failed on 530263736320 found 71FB90AF wanted 150EBA1C
checksum verify failed on 530263736320 found 71FB90AF wanted 150EBA1C
checksum verify failed on 669649289216 found ED6068EA wanted CE438B87
checksum verify failed on 669649289216 found ED6068EA wanted CE438B87
checksum verify failed on 806659584000 found E813828C wanted ED236710
checksum verify failed on 806659584000 found E813828C wanted ED236710
checksum verify failed on 806795800576 found 01A8A8FB wanted 51361541
checksum verify failed on 806795800576 found 01A8A8FB wanted 51361541
checksum verify failed on 846990413824 found FB9C4BDC wanted AA2E389E
checksum verify failed on 846990413824 found FB9C4BDC wanted AA2E389E
checksum verify failed on 854909063168 found 51E74B28 wanted D49BBFC6
checksum verify failed on 854909063168 found 51E74B28 wanted D49BBFC6
checksum verify failed on 855239913472 found 06557D7A wanted ED2CC52C
checksum verify failed on 855239913472 found 06557D7A wanted ED2CC52C
checksum verify failed on 881561030656 found 47735BC0 wanted F8957A7C
checksum verify failed on 881561030656 found 47735BC0 wanted F8957A7C
checksum verify failed on 919416066048 found B884B190 wanted 13DD9265
checksum verify failed on 919416066048 found B884B190 wanted 13DD9265
checksum verify failed on 984444981248 found 383F144D wanted 0801F812
checksum verify failed on 984444981248 found 383F144D wanted 0801F812
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
checksum verify failed on 78318182400 found 66D5FD23 wanted 5FFBFE3B
checksum verify failed on 78318182400 found 66D5FD23 wanted 5FFBFE3B
checksum verify failed on 530263736320 found 71FB90AF wanted 150EBA1C
checksum verify failed on 530263736320 found 71FB90AF wanted 150EBA1C
checksum verify failed on 669649289216 found ED6068EA wanted CE438B87
checksum verify failed on 669649289216 found ED6068EA wanted CE438B87
checksum verify failed on 806659584000 found E813828C wanted ED236710
checksum verify failed on 806659584000 found E813828C wanted ED236710
checksum verify failed on 806795800576 found 01A8A8FB wanted 51361541
checksum verify failed on 806795800576 found 01A8A8FB wanted 51361541
checksum verify failed on 846990413824 found FB9C4BDC wanted AA2E389E
checksum verify failed on 846990413824 found FB9C4BDC wanted AA2E389E
checksum verify failed on 854909063168 found 51E74B28 wanted D49BBFC6
checksum verify failed on 854909063168 found 51E74B28 wanted D49BBFC6
checksum verify failed on 855239913472 found 06557D7A wanted ED2CC52C
checksum verify failed on 855239913472 found 06557D7A wanted ED2CC52C
checksum verify failed on 881561030656 found 47735BC0 wanted F8957A7C
checksum verify failed on 881561030656 found 47735BC0 wanted F8957A7C
checksum verify failed on 919416066048 found B884B190 wanted 13DD9265
checksum verify failed on 919416066048 found B884B190 wanted 13DD9265
checksum verify failed on 984444981248 found 383F144D wanted 0801F812
checksum verify failed on 984444981248 found 383F144D wanted 0801F812
checking csums
checksum verify failed on 78318182400 found 66D5FD23 wanted 5FFBFE3B
checksum verify failed on 78318182400 found 66D5FD23 wanted 5FFBFE3B
checksum verify failed on 530263736320 found 71FB90AF wanted 150EBA1C
checksum verify failed on 530263736320 found 71FB90AF wanted 150EBA1C
checking root refs
found 530436962463 bytes used err is 0
total csum bytes: 4526618444
total tree bytes: 18394226688
total fs tree bytes: 11872870400
total extent tree bytes: 618696704
btree space waste bytes: 3766887803
file data blocks allocated: 4636473634816
 referenced 4636473634816
Btrfs v0.20-rc1-358-g194aa4a-dirty


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is `btrfsck --repair` supposed to actually repair problems?
  2013-10-01 21:12 Is `btrfsck --repair` supposed to actually repair problems? Charles Cazabon
@ 2013-10-01 22:01 ` Chris Murphy
  2013-10-01 23:46   ` Charles Cazabon
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Murphy @ 2013-10-01 22:01 UTC (permalink / raw)
  To: Charles Cazabon; +Cc: btrfs list


On Oct 1, 2013, at 3:12 PM, Charles Cazabon <charlesc-lists-btrfs@pyropus.ca> wrote:

> Greetings,
> 
> I've been using btrfs for bulk-storage purposes for a couple of years now (on
> vanilla linux-stable kernels on a few machines).  I recently set up a new
> filesystem and have been copying data to it, when I had an unrelated kernel
> lockup.  As expected, after rebooting btrfsck reported some checksum verify
> errors like:
> 
> checksum verify failed on 806795800576 found 01A8A8FB wanted 51361541
> checksum verify failed on 806795800576 found 01A8A8FB wanted 51361541
> checksum verify failed on 846990413824 found FB9C4BDC wanted AA2E389E
> 
> There's a few dozen of these.
> 
> Running btrfsck with the --repair option, however, does not appear to fix
> these problems.  I'll attach the complete output of running with the --repair
> option; running btrfsck in check-only mode afterwards reports largely the same
> checksum errors as it did originally, prior to "repair".
> 
> Shouldn't `btrfsck --repair` actually repair these errors?  Am I doing
> something wrong?

It looks like the file system thinks the file has changed and isn't matching checksum. That's not obviously fixable unless both data and metadata are raid1. More information is needed:

btrfs fi df <mountpoint>
btrfs show
dmesg | grep -i btrfs
dmesg | grep ata<port#>

I'm assuming it's a SATA drive, and if so you can get the port number with the last command and no port number, and figure out what port the drive is on. For me I get a line:
[    1.388091] ata1.00: ATA-8: WDC WD5000BEVT-22ZAT0, 01.01A01, max UDMA/133

So I'd use dmesg |grep ata1

Do that for all drives in the btrfs volume.

And report the version of btrfs-progs.


Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is `btrfsck --repair` supposed to actually repair problems?
  2013-10-01 22:01 ` Chris Murphy
@ 2013-10-01 23:46   ` Charles Cazabon
  2013-10-02  0:42     ` Chris Murphy
  0 siblings, 1 reply; 9+ messages in thread
From: Charles Cazabon @ 2013-10-01 23:46 UTC (permalink / raw)
  To: btrfs list

[-- Attachment #1: Type: text/plain, Size: 3034 bytes --]

Hi, Chris,

Chris Murphy <lists@colorremedies.com> wrote:
> On Oct 1, 2013, at 3:12 PM, Charles Cazabon
> <charlesc-lists-btrfs@pyropus.ca> wrote:
> 
> > Running btrfsck with the --repair option, however, does not appear to fix
> > these [checksum verify] problems.  I'll attach the complete output of
> > running with the --repair option; running btrfsck in check-only mode
> > afterwards reports largely the same checksum errors as it did originally,
> > prior to "repair".  something wrong?
> 
> It looks like the file system thinks the file has changed and isn't matching
> checksum. That's not obviously fixable unless both data and metadata are
> raid1.i

Perhaps this wasn't clear from my original message, but I'm not using btrfs'
RAID or lvm-like capabilities.  The filesystem is on an LVM logical volume,
with the actual underlying storage being an 8-disk RAID-6 array (mdadm array).
So the stack is:

    vanilla btrfs filesystem (not using subvolumes, btrfs' multiple device
       support or any other advanced features)

    LVM logical volume

    LVM volume group

    LVM physical volume

    md_crypt / LUKS encrypted volume

    mdadm RAID-6 array

    8 x SATA disks

> More information is needed:

Okay:

  # btrfs fi df /media/bigbackup/
  Data: total=4.53TB, used=4.22TB
  System, DUP: total=8.00MB, used=508.00KB
  System: total=4.00MB, used=0.00
  Metadata, DUP: total=18.00GB, used=17.13GB
  Metadata: total=8.00MB, used=0.00

> btrfs show

This fails with `btrfs: unknown token 'show'`.

> dmesg | grep -i btrfs

After mounting the filesystem read-only, the following ends up in the syslog:

  [13333.117462] Btrfs loaded
  [13333.157078] device label bigbackup devid 1 transid 5249
      /dev/mapper/extbackup-bigbackup
  [13333.158445] btrfs: disk space caching is enabled

That's the only btrfs-related info that gets logged.

> dmesg | grep ata<port#>
> 
> I'm assuming it's a SATA drive,

As I say, it's 8 disks (yes, SATA).  What info exactly do you want about the
disks and ports?  The log is quite noisy because these are behind SATA port
multipliers, and there are a bunch of other SATA drives in the system.  But if
I filter out all the extra stuff, then when I power up the port-multiplier
boxes that the disks are in, what's logged is 126 lines (much of it garbage
from not all possible multiplier ports being in use), log attached.

The 8 disks are, as you can see, all identical Seagate units:

  ATA-8: ST3000DM001-1E6166, CC45, max UDMA/133

> And report the version of btrfs-progs.

Btrfs v0.20-rc1-358-g194aa4a-dirty

That's what I get when I build from the git repository at
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git

git insists I'm fully up to date, though the last time I pulled before today
was over a month ago.

Charles

-- 
-----------------------------------------------------------------------
Charles Cazabon
GPL'ed software available at:               http://pyropus.ca/software/
-----------------------------------------------------------------------

[-- Attachment #2: sata.log --]
[-- Type: text/plain, Size: 7307 bytes --]

[    1.927026] ata11: SATA max UDMA/100 host m128@0xfd8ff000 port 0xfd8f8000 irq 19
[    1.927065] ata12: SATA max UDMA/100 host m128@0xfd8ff000 port 0xfd8fa000 irq 19
[    4.008746] ata11: SATA link down (SStatus 0 SControl 0)
[    6.091302] ata12: SATA link down (SStatus 0 SControl 0)
[  372.741259] ata11: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen
[  372.741270] ata11: irq_stat 0x00b40090, PHY RDY changed
[  372.741284] ata11: hard resetting link
[  374.710712] ata12: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen
[  374.710724] ata12: irq_stat 0x00b40090, PHY RDY changed
[  374.710738] ata12: hard resetting link
[  382.758711] ata11: softreset failed (timeout)
[  382.758724] ata11: hard resetting link
[  384.729193] ata12: softreset failed (timeout)
[  384.729206] ata12: hard resetting link
[  387.941314] ata11: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[  387.941715] ata11.15: Port Multiplier 1.2, 0x197b:0x0325 r0, 15 ports, feat 0x5/0xf
[  387.946096] ata11.00: hard resetting link
[  388.314054] ata11.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  388.314105] ata11.01: hard resetting link
[  388.682496] ata11.01: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  388.682548] ata11.02: hard resetting link
[  389.051042] ata11.02: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  389.051095] ata11.03: hard resetting link
[  389.419480] ata11.03: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  389.419535] ata11.04: hard resetting link
[  389.927921] ata12: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[  389.928310] ata12.15: Port Multiplier 1.2, 0x197b:0x0325 r0, 15 ports, feat 0x5/0xf
[  389.939731] ata12.00: hard resetting link
[  390.308622] ata12.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  390.308677] ata12.01: hard resetting link
[  390.448517] ata11.04: failed to resume link (SControl 0)
[  390.448851] ata11.04: SATA link down (SStatus 0 SControl 0)
[  390.448932] ata11.05: hard resetting link
[  390.677099] ata12.01: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  390.677155] ata12.02: hard resetting link
[  391.045600] ata12.02: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  391.045654] ata12.03: hard resetting link
[  391.414090] ata12.03: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  391.414143] ata12.04: hard resetting link
[  391.477925] ata11.05: failed to resume link (SControl 0)
[  391.478259] ata11.05: SATA link down (SStatus 0 SControl 0)
[  391.478339] ata11.06: hard resetting link
[  392.443117] ata12.04: failed to resume link (SControl 0)
[  392.443458] ata12.04: SATA link down (SStatus 0 SControl 0)
[  392.443540] ata12.05: hard resetting link
[  392.507226] ata11.06: failed to resume link (SControl 0)
[  392.507563] ata11.06: SATA link down (SStatus 0 SControl 0)
[  392.507644] ata11.07: hard resetting link
[  393.472419] ata12.05: failed to resume link (SControl 0)
[  393.472758] ata12.05: SATA link down (SStatus 0 SControl 0)
[  393.472842] ata12.06: hard resetting link
[  393.536548] ata11.07: failed to resume link (SControl 0)
[  393.536884] ata11.07: SATA link down (SStatus 0 SControl 0)
[  393.536964] ata11.08: hard resetting link
[  394.501715] ata12.06: failed to resume link (SControl 0)
[  394.502072] ata12.06: SATA link down (SStatus 0 SControl 0)
[  394.502154] ata12.07: hard resetting link
[  394.565850] ata11.08: failed to resume link (SControl 0)
[  394.566187] ata11.08: SATA link down (SStatus 0 SControl 0)
[  394.566319] ata11.09: hard resetting link
[  395.531029] ata12.07: failed to resume link (SControl 0)
[  395.531363] ata12.07: SATA link down (SStatus 0 SControl 0)
[  395.531446] ata12.08: hard resetting link
[  395.595131] ata11.09: failed to resume link (SControl 0)
[  395.595469] ata11.09: SATA link down (SStatus 0 SControl 0)
[  395.595550] ata11.10: hard resetting link
[  396.560399] ata12.08: failed to resume link (SControl 0)
[  396.560736] ata12.08: SATA link down (SStatus 0 SControl 0)
[  396.560818] ata12.09: hard resetting link
[  396.624462] ata11.10: failed to resume link (SControl 0)
[  396.624855] ata11.10: SATA link down (SStatus 0 SControl 0)
[  396.624963] ata11.11: hard resetting link
[  397.589718] ata12.09: failed to resume link (SControl 0)
[  397.590056] ata12.09: SATA link down (SStatus 0 SControl 0)
[  397.590137] ata12.10: hard resetting link
[  397.653780] ata11.11: failed to resume link (SControl 0)
[  397.654112] ata11.11: SATA link down (SStatus 0 SControl 0)
[  397.654193] ata11.12: hard resetting link
[  398.619001] ata12.10: failed to resume link (SControl 0)
[  398.619338] ata12.10: SATA link down (SStatus 0 SControl 0)
[  398.619420] ata12.11: hard resetting link
[  398.683119] ata11.12: failed to resume link (SControl 0)
[  398.683451] ata11.12: SATA link down (SStatus 0 SControl 0)
[  398.683530] ata11.13: hard resetting link
[  399.648291] ata12.11: failed to resume link (SControl 0)
[  399.648655] ata12.11: SATA link down (SStatus 0 SControl 0)
[  399.648744] ata12.12: hard resetting link
[  399.712480] ata11.13: failed to resume link (SControl 0)
[  399.712817] ata11.13: SATA link down (SStatus 0 SControl 0)
[  399.712897] ata11.14: hard resetting link
[  400.677675] ata12.12: failed to resume link (SControl 0)
[  400.678012] ata12.12: SATA link down (SStatus 0 SControl 0)
[  400.678097] ata12.13: hard resetting link
[  400.741762] ata11.14: failed to resume link (SControl 0)
[  400.742101] ata11.14: SATA link down (SStatus 0 SControl 0)
[  400.742911] ata11.00: ATA-8: ST3000DM001-1E6166, CC45, max UDMA/133
[  400.742921] ata11.00: 5860533168 sectors, multi 0: LBA48 
[  400.743635] ata11.00: configured for UDMA/100
[  400.744397] ata11.01: ATA-8: ST3000DM001-1E6166, CC45, max UDMA/133
[  400.744409] ata11.01: 5860533168 sectors, multi 0: LBA48 
[  400.765387] ata11.01: configured for UDMA/100
[  400.766129] ata11.02: ATA-8: ST3000DM001-1E6166, CC45, max UDMA/133
[  400.766140] ata11.02: 5860533168 sectors, multi 0: LBA48 
[  400.787661] ata11.02: configured for UDMA/100
[  400.788424] ata11.03: ATA-8: ST3000DM001-1E6166, CC45, max UDMA/133
[  400.788434] ata11.03: 5860533168 sectors, multi 0: LBA48 
[  400.808638] ata11.03: configured for UDMA/100
[  400.808738] ata11: EH complete
[  401.706984] ata12.13: failed to resume link (SControl 0)
[  401.707321] ata12.13: SATA link down (SStatus 0 SControl 0)
[  401.707405] ata12.14: hard resetting link
[  402.736244] ata12.14: failed to resume link (SControl 0)
[  402.736603] ata12.14: SATA link down (SStatus 0 SControl 0)
[  402.737449] ata12.00: ATA-8: ST3000DM001-1E6166, CC45, max UDMA/133
[  402.737460] ata12.00: 5860533168 sectors, multi 0: LBA48 
[  402.760315] ata12.00: configured for UDMA/100
[  402.761058] ata12.01: ATA-8: ST3000DM001-1E6166, CC45, max UDMA/133
[  402.761068] ata12.01: 5860533168 sectors, multi 0: LBA48 
[  402.761803] ata12.01: configured for UDMA/100
[  402.762551] ata12.02: ATA-8: ST3000DM001-1E6166, CC45, max UDMA/133
[  402.762560] ata12.02: 5860533168 sectors, multi 0: LBA48 
[  402.763284] ata12.02: configured for UDMA/100
[  402.764008] ata12.03: ATA-8: ST3000DM001-1E6166, CC45, max UDMA/133
[  402.764014] ata12.03: 5860533168 sectors, multi 0: LBA48 
[  402.764778] ata12.03: configured for UDMA/100
[  402.764876] ata12: EH complete


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is `btrfsck --repair` supposed to actually repair problems?
  2013-10-01 23:46   ` Charles Cazabon
@ 2013-10-02  0:42     ` Chris Murphy
  2013-10-02  3:13       ` Charles Cazabon
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Murphy @ 2013-10-02  0:42 UTC (permalink / raw)
  To: Charles Cazabon; +Cc: btrfs list


On Oct 1, 2013, at 5:46 PM, Charles Cazabon <charlesc-lists-btrfs@pyropus.ca> wrote:
> 
>  # btrfs fi df /media/bigbackup/
>  Data: total=4.53TB, used=4.22TB
>  System, DUP: total=8.00MB, used=508.00KB
>  System: total=4.00MB, used=0.00
>  Metadata, DUP: total=18.00GB, used=17.13GB
>  Metadata: total=8.00MB, used=0.00

Since there's only one copy of the data, there isn't a way to repair it, it just notes that there is a checksum mismatch.
> 
>> btrfs show
> 
> This fails with `btrfs: unknown token 'show'`.

I meant 'btrfs fi show'


> As I say, it's 8 disks (yes, SATA).  What info exactly do you want about the
> disks and ports? 

Looking for problems that relate to this one.

When was the last time you did a scrub on the md device? And what was the result?

What is the 'smartctl -l scterc /dev/sdX' result for one of the drives?

This sounds to me like it could be a bit flip, and btrfs is catching it but doesn't have a 2nd copy of the data. Just a guess.

Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is `btrfsck --repair` supposed to actually repair problems?
  2013-10-02  0:42     ` Chris Murphy
@ 2013-10-02  3:13       ` Charles Cazabon
  2013-10-02  3:50         ` Chris Murphy
  0 siblings, 1 reply; 9+ messages in thread
From: Charles Cazabon @ 2013-10-02  3:13 UTC (permalink / raw)
  To: btrfs list

Chris Murphy <lists@colorremedies.com> wrote:
> On Oct 1, 2013, at 5:46 PM, Charles Cazabon wrote:
> > 
> >  # btrfs fi df /media/bigbackup/
> >  Data: total=4.53TB, used=4.22TB
> >  System, DUP: total=8.00MB, used=508.00KB
> >  System: total=4.00MB, used=0.00
> >  Metadata, DUP: total=18.00GB, used=17.13GB
> >  Metadata: total=8.00MB, used=0.00
> 
> Since there's only one copy of the data, there isn't a way to repair it, it
> just notes that there is a checksum mismatch.

Ah, I'm not looking to repair the files -- I can recopy the files easily
enough, and rsync will pick up any files whose contents have been corrupted.
I'd like to get the filesystem fixed, though.  i.e., even deleting the
affected files would be fine.  This is a new filesystem to replace my existing
(full) backups filesystem.  The existing backups one is ext4 but this new one
is too big for mkfs.ext4 to handle, so btrfs it is.  I wasn't expecting
problems as I've been running btrfs for other purposes for years.

Am I misunderstanding something here?  It seems to me like btrfsck is telling
me there's problems with the filesystem itself when it continues to report
these checksum errors even after a `btrfsck --repair`.

> I meant 'btrfs fi show'

  Label: 'bigbackup'  uuid: c18dfd04-d931-4269-b999-e94df3b1918c
  Total devices 1 FS bytes used 4.23TB
  devid    1 size 16.37TB used 4.56TB path /dev/dm-9

> > As I say, it's 8 disks (yes, SATA).  What info exactly do you want about
> > the disks and ports? 
> 
> Looking for problems that relate to this one.
> 
> When was the last time you did a scrub on the md device? And what was the
> result?

It's a brand new array.  The initial sync is actually still going on (about
half complete; it'll take several days to initialize an array this size on
this hardware).

So in short, the underlying array is clean.

> What is the 'smartctl -l scterc /dev/sdX' result for one of the drives?

  Warning: device does not support SCT Error Recovery Control command

> This sounds to me like it could be a bit flip, and btrfs is catching it but
> doesn't have a 2nd copy of the data. Just a guess.

If one of the disks flipped a bit, it would be caught at the md RAID-6 level,
no?

Charles
-- 
-----------------------------------------------------------------------
Charles Cazabon
GPL'ed software available at:               http://pyropus.ca/software/
-----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is `btrfsck --repair` supposed to actually repair problems?
  2013-10-02  3:13       ` Charles Cazabon
@ 2013-10-02  3:50         ` Chris Murphy
  2013-10-02 16:53           ` Charles Cazabon
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Murphy @ 2013-10-02  3:50 UTC (permalink / raw)
  To: Charles Cazabon; +Cc: btrfs list

On Oct 1, 2013, at 9:13 PM, Charles Cazabon <charlesc-lists-btrfs@pyropus.ca> wrote:
> 
> Ah, I'm not looking to repair the files -- I can recopy the files easily
> enough, and rsync will pick up any files whose contents have been corrupted.
> I'd like to get the filesystem fixed, though.  i.e., even deleting the
> affected files would be fine.

If you run a scrub, dmesg should contain the path for affected files which you can then delete. If it's just a checksum problem with files, the file system doesn't need fixing. I'd wait until the raid is finished syncing.

>  This is a new filesystem to replace my existing
> (full) backups filesystem.  The existing backups one is ext4 but this new one
> is too big for mkfs.ext4 to handle, so btrfs it is.  I wasn't expecting
> problems as I've been running btrfs for other purposes for years.

It's still experimental. I'd expect almost anything.

> 
> Am I misunderstanding something here?  It seems to me like btrfsck is telling
> me there's problems with the filesystem itself when it continues to report
> these checksum errors even after a `btrfsck --repair`.

Well I haven't seen the entire btrfsck or the entire dmesg so like I said I'm sorta guessing it's just a file problem, but maybe you've stumbled on something else.

> 
> It's a brand new array.  The initial sync is actually still going on (about
> half complete; it'll take several days to initialize an array this size on
> this hardware).

OK maybe someone else can comment if this is expected to work, maybe on linux-raid even. But now you tell us this? You didn't think it might be important to mention that you've got a raid initially syncing, that you've formatted btrfs, copied files over, and at some point you got a kerne lock up, and then once restarted you ran a btrfsck?

I would expect problems with any file system, with a system that locks up while the raid is still syncing.

> So in short, the underlying array is clean.

Well except you've got either file system corruption, or corrupt files.

> 
>> What is the 'smartctl -l scterc /dev/sdX' result for one of the drives?
> 
>  Warning: device does not support SCT Error Recovery Control command

These drives aren't well suited for RAID of any kind. Hopefully, at least, you will change the scsi layer time out for each drive using 
echo 121 >/sys/block/sdX/device/timeout

That may not even be long enough, but without more information about what the ERC timeout of the drive is, which the manufacturer might have in the exhaustive version of their spec book, it's a guess. Consumer drives try to recover for up to a couple minutes. If the scsi layer resets in 30 seconds (the default) then sector problems are never fixed because the drive never reports the read error back to the kernel. And md won't write over the bad sector with reconstructed data. So you get an accumulation of bad sectors, rather than them being taken care of normally.

Your application layer might get frustrated, or worse, with up to 2 minute delays in the storage stack.

> 
>> This sounds to me like it could be a bit flip, and btrfs is catching it but
>> doesn't have a 2nd copy of the data. Just a guess.
> 
> If one of the disks flipped a bit, it would be caught at the md RAID-6 level,
> no?

No. In normal operation the parity is never consulted, so it would have no idea if there's a flipped bit. The hardware ought to catch it, but we know that isn't always true.

Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is `btrfsck --repair` supposed to actually repair problems?
  2013-10-02  3:50         ` Chris Murphy
@ 2013-10-02 16:53           ` Charles Cazabon
  2013-10-02 19:13             ` Chris Murphy
  0 siblings, 1 reply; 9+ messages in thread
From: Charles Cazabon @ 2013-10-02 16:53 UTC (permalink / raw)
  To: btrfs list

Chris Murphy <lists@colorremedies.com> wrote:
> On Oct 1, 2013, at 9:13 PM, Charles Cazabon wrote:
> > 
> > Ah, I'm not looking to repair the files -- I can recopy the files easily
> > enough, and rsync will pick up any files whose contents have been corrupted.
> 
> If you run a scrub, dmesg should contain the path for affected files which
> you can then delete. If it's just a checksum problem with files, the file
> system doesn't need fixing.

Okay, I'll do that.

> I'd wait until the raid is finished syncing.

Strictly speaking, this shouldn't be necessary.  mdadm arrays are fully usable
from creation during the initial sync; the system tracks which bits have been
initialized and which haven't.

> > It's a brand new array.  The initial sync is actually still going on
> > (about half complete; it'll take several days to initialize an array this
> > size on this hardware).
> 
> OK maybe someone else can comment if this is expected to work, maybe on
> linux-raid even.

https://raid.wiki.kernel.org/index.php/Initial_Array_Creation talks about the
initial (re)sync.  It explicitly states:

  This can take quite a time and the array is not fully resilient whilst this
  is happening (it is however fully useable). 

> But now you tell us this? You didn't think it might be important to mention
> that you've got a raid initially syncing, that you've formatted btrfs,
> copied files over, and at some point you got a kerne lock up, and then once
> restarted you ran a btrfsck?

Yes.  The array uses a write-intent bitmap, so the kernel lockup during the
initial sync does not cause corruption; when the system is brought back up, it
may re-initialize a portion that it had already initialized (i.e. it's not
100% efficient), but it doesn't result in corruption.

> I would expect problems with any file system, with a system that locks up
> while the raid is still syncing.

No, this doesn't cause any particular problems.  It's just like the normal
case of a single-drive filesystem and the system crashing during a write.
You just fsck to address any problems the interrupted write caused and recover
the journal (if applicable).

Charles
-- 
-----------------------------------------------------------------------
Charles Cazabon
GPL'ed software available at:               http://pyropus.ca/software/
-----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is `btrfsck --repair` supposed to actually repair problems?
  2013-10-02 16:53           ` Charles Cazabon
@ 2013-10-02 19:13             ` Chris Murphy
  2013-10-02 19:56               ` Charles Cazabon
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Murphy @ 2013-10-02 19:13 UTC (permalink / raw)
  To: Charles Cazabon; +Cc: btrfs list

On Oct 2, 2013, at 10:53 AM, Charles Cazabon <charlesc-lists-btrfs@pyropus.ca> wrote:

>> I'd wait until the raid is finished syncing.
> 
> Strictly speaking, this shouldn't be necessary.  mdadm arrays are fully usable
> from creation during the initial sync; the system tracks which bits have been
> initialized and which haven't.

I know but it's a 16TB array, do you really want to start over from scratch? No. And neither do most people. So this isn't a use case that's probably getting a ton of testing.

>> But now you tell us this? You didn't think it might be important to mention
>> that you've got a raid initially syncing, that you've formatted btrfs,
>> copied files over, and at some point you got a kerne lock up, and then once
>> restarted you ran a btrfsck?
> 
> Yes.  The array uses a write-intent bitmap, so the kernel lockup during the
> initial sync does not cause corruption; when the system is brought back up, it
> may re-initialize a portion that it had already initialized (i.e. it's not
> 100% efficient), but it doesn't result in corruption.

OK except there is corruption. We just don't know for sure if it's just files or if it's the file system. If you don't know already what caused it, it's not really correct to say what doesn't result in corruption.

Also the write-intent bitmap isn't configured by default, and you didn't previous say that it was. Is this an internal or external bitmap?

>> I would expect problems with any file system, with a system that locks up
>> while the raid is still syncing.
> 
> No, this doesn't cause any particular problems.  It's just like the normal
> case of a single-drive filesystem and the system crashing during a write.
> You just fsck to address any problems the interrupted write caused and recover
> the journal (if applicable).

If only hardware worked exactly per spec, and also didn't lie about committing data to disk rather than merely keeping it in cache, this may be true. But hardware lies, it has bugs. And the kernel isn't bug free either.

Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is `btrfsck --repair` supposed to actually repair problems?
  2013-10-02 19:13             ` Chris Murphy
@ 2013-10-02 19:56               ` Charles Cazabon
  0 siblings, 0 replies; 9+ messages in thread
From: Charles Cazabon @ 2013-10-02 19:56 UTC (permalink / raw)
  To: btrfs list

Chris Murphy <lists@colorremedies.com> wrote:
> On Oct 2, 2013, at 10:53 AM, Charles Cazabon wrote:
> 
> >> I'd wait until the raid is finished syncing.
> > 
> > Strictly speaking, this shouldn't be necessary.
> 
> I know but it's a 16TB array, do you really want to start over from scratch?
> No. And neither do most people. So this isn't a use case that's probably
> getting a ton of testing.

Fair enough.  The sync should be done late today or early tomorrow, and I am
waiting for it to complete before continuing to debug this.  I'll start with
the scrub you mentioned.

> Also the write-intent bitmap isn't configured by default, and you didn't
> previous say that it was. Is this an internal or external bitmap?

Internal.

Thanks for your assistance to date.

Charles
-- 
-----------------------------------------------------------------------
Charles Cazabon
GPL'ed software available at:               http://pyropus.ca/software/
-----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-10-02 19:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-01 21:12 Is `btrfsck --repair` supposed to actually repair problems? Charles Cazabon
2013-10-01 22:01 ` Chris Murphy
2013-10-01 23:46   ` Charles Cazabon
2013-10-02  0:42     ` Chris Murphy
2013-10-02  3:13       ` Charles Cazabon
2013-10-02  3:50         ` Chris Murphy
2013-10-02 16:53           ` Charles Cazabon
2013-10-02 19:13             ` Chris Murphy
2013-10-02 19:56               ` Charles Cazabon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).