raid10 array lost with single disk failure?

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* raid10 array lost with single disk failure?
@ 2017-07-08  4:26 Adam Bahe
  2017-07-08  4:40 ` Adam Bahe
  2017-07-08 20:51 ` Duncan
  0 siblings, 2 replies; 7+ messages in thread
From: Adam Bahe @ 2017-07-08  4:26 UTC (permalink / raw)
  To: linux-btrfs

Hello all,

I have a 18 device raid10 array that has recently stopped working.
Seems like whenever my array tries to mount, it sits there with all
disks doing I/O but never fully mounts. Eventually after a few minutes
of attempting to mount the entire system locks up. This is as best I
could get out of the logs before it froze up on me:


[  851.358139] BTRFS: device label btrfs_pool1 devid 18 transid 1546569 /dev/sds

[  856.247402] BTRFS info (device sds): disk space caching is enabled

[  856.247405] BTRFS info (device sds): has skinny extents

[  968.236099] perf: interrupt took too long (2524 > 2500), lowering
kernel.perf_event_max_sample_rate to 79000

[  969.375296] BUG: unable to handle kernel NULL pointer dereference
at 00000000000001f0

[  969.376583] IP: can_overcommit+0x1d/0x110 [btrfs]

[  969.377707] PGD 0

[  969.379870] Oops: 0000 [#1] SMP

[  969.380932] Modules linked in: dm_mod 8021q garp mrp rpcrdma
ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi
ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm
ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core ext4 jbd2
mbcache sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp
coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd
intel_cstate iTCO_wdt iTCO_vendor_support intel_rapl_perf mei_me ses
lpc_ich pcspkr input_leds joydev enclosure i2c_i801 mfd_core mei sg
ioatdma wmi shpchp ipmi_si ipmi_devintf ipmi_msghandler
acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc
ip_tables btrfs xor raid6_pq mlx4_en sd_mod crc32c_intel mlx4_core ast
i2c_algo_bit drm_kms_helper

[  969.389915]  syscopyarea ata_generic sysfillrect pata_acpi
sysimgblt fb_sys_fops ttm ixgbe drm mdio mpt3sas ptp pps_core
raid_class ata_piix dca scsi_transport_sas libata fjes

[  969.392846] CPU: 35 PID: 20864 Comm: kworker/u97:10 Tainted: G
    I     4.10.6-1.el7.elrepo.x86_64 #1

[  969.394344] Hardware name: Supermicro Super Server/X10DRi-T4+, BIOS
2.0 12/17/2015


I did recently upgrade the kernel a few days ago from
4.8.7-1.el7.elrepo.x86_64 to 4.10.6-1.el7.elrepo.x86_64. I had also
added a new 6TB disk a few days ago but I'm not sure if the balance
finished as it locked up sometime today when I was at work. Any ideas
how I can recover? Even if I have 1 bad disk, raid10 should have kept
my data safe no? Is there anything I can do to recover?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid10 array lost with single disk failure?
  2017-07-08  4:26 raid10 array lost with single disk failure? Adam Bahe
@ 2017-07-08  4:40 ` Adam Bahe
  2017-07-08 19:37   ` Duncan
  2017-07-08 20:51 ` Duncan
  1 sibling, 1 reply; 7+ messages in thread
From: Adam Bahe @ 2017-07-08  4:40 UTC (permalink / raw)
  To: linux-btrfs

Some additional information. I am running Rockstor just like Daniel
Brady noted in his post just before mine titled "Chunk root problem".
Sorry I am somewhat unfamiliar with newsgroups so I am not sure how to
reply to his thread before I was subscribed. But I am noticing
something in my logs very similar to his, I get:

[  716.902506] BTRFS error (device sdb): failed to read the system array: -5
[  716.918284] BTRFS error (device sdb): open_ctree failed
[  717.004162] BTRFS warning (device sdb): 'recovery' is deprecated,
use 'usebackuproot' instead
[  717.004165] BTRFS info (device sdb): trying to use backup root at mount time
[  717.004167] BTRFS info (device sdb): disk space caching is enabled
[  717.004168] BTRFS info (device sdb): has skinny extents
[  717.005673] BTRFS error (device sdb): failed to read the system array: -5
[  717.020248] BTRFS error (device sdb): open_ctree failed

He also received a similar open_ctree failed message after he upgraded
his kernel on Rockstor to 4.10.6-1.el7.elrepo.x86_64 and
btrfs-progs-4.10.1-0.rockstor.x86_64.

On Fri, Jul 7, 2017 at 11:26 PM, Adam Bahe <adambahe@gmail.com> wrote:
> Hello all,
>
> I have a 18 device raid10 array that has recently stopped working.
> Seems like whenever my array tries to mount, it sits there with all
> disks doing I/O but never fully mounts. Eventually after a few minutes
> of attempting to mount the entire system locks up. This is as best I
> could get out of the logs before it froze up on me:
>
>
> [  851.358139] BTRFS: device label btrfs_pool1 devid 18 transid 1546569 /dev/sds
>
> [  856.247402] BTRFS info (device sds): disk space caching is enabled
>
> [  856.247405] BTRFS info (device sds): has skinny extents
>
> [  968.236099] perf: interrupt took too long (2524 > 2500), lowering
> kernel.perf_event_max_sample_rate to 79000
>
> [  969.375296] BUG: unable to handle kernel NULL pointer dereference
> at 00000000000001f0
>
> [  969.376583] IP: can_overcommit+0x1d/0x110 [btrfs]
>
> [  969.377707] PGD 0
>
> [  969.379870] Oops: 0000 [#1] SMP
>
> [  969.380932] Modules linked in: dm_mod 8021q garp mrp rpcrdma
> ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi
> ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm
> ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core ext4 jbd2
> mbcache sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp
> coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul
> ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd
> intel_cstate iTCO_wdt iTCO_vendor_support intel_rapl_perf mei_me ses
> lpc_ich pcspkr input_leds joydev enclosure i2c_i801 mfd_core mei sg
> ioatdma wmi shpchp ipmi_si ipmi_devintf ipmi_msghandler
> acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc
> ip_tables btrfs xor raid6_pq mlx4_en sd_mod crc32c_intel mlx4_core ast
> i2c_algo_bit drm_kms_helper
>
> [  969.389915]  syscopyarea ata_generic sysfillrect pata_acpi
> sysimgblt fb_sys_fops ttm ixgbe drm mdio mpt3sas ptp pps_core
> raid_class ata_piix dca scsi_transport_sas libata fjes
>
> [  969.392846] CPU: 35 PID: 20864 Comm: kworker/u97:10 Tainted: G
>     I     4.10.6-1.el7.elrepo.x86_64 #1
>
> [  969.394344] Hardware name: Supermicro Super Server/X10DRi-T4+, BIOS
> 2.0 12/17/2015
>
>
> I did recently upgrade the kernel a few days ago from
> 4.8.7-1.el7.elrepo.x86_64 to 4.10.6-1.el7.elrepo.x86_64. I had also
> added a new 6TB disk a few days ago but I'm not sure if the balance
> finished as it locked up sometime today when I was at work. Any ideas
> how I can recover? Even if I have 1 bad disk, raid10 should have kept
> my data safe no? Is there anything I can do to recover?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid10 array lost with single disk failure?
  2017-07-08  4:40 ` Adam Bahe
@ 2017-07-08 19:37   ` Duncan
  0 siblings, 0 replies; 7+ messages in thread
From: Duncan @ 2017-07-08 19:37 UTC (permalink / raw)
  To: linux-btrfs

Adam Bahe posted on Fri, 07 Jul 2017 23:40:20 -0500 as excerpted:

> Some additional information. I am running Rockstor just like Daniel
> Brady noted in his post just before mine titled "Chunk root problem".
> Sorry I am somewhat unfamiliar with newsgroups so I am not sure how to
> reply to his thread before I was subscribed. But I am noticing something
> in my logs very similar to his, I get:
> 
> [  716.902506] BTRFS error (device sdb): failed to read the system
> array: -5
> [  716.918284] BTRFS error (device sdb): open_ctree failed
> [  717.004162] BTRFS warning (device sdb): 'recovery' is deprecated,
> use 'usebackuproot' instead
> [  717.004165] BTRFS info (device sdb): trying to use backup root at
> mount time
> [  717.004167] BTRFS info (device sdb): disk space caching is enabled
> [  717.004168] BTRFS info (device sdb): has skinny extents
> [  717.005673] BTRFS error (device sdb): failed to read the system
> array: -5
> [  717.020248] BTRFS error (device sdb): open_ctree failed
> 
> He also received a similar open_ctree failed message after he upgraded
> his kernel on Rockstor to 4.10.6-1.el7.elrepo.x86_64 and
> btrfs-progs-4.10.1-0.rockstor.x86_64.

FWIW that's not significant.  Open ctree failed is simply the generic 
btrfs failure to mount message.  It tells you nothing of the real 
problem, so other clues must be used to discern that.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid10 array lost with single disk failure?
  2017-07-08  4:26 raid10 array lost with single disk failure? Adam Bahe
  2017-07-08  4:40 ` Adam Bahe
@ 2017-07-08 20:51 ` Duncan
  2017-07-09 23:13   ` Ferry Toth
  1 sibling, 1 reply; 7+ messages in thread
From: Duncan @ 2017-07-08 20:51 UTC (permalink / raw)
  To: linux-btrfs

Adam Bahe posted on Fri, 07 Jul 2017 23:26:31 -0500 as excerpted:

> I did recently upgrade the kernel a few days ago from
> 4.8.7-1.el7.elrepo.x86_64 to 4.10.6-1.el7.elrepo.x86_64. I had also
> added a new 6TB disk a few days ago but I'm not sure if the balance
> finished as it locked up sometime today when I was at work. Any ideas
> how I can recover? Even if I have 1 bad disk, raid10 should have kept my
> data safe no? Is there anything I can do to recover?

Yes, btrfs raid10 should be fine with a single bad device.  That's 
unlikely to be the issue.

But you did well to bring up the balance.  Have you tried mounting with 
the "skip_balance" mount option?

Sometimes a balance will run into a previously undetected problem with 
the filesystem and crash.  While mounting would otherwise still work, as 
soon as the filesystem goes active at the kernel level and before the 
mount call returns to userspace, the kernel will see the in-progress 
balance and attempt to continue it.  But if it crashed while processing a 
particular block group (aka chunk), of course that's the first one in 
line to continue the balance with, which will naturally crash again as it 
comes to the same inconsistency that triggered the crash the first time.

So the skip_balance mount option was invented to create a work-around and 
allow you to mount the filesystem again. =:^)

The fact that it sits there for awhile trying to do IO on all devices 
before it crashes is another clue it's probably the resumed balance 
crashing things as it comes to the same inconsistency that triggered the 
original crash during balance, so it's very likely that skip_balance will 
help. =:^)

Assuming that lets you mount, the next thing I'd try is a btrfs scrub.  
Chances are it'll find some checksum problems, but given that you're 
running raid10, there's a second copy it can try to use to correct the 
bad one and there's a reasonably good chance scrub will find and fix your 
problems.  Even if it can't fix them all, it should get you closer, with 
less chance at making things worse instead of better than more risky 
options such as btrfs check with --repair.

If a scrub completes with no uncorrected errors, I'd do an umount/mount 
cycle or reboot just to be sure -- don't forget the skip_balance option 
again tho -- and then, ensuring you're not doing anything that a crash 
would interrupt and have taken the opportunity presented to update your 
backups if you need to and assuming you consider the data worth more than 
the time/trouble/resources required for a backup, try a balance resume.

Once the balance resume gets reasonably past the time it otherwise took 
to crash, you can reasonably assume you've safely corrected at least 
/that/ inconsistency, and hope the scrub took care of any others before 
you got to them.

But of course all scrub does is verify checksums and where there's a 
second copy (as there is with dup, raid1 and raid10 modes) attempt a 
repair of the bad copy with the second one, of course verifying it as 
well in the process.  If the second copy of that block is bad too or in 
cases where there isn't such a second copy, it'll detect but not be able 
to fix the block with a bad checksum, and if the block has a valid 
checksum but is logically invalid for other reasons, scrub won't detect 
it, because /all/ it does is verify checksums, not actual filesystem 
consistency.  That's what the somewhat more risky (if --repair or other 
fix option is used, not in read-only mode, which detects but doesn't 
attempt to fix) btrfs check is for.

So if skip_balance doesn't work, or it does but scrub can't fix all the 
errors it finds, or scrub fixes everything it detects but a balance 
resume still crashes, then it's time to try riskier fixes.  I'll let 
others guide you there if needed, but will leave you with one reminder...

Sysadmin's first rule of backups:

Don't test fate and challenge reality!  Have your backups or regardless 
of claims to the contrary you're defining your data as throw-away value, 
and eventually, fate and reality are going to call you on it!

So don't worry too much even if you lose the filesystem.  Either you have 
backups and can restore from them should it be necessary, or you defined 
the data as not worth the trouble of those backups, and losing it isn't a 
big deal, because in either case you saved what was truly important to 
you, either the data because it was important enough to you to have 
backups, or the time/resources/trouble you would have spent doing those 
backups, which you still saved regardless of whether you can save the 
data or not. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid10 array lost with single disk failure?
  2017-07-08 20:51 ` Duncan
@ 2017-07-09 23:13   ` Ferry Toth
  2017-07-10  2:13     ` Adam Bahe
  0 siblings, 1 reply; 7+ messages in thread
From: Ferry Toth @ 2017-07-09 23:13 UTC (permalink / raw)
  To: linux-btrfs

Op Sat, 08 Jul 2017 20:51:41 +0000, schreef Duncan:

> Adam Bahe posted on Fri, 07 Jul 2017 23:26:31 -0500 as excerpted:
> 
>> I did recently upgrade the kernel a few days ago from
>> 4.8.7-1.el7.elrepo.x86_64 to 4.10.6-1.el7.elrepo.x86_64. I had also
>> added a new 6TB disk a few days ago but I'm not sure if the balance
>> finished as it locked up sometime today when I was at work. Any ideas
>> how I can recover? Even if I have 1 bad disk, raid10 should have kept
>> my data safe no? Is there anything I can do to recover?
> 
> Yes, btrfs raid10 should be fine with a single bad device.  That's
> unlikely to be the issue.

I'm wondering about that.

The btrfs wiki says 'mostly ok' for raid10 and mentions btrfs needs to be 
able to create 2 copies of a file to prevent going into an unreversable 
read only.

To me that sounds like you are save against a single failing drive if you 
had 5 to start with. Is that correct?

'mostly ok' is a bit useless message. We need to know what to do to be 
safe, and to have defined procedures to prevent screwing up 
'unrecoverably' when something bad happens.

Any advice?

> But you did well to bring up the balance.  Have you tried mounting with
> the "skip_balance" mount option?
> 
> Sometimes a balance will run into a previously undetected problem with
> the filesystem and crash.  While mounting would otherwise still work, as
> soon as the filesystem goes active at the kernel level and before the
> mount call returns to userspace, the kernel will see the in-progress
> balance and attempt to continue it.  But if it crashed while processing
> a particular block group (aka chunk), of course that's the first one in
> line to continue the balance with, which will naturally crash again as
> it comes to the same inconsistency that triggered the crash the first
> time.
> 
> So the skip_balance mount option was invented to create a work-around
> and allow you to mount the filesystem again. =:^)
> 
> The fact that it sits there for awhile trying to do IO on all devices
> before it crashes is another clue it's probably the resumed balance
> crashing things as it comes to the same inconsistency that triggered the
> original crash during balance, so it's very likely that skip_balance
> will help. =:^)
> 
> Assuming that lets you mount, the next thing I'd try is a btrfs scrub.
> Chances are it'll find some checksum problems, but given that you're
> running raid10, there's a second copy it can try to use to correct the
> bad one and there's a reasonably good chance scrub will find and fix
> your problems.  Even if it can't fix them all, it should get you closer,
> with less chance at making things worse instead of better than more
> risky options such as btrfs check with --repair.
> 
> If a scrub completes with no uncorrected errors, I'd do an umount/mount
> cycle or reboot just to be sure -- don't forget the skip_balance option
> again tho -- and then, ensuring you're not doing anything that a crash
> would interrupt and have taken the opportunity presented to update your
> backups if you need to and assuming you consider the data worth more
> than the time/trouble/resources required for a backup, try a balance
> resume.
> 
> Once the balance resume gets reasonably past the time it otherwise took
> to crash, you can reasonably assume you've safely corrected at least
> /that/ inconsistency, and hope the scrub took care of any others before
> you got to them.
> 
> But of course all scrub does is verify checksums and where there's a
> second copy (as there is with dup, raid1 and raid10 modes) attempt a
> repair of the bad copy with the second one, of course verifying it as
> well in the process.  If the second copy of that block is bad too or in
> cases where there isn't such a second copy, it'll detect but not be able
> to fix the block with a bad checksum, and if the block has a valid
> checksum but is logically invalid for other reasons, scrub won't detect
> it, because /all/ it does is verify checksums, not actual filesystem
> consistency.  That's what the somewhat more risky (if --repair or other
> fix option is used, not in read-only mode, which detects but doesn't
> attempt to fix) btrfs check is for.
> 
> So if skip_balance doesn't work, or it does but scrub can't fix all the
> errors it finds, or scrub fixes everything it detects but a balance
> resume still crashes, then it's time to try riskier fixes.  I'll let
> others guide you there if needed, but will leave you with one
> reminder...
> 
> Sysadmin's first rule of backups:
> 
> Don't test fate and challenge reality!  Have your backups or regardless
> of claims to the contrary you're defining your data as throw-away value,
> and eventually, fate and reality are going to call you on it!
> 
> So don't worry too much even if you lose the filesystem.  Either you
> have backups and can restore from them should it be necessary, or you
> defined the data as not worth the trouble of those backups, and losing
> it isn't a big deal, because in either case you saved what was truly
> important to you, either the data because it was important enough to you
> to have backups, or the time/resources/trouble you would have spent
> doing those backups, which you still saved regardless of whether you can
> save the data or not. =:^)
> 
> --
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid10 array lost with single disk failure?
  2017-07-09 23:13   ` Ferry Toth
@ 2017-07-10  2:13     ` Adam Bahe
  2017-07-10 12:49       ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 7+ messages in thread
From: Adam Bahe @ 2017-07-10  2:13 UTC (permalink / raw)
  To: linux-btrfs

I have finished all of the above suggestions, ran a scrub, remounted,
rebooted, made sure the system didn't hang, and then kicked off
another balance on the entire pool. It completed rather quickly but
something still does not seem right.

Label: 'btrfs_pool1'  uuid: 04a7fa70-1572-47a2-a55c-7c99aef12603
        Total devices 18 FS bytes used 23.64TiB
        devid    1 size 1.82TiB used 1.82TiB path /dev/sdd
        devid    2 size 1.82TiB used 1.82TiB path /dev/sdf
        devid    3 size 3.64TiB used 3.07TiB path /dev/sdg
        devid    4 size 3.64TiB used 3.06TiB path /dev/sdk
        devid    5 size 1.82TiB used 1.82TiB path /dev/sdn
        devid    6 size 3.64TiB used 3.06TiB path /dev/sdo
        devid    7 size 1.82TiB used 1.82TiB path /dev/sds
        devid    8 size 1.82TiB used 1.82TiB path /dev/sdj
        devid    9 size 1.82TiB used 1.82TiB path /dev/sdi
        devid   10 size 1.82TiB used 1.82TiB path /dev/sdq
        devid   11 size 1.82TiB used 1.82TiB path /dev/sdr
        devid   12 size 1.82TiB used 1.82TiB path /dev/sde
        devid   13 size 1.82TiB used 1.82TiB path /dev/sdm
        devid   14 size 7.28TiB used 4.78TiB path /dev/sdh
        devid   15 size 7.28TiB used 4.99TiB path /dev/sdl
        devid   16 size 7.28TiB used 4.97TiB path /dev/sdp
        devid   17 size 7.28TiB used 4.99TiB path /dev/sdc
        devid   18 size 5.46TiB used 210.12GiB path /dev/sdb

/dev/sdb is the new disk, but btrfs only moved 210.12GB over to it.
Most disks in the array are >50% utilized or more. Is this normal?

On Sun, Jul 9, 2017 at 6:13 PM, Ferry Toth <ftoth@exalondelft.nl> wrote:
> Op Sat, 08 Jul 2017 20:51:41 +0000, schreef Duncan:
>
>> Adam Bahe posted on Fri, 07 Jul 2017 23:26:31 -0500 as excerpted:
>>
>>> I did recently upgrade the kernel a few days ago from
>>> 4.8.7-1.el7.elrepo.x86_64 to 4.10.6-1.el7.elrepo.x86_64. I had also
>>> added a new 6TB disk a few days ago but I'm not sure if the balance
>>> finished as it locked up sometime today when I was at work. Any ideas
>>> how I can recover? Even if I have 1 bad disk, raid10 should have kept
>>> my data safe no? Is there anything I can do to recover?
>>
>> Yes, btrfs raid10 should be fine with a single bad device.  That's
>> unlikely to be the issue.
>
> I'm wondering about that.
>
> The btrfs wiki says 'mostly ok' for raid10 and mentions btrfs needs to be
> able to create 2 copies of a file to prevent going into an unreversable
> read only.
>
> To me that sounds like you are save against a single failing drive if you
> had 5 to start with. Is that correct?
>
> 'mostly ok' is a bit useless message. We need to know what to do to be
> safe, and to have defined procedures to prevent screwing up
> 'unrecoverably' when something bad happens.
>
> Any advice?
>
>> But you did well to bring up the balance.  Have you tried mounting with
>> the "skip_balance" mount option?
>>
>> Sometimes a balance will run into a previously undetected problem with
>> the filesystem and crash.  While mounting would otherwise still work, as
>> soon as the filesystem goes active at the kernel level and before the
>> mount call returns to userspace, the kernel will see the in-progress
>> balance and attempt to continue it.  But if it crashed while processing
>> a particular block group (aka chunk), of course that's the first one in
>> line to continue the balance with, which will naturally crash again as
>> it comes to the same inconsistency that triggered the crash the first
>> time.
>>
>> So the skip_balance mount option was invented to create a work-around
>> and allow you to mount the filesystem again. =:^)
>>
>> The fact that it sits there for awhile trying to do IO on all devices
>> before it crashes is another clue it's probably the resumed balance
>> crashing things as it comes to the same inconsistency that triggered the
>> original crash during balance, so it's very likely that skip_balance
>> will help. =:^)
>>
>> Assuming that lets you mount, the next thing I'd try is a btrfs scrub.
>> Chances are it'll find some checksum problems, but given that you're
>> running raid10, there's a second copy it can try to use to correct the
>> bad one and there's a reasonably good chance scrub will find and fix
>> your problems.  Even if it can't fix them all, it should get you closer,
>> with less chance at making things worse instead of better than more
>> risky options such as btrfs check with --repair.
>>
>> If a scrub completes with no uncorrected errors, I'd do an umount/mount
>> cycle or reboot just to be sure -- don't forget the skip_balance option
>> again tho -- and then, ensuring you're not doing anything that a crash
>> would interrupt and have taken the opportunity presented to update your
>> backups if you need to and assuming you consider the data worth more
>> than the time/trouble/resources required for a backup, try a balance
>> resume.
>>
>> Once the balance resume gets reasonably past the time it otherwise took
>> to crash, you can reasonably assume you've safely corrected at least
>> /that/ inconsistency, and hope the scrub took care of any others before
>> you got to them.
>>
>> But of course all scrub does is verify checksums and where there's a
>> second copy (as there is with dup, raid1 and raid10 modes) attempt a
>> repair of the bad copy with the second one, of course verifying it as
>> well in the process.  If the second copy of that block is bad too or in
>> cases where there isn't such a second copy, it'll detect but not be able
>> to fix the block with a bad checksum, and if the block has a valid
>> checksum but is logically invalid for other reasons, scrub won't detect
>> it, because /all/ it does is verify checksums, not actual filesystem
>> consistency.  That's what the somewhat more risky (if --repair or other
>> fix option is used, not in read-only mode, which detects but doesn't
>> attempt to fix) btrfs check is for.
>>
>> So if skip_balance doesn't work, or it does but scrub can't fix all the
>> errors it finds, or scrub fixes everything it detects but a balance
>> resume still crashes, then it's time to try riskier fixes.  I'll let
>> others guide you there if needed, but will leave you with one
>> reminder...
>>
>> Sysadmin's first rule of backups:
>>
>> Don't test fate and challenge reality!  Have your backups or regardless
>> of claims to the contrary you're defining your data as throw-away value,
>> and eventually, fate and reality are going to call you on it!
>>
>> So don't worry too much even if you lose the filesystem.  Either you
>> have backups and can restore from them should it be necessary, or you
>> defined the data as not worth the trouble of those backups, and losing
>> it isn't a big deal, because in either case you saved what was truly
>> important to you, either the data because it was important enough to you
>> to have backups, or the time/resources/trouble you would have spent
>> doing those backups, which you still saved regardless of whether you can
>> save the data or not. =:^)
>>
>> --
>> Duncan - List replies preferred.   No HTML msgs.
>> "Every nonfree program has a lord, a master --
>> and if you use the program, he is your master."  Richard Stallman
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid10 array lost with single disk failure?
  2017-07-10  2:13     ` Adam Bahe
@ 2017-07-10 12:49       ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 7+ messages in thread
From: Austin S. Hemmelgarn @ 2017-07-10 12:49 UTC (permalink / raw)
  To: Adam Bahe, linux-btrfs

On 2017-07-09 22:13, Adam Bahe wrote:
> I have finished all of the above suggestions, ran a scrub, remounted,
> rebooted, made sure the system didn't hang, and then kicked off
> another balance on the entire pool. It completed rather quickly but
> something still does not seem right.
> 
> Label: 'btrfs_pool1'  uuid: 04a7fa70-1572-47a2-a55c-7c99aef12603
>          Total devices 18 FS bytes used 23.64TiB
>          devid    1 size 1.82TiB used 1.82TiB path /dev/sdd
>          devid    2 size 1.82TiB used 1.82TiB path /dev/sdf
>          devid    3 size 3.64TiB used 3.07TiB path /dev/sdg
>          devid    4 size 3.64TiB used 3.06TiB path /dev/sdk
>          devid    5 size 1.82TiB used 1.82TiB path /dev/sdn
>          devid    6 size 3.64TiB used 3.06TiB path /dev/sdo
>          devid    7 size 1.82TiB used 1.82TiB path /dev/sds
>          devid    8 size 1.82TiB used 1.82TiB path /dev/sdj
>          devid    9 size 1.82TiB used 1.82TiB path /dev/sdi
>          devid   10 size 1.82TiB used 1.82TiB path /dev/sdq
>          devid   11 size 1.82TiB used 1.82TiB path /dev/sdr
>          devid   12 size 1.82TiB used 1.82TiB path /dev/sde
>          devid   13 size 1.82TiB used 1.82TiB path /dev/sdm
>          devid   14 size 7.28TiB used 4.78TiB path /dev/sdh
>          devid   15 size 7.28TiB used 4.99TiB path /dev/sdl
>          devid   16 size 7.28TiB used 4.97TiB path /dev/sdp
>          devid   17 size 7.28TiB used 4.99TiB path /dev/sdc
>          devid   18 size 5.46TiB used 210.12GiB path /dev/sdb
> 
> /dev/sdb is the new disk, but btrfs only moved 210.12GB over to it.
> Most disks in the array are >50% utilized or more. Is this normal?
> 
Was this from a full balance, or just running a scrub to repair chunks?

You have three ways you can repair a BTRFS volume that's lost a device:

* The first, quickest, and most reliable is to use `btrfs device 
replace` to replace the failing/missing device.  This will result in 
only reading data that needs to be read to go on the new device, so it 
completes quicker, but you will also need to resize the new device if 
you are going to a larger device, and can't replace the missing device 
with a smaller one.

* The second is to add the device to the array, then run a scrub on the 
whole array.  This will spit out a bunch of errors from the chunks that 
need to be rebuilt, but will make sure everything is consistent.  This 
isn't as fast as using `device replace`, but is still quicker than a 
full balance most of the time.  In this particular case, I would expect 
behavior like what you're seeing above at least some of the time.

* The third, and slowest method, is to add the new device, then run a 
full balance.  This will make sure data is evenly distributed 
proportionate to device size and will rebuild all the partial chunks. 
It will also take the longest, and put significantly more stress on the 
array than the other two options (it rewrites the entire array).  If 
this is what you used, then you probably found a bug, because it should 
never result in what you're seeing.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-07-10 12:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-08  4:26 raid10 array lost with single disk failure? Adam Bahe
2017-07-08  4:40 ` Adam Bahe
2017-07-08 19:37   ` Duncan
2017-07-08 20:51 ` Duncan
2017-07-09 23:13   ` Ferry Toth
2017-07-10  2:13     ` Adam Bahe
2017-07-10 12:49       ` Austin S. Hemmelgarn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).