* raid10 array lost with single disk failure?
@ 2017-07-08 4:26 Adam Bahe
2017-07-08 4:40 ` Adam Bahe
2017-07-08 20:51 ` Duncan
0 siblings, 2 replies; 7+ messages in thread
From: Adam Bahe @ 2017-07-08 4:26 UTC (permalink / raw)
To: linux-btrfs
Hello all,
I have a 18 device raid10 array that has recently stopped working.
Seems like whenever my array tries to mount, it sits there with all
disks doing I/O but never fully mounts. Eventually after a few minutes
of attempting to mount the entire system locks up. This is as best I
could get out of the logs before it froze up on me:
[ 851.358139] BTRFS: device label btrfs_pool1 devid 18 transid 1546569 /dev/sds
[ 856.247402] BTRFS info (device sds): disk space caching is enabled
[ 856.247405] BTRFS info (device sds): has skinny extents
[ 968.236099] perf: interrupt took too long (2524 > 2500), lowering
kernel.perf_event_max_sample_rate to 79000
[ 969.375296] BUG: unable to handle kernel NULL pointer dereference
at 00000000000001f0
[ 969.376583] IP: can_overcommit+0x1d/0x110 [btrfs]
[ 969.377707] PGD 0
[ 969.379870] Oops: 0000 [#1] SMP
[ 969.380932] Modules linked in: dm_mod 8021q garp mrp rpcrdma
ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi
ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm
ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core ext4 jbd2
mbcache sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp
coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd
intel_cstate iTCO_wdt iTCO_vendor_support intel_rapl_perf mei_me ses
lpc_ich pcspkr input_leds joydev enclosure i2c_i801 mfd_core mei sg
ioatdma wmi shpchp ipmi_si ipmi_devintf ipmi_msghandler
acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc
ip_tables btrfs xor raid6_pq mlx4_en sd_mod crc32c_intel mlx4_core ast
i2c_algo_bit drm_kms_helper
[ 969.389915] syscopyarea ata_generic sysfillrect pata_acpi
sysimgblt fb_sys_fops ttm ixgbe drm mdio mpt3sas ptp pps_core
raid_class ata_piix dca scsi_transport_sas libata fjes
[ 969.392846] CPU: 35 PID: 20864 Comm: kworker/u97:10 Tainted: G
I 4.10.6-1.el7.elrepo.x86_64 #1
[ 969.394344] Hardware name: Supermicro Super Server/X10DRi-T4+, BIOS
2.0 12/17/2015
I did recently upgrade the kernel a few days ago from
4.8.7-1.el7.elrepo.x86_64 to 4.10.6-1.el7.elrepo.x86_64. I had also
added a new 6TB disk a few days ago but I'm not sure if the balance
finished as it locked up sometime today when I was at work. Any ideas
how I can recover? Even if I have 1 bad disk, raid10 should have kept
my data safe no? Is there anything I can do to recover?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: raid10 array lost with single disk failure?
2017-07-08 4:26 raid10 array lost with single disk failure? Adam Bahe
@ 2017-07-08 4:40 ` Adam Bahe
2017-07-08 19:37 ` Duncan
2017-07-08 20:51 ` Duncan
1 sibling, 1 reply; 7+ messages in thread
From: Adam Bahe @ 2017-07-08 4:40 UTC (permalink / raw)
To: linux-btrfs
Some additional information. I am running Rockstor just like Daniel
Brady noted in his post just before mine titled "Chunk root problem".
Sorry I am somewhat unfamiliar with newsgroups so I am not sure how to
reply to his thread before I was subscribed. But I am noticing
something in my logs very similar to his, I get:
[ 716.902506] BTRFS error (device sdb): failed to read the system array: -5
[ 716.918284] BTRFS error (device sdb): open_ctree failed
[ 717.004162] BTRFS warning (device sdb): 'recovery' is deprecated,
use 'usebackuproot' instead
[ 717.004165] BTRFS info (device sdb): trying to use backup root at mount time
[ 717.004167] BTRFS info (device sdb): disk space caching is enabled
[ 717.004168] BTRFS info (device sdb): has skinny extents
[ 717.005673] BTRFS error (device sdb): failed to read the system array: -5
[ 717.020248] BTRFS error (device sdb): open_ctree failed
He also received a similar open_ctree failed message after he upgraded
his kernel on Rockstor to 4.10.6-1.el7.elrepo.x86_64 and
btrfs-progs-4.10.1-0.rockstor.x86_64.
On Fri, Jul 7, 2017 at 11:26 PM, Adam Bahe <adambahe@gmail.com> wrote:
> Hello all,
>
> I have a 18 device raid10 array that has recently stopped working.
> Seems like whenever my array tries to mount, it sits there with all
> disks doing I/O but never fully mounts. Eventually after a few minutes
> of attempting to mount the entire system locks up. This is as best I
> could get out of the logs before it froze up on me:
>
>
> [ 851.358139] BTRFS: device label btrfs_pool1 devid 18 transid 1546569 /dev/sds
>
> [ 856.247402] BTRFS info (device sds): disk space caching is enabled
>
> [ 856.247405] BTRFS info (device sds): has skinny extents
>
> [ 968.236099] perf: interrupt took too long (2524 > 2500), lowering
> kernel.perf_event_max_sample_rate to 79000
>
> [ 969.375296] BUG: unable to handle kernel NULL pointer dereference
> at 00000000000001f0
>
> [ 969.376583] IP: can_overcommit+0x1d/0x110 [btrfs]
>
> [ 969.377707] PGD 0
>
> [ 969.379870] Oops: 0000 [#1] SMP
>
> [ 969.380932] Modules linked in: dm_mod 8021q garp mrp rpcrdma
> ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi
> ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm
> ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core ext4 jbd2
> mbcache sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp
> coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul
> ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd
> intel_cstate iTCO_wdt iTCO_vendor_support intel_rapl_perf mei_me ses
> lpc_ich pcspkr input_leds joydev enclosure i2c_i801 mfd_core mei sg
> ioatdma wmi shpchp ipmi_si ipmi_devintf ipmi_msghandler
> acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc
> ip_tables btrfs xor raid6_pq mlx4_en sd_mod crc32c_intel mlx4_core ast
> i2c_algo_bit drm_kms_helper
>
> [ 969.389915] syscopyarea ata_generic sysfillrect pata_acpi
> sysimgblt fb_sys_fops ttm ixgbe drm mdio mpt3sas ptp pps_core
> raid_class ata_piix dca scsi_transport_sas libata fjes
>
> [ 969.392846] CPU: 35 PID: 20864 Comm: kworker/u97:10 Tainted: G
> I 4.10.6-1.el7.elrepo.x86_64 #1
>
> [ 969.394344] Hardware name: Supermicro Super Server/X10DRi-T4+, BIOS
> 2.0 12/17/2015
>
>
> I did recently upgrade the kernel a few days ago from
> 4.8.7-1.el7.elrepo.x86_64 to 4.10.6-1.el7.elrepo.x86_64. I had also
> added a new 6TB disk a few days ago but I'm not sure if the balance
> finished as it locked up sometime today when I was at work. Any ideas
> how I can recover? Even if I have 1 bad disk, raid10 should have kept
> my data safe no? Is there anything I can do to recover?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: raid10 array lost with single disk failure?
2017-07-08 4:40 ` Adam Bahe
@ 2017-07-08 19:37 ` Duncan
0 siblings, 0 replies; 7+ messages in thread
From: Duncan @ 2017-07-08 19:37 UTC (permalink / raw)
To: linux-btrfs
Adam Bahe posted on Fri, 07 Jul 2017 23:40:20 -0500 as excerpted:
> Some additional information. I am running Rockstor just like Daniel
> Brady noted in his post just before mine titled "Chunk root problem".
> Sorry I am somewhat unfamiliar with newsgroups so I am not sure how to
> reply to his thread before I was subscribed. But I am noticing something
> in my logs very similar to his, I get:
>
> [ 716.902506] BTRFS error (device sdb): failed to read the system
> array: -5
> [ 716.918284] BTRFS error (device sdb): open_ctree failed
> [ 717.004162] BTRFS warning (device sdb): 'recovery' is deprecated,
> use 'usebackuproot' instead
> [ 717.004165] BTRFS info (device sdb): trying to use backup root at
> mount time
> [ 717.004167] BTRFS info (device sdb): disk space caching is enabled
> [ 717.004168] BTRFS info (device sdb): has skinny extents
> [ 717.005673] BTRFS error (device sdb): failed to read the system
> array: -5
> [ 717.020248] BTRFS error (device sdb): open_ctree failed
>
> He also received a similar open_ctree failed message after he upgraded
> his kernel on Rockstor to 4.10.6-1.el7.elrepo.x86_64 and
> btrfs-progs-4.10.1-0.rockstor.x86_64.
FWIW that's not significant. Open ctree failed is simply the generic
btrfs failure to mount message. It tells you nothing of the real
problem, so other clues must be used to discern that.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: raid10 array lost with single disk failure?
2017-07-08 4:26 raid10 array lost with single disk failure? Adam Bahe
2017-07-08 4:40 ` Adam Bahe
@ 2017-07-08 20:51 ` Duncan
2017-07-09 23:13 ` Ferry Toth
1 sibling, 1 reply; 7+ messages in thread
From: Duncan @ 2017-07-08 20:51 UTC (permalink / raw)
To: linux-btrfs
Adam Bahe posted on Fri, 07 Jul 2017 23:26:31 -0500 as excerpted:
> I did recently upgrade the kernel a few days ago from
> 4.8.7-1.el7.elrepo.x86_64 to 4.10.6-1.el7.elrepo.x86_64. I had also
> added a new 6TB disk a few days ago but I'm not sure if the balance
> finished as it locked up sometime today when I was at work. Any ideas
> how I can recover? Even if I have 1 bad disk, raid10 should have kept my
> data safe no? Is there anything I can do to recover?
Yes, btrfs raid10 should be fine with a single bad device. That's
unlikely to be the issue.
But you did well to bring up the balance. Have you tried mounting with
the "skip_balance" mount option?
Sometimes a balance will run into a previously undetected problem with
the filesystem and crash. While mounting would otherwise still work, as
soon as the filesystem goes active at the kernel level and before the
mount call returns to userspace, the kernel will see the in-progress
balance and attempt to continue it. But if it crashed while processing a
particular block group (aka chunk), of course that's the first one in
line to continue the balance with, which will naturally crash again as it
comes to the same inconsistency that triggered the crash the first time.
So the skip_balance mount option was invented to create a work-around and
allow you to mount the filesystem again. =:^)
The fact that it sits there for awhile trying to do IO on all devices
before it crashes is another clue it's probably the resumed balance
crashing things as it comes to the same inconsistency that triggered the
original crash during balance, so it's very likely that skip_balance will
help. =:^)
Assuming that lets you mount, the next thing I'd try is a btrfs scrub.
Chances are it'll find some checksum problems, but given that you're
running raid10, there's a second copy it can try to use to correct the
bad one and there's a reasonably good chance scrub will find and fix your
problems. Even if it can't fix them all, it should get you closer, with
less chance at making things worse instead of better than more risky
options such as btrfs check with --repair.
If a scrub completes with no uncorrected errors, I'd do an umount/mount
cycle or reboot just to be sure -- don't forget the skip_balance option
again tho -- and then, ensuring you're not doing anything that a crash
would interrupt and have taken the opportunity presented to update your
backups if you need to and assuming you consider the data worth more than
the time/trouble/resources required for a backup, try a balance resume.
Once the balance resume gets reasonably past the time it otherwise took
to crash, you can reasonably assume you've safely corrected at least
/that/ inconsistency, and hope the scrub took care of any others before
you got to them.
But of course all scrub does is verify checksums and where there's a
second copy (as there is with dup, raid1 and raid10 modes) attempt a
repair of the bad copy with the second one, of course verifying it as
well in the process. If the second copy of that block is bad too or in
cases where there isn't such a second copy, it'll detect but not be able
to fix the block with a bad checksum, and if the block has a valid
checksum but is logically invalid for other reasons, scrub won't detect
it, because /all/ it does is verify checksums, not actual filesystem
consistency. That's what the somewhat more risky (if --repair or other
fix option is used, not in read-only mode, which detects but doesn't
attempt to fix) btrfs check is for.
So if skip_balance doesn't work, or it does but scrub can't fix all the
errors it finds, or scrub fixes everything it detects but a balance
resume still crashes, then it's time to try riskier fixes. I'll let
others guide you there if needed, but will leave you with one reminder...
Sysadmin's first rule of backups:
Don't test fate and challenge reality! Have your backups or regardless
of claims to the contrary you're defining your data as throw-away value,
and eventually, fate and reality are going to call you on it!
So don't worry too much even if you lose the filesystem. Either you have
backups and can restore from them should it be necessary, or you defined
the data as not worth the trouble of those backups, and losing it isn't a
big deal, because in either case you saved what was truly important to
you, either the data because it was important enough to you to have
backups, or the time/resources/trouble you would have spent doing those
backups, which you still saved regardless of whether you can save the
data or not. =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: raid10 array lost with single disk failure?
2017-07-08 20:51 ` Duncan
@ 2017-07-09 23:13 ` Ferry Toth
2017-07-10 2:13 ` Adam Bahe
0 siblings, 1 reply; 7+ messages in thread
From: Ferry Toth @ 2017-07-09 23:13 UTC (permalink / raw)
To: linux-btrfs
Op Sat, 08 Jul 2017 20:51:41 +0000, schreef Duncan:
> Adam Bahe posted on Fri, 07 Jul 2017 23:26:31 -0500 as excerpted:
>
>> I did recently upgrade the kernel a few days ago from
>> 4.8.7-1.el7.elrepo.x86_64 to 4.10.6-1.el7.elrepo.x86_64. I had also
>> added a new 6TB disk a few days ago but I'm not sure if the balance
>> finished as it locked up sometime today when I was at work. Any ideas
>> how I can recover? Even if I have 1 bad disk, raid10 should have kept
>> my data safe no? Is there anything I can do to recover?
>
> Yes, btrfs raid10 should be fine with a single bad device. That's
> unlikely to be the issue.
I'm wondering about that.
The btrfs wiki says 'mostly ok' for raid10 and mentions btrfs needs to be
able to create 2 copies of a file to prevent going into an unreversable
read only.
To me that sounds like you are save against a single failing drive if you
had 5 to start with. Is that correct?
'mostly ok' is a bit useless message. We need to know what to do to be
safe, and to have defined procedures to prevent screwing up
'unrecoverably' when something bad happens.
Any advice?
> But you did well to bring up the balance. Have you tried mounting with
> the "skip_balance" mount option?
>
> Sometimes a balance will run into a previously undetected problem with
> the filesystem and crash. While mounting would otherwise still work, as
> soon as the filesystem goes active at the kernel level and before the
> mount call returns to userspace, the kernel will see the in-progress
> balance and attempt to continue it. But if it crashed while processing
> a particular block group (aka chunk), of course that's the first one in
> line to continue the balance with, which will naturally crash again as
> it comes to the same inconsistency that triggered the crash the first
> time.
>
> So the skip_balance mount option was invented to create a work-around
> and allow you to mount the filesystem again. =:^)
>
> The fact that it sits there for awhile trying to do IO on all devices
> before it crashes is another clue it's probably the resumed balance
> crashing things as it comes to the same inconsistency that triggered the
> original crash during balance, so it's very likely that skip_balance
> will help. =:^)
>
> Assuming that lets you mount, the next thing I'd try is a btrfs scrub.
> Chances are it'll find some checksum problems, but given that you're
> running raid10, there's a second copy it can try to use to correct the
> bad one and there's a reasonably good chance scrub will find and fix
> your problems. Even if it can't fix them all, it should get you closer,
> with less chance at making things worse instead of better than more
> risky options such as btrfs check with --repair.
>
> If a scrub completes with no uncorrected errors, I'd do an umount/mount
> cycle or reboot just to be sure -- don't forget the skip_balance option
> again tho -- and then, ensuring you're not doing anything that a crash
> would interrupt and have taken the opportunity presented to update your
> backups if you need to and assuming you consider the data worth more
> than the time/trouble/resources required for a backup, try a balance
> resume.
>
> Once the balance resume gets reasonably past the time it otherwise took
> to crash, you can reasonably assume you've safely corrected at least
> /that/ inconsistency, and hope the scrub took care of any others before
> you got to them.
>
> But of course all scrub does is verify checksums and where there's a
> second copy (as there is with dup, raid1 and raid10 modes) attempt a
> repair of the bad copy with the second one, of course verifying it as
> well in the process. If the second copy of that block is bad too or in
> cases where there isn't such a second copy, it'll detect but not be able
> to fix the block with a bad checksum, and if the block has a valid
> checksum but is logically invalid for other reasons, scrub won't detect
> it, because /all/ it does is verify checksums, not actual filesystem
> consistency. That's what the somewhat more risky (if --repair or other
> fix option is used, not in read-only mode, which detects but doesn't
> attempt to fix) btrfs check is for.
>
> So if skip_balance doesn't work, or it does but scrub can't fix all the
> errors it finds, or scrub fixes everything it detects but a balance
> resume still crashes, then it's time to try riskier fixes. I'll let
> others guide you there if needed, but will leave you with one
> reminder...
>
> Sysadmin's first rule of backups:
>
> Don't test fate and challenge reality! Have your backups or regardless
> of claims to the contrary you're defining your data as throw-away value,
> and eventually, fate and reality are going to call you on it!
>
> So don't worry too much even if you lose the filesystem. Either you
> have backups and can restore from them should it be necessary, or you
> defined the data as not worth the trouble of those backups, and losing
> it isn't a big deal, because in either case you saved what was truly
> important to you, either the data because it was important enough to you
> to have backups, or the time/resources/trouble you would have spent
> doing those backups, which you still saved regardless of whether you can
> save the data or not. =:^)
>
> --
> Duncan - List replies preferred. No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: raid10 array lost with single disk failure?
2017-07-09 23:13 ` Ferry Toth
@ 2017-07-10 2:13 ` Adam Bahe
2017-07-10 12:49 ` Austin S. Hemmelgarn
0 siblings, 1 reply; 7+ messages in thread
From: Adam Bahe @ 2017-07-10 2:13 UTC (permalink / raw)
To: linux-btrfs
I have finished all of the above suggestions, ran a scrub, remounted,
rebooted, made sure the system didn't hang, and then kicked off
another balance on the entire pool. It completed rather quickly but
something still does not seem right.
Label: 'btrfs_pool1' uuid: 04a7fa70-1572-47a2-a55c-7c99aef12603
Total devices 18 FS bytes used 23.64TiB
devid 1 size 1.82TiB used 1.82TiB path /dev/sdd
devid 2 size 1.82TiB used 1.82TiB path /dev/sdf
devid 3 size 3.64TiB used 3.07TiB path /dev/sdg
devid 4 size 3.64TiB used 3.06TiB path /dev/sdk
devid 5 size 1.82TiB used 1.82TiB path /dev/sdn
devid 6 size 3.64TiB used 3.06TiB path /dev/sdo
devid 7 size 1.82TiB used 1.82TiB path /dev/sds
devid 8 size 1.82TiB used 1.82TiB path /dev/sdj
devid 9 size 1.82TiB used 1.82TiB path /dev/sdi
devid 10 size 1.82TiB used 1.82TiB path /dev/sdq
devid 11 size 1.82TiB used 1.82TiB path /dev/sdr
devid 12 size 1.82TiB used 1.82TiB path /dev/sde
devid 13 size 1.82TiB used 1.82TiB path /dev/sdm
devid 14 size 7.28TiB used 4.78TiB path /dev/sdh
devid 15 size 7.28TiB used 4.99TiB path /dev/sdl
devid 16 size 7.28TiB used 4.97TiB path /dev/sdp
devid 17 size 7.28TiB used 4.99TiB path /dev/sdc
devid 18 size 5.46TiB used 210.12GiB path /dev/sdb
/dev/sdb is the new disk, but btrfs only moved 210.12GB over to it.
Most disks in the array are >50% utilized or more. Is this normal?
On Sun, Jul 9, 2017 at 6:13 PM, Ferry Toth <ftoth@exalondelft.nl> wrote:
> Op Sat, 08 Jul 2017 20:51:41 +0000, schreef Duncan:
>
>> Adam Bahe posted on Fri, 07 Jul 2017 23:26:31 -0500 as excerpted:
>>
>>> I did recently upgrade the kernel a few days ago from
>>> 4.8.7-1.el7.elrepo.x86_64 to 4.10.6-1.el7.elrepo.x86_64. I had also
>>> added a new 6TB disk a few days ago but I'm not sure if the balance
>>> finished as it locked up sometime today when I was at work. Any ideas
>>> how I can recover? Even if I have 1 bad disk, raid10 should have kept
>>> my data safe no? Is there anything I can do to recover?
>>
>> Yes, btrfs raid10 should be fine with a single bad device. That's
>> unlikely to be the issue.
>
> I'm wondering about that.
>
> The btrfs wiki says 'mostly ok' for raid10 and mentions btrfs needs to be
> able to create 2 copies of a file to prevent going into an unreversable
> read only.
>
> To me that sounds like you are save against a single failing drive if you
> had 5 to start with. Is that correct?
>
> 'mostly ok' is a bit useless message. We need to know what to do to be
> safe, and to have defined procedures to prevent screwing up
> 'unrecoverably' when something bad happens.
>
> Any advice?
>
>> But you did well to bring up the balance. Have you tried mounting with
>> the "skip_balance" mount option?
>>
>> Sometimes a balance will run into a previously undetected problem with
>> the filesystem and crash. While mounting would otherwise still work, as
>> soon as the filesystem goes active at the kernel level and before the
>> mount call returns to userspace, the kernel will see the in-progress
>> balance and attempt to continue it. But if it crashed while processing
>> a particular block group (aka chunk), of course that's the first one in
>> line to continue the balance with, which will naturally crash again as
>> it comes to the same inconsistency that triggered the crash the first
>> time.
>>
>> So the skip_balance mount option was invented to create a work-around
>> and allow you to mount the filesystem again. =:^)
>>
>> The fact that it sits there for awhile trying to do IO on all devices
>> before it crashes is another clue it's probably the resumed balance
>> crashing things as it comes to the same inconsistency that triggered the
>> original crash during balance, so it's very likely that skip_balance
>> will help. =:^)
>>
>> Assuming that lets you mount, the next thing I'd try is a btrfs scrub.
>> Chances are it'll find some checksum problems, but given that you're
>> running raid10, there's a second copy it can try to use to correct the
>> bad one and there's a reasonably good chance scrub will find and fix
>> your problems. Even if it can't fix them all, it should get you closer,
>> with less chance at making things worse instead of better than more
>> risky options such as btrfs check with --repair.
>>
>> If a scrub completes with no uncorrected errors, I'd do an umount/mount
>> cycle or reboot just to be sure -- don't forget the skip_balance option
>> again tho -- and then, ensuring you're not doing anything that a crash
>> would interrupt and have taken the opportunity presented to update your
>> backups if you need to and assuming you consider the data worth more
>> than the time/trouble/resources required for a backup, try a balance
>> resume.
>>
>> Once the balance resume gets reasonably past the time it otherwise took
>> to crash, you can reasonably assume you've safely corrected at least
>> /that/ inconsistency, and hope the scrub took care of any others before
>> you got to them.
>>
>> But of course all scrub does is verify checksums and where there's a
>> second copy (as there is with dup, raid1 and raid10 modes) attempt a
>> repair of the bad copy with the second one, of course verifying it as
>> well in the process. If the second copy of that block is bad too or in
>> cases where there isn't such a second copy, it'll detect but not be able
>> to fix the block with a bad checksum, and if the block has a valid
>> checksum but is logically invalid for other reasons, scrub won't detect
>> it, because /all/ it does is verify checksums, not actual filesystem
>> consistency. That's what the somewhat more risky (if --repair or other
>> fix option is used, not in read-only mode, which detects but doesn't
>> attempt to fix) btrfs check is for.
>>
>> So if skip_balance doesn't work, or it does but scrub can't fix all the
>> errors it finds, or scrub fixes everything it detects but a balance
>> resume still crashes, then it's time to try riskier fixes. I'll let
>> others guide you there if needed, but will leave you with one
>> reminder...
>>
>> Sysadmin's first rule of backups:
>>
>> Don't test fate and challenge reality! Have your backups or regardless
>> of claims to the contrary you're defining your data as throw-away value,
>> and eventually, fate and reality are going to call you on it!
>>
>> So don't worry too much even if you lose the filesystem. Either you
>> have backups and can restore from them should it be necessary, or you
>> defined the data as not worth the trouble of those backups, and losing
>> it isn't a big deal, because in either case you saved what was truly
>> important to you, either the data because it was important enough to you
>> to have backups, or the time/resources/trouble you would have spent
>> doing those backups, which you still saved regardless of whether you can
>> save the data or not. =:^)
>>
>> --
>> Duncan - List replies preferred. No HTML msgs.
>> "Every nonfree program has a lord, a master --
>> and if you use the program, he is your master." Richard Stallman
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: raid10 array lost with single disk failure?
2017-07-10 2:13 ` Adam Bahe
@ 2017-07-10 12:49 ` Austin S. Hemmelgarn
0 siblings, 0 replies; 7+ messages in thread
From: Austin S. Hemmelgarn @ 2017-07-10 12:49 UTC (permalink / raw)
To: Adam Bahe, linux-btrfs
On 2017-07-09 22:13, Adam Bahe wrote:
> I have finished all of the above suggestions, ran a scrub, remounted,
> rebooted, made sure the system didn't hang, and then kicked off
> another balance on the entire pool. It completed rather quickly but
> something still does not seem right.
>
> Label: 'btrfs_pool1' uuid: 04a7fa70-1572-47a2-a55c-7c99aef12603
> Total devices 18 FS bytes used 23.64TiB
> devid 1 size 1.82TiB used 1.82TiB path /dev/sdd
> devid 2 size 1.82TiB used 1.82TiB path /dev/sdf
> devid 3 size 3.64TiB used 3.07TiB path /dev/sdg
> devid 4 size 3.64TiB used 3.06TiB path /dev/sdk
> devid 5 size 1.82TiB used 1.82TiB path /dev/sdn
> devid 6 size 3.64TiB used 3.06TiB path /dev/sdo
> devid 7 size 1.82TiB used 1.82TiB path /dev/sds
> devid 8 size 1.82TiB used 1.82TiB path /dev/sdj
> devid 9 size 1.82TiB used 1.82TiB path /dev/sdi
> devid 10 size 1.82TiB used 1.82TiB path /dev/sdq
> devid 11 size 1.82TiB used 1.82TiB path /dev/sdr
> devid 12 size 1.82TiB used 1.82TiB path /dev/sde
> devid 13 size 1.82TiB used 1.82TiB path /dev/sdm
> devid 14 size 7.28TiB used 4.78TiB path /dev/sdh
> devid 15 size 7.28TiB used 4.99TiB path /dev/sdl
> devid 16 size 7.28TiB used 4.97TiB path /dev/sdp
> devid 17 size 7.28TiB used 4.99TiB path /dev/sdc
> devid 18 size 5.46TiB used 210.12GiB path /dev/sdb
>
> /dev/sdb is the new disk, but btrfs only moved 210.12GB over to it.
> Most disks in the array are >50% utilized or more. Is this normal?
>
Was this from a full balance, or just running a scrub to repair chunks?
You have three ways you can repair a BTRFS volume that's lost a device:
* The first, quickest, and most reliable is to use `btrfs device
replace` to replace the failing/missing device. This will result in
only reading data that needs to be read to go on the new device, so it
completes quicker, but you will also need to resize the new device if
you are going to a larger device, and can't replace the missing device
with a smaller one.
* The second is to add the device to the array, then run a scrub on the
whole array. This will spit out a bunch of errors from the chunks that
need to be rebuilt, but will make sure everything is consistent. This
isn't as fast as using `device replace`, but is still quicker than a
full balance most of the time. In this particular case, I would expect
behavior like what you're seeing above at least some of the time.
* The third, and slowest method, is to add the new device, then run a
full balance. This will make sure data is evenly distributed
proportionate to device size and will rebuild all the partial chunks.
It will also take the longest, and put significantly more stress on the
array than the other two options (it rewrites the entire array). If
this is what you used, then you probably found a bug, because it should
never result in what you're seeing.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-07-10 12:49 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-08 4:26 raid10 array lost with single disk failure? Adam Bahe
2017-07-08 4:40 ` Adam Bahe
2017-07-08 19:37 ` Duncan
2017-07-08 20:51 ` Duncan
2017-07-09 23:13 ` Ferry Toth
2017-07-10 2:13 ` Adam Bahe
2017-07-10 12:49 ` Austin S. Hemmelgarn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).