linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* btrfs replace performance with missing drive
@ 2016-07-14 11:18 Sébastien Luttringer
  2016-07-14 11:54 ` Steven Haigh
  2016-07-14 12:01 ` Duncan
  0 siblings, 2 replies; 3+ messages in thread
From: Sébastien Luttringer @ 2016-07-14 11:18 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2775 bytes --]

Hello,

I have a performance issue with «btrfs replace» with raid5 and a _missing_
device. My btrfs rely on 6x4TB HDD and the operating system is an Archlinux.

In a nutshell, I will need 23 to 46 days to replace on missing disk.

# btrfs fi sh /home
Label: 'raptor.home'  uuid: 8739c8b2-110b-44ac-8b4d-285ad06ee446
        Total devices 7 FS bytes used 14.60TiB
        devid    0 size 3.64TiB used 2.80TiB path /dev/sdf
        devid    3 size 3.64TiB used 2.97TiB path /dev/sdh
        devid    5 size 3.64TiB used 2.97TiB path /dev/sdc
        devid    6 size 3.64TiB used 2.97TiB path /dev/sdd
        devid    7 size 3.64TiB used 2.97TiB path /dev/sde
        devid    8 size 3.64TiB used 2.97TiB path /dev/sdg
        *** Some devices missing


At a disk full speed (100 MB/s) replacing the missing disk (4 TB) should take
around 8 hours. With the same disk model and same HBA card in another computer
with a mdadm/raid5, I could verify this duration could be reach.

I also tested a «btrfs replace» without a missing disk and the speed was not so
bad. Somewhere around half disk speed (50-60MB/s). Performances are under
mdadm.

But, in my case, the drive is pass away, I can't use it as source of the
replace, so I have a replace speed of 1-2MB/s ! Which mean between 23-46 days
with bad usage performance and security risk.

I tried to upgrade the kernel to the latest (4.7-rc6) but it's not better in
performance. I got some crash during replace with 4.6.0 which vanish with the
last rc.

# iostat -md  
Linux 4.7.0-rc6-seblu
(raptor.seblu.net)        14/07/2016      _x86_64_        (4 CPU)

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdc             356,75        22,51         0,14    9132054      58427
sdd             356,27        22,51         0,14    9131612      57094
sde             361,53        22,52         0,14    9132207      57245
sdf             362,78         0,00         1,81          4     735786
sdg             357,82        22,51         0,14    9131763      58323
sdh             325,25        22,52         0,14    9132715      58355


So I have a really poor performance in rebuilding a raid5 mostly when the
replaced device is missing.
Is there a parameter to tweak of something I can do to improve the replace ? 

Regards,

-- 
Sébastien "Seblu" Luttringer
https://seblu.net | Twitter: @seblu42
GPG: 0x2072D77A

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: btrfs replace performance with missing drive
  2016-07-14 11:18 btrfs replace performance with missing drive Sébastien Luttringer
@ 2016-07-14 11:54 ` Steven Haigh
  2016-07-14 12:01 ` Duncan
  1 sibling, 0 replies; 3+ messages in thread
From: Steven Haigh @ 2016-07-14 11:54 UTC (permalink / raw)
  To: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3027 bytes --]

Pray that you have no issue with anything in the short term and that you
don't lose power to the system while it is going on.

I did exactly as you are now and ended up with a corrupted filesystem
due to what you are seeing.

DO NOT interrupt it, or you may have big problems with filesystem
integrity afterwards.

See my previous posts to this list for details on what happened to me.

On 14/07/2016 9:18 PM, Sébastien Luttringer wrote:
> Hello,
> 
> I have a performance issue with «btrfs replace» with raid5 and a _missing_
> device. My btrfs rely on 6x4TB HDD and the operating system is an Archlinux.
> 
> In a nutshell, I will need 23 to 46 days to replace on missing disk.
> 
> # btrfs fi sh /home
> Label: 'raptor.home'  uuid: 8739c8b2-110b-44ac-8b4d-285ad06ee446
>         Total devices 7 FS bytes used 14.60TiB
>         devid    0 size 3.64TiB used 2.80TiB path /dev/sdf
>         devid    3 size 3.64TiB used 2.97TiB path /dev/sdh
>         devid    5 size 3.64TiB used 2.97TiB path /dev/sdc
>         devid    6 size 3.64TiB used 2.97TiB path /dev/sdd
>         devid    7 size 3.64TiB used 2.97TiB path /dev/sde
>         devid    8 size 3.64TiB used 2.97TiB path /dev/sdg
>         *** Some devices missing
> 
> 
> At a disk full speed (100 MB/s) replacing the missing disk (4 TB) should take
> around 8 hours. With the same disk model and same HBA card in another computer
> with a mdadm/raid5, I could verify this duration could be reach.
> 
> I also tested a «btrfs replace» without a missing disk and the speed was not so
> bad. Somewhere around half disk speed (50-60MB/s). Performances are under
> mdadm.
> 
> But, in my case, the drive is pass away, I can't use it as source of the
> replace, so I have a replace speed of 1-2MB/s ! Which mean between 23-46 days
> with bad usage performance and security risk.
> 
> I tried to upgrade the kernel to the latest (4.7-rc6) but it's not better in
> performance. I got some crash during replace with 4.6.0 which vanish with the
> last rc.
> 
> # iostat -md  
> Linux 4.7.0-rc6-seblu
> (raptor.seblu.net)        14/07/2016      _x86_64_        (4 CPU)
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdc             356,75        22,51         0,14    9132054      58427
> sdd             356,27        22,51         0,14    9131612      57094
> sde             361,53        22,52         0,14    9132207      57245
> sdf             362,78         0,00         1,81          4     735786
> sdg             357,82        22,51         0,14    9131763      58323
> sdh             325,25        22,52         0,14    9132715      58355
> 
> 
> So I have a really poor performance in rebuilding a raid5 mostly when the
> replaced device is missing.
> Is there a parameter to tweak of something I can do to improve the replace ? 
> 
> Regards,
> 

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: btrfs replace performance with missing drive
  2016-07-14 11:18 btrfs replace performance with missing drive Sébastien Luttringer
  2016-07-14 11:54 ` Steven Haigh
@ 2016-07-14 12:01 ` Duncan
  1 sibling, 0 replies; 3+ messages in thread
From: Duncan @ 2016-07-14 12:01 UTC (permalink / raw)
  To: linux-btrfs

Sébastien Luttringer posted on Thu, 14 Jul 2016 13:18:49 +0200 as
excerpted:

> I have a performance issue with «btrfs replace» with raid5 and a 
_missing_
> device. My btrfs rely on 6x4TB HDD and the operating system is an 
Archlinux.
> 
> In a nutshell, I will need 23 to 46 days to replace on missing disk.


If you're still posting about this, it means you haven't been keeping up 
with list discussion over the last couple weeks and thus have missed the 
following.  I'll leave it to you to find the threads and get the details 
if you want, but here's the basics:

1) Btrfs raid56 mode has never really gotten to the stability level of 
the rest of btrfs, and recently a couple of fundamental defects in the 
current implementation have come to light, that unfortunately might well 
require a full rewrite to correct.

As such, btrfs raid56 mode, while never recommended except for those 
willing to be on the bleeding edge, is now negatively recommended, with 
the recommendation for those already using it being to switch to 
something more stable at their soonest convenience and to *ensure* that 
they either have backups or simply don't care about losing the data in 
the mean time.

2) Replace's often impractically slow performance in raid56 mode with a 
missing device was one of the known bugs keeping raid56 mode from being 
considered stable even before the above mentioned fundamental defects had 
come to light.  As such, for raid56 mode with a missing device, btrfs 
device add of the replacement, followed by btrfs device delete of the 
missing/failed drive (which forces a rebalance as part of the device 
delete), seems to be much faster, and is recommended, for raid56 mode 
with a missing device only, instead of btrfs replace.

So there's a workaround for your immediately reported problem, but that 
doesn't change the fact that there are fundamental issues with the 
current implementation, and that as such, getting off of raid56 mode as 
soon as convenient, and ensuring good backups of anything you don't want 
to lose in the mean time, is now recommended.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-07-14 12:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-14 11:18 btrfs replace performance with missing drive Sébastien Luttringer
2016-07-14 11:54 ` Steven Haigh
2016-07-14 12:01 ` Duncan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).