* The performance is not as expected when used several disks on raid0.
@ 2015-08-14 15:16 Eduardo Bach
2015-08-14 16:30 ` Calvin Walton
2015-08-14 18:31 ` Chris Murphy
0 siblings, 2 replies; 15+ messages in thread
From: Eduardo Bach @ 2015-08-14 15:16 UTC (permalink / raw)
To: linux-btrfs
Hi all,
This is my first email to this list, so please excuse any gaffe.
I am in the evaluation early stages of a new storage, an SGI MIS,
currently with two HBAs LSI and 32 disks.
The hba controllers are LSI 9207-8i and the disks are Seagate 6TB,
model ST6000NM0004-1FT17Z.
To evaluate the performance I am using IOzone over a raid0 using all
the 32 disks, with the parameters: iozone -i0 -i1 -t5 -s 20G -P0.
With btrfs the result approaches 3.5GB/s. When using mdadm+xfs the
result reaches 6gb/s, which is the expected value when compared with
parallel dd made on discs.
When used btrfs with only half of the disc the result is about 3GB/s.
More information:
# uname -a
Linux spstrg13 4.2.0-999-generic #201508132200 SMP Fri Aug 14 02:01:52
UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
# btrfs --version
btrfs-progs v4.0
# btrfs fi show
Label: none uuid: be2a5671-87d1-4b89-ac4a-04efabb5912f
Total devices 32 FS bytes used 3.66MiB
devid 1 size 5.46TiB used 1.07GiB path /dev/sdc
devid 2 size 5.46TiB used 1.06GiB path /dev/sdd
devid 3 size 5.46TiB used 1.06GiB path /dev/sde
devid 4 size 5.46TiB used 1.06GiB path /dev/sdf
devid 5 size 5.46TiB used 1.06GiB path /dev/sdg
devid 6 size 5.46TiB used 1.06GiB path /dev/sdh
devid 7 size 5.46TiB used 1.06GiB path /dev/sdi
devid 8 size 5.46TiB used 1.06GiB path /dev/sdj
devid 9 size 5.46TiB used 1.06GiB path /dev/sdk
devid 10 size 5.46TiB used 1.06GiB path /dev/sdl
devid 11 size 5.46TiB used 1.06GiB path /dev/sdm
devid 12 size 5.46TiB used 1.06GiB path /dev/sdn
devid 13 size 5.46TiB used 1.06GiB path /dev/sdo
devid 14 size 5.46TiB used 1.06GiB path /dev/sdp
devid 15 size 5.46TiB used 1.06GiB path /dev/sdq
devid 16 size 5.46TiB used 1.06GiB path /dev/sdr
devid 17 size 5.46TiB used 1.06GiB path /dev/sds
devid 18 size 5.46TiB used 1.06GiB path /dev/sdt
devid 19 size 5.46TiB used 1.06GiB path /dev/sdu
devid 20 size 5.46TiB used 1.06GiB path /dev/sdv
devid 21 size 5.46TiB used 1.06GiB path /dev/sdw
devid 22 size 5.46TiB used 1.06GiB path /dev/sdx
devid 23 size 5.46TiB used 1.06GiB path /dev/sdy
devid 24 size 5.46TiB used 1.06GiB path /dev/sdz
devid 25 size 5.46TiB used 1.06GiB path /dev/sdaa
devid 26 size 5.46TiB used 1.06GiB path /dev/sdab
devid 27 size 5.46TiB used 1.06GiB path /dev/sdac
devid 28 size 5.46TiB used 1.06GiB path /dev/sdad
devid 29 size 5.46TiB used 1.06GiB path /dev/sdae
devid 30 size 5.46TiB used 1.06GiB path /dev/sdaf
devid 31 size 5.46TiB used 1.06GiB path /dev/sdag
devid 32 size 5.46TiB used 1.06GiB path /dev/sdah
btrfs-progs v4.0
# btrfs fi df /root/backup/root/storageTestes/mbtr
Data, RAID0: total=30.00GiB, used=3.50MiB
System, RAID0: total=32.00MiB, used=16.00KiB
Metadata, RAID0: total=4.00GiB, used=128.00KiB
Metadata, single: total=8.00MiB, used=16.00KiB
GlobalReserve, single: total=16.00MiB, used=0.00B
The dmesg is attached.
The result are about the same using kernel 3.16 and btrfs tools 3.12.
I am far from be able to isolate the problem, so please ask me any
information you think is relevant.
Thanks in advance.
Eduardo.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The performance is not as expected when used several disks on raid0.
2015-08-14 15:16 The performance is not as expected when used several disks on raid0 Eduardo Bach
@ 2015-08-14 16:30 ` Calvin Walton
2015-08-14 16:35 ` Calvin Walton
2015-08-14 18:31 ` Chris Murphy
1 sibling, 1 reply; 15+ messages in thread
From: Calvin Walton @ 2015-08-14 16:30 UTC (permalink / raw)
To: Eduardo Bach, linux-btrfs
On Fri, 2015-08-14 at 12:16 -0300, Eduardo Bach wrote:
> Hi all,
>
> This is my first email to this list, so please excuse any gaffe.
>
> I am in the evaluation early stages of a new storage, an SGI MIS,
> currently with two HBAs LSI and 32 disks.
> The hba controllers are LSI 9207-8i and the disks are Seagate 6TB,
> model ST6000NM0004-1FT17Z.
>
> To evaluate the performance I am using IOzone over a raid0 using all
> the 32 disks, with the parameters: iozone -i0 -i1 -t5 -s 20G -P0.
>
> With btrfs the result approaches 3.5GB/s. When using mdadm+xfs the
> result reaches 6gb/s, which is the expected value when compared with
> parallel dd made on discs.
> When used btrfs with only half of the disc the result is about 3GB/s.
There's two things in particular to pay attention with on btrfs with
this sort of setup right now:
1. btrfs's "raid0" is not an n-way stripe; it's a 2-way stripe only. (n
-way stripe is a long requested feature, but there is no timeline on
its completion) A single-threaded disk write will only ever be
writing to two disks at the same time. The total throughput you get
for multithreaded writes is up to which blocks the allocator happens
to pick; it will probably often happen that multiple threads will
both be using the same chunk, sharing IO from only 2 disks.
2. Btrfs development is currently primarily focused on functionality
over performance. There's several places where placeholder or
untuned algorithms are used (e.g. the multi-mirror io read
scheduling just does pid % number_of_mirrors to pick a mirror).
This kind of a performance difference on large performance-oriented
RAID systems between btrfs's built-in raid and mdadm is interesting to
see, but for the moment I'd say it's mostly expected.
One of the developers here might have some more precise information on
exactly why you're seeing such a performance difference.
As an aside, you have 192TB in RAID0? That's certainly pretty
impressive, but as soon as one disk dies, you're going to lose a *lot*
of data.
--
Calvin Walton <calvin.walton@kepstin.ca>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The performance is not as expected when used several disks on raid0.
2015-08-14 16:30 ` Calvin Walton
@ 2015-08-14 16:35 ` Calvin Walton
2015-08-17 19:44 ` Eduardo Bach
0 siblings, 1 reply; 15+ messages in thread
From: Calvin Walton @ 2015-08-14 16:35 UTC (permalink / raw)
To: Eduardo Bach, linux-btrfs
On Fri, 2015-08-14 at 12:30 -0400, Calvin Walton wrote:
> On Fri, 2015-08-14 at 12:16 -0300, Eduardo Bach wrote:
> > Hi all,
> >
> > This is my first email to this list, so please excuse any gaffe.
> >
> > I am in the evaluation early stages of a new storage, an SGI MIS,
> > currently with two HBAs LSI and 32 disks.
> > The hba controllers are LSI 9207-8i and the disks are Seagate 6TB,
> > model ST6000NM0004-1FT17Z.
> >
> > To evaluate the performance I am using IOzone over a raid0 using
> > all
> > the 32 disks, with the parameters: iozone -i0 -i1 -t5 -s 20G -P0.
> >
> > With btrfs the result approaches 3.5GB/s. When using mdadm+xfs the
> > result reaches 6gb/s, which is the expected value when compared
> > with
> > parallel dd made on discs.
> > When used btrfs with only half of the disc the result is about
> > 3GB/s.
>
> There's two things in particular to pay attention with on btrfs with
> this sort of setup right now:
Umm, Ok, I made a mistake. You can ignore paragraph #1 - I got some
details about the btrfs raid1 and raid0 modes mixed up!
Btrfs RAID0 is n-way striping across all available drives which have
room for allocations.
> 1. btrfs's "raid0" is not an n-way stripe; it's a 2-way stripe
> only. (n
> -way stripe is a long requested feature, but there is no
> timeline on
> its completion) A single-threaded disk write will only ever be
> writing to two disks at the same time. The total throughput you
> get
> for multithreaded writes is up to which blocks the allocator
> happens
> to pick; it will probably often happen that multiple threads
> will
> both be using the same chunk, sharing IO from only 2 disks.
> 2. Btrfs development is currently primarily focused on
> functionality
> over performance. There's several places where placeholder or
> untuned algorithms are used (e.g. the multi-mirror io read
> scheduling just does pid % number_of_mirrors to pick a mirror).
>
> This kind of a performance difference on large performance-oriented
> RAID systems between btrfs's built-in raid and mdadm is interesting
> to
> see, but for the moment I'd say it's mostly expected.
>
> One of the developers here might have some more precise information
> on
> exactly why you're seeing such a performance difference.
>
> As an aside, you have 192TB in RAID0? That's certainly pretty
> impressive, but as soon as one disk dies, you're going to lose a
> *lot*
> of data.
>
--
Calvin Walton <calvin.walton@kepstin.ca>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The performance is not as expected when used several disks on raid0.
2015-08-14 15:16 The performance is not as expected when used several disks on raid0 Eduardo Bach
2015-08-14 16:30 ` Calvin Walton
@ 2015-08-14 18:31 ` Chris Murphy
2015-08-14 19:50 ` Austin S Hemmelgarn
2015-08-17 19:57 ` Eduardo Bach
1 sibling, 2 replies; 15+ messages in thread
From: Chris Murphy @ 2015-08-14 18:31 UTC (permalink / raw)
To: Eduardo Bach, Btrfs BTRFS
On Fri, Aug 14, 2015 at 9:16 AM, Eduardo Bach <hellbach@gmail.com> wrote:
> With btrfs the result approaches 3.5GB/s. When using mdadm+xfs the
> result reaches 6gb/s, which is the expected value when compared with
> parallel dd made on discs.
mdadm with what chunk (strip) size? The default for mdadm is 512KiB.
On Btrfs it's fixed at 64KiB. While testing with 64KiB chunk with XFS
on md RAID might improve its performance relative to Btrfs, at least
it's a more apples to apples comparison.
--
Chris Murphy
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The performance is not as expected when used several disks on raid0.
2015-08-14 18:31 ` Chris Murphy
@ 2015-08-14 19:50 ` Austin S Hemmelgarn
2015-08-14 19:54 ` Chris Murphy
2015-08-17 19:57 ` Eduardo Bach
1 sibling, 1 reply; 15+ messages in thread
From: Austin S Hemmelgarn @ 2015-08-14 19:50 UTC (permalink / raw)
To: Chris Murphy, Eduardo Bach, Btrfs BTRFS
[-- Attachment #1: Type: text/plain, Size: 747 bytes --]
On 2015-08-14 14:31, Chris Murphy wrote:
> On Fri, Aug 14, 2015 at 9:16 AM, Eduardo Bach <hellbach@gmail.com> wrote:
>
>> With btrfs the result approaches 3.5GB/s. When using mdadm+xfs the
>> result reaches 6gb/s, which is the expected value when compared with
>> parallel dd made on discs.
>
> mdadm with what chunk (strip) size? The default for mdadm is 512KiB.
> On Btrfs it's fixed at 64KiB. While testing with 64KiB chunk with XFS
> on md RAID might improve its performance relative to Btrfs, at least
> it's a more apples to apples comparison.
>
I have a feeling that XFS will still win this. It is one of the slower
filesystems for Linux, but it still beats BTRFS senseless when it comes
to performance as of right now.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The performance is not as expected when used several disks on raid0.
2015-08-14 19:50 ` Austin S Hemmelgarn
@ 2015-08-14 19:54 ` Chris Murphy
2015-08-14 19:58 ` Austin S Hemmelgarn
0 siblings, 1 reply; 15+ messages in thread
From: Chris Murphy @ 2015-08-14 19:54 UTC (permalink / raw)
To: Austin S Hemmelgarn, Btrfs BTRFS
On Fri, Aug 14, 2015 at 1:50 PM, Austin S Hemmelgarn
<ahferroin7@gmail.com> wrote:
> On 2015-08-14 14:31, Chris Murphy wrote:
>>
>> On Fri, Aug 14, 2015 at 9:16 AM, Eduardo Bach <hellbach@gmail.com> wrote:
>>
>>> With btrfs the result approaches 3.5GB/s. When using mdadm+xfs the
>>> result reaches 6gb/s, which is the expected value when compared with
>>> parallel dd made on discs.
>>
>>
>> mdadm with what chunk (strip) size? The default for mdadm is 512KiB.
>> On Btrfs it's fixed at 64KiB. While testing with 64KiB chunk with XFS
>> on md RAID might improve its performance relative to Btrfs, at least
>> it's a more apples to apples comparison.
>>
> I have a feeling that XFS will still win this. It is one of the slower
> filesystems for Linux, but it still beats BTRFS senseless when it comes to
> performance as of right now.
Yeah I was suggesting with a 64KiB chunk the XFS case might get even faster.
--
Chris Murphy
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The performance is not as expected when used several disks on raid0.
2015-08-14 19:54 ` Chris Murphy
@ 2015-08-14 19:58 ` Austin S Hemmelgarn
2015-08-15 6:30 ` Duncan
0 siblings, 1 reply; 15+ messages in thread
From: Austin S Hemmelgarn @ 2015-08-14 19:58 UTC (permalink / raw)
To: Chris Murphy, Btrfs BTRFS
[-- Attachment #1: Type: text/plain, Size: 1405 bytes --]
On 2015-08-14 15:54, Chris Murphy wrote:
> On Fri, Aug 14, 2015 at 1:50 PM, Austin S Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>> On 2015-08-14 14:31, Chris Murphy wrote:
>>>
>>> On Fri, Aug 14, 2015 at 9:16 AM, Eduardo Bach <hellbach@gmail.com> wrote:
>>>
>>>> With btrfs the result approaches 3.5GB/s. When using mdadm+xfs the
>>>> result reaches 6gb/s, which is the expected value when compared with
>>>> parallel dd made on discs.
>>>
>>>
>>> mdadm with what chunk (strip) size? The default for mdadm is 512KiB.
>>> On Btrfs it's fixed at 64KiB. While testing with 64KiB chunk with XFS
>>> on md RAID might improve its performance relative to Btrfs, at least
>>> it's a more apples to apples comparison.
>>>
>> I have a feeling that XFS will still win this. It is one of the slower
>> filesystems for Linux, but it still beats BTRFS senseless when it comes to
>> performance as of right now.
>
> Yeah I was suggesting with a 64KiB chunk the XFS case might get even faster.
>
>
Ah, misunderstood what you meant. Yeah, that will almost certainly make
things faster for XFS.
FWIW, running BTRFS on top of MDRAID actually works very well,
especially for BTRFS raid1 on top of MD-RAID0 (I get an almost 50%
performance increase for this usage over BTRFS raid10, although most of
this is probably due to how btrfs dispatches I/O's to disks in
multi-disk stetups).
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The performance is not as expected when used several disks on raid0.
2015-08-14 19:58 ` Austin S Hemmelgarn
@ 2015-08-15 6:30 ` Duncan
2015-08-17 11:38 ` Austin S Hemmelgarn
0 siblings, 1 reply; 15+ messages in thread
From: Duncan @ 2015-08-15 6:30 UTC (permalink / raw)
To: linux-btrfs
Austin S Hemmelgarn posted on Fri, 14 Aug 2015 15:58:30 -0400 as
excerpted:
> FWIW, running BTRFS on top of MDRAID actually works very well,
> especially for BTRFS raid1 on top of MD-RAID0 (I get an almost 50%
> performance increase for this usage over BTRFS raid10, although most of
> this is probably due to how btrfs dispatches I/O's to disks in
> multi-disk stetups).
Of course that's effectively a raid01, which is normally supposed to most
often be a mistakenly reversed raid10 implementation, mistakenly, due to
the IO cost of the rebuild should a device fail, since the whole raid0 of
the one raid1 side would have to be rereplicated to the other, vs only
having to rereplicate one device to the other locally, in a raid10
arrangement.
However, in this case it's a very smart arrangement, actually, the only
md-raid-under-btrfs-raid arrangement that makes real sense (well, other
than raid00, raid0 at both levels, perhaps), in particular because the
btrfs raid1 on top still gives you the full benefit of btrfs file
integrity features as well as the usual raid1 redundancy, tho in this
case it's only at the one raid0 against the other as the pair of btrfs
raid1 copies. And the mdraid0 is much better optimized than btrfs raid0,
so there's that bonus, while at the same time the btrfs raid1 redundancy
nicely balances the usual "Russian Roulette" quality of raid0.
Very nice configuration! =:^)
Thanks for mentioning it, as I guess I was effectively ruling it out as
an option before even really considering it due to the usual raid10's
better than raid01 thing, and thus was entirely blind to the
possibility. Which was bad, because as I alluded to, mdraid's lack of
file integrity features and thus lack of any way to have btrfs scrub
properly filter down to the mdraid level when there's mdraid level
redundancy, kind of makes a mess of things, otherwise. But btrfs raid1
on mdraid0 effectively balances and eliminates the negatives at each
level with the strengths of the other level, and is really a quite
awesome solution, that until now I was entirely blinded to! =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The performance is not as expected when used several disks on raid0.
2015-08-15 6:30 ` Duncan
@ 2015-08-17 11:38 ` Austin S Hemmelgarn
2015-08-17 23:06 ` Duncan
0 siblings, 1 reply; 15+ messages in thread
From: Austin S Hemmelgarn @ 2015-08-17 11:38 UTC (permalink / raw)
To: Duncan, linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 2776 bytes --]
On 2015-08-15 02:30, Duncan wrote:
> Austin S Hemmelgarn posted on Fri, 14 Aug 2015 15:58:30 -0400 as
> excerpted:
>
>> FWIW, running BTRFS on top of MDRAID actually works very well,
>> especially for BTRFS raid1 on top of MD-RAID0 (I get an almost 50%
>> performance increase for this usage over BTRFS raid10, although most of
>> this is probably due to how btrfs dispatches I/O's to disks in
>> multi-disk stetups).
>
> Of course that's effectively a raid01, which is normally supposed to most
> often be a mistakenly reversed raid10 implementation, mistakenly, due to
> the IO cost of the rebuild should a device fail, since the whole raid0 of
> the one raid1 side would have to be rereplicated to the other, vs only
> having to rereplicate one device to the other locally, in a raid10
> arrangement.
>
> However, in this case it's a very smart arrangement, actually, the only
> md-raid-under-btrfs-raid arrangement that makes real sense (well, other
> than raid00, raid0 at both levels, perhaps), in particular because the
> btrfs raid1 on top still gives you the full benefit of btrfs file
> integrity features as well as the usual raid1 redundancy, tho in this
> case it's only at the one raid0 against the other as the pair of btrfs
> raid1 copies. And the mdraid0 is much better optimized than btrfs raid0,
> so there's that bonus, while at the same time the btrfs raid1 redundancy
> nicely balances the usual "Russian Roulette" quality of raid0.
>
> Very nice configuration! =:^)
>
> Thanks for mentioning it, as I guess I was effectively ruling it out as
> an option before even really considering it due to the usual raid10's
> better than raid01 thing, and thus was entirely blind to the
> possibility. Which was bad, because as I alluded to, mdraid's lack of
> file integrity features and thus lack of any way to have btrfs scrub
> properly filter down to the mdraid level when there's mdraid level
> redundancy, kind of makes a mess of things, otherwise. But btrfs raid1
> on mdraid0 effectively balances and eliminates the negatives at each
> level with the strengths of the other level, and is really a quite
> awesome solution, that until now I was entirely blinded to! =:^)
>
I've also found that BTRFS raid5/6 on top of MD RAID0 mitigates (to a
certain extent that is) the performance penalty of doing raid5/6 if you
aren't on ridiculously fast storage, probably not something that should
be used in production yet, but it's how I've got the near-line backups
setup on my home server system. It may also be worth pointing out that
BTRFS raid6 lets you use 4 disks minimum, as opposed to most other raid6
implementations that (unnecessarily, as a 4 disk RAID6 is not a
degenerate form) require 5.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The performance is not as expected when used several disks on raid0.
2015-08-14 16:35 ` Calvin Walton
@ 2015-08-17 19:44 ` Eduardo Bach
2015-08-17 20:36 ` Calvin Walton
0 siblings, 1 reply; 15+ messages in thread
From: Eduardo Bach @ 2015-08-17 19:44 UTC (permalink / raw)
To: Calvin Walton; +Cc: linux-btrfs
Hi Calvin.
thanks a lot for the quick answer and sorry for my delayed to reply.
We got some security issues at some machines. I will answer almost al
the replies below.
Yes raid0 is huge risk. This setup is just for performance demos and
other very specific occasions.
I understand the the need of the development be focused on stability now.
Based on previous testing with a smaller number of disk I'm suspecting
that the 32 disks are not all being used. With 12 discs I got more
speed with btrfs thanmdadm+xfs. With, btrfs, 12 disks and large files
we got the entire theoretical speed, 12 x 200MB/s per disk. My hope
was to get some light from you guys to debug the problem so the btrfs
use the 32 discs (assuming this is the problem). Perhaps the debug
this problem may be of interest to devs?
Thanks again.
Eduardo.
2015-08-14 13:35 GMT-03:00 Calvin Walton <calvin.walton@kepstin.ca>:
> On Fri, 2015-08-14 at 12:30 -0400, Calvin Walton wrote:
>> On Fri, 2015-08-14 at 12:16 -0300, Eduardo Bach wrote:
>> > Hi all,
>> >
>> > This is my first email to this list, so please excuse any gaffe.
>> >
>> > I am in the evaluation early stages of a new storage, an SGI MIS,
>> > currently with two HBAs LSI and 32 disks.
>> > The hba controllers are LSI 9207-8i and the disks are Seagate 6TB,
>> > model ST6000NM0004-1FT17Z.
>> >
>> > To evaluate the performance I am using IOzone over a raid0 using
>> > all
>> > the 32 disks, with the parameters: iozone -i0 -i1 -t5 -s 20G -P0.
>> >
>> > With btrfs the result approaches 3.5GB/s. When using mdadm+xfs the
>> > result reaches 6gb/s, which is the expected value when compared
>> > with
>> > parallel dd made on discs.
>> > When used btrfs with only half of the disc the result is about
>> > 3GB/s.
>>
>> There's two things in particular to pay attention with on btrfs with
>> this sort of setup right now:
>
> Umm, Ok, I made a mistake. You can ignore paragraph #1 - I got some
> details about the btrfs raid1 and raid0 modes mixed up!
> Btrfs RAID0 is n-way striping across all available drives which have
> room for allocations.
>
>> 1. btrfs's "raid0" is not an n-way stripe; it's a 2-way stripe
>> only. (n
>> -way stripe is a long requested feature, but there is no
>> timeline on
>> its completion) A single-threaded disk write will only ever be
>> writing to two disks at the same time. The total throughput you
>> get
>> for multithreaded writes is up to which blocks the allocator
>> happens
>> to pick; it will probably often happen that multiple threads
>> will
>> both be using the same chunk, sharing IO from only 2 disks.
>> 2. Btrfs development is currently primarily focused on
>> functionality
>> over performance. There's several places where placeholder or
>> untuned algorithms are used (e.g. the multi-mirror io read
>> scheduling just does pid % number_of_mirrors to pick a mirror).
>>
>> This kind of a performance difference on large performance-oriented
>> RAID systems between btrfs's built-in raid and mdadm is interesting
>> to
>> see, but for the moment I'd say it's mostly expected.
>>
>> One of the developers here might have some more precise information
>> on
>> exactly why you're seeing such a performance difference.
>>
>> As an aside, you have 192TB in RAID0? That's certainly pretty
>> impressive, but as soon as one disk dies, you're going to lose a
>> *lot*
>> of data.
>>
>
> --
> Calvin Walton <calvin.walton@kepstin.ca>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The performance is not as expected when used several disks on raid0.
2015-08-14 18:31 ` Chris Murphy
2015-08-14 19:50 ` Austin S Hemmelgarn
@ 2015-08-17 19:57 ` Eduardo Bach
1 sibling, 0 replies; 15+ messages in thread
From: Eduardo Bach @ 2015-08-17 19:57 UTC (permalink / raw)
To: Chris Murphy; +Cc: Btrfs BTRFS
The bether xfs performance we got was using 32 disks and 128KB mdadm chunk size.
Could the be the problem we are seen? if each disk get 4KB, 64KB will
be optimal for just 16 disks when usint raid0 with btrfs?
2015-08-14 15:31 GMT-03:00 Chris Murphy <lists@colorremedies.com>:
> On Fri, Aug 14, 2015 at 9:16 AM, Eduardo Bach <hellbach@gmail.com> wrote:
>
>> With btrfs the result approaches 3.5GB/s. When using mdadm+xfs the
>> result reaches 6gb/s, which is the expected value when compared with
>> parallel dd made on discs.
>
> mdadm with what chunk (strip) size? The default for mdadm is 512KiB.
> On Btrfs it's fixed at 64KiB. While testing with 64KiB chunk with XFS
> on md RAID might improve its performance relative to Btrfs, at least
> it's a more apples to apples comparison.
>
> --
> Chris Murphy
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The performance is not as expected when used several disks on raid0.
2015-08-17 19:44 ` Eduardo Bach
@ 2015-08-17 20:36 ` Calvin Walton
0 siblings, 0 replies; 15+ messages in thread
From: Calvin Walton @ 2015-08-17 20:36 UTC (permalink / raw)
To: Eduardo Bach; +Cc: linux-btrfs
On Mon, 2015-08-17 at 16:44 -0300, Eduardo Bach wrote:
> Based on previous testing with a smaller number of disk I'm
> suspecting
> that the 32 disks are not all being used. With 12 discs I got more
> speed with btrfs thanmdadm+xfs. With, btrfs, 12 disks and large files
> we got the entire theoretical speed, 12 x 200MB/s per disk. My hope
> was to get some light from you guys to debug the problem so the btrfs
> use the 32 discs (assuming this is the problem). Perhaps the debug
> this problem may be of interest to devs?
>From the sounds of this, you must be hitting some bottleneck in the
btrfs code. One thing I'm actually curious about: How is the CPU usage
during these tests?
Btrfs can more work on the CPU than mdadm+xfs - in particular, data che
cksums are enabled by default. If you have compression enabled, that
would obviously be a major hit as well. Make sure you don't have
compression enabled (it's off by default, or you can use the mount
option "compress=no"). You could try with the "nodatasum" option to see
if checksums make a difference.
It could be possible that you're saturating the CPU, and that's why
you're not seeing any additional gains over 3.5GB/s. Taking a look at
top output while the test is running might be informative.
On the other hand, if the CPU isn't saturated and the disk io isn't
saturated, then it's probably a scaling issue in btrfs, possibly
something like lock contention.
--
Calvin Walton <calvin.walton@kepstin.ca>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The performance is not as expected when used several disks on raid0.
2015-08-17 11:38 ` Austin S Hemmelgarn
@ 2015-08-17 23:06 ` Duncan
2015-08-18 11:34 ` Austin S Hemmelgarn
0 siblings, 1 reply; 15+ messages in thread
From: Duncan @ 2015-08-17 23:06 UTC (permalink / raw)
To: linux-btrfs
Austin S Hemmelgarn posted on Mon, 17 Aug 2015 07:38:13 -0400 as
excerpted:
> I've also found that BTRFS raid5/6 on top of MD RAID0 mitigates (to a
> certain extent that is) the performance penalty of doing raid5/6 if you
> aren't on ridiculously fast storage, probably not something that should
> be used in production yet, but it's how I've got the near-line backups
> setup on my home server system.
As should be clear from my previous posts on the subject, I'm
conservative enough not to be comfortable with the btrfs raid56
implementation yet. My recommendation has been, and remains, unless
you're deliberately testing it in ordered to help find/report/workout
bugs, give it a year after the nominally full implementation (3.19, so
until 4.4), before expecting it to be reasonably as stable as the rest of
btrfs (which itself isn't fully stable yet).
But the almost-released 4.2 does seem to be past the initial nominally
btrfs raid56 full-code bugs, and I'd call an intermediate level backup,
with working copies in front and itself backed up in back, a reasonable
first working (as opposed to testing) deployment.
And yes, btrfs raid5/6 over mdraid0 would have the same general
complementary nature as btrfs raid1/10 over mdraid0.
> It may also be worth pointing out that
> BTRFS raid6 lets you use 4 disks minimum, as opposed to most other raid6
> implementations that (unnecessarily, as a 4 disk RAID6 is not a
> degenerate form) require 5.
4-device raid6, btrfs and mdraid both allow that, good point. But of
course mdraid6 doesn't have the data integrity, only rebuild-parity.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The performance is not as expected when used several disks on raid0.
2015-08-17 23:06 ` Duncan
@ 2015-08-18 11:34 ` Austin S Hemmelgarn
2015-08-18 14:59 ` Duncan
0 siblings, 1 reply; 15+ messages in thread
From: Austin S Hemmelgarn @ 2015-08-18 11:34 UTC (permalink / raw)
To: Duncan, linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 2813 bytes --]
On 2015-08-17 19:06, Duncan wrote:
> Austin S Hemmelgarn posted on Mon, 17 Aug 2015 07:38:13 -0400 as
> excerpted:
>
>> I've also found that BTRFS raid5/6 on top of MD RAID0 mitigates (to a
>> certain extent that is) the performance penalty of doing raid5/6 if you
>> aren't on ridiculously fast storage, probably not something that should
>> be used in production yet, but it's how I've got the near-line backups
>> setup on my home server system.
>
> As should be clear from my previous posts on the subject, I'm
> conservative enough not to be comfortable with the btrfs raid56
> implementation yet. My recommendation has been, and remains, unless
> you're deliberately testing it in ordered to help find/report/workout
> bugs, give it a year after the nominally full implementation (3.19, so
> until 4.4), before expecting it to be reasonably as stable as the rest of
> btrfs (which itself isn't fully stable yet).
>
> But the almost-released 4.2 does seem to be past the initial nominally
> btrfs raid56 full-code bugs, and I'd call an intermediate level backup,
> with working copies in front and itself backed up in back, a reasonable
> first working (as opposed to testing) deployment.
Yeah, I've been ridiculously luck to have not hit _any_ of the raid56
related bugs. In fact the only issue I've had with it was a result of a
btrfs interaction with dm-thinp (if dm-thinp isn't set to zero newly
allocated blocks, btrfs sometimes loses it's mind during remount, which
in turn reminds me that I meant to check if this was fixed or not).
And the deployment you suggest is ironically how I use it, I've got my
root filesystem on btrfs raid1 across 2 SSD's, with a btrfs raid6 on top
of LVM single volumes on a set of 4 1TB HDD's as a target for receive
(and configured such that I can directly boot any of the backups there),
and then store compressed, encrypted tarballs of the Sunday backups on 3
different cloud storage services and an external 4TB HDD (It's wonderful
how Gentoo lends itself so well to custom solutions).
>
> And yes, btrfs raid5/6 over mdraid0 would have the same general
> complementary nature as btrfs raid1/10 over mdraid0.
>
>> It may also be worth pointing out that
>> BTRFS raid6 lets you use 4 disks minimum, as opposed to most other raid6
>> implementations that (unnecessarily, as a 4 disk RAID6 is not a
>> degenerate form) require 5.
>
> 4-device raid6, btrfs and mdraid both allow that, good point. But of
> course mdraid6 doesn't have the data integrity, only rebuild-parity.
>
Huh, I didn't know that mdraid allowed that, I know dm-raid through LVM
doesn't (which in turn is a large part of what caused me to try btrfs
raid56 so soon, I had been going to do btrfs raid1 on top of LVM based
raid6).
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The performance is not as expected when used several disks on raid0.
2015-08-18 11:34 ` Austin S Hemmelgarn
@ 2015-08-18 14:59 ` Duncan
0 siblings, 0 replies; 15+ messages in thread
From: Duncan @ 2015-08-18 14:59 UTC (permalink / raw)
To: linux-btrfs
Austin S Hemmelgarn posted on Tue, 18 Aug 2015 07:34:09 -0400 as
excerpted:
>> 4-device raid6, btrfs and mdraid both allow that, good point. But of
>> course mdraid6 doesn't have the data integrity, only rebuild-parity.
>>
> Huh, I didn't know that mdraid allowed that, I know dm-raid through LVM
> doesn't (which in turn is a large part of what caused me to try btrfs
> raid56 so soon, I had been going to do btrfs raid1 on top of LVM based
> raid6).
Yes. I ran a 4-device mdraid-6 for a couple years or so. Then I got
tired of the write performance, and the devices were getting old and I
found I had room if I squeezed, so I switched to 4-way mdraid1 on the
same devices, for awhile. That's actually what I was running when I
first looked into btrfs, and was hugely disappointed that btrfs could
only do pair-mirrored raid1. The devices were old enough I didn't trust
pair-mirrored, and wanted at least 3-way, so I didn't switch to btrfs at
that time.
I only switched to btrfs sometime later, after I upgraded the core
system, and primary storage to ssd. I had been impressed with reiserfs,
which I still use on my spinning rust, but decided its journaling wasn't
going to be good for ssds, and while I was upgrading to something a bit
more ssd friendly, decided I might as well go btrfs.
So now I'm running btrfs on the two ssds, partitioned identically, with
several btrfs raid1 filesystems on parallel partitions on the two, with
primary (fat-finger-level) backup on other partitions on the same devices.
Only /boot (and the gpt bios and efi partitions, bios currently used, efi
pre-allocated for easy mobo upgrade) is not btrfs raid1. It's btrfs
mixed-bg-mode dup, with an independent btrfs on each device, with grub2-
core independently installed to the gpt bios partition on each device as
well, so I can boot either one by simply selecting it in the bios,
allowing the one to be the backup for the other, much like additional
partitions and btrfs raid1 on the same physical pair are backups for the
working copy of other partitions. But a backup raid1 btrfs won't work
for /boot, since grub can really only point at one /boot (tho of course
with grub2-core installed in the bios partition, I could use grub
emergency mode to select a backup /boot if necessary, but that's harder
than simply having two independent /boots setup, one per device), which
is why I run two independent dup-mode /boots.
Secondary backup is to reiserfs on another device, spinning rust, with
third backup to a normally disconnected external USB device. For my use-
case, I decided off-site backup isn't worth the hassle, as the backup
that /really/ counts is in my head, and if all the on-site backups
including the external USB drive are unavailable, chances are a rather
large real-life disaster (like my neighbor's house that burned down a few
weeks ago) happened, and the things I'll be worried about then will make
worrying about a lost computer backup look rather trivial.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2015-08-18 14:59 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-14 15:16 The performance is not as expected when used several disks on raid0 Eduardo Bach
2015-08-14 16:30 ` Calvin Walton
2015-08-14 16:35 ` Calvin Walton
2015-08-17 19:44 ` Eduardo Bach
2015-08-17 20:36 ` Calvin Walton
2015-08-14 18:31 ` Chris Murphy
2015-08-14 19:50 ` Austin S Hemmelgarn
2015-08-14 19:54 ` Chris Murphy
2015-08-14 19:58 ` Austin S Hemmelgarn
2015-08-15 6:30 ` Duncan
2015-08-17 11:38 ` Austin S Hemmelgarn
2015-08-17 23:06 ` Duncan
2015-08-18 11:34 ` Austin S Hemmelgarn
2015-08-18 14:59 ` Duncan
2015-08-17 19:57 ` Eduardo Bach
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.