linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* raid5 to utilize upto 8 cores
@ 2012-08-16  2:56 vincent Ferrer
  2012-08-16  5:58 ` Stan Hoeppner
       [not found] ` <CAD9gYJLwuai2kGw1D1wQoK8cOvMOiCCcN3hAY=k_jj0=4og3Vg@mail.gmail.com>
  0 siblings, 2 replies; 17+ messages in thread
From: vincent Ferrer @ 2012-08-16  2:56 UTC (permalink / raw)
  To: linux-raid

Hi All,

 Have questions  about making raid5  thread  use all the cores on my
linux based storage server !

- My  storage server  has upto 8 cores  running linux kernel 2.6.32.27.
- I created  a raid5 device of  10  SSDs .
-  It seems  I only have single raid5 kernel thread,  limiting  my
WRITE  throughput  to single cpu  core/thread.
Question :   What are my options to make  my raid5 thread use all the
CPU cores ?
                  My SSDs  can do much more but  single raid5 thread
from mdadm   is becoming the bottleneck.

To overcome above single-thread-raid5 limitation (for now)  I  re-configured.
     1)  I partitioned  all  my  10 SSDs into 8  partitions:
     2)  I created  8   raid5 threads. Each raid5 thread having
partition from each of the 8 SSDs
     3)  My WRITE performance   quadrupled  because I have 8 RAID5 threads.
Question: Is this workaround a   normal practice  or may give me
maintenance problems later on.


cheers
vincy F.
chico

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid5 to utilize upto 8 cores
  2012-08-16  2:56 raid5 to utilize upto 8 cores vincent Ferrer
@ 2012-08-16  5:58 ` Stan Hoeppner
  2012-08-16  7:03   ` Mikael Abrahamsson
                     ` (3 more replies)
       [not found] ` <CAD9gYJLwuai2kGw1D1wQoK8cOvMOiCCcN3hAY=k_jj0=4og3Vg@mail.gmail.com>
  1 sibling, 4 replies; 17+ messages in thread
From: Stan Hoeppner @ 2012-08-16  5:58 UTC (permalink / raw)
  To: vincent Ferrer; +Cc: linux-raid

On 8/15/2012 9:56 PM, vincent Ferrer wrote:

> - My  storage server  has upto 8 cores  running linux kernel 2.6.32.27.
> - I created  a raid5 device of  10  SSDs .
> -  It seems  I only have single raid5 kernel thread,  limiting  my
> WRITE  throughput  to single cpu  core/thread.

The single write threads of md/RAID5/6/10 are being addressed by patches
in development.  Read the list archives for progress/status.  There were
3 posts to the list today regarding the RAID5 patch.

> Question :   What are my options to make  my raid5 thread use all the
> CPU cores ?
>                   My SSDs  can do much more but  single raid5 thread
> from mdadm   is becoming the bottleneck.
> 
> To overcome above single-thread-raid5 limitation (for now)  I  re-configured.
>      1)  I partitioned  all  my  10 SSDs into 8  partitions:
>      2)  I created  8   raid5 threads. Each raid5 thread having
> partition from each of the 8 SSDs
>      3)  My WRITE performance   quadrupled  because I have 8 RAID5 threads.
> Question: Is this workaround a   normal practice  or may give me
> maintenance problems later on.

No it is not normal practice.  I 'preach' against it regularly when I
see OPs doing it.  It's quite insane.  The glaring maintenance problem
is that when one SSD fails, and at least one will, you'll have 8 arrays
to rebuild vs one.  This may be acceptable to you, but not to the
general population.  With rust drives, and real workloads, it tends to
hammer the drive heads prodigiously, increasing latency and killing
performance, and decreasing drive life.  That's not an issue with SSD,
but multiple rebuilds is.  That and simply keeping track of 80 partitions.

There are a couple of sane things you can do today to address your problem:

1.  Create a RAID50, a layered md/RAID0 over two 5 SSD md/RAID5 arrays.
 This will double your threads and your IOPS.  It won't be as fast as
your Frankenstein setup and you'll lose one SSD of capacity to
additional parity.  However, it's sane, stable, doubles your
performance, and you have only one array to rebuild after an SSD
failure.  Any filesystem will work well with it, including XFS if
aligned properly.  It gives you an easy upgrade path-- as soon as the
threaded patches hit, a simple kernel upgrade will give your two RAID5
arrays the extra threads, so you're simply out one SSD of capacity.  You
won't need to, and probably won't want to rebuild the entire thing after
the patch.  With the Frankenstein setup you'll be destroying and
rebuilding arrays.  And if these are consumer grade SSDs, you're much
better off having two drives worth of redundancy anyway, so a RAID50
makes good sense all around.

2.  Make 5 md/RAID1 mirrors and concatenate them with md/RAID linear.
You'll get one md write thread per RAID1 device utilizing 5 cores in
parallel.  The linear driver doesn't use threads, but passes offsets to
the block layer, allowing infinite core scaling.  Format the linear
device with XFS and mount with inode64.  XFS has been fully threaded for
15 years.  Its allocation group design along with the inode64 allocator
allows near linear parallel scaling across a concatenated device[1],
assuming your workload/directory layout is designed for parallel file
throughput.

#2, with a parallel write workload, may be competitive with your
Frankenstein setup in both IOPS and throughput, even with 3 fewer RAID
threads and 4 fewer SSD "spindles".  It will outrun the RAID50 setup
like it's standing still.  You'll lose half your capacity to redundancy
as with RAID10, but you'll have 5 write threads for md/RAID1, one per
SSD pair.  One core should be plenty to drive a single SSD mirror, with
plenty of cycles to spare for actual applications, while sparing 3 cores
for apps as well.  You'll get unlimited core scaling with both md/linear
and XFS.  This setup will yield the best balance of IOPS and throughput
performance for the amount of cycles burned on IO, compared to
Frankenstein and the RAID50.

[1] If you are one of the uneducated masses who believe dd gives an
accurate measure of storage performance, then ignore option #2.  Such a
belief would indicate you thoroughly lack understanding of storage
workloads, and thus you will be greatly disappointed with the dd numbers
this configuration will give you.

-- 
Stan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid5 to utilize upto 8 cores
  2012-08-16  5:58 ` Stan Hoeppner
@ 2012-08-16  7:03   ` Mikael Abrahamsson
  2012-08-16  7:52   ` David Brown
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Mikael Abrahamsson @ 2012-08-16  7:03 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: vincent Ferrer, linux-raid

On Thu, 16 Aug 2012, Stan Hoeppner wrote:

> performance, and decreasing drive life.  That's not an issue with SSD,
> but multiple rebuilds is.

I remember seeing MD waiting to rebuild when it detects components being 
on the same physical devices, doesn't it do that (anymore)?

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid5 to utilize upto 8 cores
  2012-08-16  5:58 ` Stan Hoeppner
  2012-08-16  7:03   ` Mikael Abrahamsson
@ 2012-08-16  7:52   ` David Brown
  2012-08-16 15:47     ` Flynn
  2012-08-17  7:15     ` Stan Hoeppner
       [not found]   ` <CAEyJA_ungvS_o6dpKL+eghpavRwtY9eaDNCRJF0eUULoC0P6BA@mail.gmail.com>
  2012-08-16 22:11   ` vincent Ferrer
  3 siblings, 2 replies; 17+ messages in thread
From: David Brown @ 2012-08-16  7:52 UTC (permalink / raw)
  To: stan; +Cc: vincent Ferrer, linux-raid

On 16/08/2012 07:58, Stan Hoeppner wrote:
> On 8/15/2012 9:56 PM, vincent Ferrer wrote:
>
>> - My  storage server  has upto 8 cores  running linux kernel 2.6.32.27.
>> - I created  a raid5 device of  10  SSDs .
>> -  It seems  I only have single raid5 kernel thread,  limiting  my
>> WRITE  throughput  to single cpu  core/thread.
>
> The single write threads of md/RAID5/6/10 are being addressed by patches
> in development.  Read the list archives for progress/status.  There were
> 3 posts to the list today regarding the RAID5 patch.
>
>> Question :   What are my options to make  my raid5 thread use all the
>> CPU cores ?
>>                    My SSDs  can do much more but  single raid5 thread
>> from mdadm   is becoming the bottleneck.
>>
>> To overcome above single-thread-raid5 limitation (for now)  I  re-configured.
>>       1)  I partitioned  all  my  10 SSDs into 8  partitions:
>>       2)  I created  8   raid5 threads. Each raid5 thread having
>> partition from each of the 8 SSDs
>>       3)  My WRITE performance   quadrupled  because I have 8 RAID5 threads.
>> Question: Is this workaround a   normal practice  or may give me
>> maintenance problems later on.
>
> No it is not normal practice.  I 'preach' against it regularly when I
> see OPs doing it.  It's quite insane.  The glaring maintenance problem
> is that when one SSD fails, and at least one will, you'll have 8 arrays
> to rebuild vs one.  This may be acceptable to you, but not to the
> general population.  With rust drives, and real workloads, it tends to
> hammer the drive heads prodigiously, increasing latency and killing
> performance, and decreasing drive life.  That's not an issue with SSD,
> but multiple rebuilds is.  That and simply keeping track of 80 partitions.
>

The rebuilds will, I believe, be done sequentially rather than in 
parallel.  And each rebuild will take 1/8 of the time a full array 
rebuild would have done.  So it really should not be much more time or 
wear-and-tear for a rebuild of this monster setup, compared to a single 
raid5 array rebuild.  (With hard disks, it would be worse due to head 
seeks - but still not as bad as you imply, if I am right about the 
rebuilds being done sequentially.)

However, there was a recent thread here about someone with a similar 
setup (on hard disks) who had a failure during such a rebuild and had 
lots of trouble.  That makes me sceptical to this sort of multiple array 
setup (in addition to Stan's other points).

And of course, all Stan's other points about maintenance, updates to 
later kernels with multiple raid5 threads, etc., still stand.

> There are a couple of sane things you can do today to address your problem:
>
> 1.  Create a RAID50, a layered md/RAID0 over two 5 SSD md/RAID5 arrays.
>   This will double your threads and your IOPS.  It won't be as fast as
> your Frankenstein setup and you'll lose one SSD of capacity to
> additional parity.  However, it's sane, stable, doubles your
> performance, and you have only one array to rebuild after an SSD
> failure.  Any filesystem will work well with it, including XFS if
> aligned properly.  It gives you an easy upgrade path-- as soon as the
> threaded patches hit, a simple kernel upgrade will give your two RAID5
> arrays the extra threads, so you're simply out one SSD of capacity.  You
> won't need to, and probably won't want to rebuild the entire thing after
> the patch.  With the Frankenstein setup you'll be destroying and
> rebuilding arrays.  And if these are consumer grade SSDs, you're much
> better off having two drives worth of redundancy anyway, so a RAID50
> makes good sense all around.
>
> 2.  Make 5 md/RAID1 mirrors and concatenate them with md/RAID linear.
> You'll get one md write thread per RAID1 device utilizing 5 cores in
> parallel.  The linear driver doesn't use threads, but passes offsets to
> the block layer, allowing infinite core scaling.  Format the linear
> device with XFS and mount with inode64.  XFS has been fully threaded for
> 15 years.  Its allocation group design along with the inode64 allocator
> allows near linear parallel scaling across a concatenated device[1],
> assuming your workload/directory layout is designed for parallel file
> throughput.
>
> #2, with a parallel write workload, may be competitive with your
> Frankenstein setup in both IOPS and throughput, even with 3 fewer RAID
> threads and 4 fewer SSD "spindles".  It will outrun the RAID50 setup
> like it's standing still.  You'll lose half your capacity to redundancy
> as with RAID10, but you'll have 5 write threads for md/RAID1, one per
> SSD pair.  One core should be plenty to drive a single SSD mirror, with
> plenty of cycles to spare for actual applications, while sparing 3 cores
> for apps as well.  You'll get unlimited core scaling with both md/linear
> and XFS.  This setup will yield the best balance of IOPS and throughput
> performance for the amount of cycles burned on IO, compared to
> Frankenstein and the RAID50.

For those that don't want to use XFS, or won't have balanced directories 
in their filesystem, or want greater throughput of larger files (rather 
than greater average throughput of multiple parallel accesses), you can 
also take your 5 raid1 mirror pairs and combine them with raid0.  You 
should get similar scaling (the cpu does not limit raid0).  For some 
applications (such as mail server, /home mount, etc.), the XFS over a 
linear concatenation is probably unbeatable.  But for others (such as 
serving large media files), a raid0 over raid1 pairs could well be 
better.  As always, it depends on your load - and you need to test with 
realistic loads or at least realistic simulations.

>
> [1] If you are one of the uneducated masses who believe dd gives an
> accurate measure of storage performance, then ignore option #2.  Such a
> belief would indicate you thoroughly lack understanding of storage
> workloads, and thus you will be greatly disappointed with the dd numbers
> this configuration will give you.
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid5 to utilize upto 8 cores
       [not found]   ` <CAEyJA_ungvS_o6dpKL+eghpavRwtY9eaDNCRJF0eUULoC0P6BA@mail.gmail.com>
@ 2012-08-16  8:55     ` Stan Hoeppner
  0 siblings, 0 replies; 17+ messages in thread
From: Stan Hoeppner @ 2012-08-16  8:55 UTC (permalink / raw)
  To: vincent Ferrer; +Cc: Linux RAID

On 8/16/2012 2:08 AM, vincent Ferrer wrote:
>> No it is not normal practice.  I 'preach' against it regularly when I
>> see OPs doing it.  It's quite insane.  The glaring maintenance problem
>> is that when one SSD fails, and at least one will, you'll have 8 arrays
>> to rebuild vs one.  This may be acceptable to you, but not to the
>> general population.  With rust drives, and real workloads, it tends to
>> hammer the drive heads prodigiously, increasing latency and killing
>> performance, and decreasing drive life.  That's not an issue with SSD,
>> but multiple rebuilds is.  That and simply keeping track of 80 partitions.

> Suppose If I  don't do the partitioning  and get more SSDs  to  be able to create  more  raid5 thread - does my setup now sane ?

No, because you're not addressing an actual workload problem, but merely
playing with what-ifs.

> Only reason i partitioned insanely  was because I dont have enough SSDs yet.
> 
>  suppose if  I get  32 SSDs .  Then What is normal practice :
>     1)  Have one raid5  drives out of  32 SSDs
>               or
>     2)  Have  8 raid5 drives each with 4 SSDs
> 
> My partitioning was only a proof-of-concept  test to prove that buying
> more SSDs and launching several raid5 will increase  write throughput
> in kernel I  am using (2.6.32)

You didn't need to perform a proof of concept in order to confirm
anything.  All you had to do is google "md raid ssd".  This issue has
been well documented for quite some time, as well as workarounds, some
of which I mentioned.

"Normal practice" would be to explain your target workload, and then
discuss storage options that meet the needs of that workload.

If you had a workload that actually requires the performance equivalent
of 32 SATA SSDs, you'd already be looking at a high end SAN controller
or a handful of LSI's WarpDrive devices, and not wasting time playing
what-ifs with md/RAID's parity personality limitations.

-- 
Stan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid5 to utilize upto 8 cores
  2012-08-16  7:52   ` David Brown
@ 2012-08-16 15:47     ` Flynn
  2012-08-17  7:15     ` Stan Hoeppner
  1 sibling, 0 replies; 17+ messages in thread
From: Flynn @ 2012-08-16 15:47 UTC (permalink / raw)
  To: David Brown; +Cc: stan, vincent Ferrer, linux-raid

> However, there was a recent thread here about someone with a similar
> setup (on hard disks) who had a failure during such a rebuild and had
> lots of trouble...

That was me.  The rebuilds _did_ happen sequentially, not in parallel.  The 
trouble was that by the time the first one finished, the critical-section 
backup for the second one was considered too old to be restored, so mdadm 
--assemble balked.  The fix was simply to tell mdadm --assemble to go 
forward with the old backup; all is well now.

So from my experience, yes, multiple partitions (with magnetic disks) are 
definitely an administrative hassle, and can sometimes get you into actual 
trouble.  Barring some real benefit to set against those real costs, I'd 
say it'd be better to not do it.  [ :) ]

(BTW David, I found that your constructive commentary added much to the 
discussion, both here and in my original thread -- thanks.)

 -- Flynn

--
America will never be destroyed from the outside. If we falter, and lose
our freedoms, it will be because we destroyed ourselves.
                                                         (Abraham Lincoln)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid5 to utilize upto 8 cores
       [not found]     ` <CAD9gYJK09kRMb_v25uwmG7eRfFQLQyEd4SMXWBSPwYkpP56jcw@mail.gmail.com>
@ 2012-08-16 21:51       ` vincent Ferrer
  2012-08-16 22:29         ` Roberto Spadim
  0 siblings, 1 reply; 17+ messages in thread
From: vincent Ferrer @ 2012-08-16 21:51 UTC (permalink / raw)
  To: 王金浦, linux-raid

On Thu, Aug 16, 2012 at 1:34 AM, 王金浦 <jinpuwang@gmail.com> wrote:
> Hi vikas,
>
> I suggest you can pull from  git://neil.brown.name/md.git and checkout to
> for-next branch, test if that works for you.
>
> I'm not sure the patches can cleanly apply to 2.6.32, good luck to try.
>
> Jack
>
> 2012/8/16 vincent Ferrer <vincentchicago1@gmail.com>
>>
>> Hi Jin,
>> Thanks
>> Which kernel  version these patches are expected in linux tree?
>> Can I apply these patches into  2.6.32 kernel ?
>>
>> regards
>> -vikas
>>
>> On Wed, Aug 15, 2012 at 8:09 PM, 王金浦 <jinpuwang@gmail.com> wrote:
>> > Hi
>> >
>> > You may not notice Shaohua have address this
>> >
>> > http://neil.brown.name/git?p=md;a=commitdiff;h=45e2c516a2ffd5d477d42f3cc6f6e960b3ae14de
>> >
>> > Jack
>> >

Hello Forum,
  Can I download any  "Released or non-released"  linux kernel version
 which has  multi thread raid5  support (for now don't care about
stability).  Just want to try it out for benchmarking my storage
server.
  This will help in testing your patch and also will save me effort
having to patch a kernel which I am on (  kernel 3.3 (Fedora 17)  or
kernel 2.6.32)

regards
vincy
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid5 to utilize upto 8 cores
  2012-08-16  5:58 ` Stan Hoeppner
                     ` (2 preceding siblings ...)
       [not found]   ` <CAEyJA_ungvS_o6dpKL+eghpavRwtY9eaDNCRJF0eUULoC0P6BA@mail.gmail.com>
@ 2012-08-16 22:11   ` vincent Ferrer
  2012-08-17  7:52     ` David Brown
  2012-08-17  8:29     ` Stan Hoeppner
  3 siblings, 2 replies; 17+ messages in thread
From: vincent Ferrer @ 2012-08-16 22:11 UTC (permalink / raw)
  To: stan; +Cc: linux-raid

On Wed, Aug 15, 2012 at 10:58 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
> On 8/15/2012 9:56 PM, vincent Ferrer wrote:
>
>> - My  storage server  has upto 8 cores  running linux kernel 2.6.32.27.
>> - I created  a raid5 device of  10  SSDs .
>
> No it is not normal practice.  I 'preach' against it regularly when I
> see OPs doing it.  It's quite insane.
>
> There are a couple of sane things you can do today to address your problem:
>
> Stan
>

Hi Stan,
Follow-up question for  2  types of setups i may have to prepare:
1) setup A   has   80   SSDs.    Question: Should I still  create one
raid5 device or should I create  8  raid5 device each having 10 SSDs ?
    My linux based storage server may be accessed by  upto   10-20
physically  different clients.

 2) Setup B  has only 12 SSDs.  Question:  Is it more practical to
have only one raid5  device,  even though I may have 4-5  physically
different  clients or create 2 raid5 devices each having  6 SSDs.

Reason I am asking because I have seen enterprise storage arrays from
EMC/IBM where new raid5 device is created on demand  and (storage
firmware may spread across automatically across all the available
drives/spindles or can be intelligently selected by storage admin by
analyzing  workload to avoid  hot-spots)

Partitioning was only done because I am still waiting  budget approval
to buy SSDs.

regards
vincy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid5 to utilize upto 8 cores
  2012-08-16 21:51       ` vincent Ferrer
@ 2012-08-16 22:29         ` Roberto Spadim
  0 siblings, 0 replies; 17+ messages in thread
From: Roberto Spadim @ 2012-08-16 22:29 UTC (permalink / raw)
  To: vincent Ferrer; +Cc: 王金浦, linux-raid

maybe you will need a syslinux boot loader, instead grub
i tryed archlinux devel version, and only worked with syslinux boot
loader, grub didn´t work with rootfs on md device
i didn´t tryed to understand, I just changed the bootloader and it works
just a help if you can´t boot your system...

and yes, you can download the last kernel and patch it with last raid
changes (just get the right one =]  )
try booting from a external harddisk to don´t change something
important in your server configuration
good look =)

2012/8/16 vincent Ferrer <vincentchicago1@gmail.com>:
> On Thu, Aug 16, 2012 at 1:34 AM, 王金浦 <jinpuwang@gmail.com> wrote:
>> Hi vikas,
>>
>> I suggest you can pull from  git://neil.brown.name/md.git and checkout to
>> for-next branch, test if that works for you.
>>
>> I'm not sure the patches can cleanly apply to 2.6.32, good luck to try.
>>
>> Jack
>>
>> 2012/8/16 vincent Ferrer <vincentchicago1@gmail.com>
>>>
>>> Hi Jin,
>>> Thanks
>>> Which kernel  version these patches are expected in linux tree?
>>> Can I apply these patches into  2.6.32 kernel ?
>>>
>>> regards
>>> -vikas
>>>
>>> On Wed, Aug 15, 2012 at 8:09 PM, 王金浦 <jinpuwang@gmail.com> wrote:
>>> > Hi
>>> >
>>> > You may not notice Shaohua have address this
>>> >
>>> > http://neil.brown.name/git?p=md;a=commitdiff;h=45e2c516a2ffd5d477d42f3cc6f6e960b3ae14de
>>> >
>>> > Jack
>>> >
>
> Hello Forum,
>   Can I download any  "Released or non-released"  linux kernel version
>  which has  multi thread raid5  support (for now don't care about
> stability).  Just want to try it out for benchmarking my storage
> server.
>   This will help in testing your patch and also will save me effort
> having to patch a kernel which I am on (  kernel 3.3 (Fedora 17)  or
> kernel 2.6.32)
>
> regards
> vincy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid5 to utilize upto 8 cores
  2012-08-16  7:52   ` David Brown
  2012-08-16 15:47     ` Flynn
@ 2012-08-17  7:15     ` Stan Hoeppner
  2012-08-17  7:29       ` David Brown
  1 sibling, 1 reply; 17+ messages in thread
From: Stan Hoeppner @ 2012-08-17  7:15 UTC (permalink / raw)
  To: David Brown; +Cc: vincent Ferrer, linux-raid

On 8/16/2012 2:52 AM, David Brown wrote:
> On 16/08/2012 07:58, Stan Hoeppner wrote:
>> On 8/15/2012 9:56 PM, vincent Ferrer wrote:
>>
>>> - My  storage server  has upto 8 cores  running linux kernel 2.6.32.27.
>>> - I created  a raid5 device of  10  SSDs .
>>> -  It seems  I only have single raid5 kernel thread,  limiting  my
>>> WRITE  throughput  to single cpu  core/thread.
>>
>> The single write threads of md/RAID5/6/10 are being addressed by patches
>> in development.  Read the list archives for progress/status.  There were
>> 3 posts to the list today regarding the RAID5 patch.
>>
>>> Question :   What are my options to make  my raid5 thread use all the
>>> CPU cores ?
>>>                    My SSDs  can do much more but  single raid5 thread
>>> from mdadm   is becoming the bottleneck.
>>>
>>> To overcome above single-thread-raid5 limitation (for now)  I 
>>> re-configured.
>>>       1)  I partitioned  all  my  10 SSDs into 8  partitions:
>>>       2)  I created  8   raid5 threads. Each raid5 thread having
>>> partition from each of the 8 SSDs
>>>       3)  My WRITE performance   quadrupled  because I have 8 RAID5
>>> threads.
>>> Question: Is this workaround a   normal practice  or may give me
>>> maintenance problems later on.
>>
>> No it is not normal practice.  I 'preach' against it regularly when I
>> see OPs doing it.  It's quite insane.  The glaring maintenance problem
>> is that when one SSD fails, and at least one will, you'll have 8 arrays
>> to rebuild vs one.  This may be acceptable to you, but not to the
>> general population.  With rust drives, and real workloads, it tends to
>> hammer the drive heads prodigiously, increasing latency and killing
>> performance, and decreasing drive life.  That's not an issue with SSD,
>> but multiple rebuilds is.  That and simply keeping track of 80
>> partitions.
>>
> 
> The rebuilds will, I believe, be done sequentially rather than in
> parallel.  And each rebuild will take 1/8 of the time a full array
> rebuild would have done.  So it really should not be much more time or
> wear-and-tear for a rebuild of this monster setup, compared to a single
> raid5 array rebuild.  (With hard disks, it would be worse due to head
> seeks - but still not as bad as you imply, if I am right about the
> rebuilds being done sequentially.)
> 
> However, there was a recent thread here about someone with a similar
> setup (on hard disks) who had a failure during such a rebuild and had
> lots of trouble.  That makes me sceptical to this sort of multiple array
> setup (in addition to Stan's other points).
> 
> And of course, all Stan's other points about maintenance, updates to
> later kernels with multiple raid5 threads, etc., still stand.
> 
>> There are a couple of sane things you can do today to address your
>> problem:
>>
>> 1.  Create a RAID50, a layered md/RAID0 over two 5 SSD md/RAID5 arrays.
>>   This will double your threads and your IOPS.  It won't be as fast as
>> your Frankenstein setup and you'll lose one SSD of capacity to
>> additional parity.  However, it's sane, stable, doubles your
>> performance, and you have only one array to rebuild after an SSD
>> failure.  Any filesystem will work well with it, including XFS if
>> aligned properly.  It gives you an easy upgrade path-- as soon as the
>> threaded patches hit, a simple kernel upgrade will give your two RAID5
>> arrays the extra threads, so you're simply out one SSD of capacity.  You
>> won't need to, and probably won't want to rebuild the entire thing after
>> the patch.  With the Frankenstein setup you'll be destroying and
>> rebuilding arrays.  And if these are consumer grade SSDs, you're much
>> better off having two drives worth of redundancy anyway, so a RAID50
>> makes good sense all around.
>>
>> 2.  Make 5 md/RAID1 mirrors and concatenate them with md/RAID linear.
>> You'll get one md write thread per RAID1 device utilizing 5 cores in
>> parallel.  The linear driver doesn't use threads, but passes offsets to
>> the block layer, allowing infinite core scaling.  Format the linear
>> device with XFS and mount with inode64.  XFS has been fully threaded for
>> 15 years.  Its allocation group design along with the inode64 allocator
>> allows near linear parallel scaling across a concatenated device[1],
>> assuming your workload/directory layout is designed for parallel file
>> throughput.
>>
>> #2, with a parallel write workload, may be competitive with your
>> Frankenstein setup in both IOPS and throughput, even with 3 fewer RAID
>> threads and 4 fewer SSD "spindles".  It will outrun the RAID50 setup
>> like it's standing still.  You'll lose half your capacity to redundancy
>> as with RAID10, but you'll have 5 write threads for md/RAID1, one per
>> SSD pair.  One core should be plenty to drive a single SSD mirror, with
>> plenty of cycles to spare for actual applications, while sparing 3 cores
>> for apps as well.  You'll get unlimited core scaling with both md/linear
>> and XFS.  This setup will yield the best balance of IOPS and throughput
>> performance for the amount of cycles burned on IO, compared to
>> Frankenstein and the RAID50.
> 
> For those that don't want to use XFS, or won't have balanced directories
> in their filesystem, or want greater throughput of larger files (rather
> than greater average throughput of multiple parallel accesses), you can
> also take your 5 raid1 mirror pairs and combine them with raid0.  You
> should get similar scaling (the cpu does not limit raid0).  For some
> applications (such as mail server, /home mount, etc.), the XFS over a
> linear concatenation is probably unbeatable.  But for others (such as
> serving large media files), a raid0 over raid1 pairs could well be
> better.  As always, it depends on your load - and you need to test with
> realistic loads or at least realistic simulations.

Sure, a homemade RAID10 would work as it avoids the md/RAID10 single
write thread.  I intentionally avoided mentioning this option for a few
reasons:

1.  Anyone needing 10 SATA SSDs obviously has a parallel workload
2.  Any thread will have up to 200-500MB/s available (one SSD)
    with a concat, I can't see a single thread needing 4.5GB/s of B/W
    If so, md/RAID isn't capable, not on COTS hardware
3.  With a parallel workload requiring this many SSDs, XFS is a must
4.  With a concat, mkfs.xfs is simple, no stripe aligning, etc
    ~$ mkfs.xfs /dev/md0

-- 
Stan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid5 to utilize upto 8 cores
  2012-08-17  7:15     ` Stan Hoeppner
@ 2012-08-17  7:29       ` David Brown
  2012-08-17 10:52         ` Stan Hoeppner
  0 siblings, 1 reply; 17+ messages in thread
From: David Brown @ 2012-08-17  7:29 UTC (permalink / raw)
  To: stan; +Cc: vincent Ferrer, linux-raid

On 17/08/2012 09:15, Stan Hoeppner wrote:
> On 8/16/2012 2:52 AM, David Brown wrote:
>> For those that don't want to use XFS, or won't have balanced directories
>> in their filesystem, or want greater throughput of larger files (rather
>> than greater average throughput of multiple parallel accesses), you can
>> also take your 5 raid1 mirror pairs and combine them with raid0.  You
>> should get similar scaling (the cpu does not limit raid0).  For some
>> applications (such as mail server, /home mount, etc.), the XFS over a
>> linear concatenation is probably unbeatable.  But for others (such as
>> serving large media files), a raid0 over raid1 pairs could well be
>> better.  As always, it depends on your load - and you need to test with
>> realistic loads or at least realistic simulations.
>
> Sure, a homemade RAID10 would work as it avoids the md/RAID10 single
> write thread.  I intentionally avoided mentioning this option for a few
> reasons:
>
> 1.  Anyone needing 10 SATA SSDs obviously has a parallel workload
> 2.  Any thread will have up to 200-500MB/s available (one SSD)
>      with a concat, I can't see a single thread needing 4.5GB/s of B/W
>      If so, md/RAID isn't capable, not on COTS hardware
> 3.  With a parallel workload requiring this many SSDs, XFS is a must
> 4.  With a concat, mkfs.xfs is simple, no stripe aligning, etc
>      ~$ mkfs.xfs /dev/md0
>

These are all good points.  There is always a lot to learn from your posts.

My only concern with XFS over linear concat is that its performance 
depends on the spread of allocation groups across the elements of the 
concatenation (the raid1 pairs in this case), and that in turn depends 
on the directory structure.  (I'm sure you'll correct me if I'm wrong in 
this - indeed I would /like/ to be wrong!)  If you have large numbers of 
top-level directories and a spread of access, then this is ideal.  But 
if you have very skewed access with most access within only one or two 
top-level directories, then as far as I understand XFS allocation 
groups, access will then be concentrated heavily on only one (or a few) 
of the concat elements.

raid0 of the raid1 pairs may not be the best way to spread out access 
(assuming XFS linear concat is not a good fit for the workload), but it 
might still be an improvement.  Perhaps a good solution would be raid0 
with a very large chunk size - that make most accesses non-striped (as 
you say, the user probably doesn't need striping), thus allowing more 
parallel accesses, while scattering the accesses evenly across all raid1 
elements?


Of course, we are still waiting to hear a bit about the OP's real load.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid5 to utilize upto 8 cores
  2012-08-16 22:11   ` vincent Ferrer
@ 2012-08-17  7:52     ` David Brown
  2012-08-17  8:29     ` Stan Hoeppner
  1 sibling, 0 replies; 17+ messages in thread
From: David Brown @ 2012-08-17  7:52 UTC (permalink / raw)
  To: vincent Ferrer; +Cc: stan, linux-raid

On 17/08/2012 00:11, vincent Ferrer wrote:
> On Wed, Aug 15, 2012 at 10:58 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
>> On 8/15/2012 9:56 PM, vincent Ferrer wrote:
>>
>>> - My  storage server  has upto 8 cores  running linux kernel 2.6.32.27.
>>> - I created  a raid5 device of  10  SSDs .
>>
>> No it is not normal practice.  I 'preach' against it regularly when I
>> see OPs doing it.  It's quite insane.
>>
>> There are a couple of sane things you can do today to address your problem:
>>
>> Stan
>>
>
> Hi Stan,
> Follow-up question for  2  types of setups i may have to prepare:
> 1) setup A   has   80   SSDs.    Question: Should I still  create one
> raid5 device or should I create  8  raid5 device each having 10 SSDs ?
>      My linux based storage server may be accessed by  upto   10-20
> physically  different clients.
>

I have difficultly imagining the sort of workload that would justify 80 
SSDs.  Certainly you have to think about far more than just the disks or 
the raid setup - you would be looking at massive network bandwidth, 
multiple servers with large PCI express buses, etc.  Probably you would 
want dedicated SAN hardware of some sort.  Otherwise you could get 
pretty much the same performance and capacity using 10 hard disks (and 
maybe a little extra ram to improve caching).

But as a general rule, you want to limit the number of disks (or 
partitions) you have in a single raid5 to perhaps 6 devices.  With too 
many devices, you increase the probability that you will get a failure, 
and then a second failure during a rebuild.  You can use raid6 for extra 
protection - but that also (currently) suffers from the single-thread 
bottleneck.

Remember also that raid5 (or raid6) requires a RMW for updates larger 
than a single block but smaller than a full stripe - that means it needs 
to read from every disk in the array before it can write.  The wider the 
array, the bigger effect this is.

>   2) Setup B  has only 12 SSDs.  Question:  Is it more practical to
> have only one raid5  device,  even though I may have 4-5  physically
> different  clients or create 2 raid5 devices each having  6 SSDs.

Again, I would put only 6 disks in a raid5.

>
> Reason I am asking because I have seen enterprise storage arrays from
> EMC/IBM where new raid5 device is created on demand  and (storage
> firmware may spread across automatically across all the available
> drives/spindles or can be intelligently selected by storage admin by
> analyzing  workload to avoid  hot-spots)
>
> Partitioning was only done because I am still waiting  budget approval
> to buy SSDs.
>
> regards
> vincy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid5 to utilize upto 8 cores
  2012-08-16 22:11   ` vincent Ferrer
  2012-08-17  7:52     ` David Brown
@ 2012-08-17  8:29     ` Stan Hoeppner
  1 sibling, 0 replies; 17+ messages in thread
From: Stan Hoeppner @ 2012-08-17  8:29 UTC (permalink / raw)
  To: vincent Ferrer; +Cc: linux-raid

On 8/16/2012 5:11 PM, vincent Ferrer wrote:
> On Wed, Aug 15, 2012 at 10:58 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
>> On 8/15/2012 9:56 PM, vincent Ferrer wrote:
>>
>>> - My  storage server  has upto 8 cores  running linux kernel 2.6.32.27.
>>> - I created  a raid5 device of  10  SSDs .
>>
>> No it is not normal practice.  I 'preach' against it regularly when I
>> see OPs doing it.  It's quite insane.
>>
>> There are a couple of sane things you can do today to address your problem:
>>
>> Stan
>>
> 
> Hi Stan,
> Follow-up question for  2  types of setups i may have to prepare:
> 1) setup A   has   80   SSDs.    

This is simply silly.  There's no need for exaggeration here.

>  2) Setup B  has only 12 SSDs.  Question:  Is it more practical to
> have only one raid5  device,  even though I may have 4-5  physically
> different  clients or create 2 raid5 devices each having  6 SSDs.

As I repeat in every such case:  What is your workload?  Creating a
storage specification is driven by the requirements of the workload.

-- 
Stan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid5 to utilize upto 8 cores
  2012-08-17  7:29       ` David Brown
@ 2012-08-17 10:52         ` Stan Hoeppner
  2012-08-17 11:47           ` David Brown
  0 siblings, 1 reply; 17+ messages in thread
From: Stan Hoeppner @ 2012-08-17 10:52 UTC (permalink / raw)
  To: David Brown; +Cc: vincent Ferrer, linux-raid

On 8/17/2012 2:29 AM, David Brown wrote:
> On 17/08/2012 09:15, Stan Hoeppner wrote:
>> On 8/16/2012 2:52 AM, David Brown wrote:
>>> For those that don't want to use XFS, or won't have balanced directories
>>> in their filesystem, or want greater throughput of larger files (rather
>>> than greater average throughput of multiple parallel accesses), you can
>>> also take your 5 raid1 mirror pairs and combine them with raid0.  You
>>> should get similar scaling (the cpu does not limit raid0).  For some
>>> applications (such as mail server, /home mount, etc.), the XFS over a
>>> linear concatenation is probably unbeatable.  But for others (such as
>>> serving large media files), a raid0 over raid1 pairs could well be
>>> better.  As always, it depends on your load - and you need to test with
>>> realistic loads or at least realistic simulations.
>>
>> Sure, a homemade RAID10 would work as it avoids the md/RAID10 single
>> write thread.  I intentionally avoided mentioning this option for a few
>> reasons:
>>
>> 1.  Anyone needing 10 SATA SSDs obviously has a parallel workload
>> 2.  Any thread will have up to 200-500MB/s available (one SSD)
>>      with a concat, I can't see a single thread needing 4.5GB/s of B/W
>>      If so, md/RAID isn't capable, not on COTS hardware
>> 3.  With a parallel workload requiring this many SSDs, XFS is a must
>> 4.  With a concat, mkfs.xfs is simple, no stripe aligning, etc
>>      ~$ mkfs.xfs /dev/md0
>>
> 
> These are all good points.  There is always a lot to learn from your posts.
> 
> My only concern with XFS over linear concat is that its performance
> depends on the spread of allocation groups across the elements of the
> concatenation (the raid1 pairs in this case), and that in turn depends
> on the directory structure.  (I'm sure you'll correct me if I'm wrong in
> this - indeed I would /like/ to be wrong!)  If you have large numbers of
> top-level directories and a spread of access, then this is ideal.  But
> if you have very skewed access with most access within only one or two
> top-level directories, then as far as I understand XFS allocation
> groups, access will then be concentrated heavily on only one (or a few)
> of the concat elements.

This depends on the allocator.  inode32, the default allocator, does
RAID0 with files--each file being a chunk.  All inodes go in AG0, all
files round robin'd across the other AGs.  Great for parallel streaming
workloads on a mirror concat, obviously not metadata intensive
workloads, as metadata is on the first spindle.

The optional inode64 allocator spreads inodes and files across all AGs.
 Every new dir is created in a different AG round robin, regardless of
the on disk location of the parent dir.  File however always get created
in their parent dir.  Much better for metadata workloads.  It's just as
good with parallel streaming workloads if the user has read the XFS
Users Guide and does some manual placement.

> raid0 of the raid1 pairs may not be the best way to spread out access
> (assuming XFS linear concat is not a good fit for the workload), but it
> might still be an improvement.  Perhaps a good solution would be raid0
> with a very large chunk size - that make most accesses non-striped (as
> you say, the user probably doesn't need striping), thus allowing more
> parallel accesses, while scattering the accesses evenly across all raid1
> elements?

No matter how anyone tries to slice it, striped RAID is only optimal for
streaming writes/reads of large files.  This represents less than 1% of
real world workloads.  The rest are all concurrent relatively small file
workloads, and for these using an intelligent filesystem with an
allocation group design (XFS, JFS) will yield better performance.

The only real benefit of striped RAID over concat, for the majority of
workloads, is $/GB.

> Of course, we are still waiting to hear a bit about the OP's real load.

It seems clear he has some hardware, real or fantasy, in need of a
workload, so I'm not holding my breath.

-- 
Stan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid5 to utilize upto 8 cores
  2012-08-17 10:52         ` Stan Hoeppner
@ 2012-08-17 11:47           ` David Brown
  2012-08-18  4:55             ` Stan Hoeppner
  0 siblings, 1 reply; 17+ messages in thread
From: David Brown @ 2012-08-17 11:47 UTC (permalink / raw)
  To: stan; +Cc: vincent Ferrer, linux-raid

On 17/08/2012 12:52, Stan Hoeppner wrote:
> On 8/17/2012 2:29 AM, David Brown wrote:
>> On 17/08/2012 09:15, Stan Hoeppner wrote:
>>> On 8/16/2012 2:52 AM, David Brown wrote:
>>>> For those that don't want to use XFS, or won't have balanced directories
>>>> in their filesystem, or want greater throughput of larger files (rather
>>>> than greater average throughput of multiple parallel accesses), you can
>>>> also take your 5 raid1 mirror pairs and combine them with raid0.  You
>>>> should get similar scaling (the cpu does not limit raid0).  For some
>>>> applications (such as mail server, /home mount, etc.), the XFS over a
>>>> linear concatenation is probably unbeatable.  But for others (such as
>>>> serving large media files), a raid0 over raid1 pairs could well be
>>>> better.  As always, it depends on your load - and you need to test with
>>>> realistic loads or at least realistic simulations.
>>>
>>> Sure, a homemade RAID10 would work as it avoids the md/RAID10 single
>>> write thread.  I intentionally avoided mentioning this option for a few
>>> reasons:
>>>
>>> 1.  Anyone needing 10 SATA SSDs obviously has a parallel workload
>>> 2.  Any thread will have up to 200-500MB/s available (one SSD)
>>>       with a concat, I can't see a single thread needing 4.5GB/s of B/W
>>>       If so, md/RAID isn't capable, not on COTS hardware
>>> 3.  With a parallel workload requiring this many SSDs, XFS is a must
>>> 4.  With a concat, mkfs.xfs is simple, no stripe aligning, etc
>>>       ~$ mkfs.xfs /dev/md0
>>>
>>
>> These are all good points.  There is always a lot to learn from your posts.
>>
>> My only concern with XFS over linear concat is that its performance
>> depends on the spread of allocation groups across the elements of the
>> concatenation (the raid1 pairs in this case), and that in turn depends
>> on the directory structure.  (I'm sure you'll correct me if I'm wrong in
>> this - indeed I would /like/ to be wrong!)  If you have large numbers of
>> top-level directories and a spread of access, then this is ideal.  But
>> if you have very skewed access with most access within only one or two
>> top-level directories, then as far as I understand XFS allocation
>> groups, access will then be concentrated heavily on only one (or a few)
>> of the concat elements.
>
> This depends on the allocator.  inode32, the default allocator, does
> RAID0 with files--each file being a chunk.  All inodes go in AG0, all
> files round robin'd across the other AGs.  Great for parallel streaming
> workloads on a mirror concat, obviously not metadata intensive
> workloads, as metadata is on the first spindle.
>
> The optional inode64 allocator spreads inodes and files across all AGs.
>   Every new dir is created in a different AG round robin, regardless of
> the on disk location of the parent dir.  File however always get created
> in their parent dir.  Much better for metadata workloads.  It's just as
> good with parallel streaming workloads if the user has read the XFS
> Users Guide and does some manual placement.
>

It sounds like I /have/ misunderstood things - at least regarding the 
inode64 allocator (which will surely be the best choice for a large 
array).  I had though that while directories "/a/" and "/b/" get 
different allocation groups, directories "/a/1/" and "/a/2/" would go in 
the same AG as "/a/".  What you are saying is that this is not correct - 
each of these four directories would go in a separate AG.  File "/a/1/x" 
would go in the same AG as "/a/1/", of course.  Assuming this is the 
case, XFS over linear concat sounds more appealing for a much wider set 
of applications than I had previously thought.

>> raid0 of the raid1 pairs may not be the best way to spread out access
>> (assuming XFS linear concat is not a good fit for the workload), but it
>> might still be an improvement.  Perhaps a good solution would be raid0
>> with a very large chunk size - that make most accesses non-striped (as
>> you say, the user probably doesn't need striping), thus allowing more
>> parallel accesses, while scattering the accesses evenly across all raid1
>> elements?
>
> No matter how anyone tries to slice it, striped RAID is only optimal for
> streaming writes/reads of large files.  This represents less than 1% of
> real world workloads.  The rest are all concurrent relatively small file
> workloads, and for these using an intelligent filesystem with an
> allocation group design (XFS, JFS) will yield better performance.
>
> The only real benefit of striped RAID over concat, for the majority of
> workloads, is $/GB.
>
>> Of course, we are still waiting to hear a bit about the OP's real load.
>
> It seems clear he has some hardware, real or fantasy, in need of a
> workload, so I'm not holding my breath.
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid5 to utilize upto 8 cores
  2012-08-17 11:47           ` David Brown
@ 2012-08-18  4:55             ` Stan Hoeppner
  2012-08-18  8:59               ` David Brown
  0 siblings, 1 reply; 17+ messages in thread
From: Stan Hoeppner @ 2012-08-18  4:55 UTC (permalink / raw)
  To: David Brown; +Cc: vincent Ferrer, linux-raid

On 8/17/2012 6:47 AM, David Brown wrote:
> On 17/08/2012 12:52, Stan Hoeppner wrote:

> It sounds like I /have/ misunderstood things - at least regarding the
> inode64 allocator (which will surely be the best choice for a large
> array).  I had though that while directories "/a/" and "/b/" get
> different allocation groups, directories "/a/1/" and "/a/2/" would go in
> the same AG as "/a/".  What you are saying is that this is not correct -
> each of these four directories would go in a separate AG.  File "/a/1/x"
> would go in the same AG as "/a/1/", of course.  Assuming this is the
> case, XFS over linear concat sounds more appealing for a much wider set
> of applications than I had previously thought.

I may bear some blame for this.  Long ago I thought the inode64
allocator worked as you describe.  I may have spread that
misinformation.  If so, my apologies to all.

Indeed the inode64 allocator creates new directories evenly across all
AGs in a round robin fashion.  Thus it will work better with most
workloads on storage with some level of concatenation than with straight
striped RAID on rust.  This is due to excessive seeks across all the
AGs, which line up from outer to inner tracks when striping.  With SSD
seek starvation is irrelevant, so concat and striping are pretty much equal.

>> The only real benefit of striped RAID over concat, for the majority of
>> workloads, is $/GB.

To be more clear, I was obviously referring to striped parity RAID here,
not RAID10, which has the same cost as concat+mirror.

-- 
Stan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: raid5 to utilize upto 8 cores
  2012-08-18  4:55             ` Stan Hoeppner
@ 2012-08-18  8:59               ` David Brown
  0 siblings, 0 replies; 17+ messages in thread
From: David Brown @ 2012-08-18  8:59 UTC (permalink / raw)
  To: stan; +Cc: vincent Ferrer, linux-raid

On 18/08/12 06:55, Stan Hoeppner wrote:
> On 8/17/2012 6:47 AM, David Brown wrote:
>> On 17/08/2012 12:52, Stan Hoeppner wrote:
>
>> It sounds like I /have/ misunderstood things - at least regarding the
>> inode64 allocator (which will surely be the best choice for a large
>> array).  I had though that while directories "/a/" and "/b/" get
>> different allocation groups, directories "/a/1/" and "/a/2/" would go in
>> the same AG as "/a/".  What you are saying is that this is not correct -
>> each of these four directories would go in a separate AG.  File "/a/1/x"
>> would go in the same AG as "/a/1/", of course.  Assuming this is the
>> case, XFS over linear concat sounds more appealing for a much wider set
>> of applications than I had previously thought.
>
> I may bear some blame for this.  Long ago I thought the inode64
> allocator worked as you describe.  I may have spread that
> misinformation.  If so, my apologies to all.
>
> Indeed the inode64 allocator creates new directories evenly across all
> AGs in a round robin fashion.

Thanks for clearing this up.  I understand much better now why you are 
such a fan of XFS over concat - spreading all directories around like 
this will give better performance for a much wider set of workloads.

> Thus it will work better with most
> workloads on storage with some level of concatenation than with straight
> striped RAID on rust.  This is due to excessive seeks across all the
> AGs, which line up from outer to inner tracks when striping.  With SSD
> seek starvation is irrelevant, so concat and striping are pretty much equal.
>
>>> The only real benefit of striped RAID over concat, for the majority of
>>> workloads, is $/GB.
>
> To be more clear, I was obviously referring to striped parity RAID here,
> not RAID10, which has the same cost as concat+mirror.
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-08-18  8:59 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-16  2:56 raid5 to utilize upto 8 cores vincent Ferrer
2012-08-16  5:58 ` Stan Hoeppner
2012-08-16  7:03   ` Mikael Abrahamsson
2012-08-16  7:52   ` David Brown
2012-08-16 15:47     ` Flynn
2012-08-17  7:15     ` Stan Hoeppner
2012-08-17  7:29       ` David Brown
2012-08-17 10:52         ` Stan Hoeppner
2012-08-17 11:47           ` David Brown
2012-08-18  4:55             ` Stan Hoeppner
2012-08-18  8:59               ` David Brown
     [not found]   ` <CAEyJA_ungvS_o6dpKL+eghpavRwtY9eaDNCRJF0eUULoC0P6BA@mail.gmail.com>
2012-08-16  8:55     ` Stan Hoeppner
2012-08-16 22:11   ` vincent Ferrer
2012-08-17  7:52     ` David Brown
2012-08-17  8:29     ` Stan Hoeppner
     [not found] ` <CAD9gYJLwuai2kGw1D1wQoK8cOvMOiCCcN3hAY=k_jj0=4og3Vg@mail.gmail.com>
     [not found]   ` <CAEyJA_tGFtN2HMYa=vDV7m9N8thA-6MJ5TFo20X1yEpG3HQWYw@mail.gmail.com>
     [not found]     ` <CAD9gYJK09kRMb_v25uwmG7eRfFQLQyEd4SMXWBSPwYkpP56jcw@mail.gmail.com>
2012-08-16 21:51       ` vincent Ferrer
2012-08-16 22:29         ` Roberto Spadim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).