linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stan Hoeppner <stan@hardwarefreak.com>
To: vincent Ferrer <vincentchicago1@gmail.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: raid5 to utilize upto 8 cores
Date: Thu, 16 Aug 2012 00:58:48 -0500	[thread overview]
Message-ID: <502C8C18.5070501@hardwarefreak.com> (raw)
In-Reply-To: <CAEyJA_sSzmgcK6miWfxC4jgvbH7WJ_hgZhnmFABaK9r7X=SLDQ@mail.gmail.com>

On 8/15/2012 9:56 PM, vincent Ferrer wrote:

> - My  storage server  has upto 8 cores  running linux kernel 2.6.32.27.
> - I created  a raid5 device of  10  SSDs .
> -  It seems  I only have single raid5 kernel thread,  limiting  my
> WRITE  throughput  to single cpu  core/thread.

The single write threads of md/RAID5/6/10 are being addressed by patches
in development.  Read the list archives for progress/status.  There were
3 posts to the list today regarding the RAID5 patch.

> Question :   What are my options to make  my raid5 thread use all the
> CPU cores ?
>                   My SSDs  can do much more but  single raid5 thread
> from mdadm   is becoming the bottleneck.
> 
> To overcome above single-thread-raid5 limitation (for now)  I  re-configured.
>      1)  I partitioned  all  my  10 SSDs into 8  partitions:
>      2)  I created  8   raid5 threads. Each raid5 thread having
> partition from each of the 8 SSDs
>      3)  My WRITE performance   quadrupled  because I have 8 RAID5 threads.
> Question: Is this workaround a   normal practice  or may give me
> maintenance problems later on.

No it is not normal practice.  I 'preach' against it regularly when I
see OPs doing it.  It's quite insane.  The glaring maintenance problem
is that when one SSD fails, and at least one will, you'll have 8 arrays
to rebuild vs one.  This may be acceptable to you, but not to the
general population.  With rust drives, and real workloads, it tends to
hammer the drive heads prodigiously, increasing latency and killing
performance, and decreasing drive life.  That's not an issue with SSD,
but multiple rebuilds is.  That and simply keeping track of 80 partitions.

There are a couple of sane things you can do today to address your problem:

1.  Create a RAID50, a layered md/RAID0 over two 5 SSD md/RAID5 arrays.
 This will double your threads and your IOPS.  It won't be as fast as
your Frankenstein setup and you'll lose one SSD of capacity to
additional parity.  However, it's sane, stable, doubles your
performance, and you have only one array to rebuild after an SSD
failure.  Any filesystem will work well with it, including XFS if
aligned properly.  It gives you an easy upgrade path-- as soon as the
threaded patches hit, a simple kernel upgrade will give your two RAID5
arrays the extra threads, so you're simply out one SSD of capacity.  You
won't need to, and probably won't want to rebuild the entire thing after
the patch.  With the Frankenstein setup you'll be destroying and
rebuilding arrays.  And if these are consumer grade SSDs, you're much
better off having two drives worth of redundancy anyway, so a RAID50
makes good sense all around.

2.  Make 5 md/RAID1 mirrors and concatenate them with md/RAID linear.
You'll get one md write thread per RAID1 device utilizing 5 cores in
parallel.  The linear driver doesn't use threads, but passes offsets to
the block layer, allowing infinite core scaling.  Format the linear
device with XFS and mount with inode64.  XFS has been fully threaded for
15 years.  Its allocation group design along with the inode64 allocator
allows near linear parallel scaling across a concatenated device[1],
assuming your workload/directory layout is designed for parallel file
throughput.

#2, with a parallel write workload, may be competitive with your
Frankenstein setup in both IOPS and throughput, even with 3 fewer RAID
threads and 4 fewer SSD "spindles".  It will outrun the RAID50 setup
like it's standing still.  You'll lose half your capacity to redundancy
as with RAID10, but you'll have 5 write threads for md/RAID1, one per
SSD pair.  One core should be plenty to drive a single SSD mirror, with
plenty of cycles to spare for actual applications, while sparing 3 cores
for apps as well.  You'll get unlimited core scaling with both md/linear
and XFS.  This setup will yield the best balance of IOPS and throughput
performance for the amount of cycles burned on IO, compared to
Frankenstein and the RAID50.

[1] If you are one of the uneducated masses who believe dd gives an
accurate measure of storage performance, then ignore option #2.  Such a
belief would indicate you thoroughly lack understanding of storage
workloads, and thus you will be greatly disappointed with the dd numbers
this configuration will give you.

-- 
Stan


  reply	other threads:[~2012-08-16  5:58 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-16  2:56 raid5 to utilize upto 8 cores vincent Ferrer
2012-08-16  5:58 ` Stan Hoeppner [this message]
2012-08-16  7:03   ` Mikael Abrahamsson
2012-08-16  7:52   ` David Brown
2012-08-16 15:47     ` Flynn
2012-08-17  7:15     ` Stan Hoeppner
2012-08-17  7:29       ` David Brown
2012-08-17 10:52         ` Stan Hoeppner
2012-08-17 11:47           ` David Brown
2012-08-18  4:55             ` Stan Hoeppner
2012-08-18  8:59               ` David Brown
     [not found]   ` <CAEyJA_ungvS_o6dpKL+eghpavRwtY9eaDNCRJF0eUULoC0P6BA@mail.gmail.com>
2012-08-16  8:55     ` Stan Hoeppner
2012-08-16 22:11   ` vincent Ferrer
2012-08-17  7:52     ` David Brown
2012-08-17  8:29     ` Stan Hoeppner
     [not found] ` <CAD9gYJLwuai2kGw1D1wQoK8cOvMOiCCcN3hAY=k_jj0=4og3Vg@mail.gmail.com>
     [not found]   ` <CAEyJA_tGFtN2HMYa=vDV7m9N8thA-6MJ5TFo20X1yEpG3HQWYw@mail.gmail.com>
     [not found]     ` <CAD9gYJK09kRMb_v25uwmG7eRfFQLQyEd4SMXWBSPwYkpP56jcw@mail.gmail.com>
2012-08-16 21:51       ` vincent Ferrer
2012-08-16 22:29         ` Roberto Spadim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=502C8C18.5070501@hardwarefreak.com \
    --to=stan@hardwarefreak.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=vincentchicago1@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).