linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stan Hoeppner <stan@hardwarefreak.com>
To: linux-raid@vger.kernel.org
Subject: Re: Multiple SSDs - RAID-1, -10, or stacked? TRIM?
Date: Thu, 10 Oct 2013 04:15:08 -0500	[thread overview]
Message-ID: <5256701C.3090807@hardwarefreak.com> (raw)
In-Reply-To: <20131009123135.GT1779@bitfolk.com>

On 10/9/2013 7:31 AM, Andy Smith wrote:
> Hello,

Hello Andy.

> Due to increasing load of random read IOPS I am considering using 8
                            ^^^^^^^^^^^^^^^^

The data has to be written before it can be read.  Are you at all
concerned with write throughput, either random or sequential?  Please
read on.

> SSDs and md in my next server, instead of 8 SATA HDDs with
> battery-backed hardware RAID. I am thinking of using Crucial m500s.
> 
> Are there any gotchas to be aware of? I haven't much experience with
> SSDs.

Yes, there is one major gotcha WRT md/RAID and SSDs, which to this point
nobody has mentioned in this thread, possibly because it pertains to
writes, not reads.  Note my question posed to you up above.  Since I've
answered this question in detail at least a dozen times on this mailing
list, I'll simply refer you to one of my recent archived posts for the
details:

http://permalink.gmane.org/gmane.linux.raid/43984

> If these were normal HDDs then (aside from small partitions for
> /boot) I'd just RAID-10 for the main bulk of the storage. Is there
> any reason not to do that with SSDs currently?

The answer to this questions lies behind the link above.

> I think I read somewhere that offline TRIM is only supported by md
> for RAID-1, is that correct? If so, should I be finding a way to use
> four pairs of RAID-1s, or does it not matter?

Yes, but not because of TRIM.  But of course, you already read that in
the gmane post above.  That thread is void of another option I've
written about many a time, which someone attempted to parrot earlier.

Layer an md linear array atop RAID1 pairs, and format it with XFS.  XFS
is unique among Linux filesystems in that it uses what are called
allocation groups.  Take a pie (XFS filesystem atop linear array of 4x
RAID1 SSD pairs) and cut 4 slices (AGs).  That's basically what XFS does
with the blocks of the underlying device.  Now create 4 directories.
Now write four 1GB files, each into one directory, simultaneously.  XFS
just wrote each 1GB file to a different SSD, all in parallel.  If each
SSD can write at 500MB/s, you just achieved 2GB/s throughput, -without-
using a striped array.  No other filesystem can achieve this kind of
throughput without a striped array underneath.  And yes, TRIM will work
with this setup, both DISCARD and batch fitrim.

Allocation groups enable fantastic parallelism in XFS with a linear
array over mirrors, and this setup is perfect for both random write and
read workloads.  But AGs on a linear array can also cause a bottleneck
if the user doesn't do a little planning of directory and data layout.
In the scenario above we have 4 allocation groups, AG0-AG3, each
occupying one SSD.  The first directory you create will be created in
AG0 (SSD0), the 2nd AG1 (SSD1), the 3rd AG2 (SSD2), and the 4th AG3
(SSD3).  The 5th directory will be created on AG0, as well as the 9th,
and so on.  So you should already see the potential problem here.  If
you put all of your files in a single directory, or in multiple
directories that all reside within the same AG, they will all end up on
only one of your 4 SSDs.  Or at least up to the point you run out of
free space, in which case XFS will "spill" new files into the next AG.

To be clear, the need for careful directory/file layout to achieve
parallel throughput pertains only to the linear concatenation storage
architecture described above.  If one is using XFS atop a striped array
then throughput, either sequential or parallel, is -not- limited by
file/dir placement across the AGs, as all AGs are striped across the disks.

-- 
Stan


  parent reply	other threads:[~2013-10-10  9:15 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-09 12:31 Multiple SSDs - RAID-1, -10, or stacked? TRIM? Andy Smith
2013-10-09 13:00 ` Roberto Spadim
2013-10-09 13:27 ` David Brown
2013-10-09 13:52   ` Roberto Spadim
2013-10-09 14:46 ` Ian Pilcher
2013-10-09 16:21   ` David Brown
2013-10-09 17:33     ` Ian Pilcher
2013-10-09 18:04       ` Roberto Spadim
2013-10-09 19:08       ` David Brown
2013-10-09 20:35         ` SSD reliability; was: " Matt Garman
2013-10-09 21:17           ` David Brown
2013-10-09 21:46           ` Brian Candler
2013-10-10  6:14     ` Mikael Abrahamsson
2013-10-10 16:18     ` Art -kwaak- van Breemen
2013-10-10  9:15 ` Stan Hoeppner [this message]
2013-10-10 20:37   ` Andy Smith
2013-10-11  8:30     ` David Brown
2013-10-11  9:37     ` Stan Hoeppner
2013-10-11  8:42 ` David Brown
2013-10-11 11:00   ` Art -kwaak- van Breemen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5256701C.3090807@hardwarefreak.com \
    --to=stan@hardwarefreak.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).