linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Adam Goryachev <mailinglists@websitemanagers.com.au>
To: Matt Garman <matthew.garman@gmail.com>,
	Mdadm <linux-raid@vger.kernel.org>
Subject: Re: Thoughts on big SSD arrays?
Date: Mon, 03 Aug 2015 21:38:56 +1000	[thread overview]
Message-ID: <55BF52D0.304@websitemanagers.com.au> (raw)
In-Reply-To: <CAJvUf-AW08pcRkADn4Fgh46_vVAxgofiuB93+pzQYk99hx1nbw@mail.gmail.com>



On 1/08/2015 01:23, Matt Garman wrote:
> I continue to be inspired by the "Dirt Cheap Data Warehouse (DCDW)"
> [3].  SSD are getting bigger and prices are dropping rapidly (2 TB
> SSDs available now for $800).  With our WORM-like workload, I believe
> we can safely get away with consumer drives, as durability shouldn't
> be an issue.
>
> So at this point I'm just putting out a feeler---has anyone out there
> actually built a massive SSD array, using either Linux software raid
> or hardware raid (technically off-topic for this list, though I hope
> the discussion is interesting enough to let it slide).  If so, how big
> of an array (i.e. drives/capacity)?  What was the target versus actual
> performance?  Any particularly challenging issues that came up?
I have been using a 8x 480GB RAID5 linux md array for a iSCSI SAN for a 
number of years, and it worked well after some careful tuning, and 
careful (lucky) hardware selection (ie, motherboard was lucky to have 
the right bandwidth memory/PCI bus/etc).

The main challenge I had was actually with DRBD on top of the array, 
once I disabled the forced writes, then it all worked really well. The 
forced writes were forcing a consistent on disk status for every write 
since the SSD's I use did not have any ability to save the data during a 
power outage.
> FWIW, I'm thinking of something along the lines of a 24-disk chassis,
> with 2 disks for OS (raid1), 2 disks as hot spares, and the remaining
> 20 in raid-6.  The 22 data disks (raid + hot spares) would be 2 TB
> SSDs.
I'm not sure that sounds like a good idea. Personally, I'd probably 
prefer to use 2 x RAID6 arrays at least, but then that is just what 
advice I hear on the list. Using two arrays will also get you more 
parallel processing (use more cpu cores), as I think you are limited to 
one cpu per array.
> The "problem" with SSDs is that they're just so seductive:
> back-of-the-envelope numbers are wonderful, so it's easy to get
> overly-optimistic about builds that use them.  But as with most
> things, the devil's in the details.
I was able to get 2.5GB/s read and 1.5GB/s write with (I think) only 6 
SSD's in RAID5. However, eventually, when I did the correct test to 
match my actual load, that dropped to abysmal values (well under 
100MB/s). The reason is that my live load uses very small read/write 
block size, so there were a massive number of small random read/writes, 
leading to high IOPS. Using large block sizes can deliver massive 
throughput, with very small number of IOPS,

> Off the top of my head, potential issues I can think of:
>
>      - Subtle PCIe latency/timing issues of the motherboard
 From memory, this can include the amount of bandwidth between 
memory/CPU/PCI bus/SATA bus/etc... Including the speed of the RAM as 
just one of the factors. I don't know all the tricky details, but I do 
recall that while the bandwdith looks plenty fast enough at first, the 
data moves over a number of bridges, and sometimes the same bridge more 
than once (eg, the disk interface and network interface might be on the 
same bridge).
>      - High variation in SSD latency
>      - Software stacks still making assumptions based on spinning
> drives (i.e. not adequately tuned for SSDs)
>      - Non-parallel RAID implementation (i.e. single CPU bottleneck potential)
>      - Potential bandwidth bottlenecks at various stages: SATA/SAS
> interface, SAS expander/backplane, SATA/SAS controller (or HBA), PCIe
> bus, CPU memory bus, network card, etc
>      - I forget the exact number, but the DCDW guy told me with Linux
> he was only able to get about 30% of the predicted throughput in his
> SSD array
I got close to the theoretical maximum (from memory), but it depended on 
the actual real life workload. Those theoretical performance values are 
only achieved in "optimal" conditions, real life is often a lot more messy.
>      - Wacky TRIM related issues (seem to be drive dependent)
If you are mostly read then TRIM shouldn't be much of an issue for you.
> Not asking any particular question here, just hoping to start an
> open-ended discussion.  Of course I'd love to hear from anyone with
> actual SSD RAID experience!
>

My experience has been positive. BTW, I'm using the Intel 480GB SSD 
(basically consumer grade 520/530 series). If you want any extra 
information/details, let me know.

Regards,
Adam


      parent reply	other threads:[~2015-08-03 11:38 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-31 15:23 Thoughts on big SSD arrays? Matt Garman
2015-08-01  8:34 ` Pasi Kärkkäinen
2015-08-03 10:52 ` AW: " Markus Stockhausen
2015-08-03 11:38 ` Adam Goryachev [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55BF52D0.304@websitemanagers.com.au \
    --to=mailinglists@websitemanagers.com.au \
    --cc=linux-raid@vger.kernel.org \
    --cc=matthew.garman@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).