* Latest on SSD Raid @ 2017-09-29 15:53 Dag Nygren 2017-09-29 16:22 ` Joe Landman 2017-09-30 13:22 ` Matt Garman 0 siblings, 2 replies; 4+ messages in thread From: Dag Nygren @ 2017-09-29 15:53 UTC (permalink / raw) To: linux-raid Hi all! Would like to tap some experience out of all here with the question: Any good hints and advice when setting up a RAID5 SSD with 3 disks to start with? Best Dag ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Latest on SSD Raid 2017-09-29 15:53 Latest on SSD Raid Dag Nygren @ 2017-09-29 16:22 ` Joe Landman 2017-09-30 11:00 ` David Brown 2017-09-30 13:22 ` Matt Garman 1 sibling, 1 reply; 4+ messages in thread From: Joe Landman @ 2017-09-29 16:22 UTC (permalink / raw) To: Dag Nygren, linux-raid On 09/29/2017 11:53 AM, Dag Nygren wrote: > Hi all! > > Would like to tap some experience out of all here > with the question: > > Any good hints and advice when setting up a RAID5 SSD with > 3 disks to start with? You would need to worry about write amplification due to R5. So if you do this, use SSDs with higher DWPD (drive writes per day). Aim for 3DWPD if you can, so you don't burn out the SSDs early. Don't do this with consumer grade SSDs (anything 0.5 DWPD or less). They do burn out (sometimes much) faster. The little extra money spent on the enterprise SATA (or SAS) with higher DWPD is worth it. Precondition the SSDs. If you don't know how, I wrote a nice little util here: https://github.com/joelandman/disk_test_setup that helps you do it ... uses fio to drive 128k seqeuential writes to fill drives. Drive life appears well correlated with preconditioning and write loads. Use a chunk size of 128k or so (larger better). You want the chunk size at the same size as the erase block size. Reduces write amplification. You still have to worry about the whole RMW cycle for RAID5. This means, for small IO (below chunk/erase block size), you have to read-modify-write at least 2 blocks back for every block written. If your writes are small (4k -> 32k) you'll want to invest in even higher quality (e.g. more DWPD). If you can get enough drives, I'd actually recommend a RAID10. Much lower write amplification -> longer lifetime. > > Best > Dag > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Joe Landman e: joe.landman@gmail.com t: @hpcjoe w: https://scalability.org g: https://github.com/joelandman l: https://www.linkedin.com/in/joelandman ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Latest on SSD Raid 2017-09-29 16:22 ` Joe Landman @ 2017-09-30 11:00 ` David Brown 0 siblings, 0 replies; 4+ messages in thread From: David Brown @ 2017-09-30 11:00 UTC (permalink / raw) To: Joe Landman, Dag Nygren, linux-raid On 29/09/17 18:22, Joe Landman wrote: > > On 09/29/2017 11:53 AM, Dag Nygren wrote: >> Hi all! >> >> Would like to tap some experience out of all here >> with the question: >> >> Any good hints and advice when setting up a RAID5 SSD with >> 3 disks to start with? > > You would need to worry about write amplification due to R5. So if you > do this, use SSDs with higher DWPD (drive writes per day). Aim for > 3DWPD if you can, so you don't burn out the SSDs early. Don't do this > with consumer grade SSDs (anything 0.5 DWPD or less). They do burn out > (sometimes much) faster. The little extra money spent on the enterprise > SATA (or SAS) with higher DWPD is worth it. > > Precondition the SSDs. If you don't know how, I wrote a nice little > util here: https://github.com/joelandman/disk_test_setup that helps you > do it ... uses fio to drive 128k seqeuential writes to fill drives. > Drive life appears well correlated with preconditioning and write loads. > > Use a chunk size of 128k or so (larger better). You want the chunk size > at the same size as the erase block size. Reduces write amplification. > > You still have to worry about the whole RMW cycle for RAID5. This > means, for small IO (below chunk/erase block size), you have to > read-modify-write at least 2 blocks back for every block written. If > your writes are small (4k -> 32k) you'll want to invest in even higher > quality (e.g. more DWPD). > > If you can get enough drives, I'd actually recommend a RAID10. Much > lower write amplification -> longer lifetime. > And also much faster. The key point for speed in SSD's is low latency - and RAID5 can add a lot of latency to writes. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Latest on SSD Raid 2017-09-29 15:53 Latest on SSD Raid Dag Nygren 2017-09-29 16:22 ` Joe Landman @ 2017-09-30 13:22 ` Matt Garman 1 sibling, 0 replies; 4+ messages in thread From: Matt Garman @ 2017-09-30 13:22 UTC (permalink / raw) To: Dag Nygren; +Cc: Mdadm On Fri, Sep 29, 2017 at 10:53 AM, Dag Nygren <dag@newtech.fi> wrote: > Would like to tap some experience out of all here > with the question: > > Any good hints and advice when setting up a RAID5 SSD with > 3 disks to start with? As is often (always?) the case, a lot depends on your expected workload. Have now built three systems with 24-disk 2TB SSD RAID6 arrays. We used consumer-grade drives for cost savings, as this is a virtually read-only workload. We do add a small amount of data every day---roughly 50GB, spread across the entire array. The rest of the time it's just read read read read. (Effectively a WORM workload.) In this role, our systems have been great. We found that the network interface was the first bottleneck. Now we've got dual 40gbs ports on these systems. The interfaces are bonded and load balanced, and jumbo frames is a must. We did just a tiny bit of tuning, nothing special (don't remember the details offhand). The only real "gotcha" we ran into in all this is rebuild times. Actually, rebuilds themselves are fast. The problem is, it's a tradeoff between reduced client performance and rebuild time. mdadm allows you to tune how fast the rebuild can go. Although, with a bit of experimentation, I found that mdadm supports multi-threaded rebuilds, and this made a huge improvement in being able to do a rebuild while still serving some data. I suspect enterprise drives, with their generally bigger overprovision space and smarter controllers, would likely fare better on rebuilds. On the flipside, we haven't actually had any drive failures, the rebuilds we did were just "practicing" for what to do and expect when a drive does inevitably fail. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-09-30 13:22 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-09-29 15:53 Latest on SSD Raid Dag Nygren 2017-09-29 16:22 ` Joe Landman 2017-09-30 11:00 ` David Brown 2017-09-30 13:22 ` Matt Garman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox