OSD nodes with >=8 spinners, SSD-backed journals, and their performance impact

All of lore.kernel.org
 help / color / mirror / Atom feed

* OSD nodes with >=8 spinners, SSD-backed journals, and their performance impact
@ 2013-01-14 12:17 Florian Haas
  2013-01-14 13:28 ` Tom Lanyon
  2013-01-14 13:46 ` Mark Nelson
  0 siblings, 2 replies; 17+ messages in thread
From: Florian Haas @ 2013-01-14 12:17 UTC (permalink / raw)
  To: ceph-devel

Hi everyone,

we ran into an interesting performance issue on Friday that we were
able to troubleshoot with some help from Greg and Sam (thanks guys),
and in the process realized that there's little guidance around for
how to optimize performance in OSD nodes with lots of spinning disks
(and hence, hosting a relatively large number of OSDs). In that type
of hardware configuration, the usual mantra of "put your OSD journals
on an SSD" doesn't always hold up. So we wrote up some
recommendations, and I'd ask everyone interested to critique this or
provide feedback:

http://www.hastexo.com/resources/hints-and-kinks/solid-state-drives-and-ceph-osd-journals

It's probably easiest to comment directly on that page, but if you
prefer instead to just respond in this thread, that's perfectly fine
too.

For some background of the discussion, please refer to the LogBot log
from #ceph:
http://irclogs.ceph.widodh.nl/index.php?date=2013-01-12

Hope this is useful.

Cheers,
Florian

-- 
Helpful information? Let us know!
http://www.hastexo.com/shoutbox

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: OSD nodes with >=8 spinners, SSD-backed journals, and their performance impact
  2013-01-14 12:17 OSD nodes with >=8 spinners, SSD-backed journals, and their performance impact Florian Haas
@ 2013-01-14 13:28 ` Tom Lanyon
  2013-01-14 13:41   ` Florian Haas
  2013-01-14 13:46 ` Mark Nelson
  1 sibling, 1 reply; 17+ messages in thread
From: Tom Lanyon @ 2013-01-14 13:28 UTC (permalink / raw)
  To: Florian Haas; +Cc: ceph-devel

On 14/01/2013, at 10:47 PM, Florian Haas <florian@hastexo.com> wrote:
<snip>
> http://www.hastexo.com/resources/hints-and-kinks/solid-state-drives-and-ceph-osd-journals
> 
> It's probably easiest to comment directly on that page, but if you
> prefer instead to just respond in this thread, that's perfectly fine
> too.
<snip>


Hi Florian,

Thanks for putting this together.

A couple of minor questions/comments:

* One of the conclusions is to use the SSDs (assuming 2) un-RAIDed, but the article doesn't actually explain why using them in a RAID-1 is a poor idea.

* Should the end of this sentence:
	"Another option is to use, say, one partition on each of your SSD in a RAID for the operating system installation, and then chop up the rest of your SSDs an non-RAIDed Ceph OSDs."

...instead read:

	"Another option is to use, say, one partition on each of your SSD in a RAID for the operating system installation, and then chop up the rest of your SSDs an non-RAIDed Ceph **OSD journals**." ?

Regards,
Tom


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: OSD nodes with >=8 spinners, SSD-backed journals, and their performance impact
  2013-01-14 13:28 ` Tom Lanyon
@ 2013-01-14 13:41   ` Florian Haas
  0 siblings, 0 replies; 17+ messages in thread
From: Florian Haas @ 2013-01-14 13:41 UTC (permalink / raw)
  To: Tom Lanyon; +Cc: ceph-devel

Hi Tom,

On Mon, Jan 14, 2013 at 2:28 PM, Tom Lanyon <tom@netspot.com.au> wrote:
> On 14/01/2013, at 10:47 PM, Florian Haas <florian@hastexo.com> wrote:
> <snip>
>> http://www.hastexo.com/resources/hints-and-kinks/solid-state-drives-and-ceph-osd-journals
>>
>> It's probably easiest to comment directly on that page, but if you
>> prefer instead to just respond in this thread, that's perfectly fine
>> too.
> <snip>
>
>
> Hi Florian,
>
> Thanks for putting this together.

Pleasure!

> A couple of minor questions/comments:
>
> * One of the conclusions is to use the SSDs (assuming 2) un-RAIDed, but the article doesn't actually explain why using them in a RAID-1 is a poor idea.

Added paragraph starting with "putting your journal SSDs in a RAID set
looks like a good idea at first", does that explain the situation
better?

> * Should the end of this sentence:
>         "Another option is to use, say, one partition on each of your SSD in a RAID for the operating system installation, and then chop up the rest of your SSDs an non-RAIDed Ceph OSDs."
>
> ...instead read:
>
>         "Another option is to use, say, one partition on each of your SSD in a RAID for the operating system installation, and then chop up the rest of your SSDs an non-RAIDed Ceph **OSD journals**." ?

Sure. Fixed.

Btw: coming to LCA? If you are, please find me and say hello. :)

Cheers,
Florian

-- 
Helpful information? Let us know!
http://www.hastexo.com/shoutbox

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: OSD nodes with >=8 spinners, SSD-backed journals, and their performance impact
  2013-01-14 12:17 OSD nodes with >=8 spinners, SSD-backed journals, and their performance impact Florian Haas
  2013-01-14 13:28 ` Tom Lanyon
@ 2013-01-14 13:46 ` Mark Nelson
  2013-01-14 14:09   ` Florian Haas
  2013-01-15  9:31   ` Gandalf Corvotempesta
  1 sibling, 2 replies; 17+ messages in thread
From: Mark Nelson @ 2013-01-14 13:46 UTC (permalink / raw)
  To: Florian Haas; +Cc: ceph-devel

On 01/14/2013 06:17 AM, Florian Haas wrote:
> Hi everyone,
>
> we ran into an interesting performance issue on Friday that we were
> able to troubleshoot with some help from Greg and Sam (thanks guys),
> and in the process realized that there's little guidance around for
> how to optimize performance in OSD nodes with lots of spinning disks
> (and hence, hosting a relatively large number of OSDs). In that type
> of hardware configuration, the usual mantra of "put your OSD journals
> on an SSD" doesn't always hold up. So we wrote up some
> recommendations, and I'd ask everyone interested to critique this or
> provide feedback:
>
> http://www.hastexo.com/resources/hints-and-kinks/solid-state-drives-and-ceph-osd-journals
>
> It's probably easiest to comment directly on that page, but if you
> prefer instead to just respond in this thread, that's perfectly fine
> too.
>
> For some background of the discussion, please refer to the LogBot log
> from #ceph:
> http://irclogs.ceph.widodh.nl/index.php?date=2013-01-12
>
> Hope this is useful.
>
> Cheers,
> Florian
>

Hi Florian,

Couple of comments:

"OSDs use a write-ahead mode for local operations: a write hits the 
journal first, and from there is then being copied into the backing 
filestore."

It's probably important to mention that this is true by default only for 
non-btrfs file systems.  See:

http://ceph.com/wiki/OSD_journal

"Thus, for best cluster performance it is crucial that the journal is 
fast, whereas the filestore can be comparatively slow."

This is a bit misleading.  Having a faster journal is helpful when there 
are short bursts of traffic.  So long as the journal doesn't fill up and 
there are periods of inactivity for the data to get flushed, having slow 
filestore disk may be ok.  With lots of traffic, reality eventually 
catches up with you and you've gotta get all of that data flushed out to 
the backing file system.

Have you ever seen ceph performance bouncing around with periods of 
really high throughput followed by periods of really low (or no!) 
throughput?  That's usually the result of having a very fast journal 
paired with a slow data disk.  The journal writes out data very quickly, 
hits it's max ops or max bytes limit, then writes are stalled for a 
period while data in the journal gets flushed out to the data disk.

Another thing to remember is that writes to the journal happen without 
causing a lot of seeks.  Ceph doesn't have to do metadata or dentry 
lookups/writes to write data to the journal.  Because of this, it's been 
my experience that journals are primarily throughput bound rather than 
being random IOPS bound.  Just putting the journals on any old SSD isn't 
enough, you need to choose ones that get really high throughput like the 
Intel S3700s or other high performance models.

"By and large, try to go for a relatively small number of OSDs per node, 
ideally not more than 8. This combined with SSD journals is likely to 
give you the best overall performance."

The advice that I usually give people is that if performance is a big 
concern, try to match filestore disk and journal performance is nearly 
matched.  In my test setup, I use 1 intel 520 SSD to host 3 journals for 
7200rpm enterprise SATA disks.  A 1:4 ratio or even 1:6 ratio may also 
work fine depending on various factors.  So far the limits I've hit with 
very minimal tuning seem to be around 15 spinning disks and 5 SSDs for 
around 1.4GB/s (2.8GB/s including journal writes) to one node.

"If you do go with OSD nodes with a very high number of disks, consider 
dropping the idea of an SSD-based journal. Yes, in this kind of setup 
you might actually do better with journals on the spinners."

If your SSD(s) is/are slow you very well may be better off with putting 
the journals on the same spinning disks as the OSD data.  It's all a 
giant balancing act between write throughput, read throughput, and 
capacity.  If you look closely at the 8 spinning disk vs 6 spinning + 2 
SSD numbers in the argonaut vs bobtail article, you can see some of the 
tradeoffs:

http://ceph.com/uncategorized/argonaut-vs-bobtail-performance-preview/

Mark

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: OSD nodes with >=8 spinners, SSD-backed journals, and their performance impact
  2013-01-14 13:46 ` Mark Nelson
@ 2013-01-14 14:09   ` Florian Haas
  2013-01-14 17:34     ` Gregory Farnum
  2013-01-15  9:31   ` Gandalf Corvotempesta
  1 sibling, 1 reply; 17+ messages in thread
From: Florian Haas @ 2013-01-14 14:09 UTC (permalink / raw)
  To: Mark Nelson; +Cc: ceph-devel

Hi Mark,

thanks for the comments.

On Mon, Jan 14, 2013 at 2:46 PM, Mark Nelson <mark.nelson@inktank.com> wrote:
> Hi Florian,
>
> Couple of comments:
>
> "OSDs use a write-ahead mode for local operations: a write hits the journal
> first, and from there is then being copied into the backing filestore."
>
> It's probably important to mention that this is true by default only for
> non-btrfs file systems.  See:
>
> http://ceph.com/wiki/OSD_journal

I am well aware of that, but I've yet to find a customer (or user)
that's actually willing to entrust a production cluster with several
hundred terabytes of data to btrfs. :) Besides, the whole post is
about whether or not to use dedicated SSD block devices for OSD
journals, and if you're tossing everything into btrfs you've already
made the decision to use in-filestore journals.

> "Thus, for best cluster performance it is crucial that the journal is fast,
> whereas the filestore can be comparatively slow."
>
> This is a bit misleading.  Having a faster journal is helpful when there are
> short bursts of traffic.  So long as the journal doesn't fill up and there
> are periods of inactivity for the data to get flushed, having slow filestore
> disk may be ok.  With lots of traffic, reality eventually catches up with
> you and you've gotta get all of that data flushed out to the backing file
> system.

I agree that the wording is non-optimal. What I meant was to equate
"fast" with SSDs, and "comparatively slow" with spinners. And to
combine spinners with SSDs is one of the most interesting points about
Ceph in terms of cost effectiveness. Pretty much every other storage
technology would require you to either go all-SSD or to look into
rather sophisticated HSM in order to achieve similar performance at a
comparable scale.

Suggestions for better wording?

> Have you ever seen ceph performance bouncing around with periods of really
> high throughput followed by periods of really low (or no!) throughput?
> That's usually the result of having a very fast journal paired with a slow
> data disk.  The journal writes out data very quickly, hits it's max ops or
> max bytes limit, then writes are stalled for a period while data in the
> journal gets flushed out to the data disk.

Sure, essentially the equivalent, on a different level, of an NFS
server with lots of RAM and a high vm.dirty_ratio suddenly doing a
massive writeout.

> Another thing to remember is that writes to the journal happen without
> causing a lot of seeks.  Ceph doesn't have to do metadata or dentry
> lookups/writes to write data to the journal.  Because of this, it's been my
> experience that journals are primarily throughput bound rather than being
> random IOPS bound.  Just putting the journals on any old SSD isn't enough,
> you need to choose ones that get really high throughput like the Intel
> S3700s or other high performance models.

Yup.

> "By and large, try to go for a relatively small number of OSDs per node,
> ideally not more than 8. This combined with SSD journals is likely to give
> you the best overall performance."
>
> The advice that I usually give people is that if performance is a big
> concern, try to match filestore disk and journal performance is nearly
> matched.  In my test setup, I use 1 intel 520 SSD to host 3 journals for
> 7200rpm enterprise SATA disks.  A 1:4 ratio or even 1:6 ratio may also work
> fine depending on various factors.  So far the limits I've hit with very
> minimal tuning seem to be around 15 spinning disks and 5 SSDs for around
> 1.4GB/s (2.8GB/s including journal writes) to one node.

Yes, I realize that there's no hard number here. I could also have put
"ideally not more than 6". The point I was trying to make is that
people need to get off their thinking of what an ideal storage box is,
and that more disks per host isn't necessarily better. We had a user
in #ceph last week thinking that an OSD node with 36 spinners was a
stellar idea. It probably isn't.

> "If you do go with OSD nodes with a very high number of disks, consider
> dropping the idea of an SSD-based journal. Yes, in this kind of setup you
> might actually do better with journals on the spinners."
>
> If your SSD(s) is/are slow you very well may be better off with putting the
> journals on the same spinning disks as the OSD data.  It's all a giant
> balancing act between write throughput, read throughput, and capacity.

And people generally prefer simple heuristics (a.k.a. rules of thumb)
over giant balancing acts. So I think if we tell them something like,

Got more that 8 spinners?
* No? Toss your journals on SSDs,
* Yes? At least consider not to.

... then I am hoping that will lead more people on the right path,
than when we tell them:

* Here's two dozen performance graphs, a pivot table, and a crystal ball.

I am obviously jesting and exaggerating, but you get my point. :)

Cheers,
Florian

-- 
Helpful information? Let us know!
http://www.hastexo.com/shoutbox

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: OSD nodes with >=8 spinners, SSD-backed journals, and their performance impact
  2013-01-14 14:09   ` Florian Haas
@ 2013-01-14 17:34     ` Gregory Farnum
  2013-01-14 20:17       ` Florian Haas
  0 siblings, 1 reply; 17+ messages in thread
From: Gregory Farnum @ 2013-01-14 17:34 UTC (permalink / raw)
  To: Florian Haas; +Cc: Mark Nelson, ceph-devel@vger.kernel.org

On Mon, Jan 14, 2013 at 6:09 AM, Florian Haas <florian@hastexo.com> wrote:
> Hi Mark,
>
> thanks for the comments.
>
> On Mon, Jan 14, 2013 at 2:46 PM, Mark Nelson <mark.nelson@inktank.com> wrote:
>> Hi Florian,
>>
>> Couple of comments:
>>
>> "OSDs use a write-ahead mode for local operations: a write hits the journal
>> first, and from there is then being copied into the backing filestore."
>>
>> It's probably important to mention that this is true by default only for
>> non-btrfs file systems.  See:
>>
>> http://ceph.com/wiki/OSD_journal
>
> I am well aware of that, but I've yet to find a customer (or user)
> that's actually willing to entrust a production cluster with several
> hundred terabytes of data to btrfs. :) Besides, the whole post is
> about whether or not to use dedicated SSD block devices for OSD
> journals, and if you're tossing everything into btrfs you've already
> made the decision to use in-filestore journals.

That is absolutely not the case. btrfs works just fine with an
external journal on SSD or whatever else; what made you think
otherwise?
-Greg

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: OSD nodes with >=8 spinners, SSD-backed journals, and their performance impact
  2013-01-14 17:34     ` Gregory Farnum
@ 2013-01-14 20:17       ` Florian Haas
  0 siblings, 0 replies; 17+ messages in thread
From: Florian Haas @ 2013-01-14 20:17 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel@vger.kernel.org

On 01/14/2013 06:34 PM, Gregory Farnum wrote:
> On Mon, Jan 14, 2013 at 6:09 AM, Florian Haas <florian@hastexo.com> wrote:
>> Hi Mark,
>>
>> thanks for the comments.
>>
>> On Mon, Jan 14, 2013 at 2:46 PM, Mark Nelson <mark.nelson@inktank.com> wrote:
>>> Hi Florian,
>>>
>>> Couple of comments:
>>>
>>> "OSDs use a write-ahead mode for local operations: a write hits the journal
>>> first, and from there is then being copied into the backing filestore."
>>>
>>> It's probably important to mention that this is true by default only for
>>> non-btrfs file systems.  See:
>>>
>>> http://ceph.com/wiki/OSD_journal
>>
>> I am well aware of that, but I've yet to find a customer (or user)
>> that's actually willing to entrust a production cluster with several
>> hundred terabytes of data to btrfs. :) Besides, the whole post is
>> about whether or not to use dedicated SSD block devices for OSD
>> journals, and if you're tossing everything into btrfs you've already
>> made the decision to use in-filestore journals.
> 
> That is absolutely not the case. btrfs works just fine with an
> external journal on SSD or whatever else; what made you think
> otherwise?

A misunderstanding on my part. Also, I was overly broad in my comment.
What I really meant to say was that if I'm using a btrfs filestore, and
a separate dedicated block device for the journal, then the journaling
mode is write-ahead and not parallel.

Which was a wrong assumption on my part, as an external journal combined
with a btrfs filestore seems to support parallel journaling mode just
fine. For some reason I had supposed the journal had to be in the same
btrfs as the filestore for this to work.

Sorry for the confusion.

Cheers,
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: OSD nodes with >=8 spinners, SSD-backed journals, and their performance impact
  2013-01-14 13:46 ` Mark Nelson
  2013-01-14 14:09   ` Florian Haas
@ 2013-01-15  9:31   ` Gandalf Corvotempesta
  2013-01-15 17:46     ` Mark Nelson
  1 sibling, 1 reply; 17+ messages in thread
From: Gandalf Corvotempesta @ 2013-01-15  9:31 UTC (permalink / raw)
  To: Mark Nelson; +Cc: Florian Haas, ceph-devel

2013/1/14 Mark Nelson <mark.nelson@inktank.com>:
> The advice that I usually give people is that if performance is a big
> concern, try to match filestore disk and journal performance is nearly
> matched.  In my test setup, I use 1 intel 520 SSD to host 3 journals for
> 7200rpm enterprise SATA disks.  A 1:4 ratio or even 1:6 ratio may also work
> fine depending on various factors.  So far the limits I've hit with very
> minimal tuning seem to be around 15 spinning disks and 5 SSDs for around
> 1.4GB/s (2.8GB/s including journal writes) to one node.

Is this a ratio based on OSDs/SSDs or on OSDs/Server ?
For example, a server with 12 spinning disks plus 2 internal SSD should be used
for 12 OSDs. The first 6 will have journal on SSD1, the latest 6 will use SSD2.

In this case, ratio is 1:6, right?
Or are you referring to the whole server?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: OSD nodes with >=8 spinners, SSD-backed journals, and their performance impact
  2013-01-15  9:31   ` Gandalf Corvotempesta
@ 2013-01-15 17:46     ` Mark Nelson
  2013-01-15 21:24       ` Gandalf Corvotempesta
  0 siblings, 1 reply; 17+ messages in thread
From: Mark Nelson @ 2013-01-15 17:46 UTC (permalink / raw)
  To: Gandalf Corvotempesta; +Cc: Florian Haas, ceph-devel

On 01/15/2013 03:31 AM, Gandalf Corvotempesta wrote:
> 2013/1/14 Mark Nelson <mark.nelson@inktank.com>:
>> The advice that I usually give people is that if performance is a big
>> concern, try to match filestore disk and journal performance is nearly
>> matched.  In my test setup, I use 1 intel 520 SSD to host 3 journals for
>> 7200rpm enterprise SATA disks.  A 1:4 ratio or even 1:6 ratio may also work
>> fine depending on various factors.  So far the limits I've hit with very
>> minimal tuning seem to be around 15 spinning disks and 5 SSDs for around
>> 1.4GB/s (2.8GB/s including journal writes) to one node.
>
> Is this a ratio based on OSDs/SSDs or on OSDs/Server ?
> For example, a server with 12 spinning disks plus 2 internal SSD should be used
> for 12 OSDs. The first 6 will have journal on SSD1, the latest 6 will use SSD2.
>
> In this case, ratio is 1:6, right?
> Or are you referring to the whole server?
>

OSDS/SSD I think if I understand you correctly. Basically you just want 
to match OSD and Journal (and network) throughput to be relatively 
similar.  That's not always easy given budgets and whether you are more 
interested in sequential throughput or IOPs.

I think the 12 bay supermicro 2U "A" chassis with 12 spinning disks, 
10GbE, and two controllers is potentially a really nice balanced 
combination.  You could go cheap controllers plus 2 fast SSDs (like 
400-500MB/s seq) or more expensive controllers with WB cache and just 
use the 12 spinning disks for data and journals (and keep the rear 
drives for OS/logs/etc).  Either could potentially be good solutions 
with slightly different performance characteristics.

Of course if you are going all out for IOPs, you probably would be 
looking at a somewhat different kind of solution anyway.

Mark

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: OSD nodes with >=8 spinners, SSD-backed journals, and their performance impact
  2013-01-15 17:46     ` Mark Nelson
@ 2013-01-15 21:24       ` Gandalf Corvotempesta
  2013-01-15 21:40         ` Mark Nelson
  0 siblings, 1 reply; 17+ messages in thread
From: Gandalf Corvotempesta @ 2013-01-15 21:24 UTC (permalink / raw)
  To: Mark Nelson; +Cc: Florian Haas, ceph-devel

2013/1/15 Mark Nelson <mark.nelson@inktank.com>:
> I think the 12 bay supermicro 2U "A" chassis with 12 spinning disks, 10GbE,
> and two controllers is potentially a really nice balanced combination.

Which chassis are your referring to ? I don't see any 2U with 12 front
and rear spinning disk from supermicro. Only 3U with rear disks

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: OSD nodes with >=8 spinners, SSD-backed journals, and their performance impact
  2013-01-15 21:24       ` Gandalf Corvotempesta
@ 2013-01-15 21:40         ` Mark Nelson
  2013-01-15 21:58           ` Gandalf Corvotempesta
  2013-01-18 18:54           ` Simon Leinen
  0 siblings, 2 replies; 17+ messages in thread
From: Mark Nelson @ 2013-01-15 21:40 UTC (permalink / raw)
  To: Gandalf Corvotempesta; +Cc: Florian Haas, ceph-devel

On 01/15/2013 03:24 PM, Gandalf Corvotempesta wrote:
> 2013/1/15 Mark Nelson <mark.nelson@inktank.com>:
>> I think the 12 bay supermicro 2U "A" chassis with 12 spinning disks, 10GbE,
>> and two controllers is potentially a really nice balanced combination.
>
> Which chassis are your referring to ? I don't see any 2U with 12 front
> and rear spinning disk from supermicro. Only 3U with rear disks
>

http://www.supermicro.com/products/chassis/2U/826/SC826BA-R920LP.cfm

That chassis has 12 3.5" bays in front, with 2 2.5" bays in back.  an 
interesting setup could be 12 spinning disks, with 2 very fast SSDs used 
for journals and OS.  Would need to test it first, and not sure I like 
putting the OS on the journal drives.  A more modest setup would be just 
to have the 12 spinning disks for data and journals and the rear drives 
for the OS.

Mark

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: OSD nodes with >=8 spinners, SSD-backed journals, and their performance impact
  2013-01-15 21:40         ` Mark Nelson
@ 2013-01-15 21:58           ` Gandalf Corvotempesta
  2013-01-16  7:41             ` Stefan Priebe - Profihost AG
  2013-01-18 18:54           ` Simon Leinen
  1 sibling, 1 reply; 17+ messages in thread
From: Gandalf Corvotempesta @ 2013-01-15 21:58 UTC (permalink / raw)
  To: Mark Nelson; +Cc: Florian Haas, ceph-devel

2013/1/15 Mark Nelson <mark.nelson@inktank.com>:
> http://www.supermicro.com/products/chassis/2U/826/SC826BA-R920LP.cfm
>
> That chassis has 12 3.5" bays in front, with 2 2.5" bays in back.  an
> interesting setup could be 12 spinning disks, with 2 very fast SSDs used for
> journals and OS.  Would need to test it first, and not sure I like putting
> the OS on the journal drives.  A more modest setup would be just to have the
> 12 spinning disks for data and journals and the rear drives for the OS.

Thank you. We have planned the same with a DELL R515 12+2 disks.
OS should be loaded from a RAID1 partitions from 2 spinning disks.

Or even no raid, in case of failure, the whole node is replicated by ceph.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: OSD nodes with >=8 spinners, SSD-backed journals, and their performance impact
  2013-01-15 21:58           ` Gandalf Corvotempesta
@ 2013-01-16  7:41             ` Stefan Priebe - Profihost AG
  2013-01-16 17:31               ` Gandalf Corvotempesta
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Priebe - Profihost AG @ 2013-01-16  7:41 UTC (permalink / raw)
  To: Gandalf Corvotempesta
  Cc: Mark Nelson, Florian Haas, ceph-devel@vger.kernel.org

Why do you use 3,5' at all instead of 2,5?

Am 15.01.2013 um 22:58 schrieb Gandalf Corvotempesta <gandalf.corvotempesta@gmail.com>:

> 2013/1/15 Mark Nelson <mark.nelson@inktank.com>:
>> http://www.supermicro.com/products/chassis/2U/826/SC826BA-R920LP.cfm
>> 
>> That chassis has 12 3.5" bays in front, with 2 2.5" bays in back.  an
>> interesting setup could be 12 spinning disks, with 2 very fast SSDs used for
>> journals and OS.  Would need to test it first, and not sure I like putting
>> the OS on the journal drives.  A more modest setup would be just to have the
>> 12 spinning disks for data and journals and the rear drives for the OS.
> 
> Thank you. We have planned the same with a DELL R515 12+2 disks.
> OS should be loaded from a RAID1 partitions from 2 spinning disks.
> 
> Or even no raid, in case of failure, the whole node is replicated by ceph.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: OSD nodes with >=8 spinners, SSD-backed journals, and their performance impact
  2013-01-16  7:41             ` Stefan Priebe - Profihost AG
@ 2013-01-16 17:31               ` Gandalf Corvotempesta
  0 siblings, 0 replies; 17+ messages in thread
From: Gandalf Corvotempesta @ 2013-01-16 17:31 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: Mark Nelson, Florian Haas, ceph-devel@vger.kernel.org

2013/1/16 Stefan Priebe - Profihost AG <s.priebe@profihost.ag>:
> Why do you use 3,5' at all instead of 2,5?

Why should I use 2.5'' ?
3.5'' are cheaper and bigger.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: OSD nodes with >=8 spinners, SSD-backed journals, and their performance impact
  2013-01-15 21:40         ` Mark Nelson
  2013-01-15 21:58           ` Gandalf Corvotempesta
@ 2013-01-18 18:54           ` Simon Leinen
  2013-01-18 23:48             ` Gandalf Corvotempesta
  1 sibling, 1 reply; 17+ messages in thread
From: Simon Leinen @ 2013-01-18 18:54 UTC (permalink / raw)
  To: ceph-devel

Mark Nelson writes:
> http://www.supermicro.com/products/chassis/2U/826/SC826BA-R920LP.cfm

> That chassis has 12 3.5" bays in front, with 2 2.5" bays in back.

FWIW, we have a few Quanta S210-X22RQ[1], which are very similar - 2U,
12 3.5" bays in front, 2*2.5 in back.  We'll try a similar configuration
to what you propose, though with only 8 spinning disks (3TB WD RED) and
one SSD (256GB Samsung 840 pro).  2*10GE, 2*Xeon E5-2630, 128GB RAM.

> an interesting setup could be 12 spinning disks, with 2 very fast SSDs
> used for journals and OS.  Would need to test it first, and not sure I
> like putting the OS on the journal drives.  A more modest setup would
> be just to have the 12 spinning disks for data and journals and the
> rear drives for the OS.
-- 
Simon.

[1] http://www.quantaqct.com/en/01_product/02_detail.php?mid=27&sid=145&id=147&qs=82


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: OSD nodes with >=8 spinners, SSD-backed journals, and their performance impact
  2013-01-18 18:54           ` Simon Leinen
@ 2013-01-18 23:48             ` Gandalf Corvotempesta
  2013-01-19  8:18               ` Simon Leinen
  0 siblings, 1 reply; 17+ messages in thread
From: Gandalf Corvotempesta @ 2013-01-18 23:48 UTC (permalink / raw)
  To: Simon Leinen; +Cc: ceph-devel

2013/1/18 Simon Leinen <simon.leinen@switch.ch>:
> FWIW, we have a few Quanta S210-X22RQ[1], which are very similar - 2U,
> 12 3.5" bays in front, 2*2.5 in back.  We'll try a similar configuration
> to what you propose, though with only 8 spinning disks (3TB WD RED) and
> one SSD (256GB Samsung 840 pro).  2*10GE, 2*Xeon E5-2630, 128GB RAM.

128GB RAM?
What will you do with this ceph node?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: OSD nodes with >=8 spinners, SSD-backed journals, and their performance impact
  2013-01-18 23:48             ` Gandalf Corvotempesta
@ 2013-01-19  8:18               ` Simon Leinen
  0 siblings, 0 replies; 17+ messages in thread
From: Simon Leinen @ 2013-01-19  8:18 UTC (permalink / raw)
  To: Gandalf Corvotempesta; +Cc: ceph-devel

Gandalf Corvotempesta writes:
> 2013/1/18 Simon Leinen <simon.leinen@switch.ch>:
>> FWIW, we have a few Quanta S210-X22RQ[1], which are very similar - 2U,
>> 12 3.5" bays in front, 2*2.5 in back.  We'll try a similar configuration
>> to what you propose, though with only 8 spinning disks (3TB WD RED) and
>> one SSD (256GB Samsung 840 pro).  2*10GE, 2*Xeon E5-2630, 128GB RAM.

> 128GB RAM?
> What will you do with this ceph node?

The idea is to also run OpenStack (Nova) on them.
(I know this is contentious. :-)
-- 
Simon.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-01-19  8:27 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-14 12:17 OSD nodes with >=8 spinners, SSD-backed journals, and their performance impact Florian Haas
2013-01-14 13:28 ` Tom Lanyon
2013-01-14 13:41   ` Florian Haas
2013-01-14 13:46 ` Mark Nelson
2013-01-14 14:09   ` Florian Haas
2013-01-14 17:34     ` Gregory Farnum
2013-01-14 20:17       ` Florian Haas
2013-01-15  9:31   ` Gandalf Corvotempesta
2013-01-15 17:46     ` Mark Nelson
2013-01-15 21:24       ` Gandalf Corvotempesta
2013-01-15 21:40         ` Mark Nelson
2013-01-15 21:58           ` Gandalf Corvotempesta
2013-01-16  7:41             ` Stefan Priebe - Profihost AG
2013-01-16 17:31               ` Gandalf Corvotempesta
2013-01-18 18:54           ` Simon Leinen
2013-01-18 23:48             ` Gandalf Corvotempesta
2013-01-19  8:18               ` Simon Leinen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.