Re: Large numbers of OSD per node

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mark Nelson <mark.nelson@inktank.com>
To: Wido den Hollander <wido@widodh.nl>
Cc: Andrew Thrift <andyonfire@gmail.com>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: Large numbers of OSD per node
Date: Mon, 05 Nov 2012 06:45:25 -0600	[thread overview]
Message-ID: <5097B4E5.8070706@inktank.com> (raw)
In-Reply-To: <50979CA3.3060005@widodh.nl>

On 11/05/2012 05:01 AM, Wido den Hollander wrote:
> Hi,
>
> On 05-11-12 08:14, Andrew Thrift wrote:
>> Hi,
>>
>> We are evaluating CEPH for deployment.
>>
>> I was wondering if there are any current "best practices" around the
>> number of OSD's per node ?
>>
>>
>> e.g. We are looking at deploying 3 nodes, each with 72x SAS disks, and
>> 2x 10gigabit Ethernet bonded.
>>
>> Would this best be configured as 72 OSD's per node.
>>
>> Or would we be better to using raid5 to have 18 OSD's per node ?
>>
>
> You should be aware of a large data movement when using 3 nodes.
>
> I myself am I fan of going with a lot of smaller nodes instead of
> building big nodes.
>
> With 3 such nodes you'd probably be going 2x replication? Otherwise you
> can never recover when one of the 3 nodes completely burns down to the
> ground.
>
> If you have 72 1TB disks in such a node you could in theory be moving
> 72TB, that would put a lot of stress on the other two nodes and you
> would need a lot of memory and CPU power.
>
> You might be better of by going for 27 nodes with 8 disks each, or have
> 18 nodes with 12 disks?
>
> When a node fails the recovery will be much easier on your cluster.
>
> You can also take out a node for maintenance when needed.
>
> Another thing you should be aware of is status "D". What if a filesystem
> inside one of your big machines hangs and one of the OSDs hangs in
> status "D", waiting for I/O which will never come?
>
> You'd be forced to reboot that node and that would again take 72TB of
> data offline.
>
> I am not aware of anybody using such big nodes in production. It could
> work, but you will need a lot of memory and a lot of CPU.
>
> The recommendation is 1GB/1Ghz per OSD, so you'd be looking at at least
> 72GB of memory and 72Ghz of CPU power.
>
> Wido



To echo what Wido is saying here, we've not really extensively tested 
configurations with nodes that big at Inktank either.  The biggest test 
node we have in-house is a 36-drive SC847a, and that was a pretty recent 
acquisition.  Nodes that large are definitely bigger than what most 
people are looking at right now.

For a deployment of the size you are talking about, I think you'd 
probably be better served with 24 disk or less nodes and picking up more 
of them.  You'll likely have better performance and fewer problems if a 
node goes down.  It is lower density, but I think in this case using up 
a few extra U will be worth it.

Having said that, my guess is that if you were to use 72 drive nodes, 
you'd probably be best off doing a raid-5 or raid-6 and doing something 
like 12 6-drive OSDs.  Be mindful of what drives, expanders, and 
controllers you pick.

-- 
Mark Nelson
Performance Engineer
Inktank

next prev parent reply	other threads:[~2012-11-05 12:46 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-05  7:14 Large numbers of OSD per node Andrew Thrift
2012-11-05 11:01 ` Wido den Hollander
2012-11-05 12:45   ` Mark Nelson [this message]
2012-11-06  2:05     ` Andrew Thrift
2012-11-06  9:10       ` Wido den Hollander
2012-11-06  9:36         ` Gandalf Corvotempesta
2012-11-06  9:46           ` Wido den Hollander
2012-11-06 10:20             ` Gandalf Corvotempesta
2012-11-06 10:24             ` Gandalf Corvotempesta
2012-11-06 11:05               ` Stefan Kleijkers
2012-11-06 11:31                 ` Gandalf Corvotempesta
2012-11-06 11:51                   ` Stefan Kleijkers
2012-11-06 12:51                     ` Gandalf Corvotempesta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5097B4E5.8070706@inktank.com \
    --to=mark.nelson@inktank.com \
    --cc=andyonfire@gmail.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=wido@widodh.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.