Re: What would a good OSD node hardware configuration look like?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dennis Jacobfeuerborn <dennisml@conversis.de>
To: Josh Durgin <josh.durgin@inktank.com>
Cc: ceph-devel@vger.kernel.org
Subject: Re: What would a good OSD node hardware configuration look like?
Date: Tue, 06 Nov 2012 03:49:29 +0100	[thread overview]
Message-ID: <50987AB9.9030905@conversis.de> (raw)
In-Reply-To: <50985677.6090708@inktank.com>

On 11/06/2012 01:14 AM, Josh Durgin wrote:
> On 11/05/2012 09:13 AM, Dennis Jacobfeuerborn wrote:
>> Hi,
>> I'm thinking about building a ceph cluster and I'm wondering what a good
>> configuration would look like for 4-8 (and maybe more) 2HU 8-disk or 3HU
>> 16-disk systems.
>> Would it make sense to make each disk an individual OSD or should I perhaps
>> create several raid-0 and create OSDs from those?
> 
> This mainly depends on your ratio of disks to cpu/ram. Generally we
> recommend 1GB ram and 1Ghz per OSD. If you've got enough cpu/ram,
> running 1 OSD/disk is pretty common. It makes recovering from a
> single disk failure faster.

So basically a 2Ghz quad-core CPU and 8GB RAM would be sufficient for 8 OSDs?

>> Also what is the best setup for the journal? If I understand it correctly
>> then each OSD needs its own journal and that should be a separate disk but
>> that would be quite wasteful it seems. Would it make sense to put in two
>> small SSD disks in a raid-1 configuration and create a filesystem for each
>> OSD journal on it?
> 
> This is certainly possible. It's a bit less overhead if you give each
> osd it's own partition of the ssd(s) instead of going through another
> filesystem.
> 
> I suspect it would be better to not use raid-1, since these ssds will be
> receiving all the data the osds write as well. If they're in raid-1 instead
> of being used independently, their lifetimes might be much
> shorter.

My primary concern here is fault tolerance. What happens when the journal
disk dies? Can ceph cope with that and write directly to the OSDs or would
that mean that with a single shared disk for all OSDs a failure would mean
the entire system is effectively offline for ceph?

>> How does the number of OSDs/Nodes affect the performance of say a single dd
>> operation? Will blocks be distributed over the cluster and written/read in
>> parallel or does the number only improve concurrency rather than benefit
>> single threaded workloads?
> 
> In cephfs and rbd, objects are distributed over the cluster, but the
> OSDs/node ratio doesn't really affect the performance. It's more
> dependent on the workload and striping policy. For example, with
> a small stripe size, small sequential writes will benefit from more
> osds, but the number per node isn't particularly important.

By OSDs/Nodes I really meant "OSDs or nodes" and not the ratio. What I'm
trying to understand is if a) the number of nodes plays a significant role
when it comes to performance (e.g. a 4 node cluster with large disks vs. a
16 node cluster with smaller disks) and b) how much of an impact the number
of OSDs has on the cluster e.g. an 8 node cluster with each node being a
single OSD (with all disks as raid-0) vs. an 8 node cluster with say 64
OSDs (each node with 8 disks as individual OSDs).

What I'm trying to find is a good baseline hardware configuration that
works well with the algorithms and assumptions made by cephs design i.e. if
cepth works better with many smaller OSDs rather than a few larger ones
then that would obviously influence the overall design of the box.

Regards,
  Dennis

next prev parent reply	other threads:[~2012-11-06  2:49 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-05 17:13 What would a good OSD node hardware configuration look like? Dennis Jacobfeuerborn
2012-11-06  0:14 ` Josh Durgin
2012-11-06  2:49   ` Dennis Jacobfeuerborn [this message]
2012-11-06 19:30     ` Josh Durgin
2012-11-07  1:35       ` Dennis Jacobfeuerborn
2012-11-07  7:35         ` Wido den Hollander
2012-11-07  8:17           ` Gandalf Corvotempesta
2012-11-07  8:21             ` Wido den Hollander
2012-11-07  8:29               ` Gandalf Corvotempesta
2012-11-06  7:36   ` Stefan Priebe - Profihost AG

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50987AB9.9030905@conversis.de \
    --to=dennisml@conversis.de \
    --cc=ceph-devel@vger.kernel.org \
    --cc=josh.durgin@inktank.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.