Re: Ideal hardware spec? - Wido den Hollander

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Wido den Hollander <wido@widodh.nl>
To: Stephen Perkins <perkins@netmass.com>
Cc: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: Ideal hardware spec?
Date: Sat, 25 Aug 2012 13:48:25 +0200	[thread overview]
Message-ID: <5038BB89.9020405@widodh.nl> (raw)
In-Reply-To: <00ae01cd823e$84e2ed20$8ea8c760$@netmass.com>

(CC back to the list)

On 08/24/2012 11:22 PM, Stephen Perkins wrote:
> Hi Wildo,
>
> Why 4 x 1TB?  I get the 4 (many boards seem to have  4 sata connectors so
> you don't need a separate controller).  However... why not 2TB or 3TB
> drives?  Is recover time too large?
>

Yes, due to recovery time mainly. With 4x 1TB I'd loose about 3.2TB of 
data (85% full) at max, that is recoverable for the cluster.

Would I increase that to 2TB or 3TB disks the recovery would indeed get 
harder for the CPU and Memory.

I could have less nodes to get the same amount of storage, but in this 
situation I also get more IOps since I have more spindles running.

> I'm guessing no RAID and one OSD process per disk?
>

Correct. RAID is expensive and the Ceph replication already provides the 
data redundancy here.

> I'm still evaluating your "looking at things differently" to see about a
> bunch of cheap 1Us.
>
> Would your 1Us have redundant power and be redundantly Ethernet connected?
> Or... cheaper single power and single Ethernet (reduced cabling)?
>
> ECC memory?
>

No redundant power, no redundant Ethernet (or switches) and no ECC memory.

I'm quoting here from the CRUSH publication Sage wrote [0]:

"Data safety is of critical importance in large storage systems,
where the large number of devices makes hardware failure
the rule rather than the exception." (4.4 Reliability)

I've been designing by that rule.

I'm relying on CRUSH to do all the redundancy work for me. By 
strategically placing nodes on different power feeds and different 
switches I can mitigate hardware failure. You just have to make sure 
that your CRUSH map resembles your physical layout of your cluster.

Make sure that two copies of your data never end up in the same rack or 
on the same switch.

Wido

[0]: http://ceph.newdream.net/papers/weil-crush-sc06.pdf

> - Steve
>
>
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org
> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Wido den Hollander
> Sent: Friday, August 24, 2012 1:12 PM
> To: Mark Nelson
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: Ideal hardware spec?
>
>
>
> On 08/24/2012 05:05 PM, Mark Nelson wrote:
>>>>
>>>> I'm running Atom D525 (SuperMicro X7SPA-HF) nodes with 4GB of RAM
>>>> and
>>>> 4 2TB
>>> disks and a 80GB SSD (old X25-M) for journaling.
>>>>
>>>> That works, but what I notice is that under heavy recover the Atoms
>>>> can't
>>> cope with it.
>>>>
>>>> I'm thinking about building a couple of nodes with the AMD Brazos
>>> mainboard, somelike like an Asus E35M1-I.
>>>>
>>>> That is not a serverboard, but it would just be a reference to see
>>>> what it
>>> does.
>>>>
>>>> One of the problems with the Atoms is the 4GB memory limitation,
>>>> with the
>>> AMD Brazos you can use 8GB.
>>>>
>>>> I'm trying to figure out a way to have a really large amount of
>>>> small nodes
>>> for a low price to have
>>>> a massive cluster where the impact of loosing one node is very small.
>>>
>>> Given that "massive" is a relative term, I am as well... but I'm also
>>> trying to reduce the footprint (power and space) of that "massive"
>>> cluster.
>>> I also
>>> want to start small (1/2 rack) and scale as needed.
>>
>> If you do end up testing Brazos processes, please post your results!
>> I think it really depends on what kind of performance you are aiming for.
>>    Our stock 2U test boxes have 6-core opterons, and our SC847a has
>> dual 6-core low power Xeon E5s.  At 10GbE+ these are probably going to
>> be pushed pretty hard, especially during recovery.
>>
>
> I'm aiming for a Ceph cluster of a couple of hundred TB consisting out of 5
> or 6 racks full of 1U machines with each 4x 1TB.
>
> Having about ~200 of these nodes all doing not that much work.
>
> If one fails I'd loose 0.5% of my cluster and recovery shouldn't be that
> hard. Assuming here that the node crashes due to hardware failure, not being
> plagued by some Ceph or BTRFS bug cluster-wide :)
>
> Wido
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
>

next prev parent reply	other threads:[~2012-08-25 11:48 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-22 13:55 Ideal hardware spec? Jonathan Proulx
2012-08-22 14:17 ` Wido den Hollander
2012-08-22 14:39   ` Stephen Perkins
2012-08-23  8:24     ` Wido den Hollander
2012-08-24 14:17       ` Stephen Perkins
2012-08-24 14:41         ` Joe Landman
2012-08-24 15:05         ` Mark Nelson
2012-08-24 16:30           ` Sławomir Skowron
2012-08-24 18:12           ` Wido den Hollander
2012-08-24 18:23             ` Mark Nelson
2012-08-27 18:05               ` Stephen Perkins
2012-08-27 22:33                 ` Wido den Hollander
     [not found]             ` <00ae01cd823e$84e2ed20$8ea8c760$@netmass.com>
2012-08-25 11:48               ` Wido den Hollander [this message]
2012-08-24 16:12         ` Tommi Virtanen
2012-08-24 18:09         ` Wido den Hollander
2012-08-22 15:46   ` Jonathan Proulx
2012-08-23  9:59     ` Wido den Hollander
     [not found]       ` <CABYiri_-73UyTKHcHWDZdjqb=rozjraVzxd166NZV2ir53tduA@mail.gmail.com>
2012-08-26 11:15         ` Wido den Hollander
2012-08-26 13:29           ` Mark Nelson
2012-08-22 14:41 ` Mark Nelson
2012-08-28  0:02   ` Curtis C.
2012-08-28  1:18     ` Mark Nelson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5038BB89.9020405@widodh.nl \
    --to=wido@widodh.nl \
    --cc=ceph-devel@vger.kernel.org \
    --cc=perkins@netmass.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.