From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark Nelson Subject: Re: Ideal hardware spec? Date: Fri, 24 Aug 2012 13:23:30 -0500 Message-ID: <5037C6A2.4050403@inktank.com> References: <20120822135530.GB10015@csail.mit.edu> <5034E9F3.10001@widodh.nl> <00d301cd8073$faa0f7e0$efe2e7a0$@netmass.com> <5035E8AB.8090006@widodh.nl> <005b01cd8203$43f6e860$cbe4b920$@netmass.com> <50379830.4000000@inktank.com> <5037C3FB.200@widodh.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mail-iy0-f174.google.com ([209.85.210.174]:47572 "EHLO mail-iy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753582Ab2HXSXg (ORCPT ); Fri, 24 Aug 2012 14:23:36 -0400 Received: by ialo24 with SMTP id o24so3923823ial.19 for ; Fri, 24 Aug 2012 11:23:32 -0700 (PDT) In-Reply-To: <5037C3FB.200@widodh.nl> Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Wido den Hollander Cc: ceph-devel@vger.kernel.org On 08/24/2012 01:12 PM, Wido den Hollander wrote: > > > On 08/24/2012 05:05 PM, Mark Nelson wrote: >>>> >>>> I'm running Atom D525 (SuperMicro X7SPA-HF) nodes with 4GB of RAM and >>>> 4 2TB >>> disks and a 80GB SSD (old X25-M) for journaling. >>>> >>>> That works, but what I notice is that under heavy recover the Atoms >>>> can't >>> cope with it. >>>> >>>> I'm thinking about building a couple of nodes with the AMD Brazos >>> mainboard, somelike like an Asus E35M1-I. >>>> >>>> That is not a serverboard, but it would just be a reference to see >>>> what it >>> does. >>>> >>>> One of the problems with the Atoms is the 4GB memory limitation, with >>>> the >>> AMD Brazos you can use 8GB. >>>> >>>> I'm trying to figure out a way to have a really large amount of small >>>> nodes >>> for a low price to have >>>> a massive cluster where the impact of loosing one node is very small. >>> >>> Given that "massive" is a relative term, I am as well... but I'm also >>> trying >>> to reduce the footprint (power and space) of that "massive" cluster. >>> I also >>> want to start small (1/2 rack) and scale as needed. >> >> If you do end up testing Brazos processes, please post your results! I >> think it really depends on what kind of performance you are aiming for. >> Our stock 2U test boxes have 6-core opterons, and our SC847a has dual >> 6-core low power Xeon E5s. At 10GbE+ these are probably going to be >> pushed pretty hard, especially during recovery. >> > > I'm aiming for a Ceph cluster of a couple of hundred TB consisting out > of 5 or 6 racks full of 1U machines with each 4x 1TB. > > Having about ~200 of these nodes all doing not that much work. > > If one fails I'd loose 0.5% of my cluster and recovery shouldn't be that > hard. Assuming here that the node crashes due to hardware failure, not > being plagued by some Ceph or BTRFS bug cluster-wide :) > > Wido Just based on past experience, I figure the most common causes of failure are going to be drive "failure", and controller failure. Your solution mitigates that by just going with tons of 1U nodes with few drives. I'm hoping we can also mitigate it by skipping expanders and doing no more than 8 drives per controller. It does mean you top out at like 40-48 drives per node max on most server boards. Mark