From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wido den Hollander Subject: Re: Large numbers of OSD per node Date: Mon, 05 Nov 2012 12:01:55 +0100 Message-ID: <50979CA3.3060005@widodh.nl> References: <5097676B.5020200@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from smtp01.mail.pcextreme.nl ([109.72.87.137]:55815 "EHLO smtp01.mail.pcextreme.nl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751157Ab2KELCG (ORCPT ); Mon, 5 Nov 2012 06:02:06 -0500 In-Reply-To: <5097676B.5020200@gmail.com> Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Andrew Thrift Cc: "ceph-devel@vger.kernel.org" Hi, On 05-11-12 08:14, Andrew Thrift wrote: > Hi, > > We are evaluating CEPH for deployment. > > I was wondering if there are any current "best practices" around the > number of OSD's per node ? > > > e.g. We are looking at deploying 3 nodes, each with 72x SAS disks, and > 2x 10gigabit Ethernet bonded. > > Would this best be configured as 72 OSD's per node. > > Or would we be better to using raid5 to have 18 OSD's per node ? > You should be aware of a large data movement when using 3 nodes. I myself am I fan of going with a lot of smaller nodes instead of building big nodes. With 3 such nodes you'd probably be going 2x replication? Otherwise you can never recover when one of the 3 nodes completely burns down to the ground. If you have 72 1TB disks in such a node you could in theory be moving 72TB, that would put a lot of stress on the other two nodes and you would need a lot of memory and CPU power. You might be better of by going for 27 nodes with 8 disks each, or have 18 nodes with 12 disks? When a node fails the recovery will be much easier on your cluster. You can also take out a node for maintenance when needed. Another thing you should be aware of is status "D". What if a filesystem inside one of your big machines hangs and one of the OSDs hangs in status "D", waiting for I/O which will never come? You'd be forced to reboot that node and that would again take 72TB of data offline. I am not aware of anybody using such big nodes in production. It could work, but you will need a lot of memory and a lot of CPU. The recommendation is 1GB/1Ghz per OSD, so you'd be looking at at least 72GB of memory and 72Ghz of CPU power. Wido > > > > Regards, > > > > > > Andrew > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html