Re: Large numbers of OSD per node

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Stefan Kleijkers <stefan@unilogicnetworks.net>
To: Gandalf Corvotempesta <gandalf.corvotempesta@gmail.com>
Cc: Wido den Hollander <wido@widodh.nl>,
	Andrew Thrift <andyonfire@gmail.com>,
	ceph-devel@vger.kernel.org, mark.nelson@inktank.com
Subject: Re: Large numbers of OSD per node
Date: Tue, 06 Nov 2012 12:51:11 +0100	[thread overview]
Message-ID: <5098F9AF.7070905@unilogicnetworks.net> (raw)
In-Reply-To: <CAJH6TXgHs7BVAwrkE0tRvxAq_LgFuHJ0PGb085ErQ=N8FSKoKw@mail.gmail.com>

On 11/06/2012 12:31 PM, Gandalf Corvotempesta wrote:
> 2012/11/6 Stefan Kleijkers <stefan@unilogicnetworks.net>:
>> Well you have to keep in mind that when a node fails the PG's that resided
>> on that node have to be redistributed over all the other nodes. So you begin
>> moving about 1% of the data between all the remaining nodes/osds (coming
>> from an OSD that has the remaining replica of the pg to the new OSD that
>> will get a replica). So you move from and to all the remaining osd's and
>> that will give you a lot of bandwidth and therefor fast recorvery to a
>> consistent state.
> Ok, but in this case, 1% is still 36TB of data.
> There are no difference between 3 nodes with 36TB of data each or 90
> nodes with 36TB of data each.
> In case of a node failure, you always have to move 36TB of data, no
> matter on how many nodes do you have.
>
True, but it's a huge difference if you have to redistribute the 36T 
between 2 remaining nodes or between 89 remaining nodes. And with such a 
few nodes you hit probably a couple of other bottlenecks like CPU power 
per node, networking bandwidth per node, etc... I have noticed this the 
hard way with 3 nodes and 24 disks/osds per node.

Stefan

next prev parent reply	other threads:[~2012-11-06 11:51 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-05  7:14 Large numbers of OSD per node Andrew Thrift
2012-11-05 11:01 ` Wido den Hollander
2012-11-05 12:45   ` Mark Nelson
2012-11-06  2:05     ` Andrew Thrift
2012-11-06  9:10       ` Wido den Hollander
2012-11-06  9:36         ` Gandalf Corvotempesta
2012-11-06  9:46           ` Wido den Hollander
2012-11-06 10:20             ` Gandalf Corvotempesta
2012-11-06 10:24             ` Gandalf Corvotempesta
2012-11-06 11:05               ` Stefan Kleijkers
2012-11-06 11:31                 ` Gandalf Corvotempesta
2012-11-06 11:51                   ` Stefan Kleijkers [this message]
2012-11-06 12:51                     ` Gandalf Corvotempesta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5098F9AF.7070905@unilogicnetworks.net \
    --to=stefan@unilogicnetworks.net \
    --cc=andyonfire@gmail.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=gandalf.corvotempesta@gmail.com \
    --cc=mark.nelson@inktank.com \
    --cc=wido@widodh.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.