From: Stefan Kleijkers <stefan@unilogicnetworks.net>
To: Gandalf Corvotempesta <gandalf.corvotempesta@gmail.com>
Cc: Wido den Hollander <wido@widodh.nl>,
Andrew Thrift <andyonfire@gmail.com>,
ceph-devel@vger.kernel.org, mark.nelson@inktank.com
Subject: Re: Large numbers of OSD per node
Date: Tue, 06 Nov 2012 12:51:11 +0100 [thread overview]
Message-ID: <5098F9AF.7070905@unilogicnetworks.net> (raw)
In-Reply-To: <CAJH6TXgHs7BVAwrkE0tRvxAq_LgFuHJ0PGb085ErQ=N8FSKoKw@mail.gmail.com>
On 11/06/2012 12:31 PM, Gandalf Corvotempesta wrote:
> 2012/11/6 Stefan Kleijkers <stefan@unilogicnetworks.net>:
>> Well you have to keep in mind that when a node fails the PG's that resided
>> on that node have to be redistributed over all the other nodes. So you begin
>> moving about 1% of the data between all the remaining nodes/osds (coming
>> from an OSD that has the remaining replica of the pg to the new OSD that
>> will get a replica). So you move from and to all the remaining osd's and
>> that will give you a lot of bandwidth and therefor fast recorvery to a
>> consistent state.
> Ok, but in this case, 1% is still 36TB of data.
> There are no difference between 3 nodes with 36TB of data each or 90
> nodes with 36TB of data each.
> In case of a node failure, you always have to move 36TB of data, no
> matter on how many nodes do you have.
>
True, but it's a huge difference if you have to redistribute the 36T
between 2 remaining nodes or between 89 remaining nodes. And with such a
few nodes you hit probably a couple of other bottlenecks like CPU power
per node, networking bandwidth per node, etc... I have noticed this the
hard way with 3 nodes and 24 disks/osds per node.
Stefan
next prev parent reply other threads:[~2012-11-06 11:51 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-05 7:14 Large numbers of OSD per node Andrew Thrift
2012-11-05 11:01 ` Wido den Hollander
2012-11-05 12:45 ` Mark Nelson
2012-11-06 2:05 ` Andrew Thrift
2012-11-06 9:10 ` Wido den Hollander
2012-11-06 9:36 ` Gandalf Corvotempesta
2012-11-06 9:46 ` Wido den Hollander
2012-11-06 10:20 ` Gandalf Corvotempesta
2012-11-06 10:24 ` Gandalf Corvotempesta
2012-11-06 11:05 ` Stefan Kleijkers
2012-11-06 11:31 ` Gandalf Corvotempesta
2012-11-06 11:51 ` Stefan Kleijkers [this message]
2012-11-06 12:51 ` Gandalf Corvotempesta
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5098F9AF.7070905@unilogicnetworks.net \
--to=stefan@unilogicnetworks.net \
--cc=andyonfire@gmail.com \
--cc=ceph-devel@vger.kernel.org \
--cc=gandalf.corvotempesta@gmail.com \
--cc=mark.nelson@inktank.com \
--cc=wido@widodh.nl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.