From: Brian Candler <B.Candler@pobox.com>
To: Joe Landman <landman@scalableinformatics.com>
Cc: xfs@oss.sgi.com
Subject: Re: Performance problem - reads slower than writes
Date: Mon, 6 Feb 2012 10:40:24 +0000 [thread overview]
Message-ID: <20120206104024.GA4975@nsrc.org> (raw)
In-Reply-To: <4F2D98A9.4090709@scalableinformatics.com>
On Sat, Feb 04, 2012 at 03:44:25PM -0500, Joe Landman wrote:
> >Sure it can. A gluster volume consists of "bricks". Each brick is served by
> >a glusterd process listening on a different TCP port. Those bricks can be on
> >the same server or on different servers.
>
> I seem to remember that the Gluster folks abandoned this model
> (using their code versus MD raid) on single servers due to
> performance issues. We did play with this a few times, and the
> performance wasn't that good. Basically limited by single disk
> seek/write speed.
I did raise the same question on the gluster-users list recently and there
seemed to be no clear-cut answer; some people were using Gluster to
aggregate RAID nodes, and some were using it to mirror individual disks
between nodes.
I do like the idea of having individual filesystems per disk, making data
recovery much more straightforward and allowing for efficient
parallelisation.
However I also like the idea of low-level RAID which lets you pop out and
replace a disk invisibly to the higher levels, and is perhaps better
battle-tested than gluster file-level replication.
> RAID in this case can protect you from some of these issues (single
> disk failure issues, being replaced by RAID issues), but unless you
> are building mirror pairs of bricks on separate units, this magical
> "automatic" isn't quite so.
That was the idea: having mirror bricks on different nodes.
server1:/brick1 <-> server2:/brick1
server2:/brick2 <-> server2:/brick2 etc
> Moreover, last I checked, Gluster made no guarantees as to the
> ordering of the layout for mirrors. So if you have more than one
> brick per node, and build mirror pairs with the "replicate" option,
> you have to check the actual hashing to make sure it did what you
> expect. Or build up the mirror pairs more carefully.
AFAICS it does guarantee the ordering:
http://download.gluster.com/pub/gluster/glusterfs/3.2/Documentation/AG/html/sect-Administration_Guide--Setting_Volumes-Distributed_Replicated.html
"Note: The number of bricks should be a multiple of the replica count for a
distributed replicated volume. Also, the order in which bricks are specified
has a great effect on data protection. Each replica_count consecutive bricks
in the list you give will form a replica set, with all replica sets combined
into a volume-wide distribute set. To make sure that replica-set members are
not placed on the same node, list the first brick on every server, then the
second brick on every server in the same order, and so on."
> At this point, it sounds like there is a gluster side of this
> discussion that I'd recommend you take to the gluster list. There
> is an xfs portion as well which is fine here.
Understood. Whatever the final solution looks like, I'm totally sold on XFS.
> Disclosure: we build/sell/support gluster (and other) based systems
> atop xfs based RAID units (both hardware and software RAID;
> 1,10,6,60,...) so we have inherent biases.
You have also inherent experience, and that is extremely valuable as I try
to pick the best storage model which will work for us going forward.
Regards,
Brian.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2012-02-06 10:40 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-30 22:00 Performance problem - reads slower than writes Brian Candler
2012-01-31 2:05 ` Dave Chinner
2012-01-31 10:31 ` Brian Candler
2012-01-31 14:16 ` Brian Candler
2012-01-31 20:25 ` Dave Chinner
2012-02-01 7:29 ` Stan Hoeppner
2012-02-03 18:47 ` Brian Candler
2012-02-03 19:03 ` Christoph Hellwig
2012-02-03 21:01 ` Brian Candler
2012-02-03 21:17 ` Brian Candler
2012-02-05 22:50 ` Dave Chinner
2012-02-05 22:43 ` Dave Chinner
2012-01-31 14:52 ` Christoph Hellwig
2012-01-31 21:52 ` Brian Candler
2012-02-01 0:50 ` Raghavendra D Prabhu
2012-02-01 3:59 ` Dave Chinner
2012-02-03 11:54 ` Brian Candler
2012-02-03 19:42 ` Stan Hoeppner
2012-02-03 22:10 ` Brian Candler
2012-02-04 9:59 ` Stan Hoeppner
2012-02-04 11:24 ` Brian Candler
2012-02-04 12:49 ` Stan Hoeppner
2012-02-04 20:04 ` Brian Candler
2012-02-04 20:44 ` Joe Landman
2012-02-06 10:40 ` Brian Candler [this message]
2012-02-07 17:30 ` Brian Candler
2012-02-05 5:16 ` Stan Hoeppner
2012-02-05 9:05 ` Brian Candler
2012-01-31 20:06 ` Dave Chinner
2012-01-31 21:35 ` Brian Candler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120206104024.GA4975@nsrc.org \
--to=b.candler@pobox.com \
--cc=landman@scalableinformatics.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.