public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Brian Candler <B.Candler@pobox.com>
To: Joe Landman <landman@scalableinformatics.com>
Cc: xfs@oss.sgi.com
Subject: Re: Performance problem - reads slower than writes
Date: Mon, 6 Feb 2012 10:40:24 +0000	[thread overview]
Message-ID: <20120206104024.GA4975@nsrc.org> (raw)
In-Reply-To: <4F2D98A9.4090709@scalableinformatics.com>

On Sat, Feb 04, 2012 at 03:44:25PM -0500, Joe Landman wrote:
> >Sure it can. A gluster volume consists of "bricks". Each brick is served by
> >a glusterd process listening on a different TCP port. Those bricks can be on
> >the same server or on different servers.
> 
> I seem to remember that the Gluster folks abandoned this model
> (using their code versus MD raid) on single servers due to
> performance issues.  We did play with this a few times, and the
> performance wasn't that good.  Basically limited by single disk
> seek/write speed.

I did raise the same question on the gluster-users list recently and there
seemed to be no clear-cut answer; some people were using Gluster to
aggregate RAID nodes, and some were using it to mirror individual disks
between nodes.

I do like the idea of having individual filesystems per disk, making data
recovery much more straightforward and allowing for efficient
parallelisation.

However I also like the idea of low-level RAID which lets you pop out and
replace a disk invisibly to the higher levels, and is perhaps better
battle-tested than gluster file-level replication.

> RAID in this case can protect you from some of these issues (single
> disk failure issues, being replaced by RAID issues), but unless you
> are building mirror pairs of bricks on separate units, this magical
> "automatic" isn't quite so.

That was the idea: having mirror bricks on different nodes.

server1:/brick1 <-> server2:/brick1
server2:/brick2 <-> server2:/brick2 etc

> Moreover, last I checked, Gluster made no guarantees as to the
> ordering of the layout for mirrors.  So if you have more than one
> brick per node, and build mirror pairs with the "replicate" option,
> you have to check the actual hashing to make sure it did what you
> expect.  Or build up the mirror pairs more carefully.

AFAICS it does guarantee the ordering:
http://download.gluster.com/pub/gluster/glusterfs/3.2/Documentation/AG/html/sect-Administration_Guide--Setting_Volumes-Distributed_Replicated.html

"Note: The number of bricks should be a multiple of the replica count for a
distributed replicated volume. Also, the order in which bricks are specified
has a great effect on data protection. Each replica_count consecutive bricks
in the list you give will form a replica set, with all replica sets combined
into a volume-wide distribute set. To make sure that replica-set members are
not placed on the same node, list the first brick on every server, then the
second brick on every server in the same order, and so on."

> At this point, it sounds like there is a gluster side of this
> discussion that I'd recommend you take to the gluster list.  There
> is an xfs portion as well which is fine here.

Understood. Whatever the final solution looks like, I'm totally sold on XFS.

> Disclosure:  we build/sell/support gluster (and other) based systems
> atop xfs based RAID units (both hardware and software RAID;
> 1,10,6,60,...) so we have inherent biases.

You have also inherent experience, and that is extremely valuable as I try
to pick the best storage model which will work for us going forward.

Regards,

Brian.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2012-02-06 10:40 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-30 22:00 Performance problem - reads slower than writes Brian Candler
2012-01-31  2:05 ` Dave Chinner
2012-01-31 10:31   ` Brian Candler
2012-01-31 14:16     ` Brian Candler
2012-01-31 20:25       ` Dave Chinner
2012-02-01  7:29         ` Stan Hoeppner
2012-02-03 18:47         ` Brian Candler
2012-02-03 19:03           ` Christoph Hellwig
2012-02-03 21:01             ` Brian Candler
2012-02-03 21:17               ` Brian Candler
2012-02-05 22:50                 ` Dave Chinner
2012-02-05 22:43               ` Dave Chinner
2012-01-31 14:52     ` Christoph Hellwig
2012-01-31 21:52       ` Brian Candler
2012-02-01  0:50         ` Raghavendra D Prabhu
2012-02-01  3:59         ` Dave Chinner
2012-02-03 11:54       ` Brian Candler
2012-02-03 19:42         ` Stan Hoeppner
2012-02-03 22:10           ` Brian Candler
2012-02-04  9:59             ` Stan Hoeppner
2012-02-04 11:24               ` Brian Candler
2012-02-04 12:49                 ` Stan Hoeppner
2012-02-04 20:04                   ` Brian Candler
2012-02-04 20:44                     ` Joe Landman
2012-02-06 10:40                       ` Brian Candler [this message]
2012-02-07 17:30                       ` Brian Candler
2012-02-05  5:16                     ` Stan Hoeppner
2012-02-05  9:05                       ` Brian Candler
2012-01-31 20:06     ` Dave Chinner
2012-01-31 21:35       ` Brian Candler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120206104024.GA4975@nsrc.org \
    --to=b.candler@pobox.com \
    --cc=landman@scalableinformatics.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox