All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Darcy <jdarcy@redhat.com>
To: Samuel Just <sam.just@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>,
	ceph-devel <ceph-devel@vger.kernel.org>,
	gluster-devel@gluster.org
Subject: Re: RADOS translator for GlusterFS
Date: Mon, 5 May 2014 14:07:44 -0400 (EDT)	[thread overview]
Message-ID: <324933830.809209.1399313264579.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <CA+4uBUavv2d63qn0zY=yfS7KV4c45JH6Dr0PupKa+zTOh2GC1Q@mail.gmail.com>

> It's very important, several kinds of blocking are done at object
> granularity.  Off the top of my head, large objects would cause deep
> scrub and recovery to stall requests for longer.  Elephant objects
> would also be able to skew data distribution.

There are some definite parallels here to discussions we've had in
Gluster-land, which we might as well go through because people from
either "parent" won't have heard the other.  The data distribution
issue has turned out to be a practical non-issue for GlusterFS
users.  Sure, if you have very few "elephant objects" on very few
small-ish bricks (our equivalent of OSDs) then you can get skewed
distribution.  On the other hand, that problem *very* quickly
solves itself for even moderate object and brick counts, to the
point that almost no users have found it useful to enable striping.
Has your experience been different, or do you not know because
striping is mandatory instead of optional?

The "deep scrub and recovery" point brings up a whole different
set of memories.  We used to have a problem in GlusterFS where
self-heal would lock an entire file while it ran, so other access
to that file would be blocked for a long time.  This would cause
VMs to hang, for example.  In either 3.3 or 3.4 (can't remember)
we added "granular self-heal" which would only lock the portion
of the file that was currently under repair, in a sort of rolling
fashion.  From your comment, it sounds like RADOS still locks the
entire object.  Is that correct?  If so, I posit that it's
something we wouldn't need to solve in a prototype.  If/when that
starts turning into something real, then we'd have two options.
One is to do striping as you suggest, which means solving all of
the associated coordination problems.  Another would be to do
something like what GlusterFS did, with locking at the sub-object
level.  That does make repair less atomic, which some would
consider a consistency problem, but we do have some evidence that
it's a violation users don't seem to care about.



  reply	other threads:[~2014-05-05 18:07 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <980181538.654650.1399300829103.JavaMail.zimbra@redhat.com>
2014-05-05 15:21 ` RADOS translator for GlusterFS Jeff Darcy
2014-05-05 15:37   ` Dan van der Ster
2014-05-05 16:39   ` Yehuda Sadeh
2014-05-05 17:08     ` Jeff Darcy
2014-05-05 17:30       ` Samuel Just
2014-05-05 17:38         ` Jeff Darcy
     [not found]           ` <1666953774.790843.1399311496408.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-05-05 17:46             ` Samuel Just
2014-05-05 18:07               ` Jeff Darcy [this message]
2014-05-05 18:23                 ` Samuel Just
     [not found]                 ` <324933830.809209.1399313264579.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-05-05 20:25                   ` Sebastien Ponce
     [not found]   ` <355696287.706122.1399303290204.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-05-05 16:41     ` John Spray
2014-05-05 16:43   ` John Spray

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=324933830.809209.1399313264579.JavaMail.zimbra@redhat.com \
    --to=jdarcy@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=gluster-devel@gluster.org \
    --cc=sam.just@inktank.com \
    --cc=yehuda@inktank.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.