All of lore.kernel.org
 help / color / mirror / Atom feed
From: Loic Dachary <loic@dachary.org>
To: Samuel Just <sam.just@inktank.com>, Gregory Farnum <greg@inktank.com>
Cc: Ceph Development <ceph-devel@vger.kernel.org>
Subject: Re: Locally repairable code description revisited (was Pyramid ...)
Date: Mon, 09 Jun 2014 23:40:30 +0200	[thread overview]
Message-ID: <539629CE.8090409@dachary.org> (raw)
In-Reply-To: <CA+4uBUbc3PtmLU3GetzQCm5r=SXA8cxq=bsayZOnvLFR2Sja8A@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5250 bytes --]

Sam, Greg,

A simpler proposal is documented at :

    https://github.com/dachary/ceph/commit/ff11902bdc26aa35c70dd2f4d9de31f4cd207519#diff-5518964bc98a094a784ce2d17a5b0cc1R1

which is part of the proposed implementation for locally repairable code

    https://github.com/ceph/ceph/pull/1921

Hopefully it makes sense ;-)

Cheers

On 09/06/2014 22:38, Samuel Just wrote:
> I'm finding that I don't really understand how the LRC specification
> works.  Is there a doc somewhere I can read?
> -Sam
> 
> On Mon, Jun 9, 2014 at 1:18 PM, Gregory Farnum <greg@inktank.com> wrote:
>> On Fri, Jun 6, 2014 at 7:30 AM, Loic Dachary <loic@dachary.org> wrote:
>>> Hi Andreas,
>>>
>>> On 06/06/2014 13:46, Andreas Joachim Peters wrote:> Hi Loic,
>>>> the basic implementation looks very clean.
>>>>
>>>> I have few comments/ideas:
>>>>
>>>> - the reconstruction strategy using the three levels is certainly efficient enough for standard cases but does not guarantee always the minimum decoding (in cases where one layer is not enough to reconstruct) since your third algorithm is just brute-force to reconstruct everything through all layers until we have what we need ...
>>>
>>> The third strategy is indeed brute force. Do you think it is worth changing to be minimal ? It would be nice to quantify the percent of cases it addresses. Do you know how to do that ? It looks like a very small percentage but there is no proof it is small ;-)
>>>
>>>> - the whole LRC configuration actually does not describe the placement - it still looks disconnected from the placement strategy/crush rules ... wouldn't it make sense to have the crush rule implicit in the description or a function to derive it automatically based on the LRC configuration? Maybe you have this already done in another way and I didn't see it ...
>>>
>>> Good catch.
>>>
>>> What about this:
>>>
>>>       "  [ \"_aAAA_aAA_\", \"set choose datacenter 2\","
>>>       "    \"_aXXX_aXX_\" ],"
>>>       "  [ \"b_BBB_____\", \"set choose host 5\","
>>>       "    \"baXXX_____\" ],"
>>>       "  [ \"_____cCCC_\", \"\","
>>>       "    \"baXXXcaXX_\" ],"
>>>       "  [ \"_____DDDDd\", \"\","
>>>       "    \"baXXXcaXXd\" ],"
>>>
>>> Which translates into
>>>
>>> take root
>>> set choose datacenter 2
>>> set choose host 5
>>>
>>> In other words, the ruleset is created by concatenating the strings from the description, without any kind of smart computation. It is up to the person who creates the description to add the ruleset near a description that makes sense. There is going to be minimal checking to make sure the ruleset can actually be used to get the required number of chunks.
>>>
>>> It probably is very difficult and very confusing to automate the generation of the ruleset. If it is implicit rather than explicit as above, the operator will have to somehow understand and learn how it is computed to make sure it does what is desired. With an explicit set of crush rules loosely coupled to chunk mapping, the operator can read the crush documentation instead of guessing.
>>
>> I think I'm missing some context for this discussion (maybe I haven't
>> been reading other threads closely enough); can you discuss this in
>> more detail?
>> Matching up CRUSH rulesets and the EC plugin formulas is very
>> important and demonstrated to be difficult, but I don't really
>> understand what you're suggesting here, which makes me think it's not
>> quite the right idea. ;)
>>
>>>
>>>> -  should the plug-in have the ability to select reconstruction on proximity or this should be up-to the higher layer to provide chunks in a way that reconstruction would select the 'closest' layer? The relevance of the question you will understand better in the next point ....
>>>>
>>>> - I remember we had this 3 data centre example with (8,4) where you can reconstruct every object if 2 data centres are up. Another appealing example avoiding remote access when reading an object is that you have 2 data centres having a replication of e.g. (4,2) encoded objects. Can you describe in your LRC configuration language to store the same chunk twice like    __ABCCBA__ ?
>>>
>>> Unless I'm mistaken that would require the caller of the plugin to support duplicate data chunks and provide a kind of proximity check. Since this is not currently supported by the OSD logic, it is difficult to figure out how an erasure code plugin could provide support for this use case.
>>
>> I haven't looked at the EC plugin interface at all, but I thought the
>> OSD told the plugin what chunks it could access, and the plugin tells
>> it which ones to fetch. So couldn't the plugin simply output duplicate
>> chunks, and not have the OSD retrieve both of them?
>> -Greg
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

      reply	other threads:[~2014-06-09 21:40 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-31 17:10 Pyramid erasure code description revisited Loic Dachary
2014-06-02 12:20 ` Andreas Joachim Peters
2014-06-02 13:14   ` Loic Dachary
     [not found]     ` <1401733713.18379.YahooMailNeo@web165006.mail.bf1.yahoo.com>
2014-06-02 18:49       ` Loic Dachary
2014-06-05 14:05 ` Locally repairable code description revisited (was Pyramid ...) Loic Dachary
2014-06-06 11:46   ` Andreas Joachim Peters
2014-06-06 14:30     ` Loic Dachary
2014-06-09 20:18       ` Gregory Farnum
2014-06-09 20:38         ` Samuel Just
2014-06-09 21:40           ` Loic Dachary [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=539629CE.8090409@dachary.org \
    --to=loic@dachary.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=greg@inktank.com \
    --cc=sam.just@inktank.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.