All of lore.kernel.org
 help / color / mirror / Atom feed
From: Loic Dachary <loic@dachary.org>
To: Sage Weil <sage@inktank.com>
Cc: Ceph Development <ceph-devel@vger.kernel.org>
Subject: Re: Review request : Erasure Code plugin loader implementation
Date: Mon, 19 Aug 2013 17:06:59 +0200	[thread overview]
Message-ID: <52123493.80803@dachary.org> (raw)
In-Reply-To: <alpine.DEB.2.00.1308181654560.1479@cobra.newdream.net>

[-- Attachment #1: Type: text/plain, Size: 5662 bytes --]



On 19/08/2013 02:01, Sage Weil wrote:
> On Sun, 18 Aug 2013, Loic Dachary wrote:
>> Hi Sage,
>>
>> Unless I misunderstood something ( which is still possible at this stage ;-) decode() is used both for recovery of missing chunks and retrieval of the original buffer. Decoding the M data chunks is a special case of decoding N <= M chunks out of the M+K chunks that were produced by encode(). It can be used to recover parity chunks as well as data chunks.
>>
>> https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api
>>
>>     map<int, buffer> decode(const set<int> &want_to_read, const map<int, buffer> &chunks)
>>
>>     decode chunks to read the content of the want_to_read chunks and return a map associating the chunk number with its decoded content. For instance, in the simplest case M=2,K=1 for an encoded payload of data A and B with parity Z, calling
>>
>>     decode([1,2], { 1 => 'A', 2 => 'B', 3 => 'Z' })
>>     => { 1 => 'A', 2 => 'B' }
>>
>>     If however, the chunk B is to be read but is missing it will be:
>>
>>     decode([2], { 1 => 'A', 3 => 'Z' })
>>     => { 2 => 'B' }
> 
> Ah, I guess this works when some of the chunks contain the original 
> data (as with a parity code).  There are codes that don't work that way, 
> although I suspect we won't use them.
> 
> Regardless, I wonder if we should generalize slightly and have some 
> methods work in terms of (offset,length) of the original stripe to 
> generalize that bit.  Then we would have something like
> 
>      map<int, buffer> transcode(const set<int> &want_to_read, const map<int, 
>             buffer>& chunks);
> 
> to go from chunks -> chunks (as we would want to do with, say, a LRC-like 
> code where we can rebuild some shards from a subset of the other shards).  
> And then also have
> 
>      int decode(const map<int, buffer>& chunks, unsigned offset, 
>          unsigned len, bufferlist *out);

This function would be implemented more or less as:

  set<int> want_to_read = range_to_chunks(offset, len) // compute what chunks must be retrieved
  set<int> available = the up set
  set<int> minimum = minimum_to_decode(want_to_read, available);
  map<int, buffer> available_chunks = retrieve_chunks_from_osds(minimum);
  map<int, buffer> chunks = transcode(want_to_read, available_chunks); // repairs if necessary
  out = bufferptr(concat_chunks(chunks), offset - offset of the first chunk, len)

or do you have something else in mind ?

> 
> that recovers the original data.
> 
> In our case, the read path would use decode, and for recovery we would use 
> transcode.  
> 
> We'd also want to have alternate minimum_to_decode* methods, like
> 
>     virtual set<int> minimum_to_decode(unsigned offset, unsigned len, const 
>          set<int> &available_chunks) = 0;

I also have a convenience wrapper in mind for this but I feel I'm missing something.

Cheers

> 
> What do you think?
> 
> sage
> 
> 
> 
> 
>>
>> Cheers
>>
>> On 18/08/2013 19:34, Sage Weil wrote:
>>> On Sun, 18 Aug 2013, Loic Dachary wrote:
>>>> Hi Ceph,
>>>>
>>>> I've implemented a draft of the Erasure Code plugin loader in the context of http://tracker.ceph.com/issues/5878. It has a trivial unit test and an example plugin. It would be great if someone could do a quick review. The general idea is that the erasure code pool calls something like:
>>>>
>>>> ErasureCodePlugin::factory(&erasure_code, "example", parameters)
>>>>
>>>> as shown at
>>>>
>>>> https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/test/osd/TestErasureCode.cc#L28
>>>>
>>>> to get an object implementing the interface
>>>>
>>>> https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/osd/ErasureCodeInterface.h
>>>>
>>>> which matches the proposal described at
>>>>
>>>> https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api
>>>>
>>>> The draft is at
>>>>
>>>> https://github.com/ceph/ceph/commit/5a2b1d66ae17b78addc14fee68c73985412f3c8c
>>>>
>>>> Thanks in advance :-)
>>>
>>> I haven't been following this discussion too closely, but taking a look 
>>> now, the first 3 make sense, but
>>>
>>>     virtual map<int, bufferptr> decode(const set<int> &want_to_read, const 
>>> map<int, bufferptr> &chunks) = 0;
>>>
>>> it seems like this one should be more like
>>>
>>>     virtual int decode(const map<int, bufferptr> &chunks, bufferlist *out);
>>>
>>> As in, you'd decode the chunks you have to get the actual data.  If you 
>>> want to get (missing) chunks for recovery, you'd do
>>>
>>>   minimum_to_decode(...);  // see what we need
>>>   <fetch those chunks from other nodes>
>>>   decode(...);   // reconstruct original buffer
>>>   encode(...);   // encode missing chunks from original data
>>>
>>> sage
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>> -- 
>> Lo?c Dachary, Artisan Logiciel Libre
>> All that is necessary for the triumph of evil is that good people do nothing.
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 261 bytes --]

  reply	other threads:[~2013-08-19 15:07 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-18 16:19 Review request : Erasure Code plugin loader implementation Loic Dachary
2013-08-18 17:34 ` Sage Weil
2013-08-18 20:05   ` Loic Dachary
2013-08-19  0:01     ` Sage Weil
2013-08-19 15:06       ` Loic Dachary [this message]
2013-08-19 16:19         ` Sage Weil
2013-08-20 11:32         ` Loic Dachary
2013-08-19  0:24     ` Sage Weil
2013-08-19 13:27       ` Loic Dachary

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52123493.80803@dachary.org \
    --to=loic@dachary.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=sage@inktank.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.