* Review request : Erasure Code plugin loader implementation
@ 2013-08-18 16:19 Loic Dachary
2013-08-18 17:34 ` Sage Weil
0 siblings, 1 reply; 9+ messages in thread
From: Loic Dachary @ 2013-08-18 16:19 UTC (permalink / raw)
To: Ceph Development
[-- Attachment #1: Type: text/plain, Size: 1068 bytes --]
Hi Ceph,
I've implemented a draft of the Erasure Code plugin loader in the context of http://tracker.ceph.com/issues/5878. It has a trivial unit test and an example plugin. It would be great if someone could do a quick review. The general idea is that the erasure code pool calls something like:
ErasureCodePlugin::factory(&erasure_code, "example", parameters)
as shown at
https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/test/osd/TestErasureCode.cc#L28
to get an object implementing the interface
https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/osd/ErasureCodeInterface.h
which matches the proposal described at
https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api
The draft is at
https://github.com/ceph/ceph/commit/5a2b1d66ae17b78addc14fee68c73985412f3c8c
Thanks in advance :-)
--
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 261 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Review request : Erasure Code plugin loader implementation
2013-08-18 16:19 Review request : Erasure Code plugin loader implementation Loic Dachary
@ 2013-08-18 17:34 ` Sage Weil
2013-08-18 20:05 ` Loic Dachary
0 siblings, 1 reply; 9+ messages in thread
From: Sage Weil @ 2013-08-18 17:34 UTC (permalink / raw)
To: Loic Dachary; +Cc: Ceph Development
On Sun, 18 Aug 2013, Loic Dachary wrote:
> Hi Ceph,
>
> I've implemented a draft of the Erasure Code plugin loader in the context of http://tracker.ceph.com/issues/5878. It has a trivial unit test and an example plugin. It would be great if someone could do a quick review. The general idea is that the erasure code pool calls something like:
>
> ErasureCodePlugin::factory(&erasure_code, "example", parameters)
>
> as shown at
>
> https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/test/osd/TestErasureCode.cc#L28
>
> to get an object implementing the interface
>
> https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/osd/ErasureCodeInterface.h
>
> which matches the proposal described at
>
> https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api
>
> The draft is at
>
> https://github.com/ceph/ceph/commit/5a2b1d66ae17b78addc14fee68c73985412f3c8c
>
> Thanks in advance :-)
I haven't been following this discussion too closely, but taking a look
now, the first 3 make sense, but
virtual map<int, bufferptr> decode(const set<int> &want_to_read, const
map<int, bufferptr> &chunks) = 0;
it seems like this one should be more like
virtual int decode(const map<int, bufferptr> &chunks, bufferlist *out);
As in, you'd decode the chunks you have to get the actual data. If you
want to get (missing) chunks for recovery, you'd do
minimum_to_decode(...); // see what we need
<fetch those chunks from other nodes>
decode(...); // reconstruct original buffer
encode(...); // encode missing chunks from original data
sage
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Review request : Erasure Code plugin loader implementation
2013-08-18 17:34 ` Sage Weil
@ 2013-08-18 20:05 ` Loic Dachary
2013-08-19 0:01 ` Sage Weil
2013-08-19 0:24 ` Sage Weil
0 siblings, 2 replies; 9+ messages in thread
From: Loic Dachary @ 2013-08-18 20:05 UTC (permalink / raw)
To: Sage Weil; +Cc: Ceph Development
[-- Attachment #1: Type: text/plain, Size: 3211 bytes --]
Hi Sage,
Unless I misunderstood something ( which is still possible at this stage ;-) decode() is used both for recovery of missing chunks and retrieval of the original buffer. Decoding the M data chunks is a special case of decoding N <= M chunks out of the M+K chunks that were produced by encode(). It can be used to recover parity chunks as well as data chunks.
https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api
map<int, buffer> decode(const set<int> &want_to_read, const map<int, buffer> &chunks)
decode chunks to read the content of the want_to_read chunks and return a map associating the chunk number with its decoded content. For instance, in the simplest case M=2,K=1 for an encoded payload of data A and B with parity Z, calling
decode([1,2], { 1 => 'A', 2 => 'B', 3 => 'Z' })
=> { 1 => 'A', 2 => 'B' }
If however, the chunk B is to be read but is missing it will be:
decode([2], { 1 => 'A', 3 => 'Z' })
=> { 2 => 'B' }
Cheers
On 18/08/2013 19:34, Sage Weil wrote:
> On Sun, 18 Aug 2013, Loic Dachary wrote:
>> Hi Ceph,
>>
>> I've implemented a draft of the Erasure Code plugin loader in the context of http://tracker.ceph.com/issues/5878. It has a trivial unit test and an example plugin. It would be great if someone could do a quick review. The general idea is that the erasure code pool calls something like:
>>
>> ErasureCodePlugin::factory(&erasure_code, "example", parameters)
>>
>> as shown at
>>
>> https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/test/osd/TestErasureCode.cc#L28
>>
>> to get an object implementing the interface
>>
>> https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/osd/ErasureCodeInterface.h
>>
>> which matches the proposal described at
>>
>> https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api
>>
>> The draft is at
>>
>> https://github.com/ceph/ceph/commit/5a2b1d66ae17b78addc14fee68c73985412f3c8c
>>
>> Thanks in advance :-)
>
> I haven't been following this discussion too closely, but taking a look
> now, the first 3 make sense, but
>
> virtual map<int, bufferptr> decode(const set<int> &want_to_read, const
> map<int, bufferptr> &chunks) = 0;
>
> it seems like this one should be more like
>
> virtual int decode(const map<int, bufferptr> &chunks, bufferlist *out);
>
> As in, you'd decode the chunks you have to get the actual data. If you
> want to get (missing) chunks for recovery, you'd do
>
> minimum_to_decode(...); // see what we need
> <fetch those chunks from other nodes>
> decode(...); // reconstruct original buffer
> encode(...); // encode missing chunks from original data
>
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 261 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Review request : Erasure Code plugin loader implementation
2013-08-18 20:05 ` Loic Dachary
@ 2013-08-19 0:01 ` Sage Weil
2013-08-19 15:06 ` Loic Dachary
2013-08-19 0:24 ` Sage Weil
1 sibling, 1 reply; 9+ messages in thread
From: Sage Weil @ 2013-08-19 0:01 UTC (permalink / raw)
To: Loic Dachary; +Cc: Ceph Development
On Sun, 18 Aug 2013, Loic Dachary wrote:
> Hi Sage,
>
> Unless I misunderstood something ( which is still possible at this stage ;-) decode() is used both for recovery of missing chunks and retrieval of the original buffer. Decoding the M data chunks is a special case of decoding N <= M chunks out of the M+K chunks that were produced by encode(). It can be used to recover parity chunks as well as data chunks.
>
> https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api
>
> map<int, buffer> decode(const set<int> &want_to_read, const map<int, buffer> &chunks)
>
> decode chunks to read the content of the want_to_read chunks and return a map associating the chunk number with its decoded content. For instance, in the simplest case M=2,K=1 for an encoded payload of data A and B with parity Z, calling
>
> decode([1,2], { 1 => 'A', 2 => 'B', 3 => 'Z' })
> => { 1 => 'A', 2 => 'B' }
>
> If however, the chunk B is to be read but is missing it will be:
>
> decode([2], { 1 => 'A', 3 => 'Z' })
> => { 2 => 'B' }
Ah, I guess this works when some of the chunks contain the original
data (as with a parity code). There are codes that don't work that way,
although I suspect we won't use them.
Regardless, I wonder if we should generalize slightly and have some
methods work in terms of (offset,length) of the original stripe to
generalize that bit. Then we would have something like
map<int, buffer> transcode(const set<int> &want_to_read, const map<int,
buffer>& chunks);
to go from chunks -> chunks (as we would want to do with, say, a LRC-like
code where we can rebuild some shards from a subset of the other shards).
And then also have
int decode(const map<int, buffer>& chunks, unsigned offset,
unsigned len, bufferlist *out);
that recovers the original data.
In our case, the read path would use decode, and for recovery we would use
transcode.
We'd also want to have alternate minimum_to_decode* methods, like
virtual set<int> minimum_to_decode(unsigned offset, unsigned len, const
set<int> &available_chunks) = 0;
What do you think?
sage
>
> Cheers
>
> On 18/08/2013 19:34, Sage Weil wrote:
> > On Sun, 18 Aug 2013, Loic Dachary wrote:
> >> Hi Ceph,
> >>
> >> I've implemented a draft of the Erasure Code plugin loader in the context of http://tracker.ceph.com/issues/5878. It has a trivial unit test and an example plugin. It would be great if someone could do a quick review. The general idea is that the erasure code pool calls something like:
> >>
> >> ErasureCodePlugin::factory(&erasure_code, "example", parameters)
> >>
> >> as shown at
> >>
> >> https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/test/osd/TestErasureCode.cc#L28
> >>
> >> to get an object implementing the interface
> >>
> >> https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/osd/ErasureCodeInterface.h
> >>
> >> which matches the proposal described at
> >>
> >> https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api
> >>
> >> The draft is at
> >>
> >> https://github.com/ceph/ceph/commit/5a2b1d66ae17b78addc14fee68c73985412f3c8c
> >>
> >> Thanks in advance :-)
> >
> > I haven't been following this discussion too closely, but taking a look
> > now, the first 3 make sense, but
> >
> > virtual map<int, bufferptr> decode(const set<int> &want_to_read, const
> > map<int, bufferptr> &chunks) = 0;
> >
> > it seems like this one should be more like
> >
> > virtual int decode(const map<int, bufferptr> &chunks, bufferlist *out);
> >
> > As in, you'd decode the chunks you have to get the actual data. If you
> > want to get (missing) chunks for recovery, you'd do
> >
> > minimum_to_decode(...); // see what we need
> > <fetch those chunks from other nodes>
> > decode(...); // reconstruct original buffer
> > encode(...); // encode missing chunks from original data
> >
> > sage
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
> --
> Lo?c Dachary, Artisan Logiciel Libre
> All that is necessary for the triumph of evil is that good people do nothing.
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Review request : Erasure Code plugin loader implementation
2013-08-18 20:05 ` Loic Dachary
2013-08-19 0:01 ` Sage Weil
@ 2013-08-19 0:24 ` Sage Weil
2013-08-19 13:27 ` Loic Dachary
1 sibling, 1 reply; 9+ messages in thread
From: Sage Weil @ 2013-08-19 0:24 UTC (permalink / raw)
To: Loic Dachary; +Cc: Ceph Development
Hi Loic,
One other thought on http://tracker.ceph.com/issues/5878:
The user interface there would let you adjust various parameters of the
pool's erasure coding scheme after the pool is created. As a practical
matter, I suspect that many/most of these fields will be specified exactly
once (at pool creation time) and will be immutable properties of the pool
after that. The m/k at a minimum need to match up with what we are
requesting out of crush. And once there is data stored, I don't think it
will make sense to be able to change the encoding scheme for new objects
and still be able to deal with old objects. (Or maybe it will be, if the
code metadata is in the object_info_t.)
Even if we do support changing some of these on the fly, though, I suspect
the most important interface, and the first we implement, will be
something like
ceph osd pool create <name> [key=value ...]
the various parameters listed, like EC algorithm, m, k, and pg_num. We
can probably generalize the mon command interface to have a key/value list
type that will make this easy to plumb from the CLI (and trivial via
ceph-rest-api).
sage
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Review request : Erasure Code plugin loader implementation
2013-08-19 0:24 ` Sage Weil
@ 2013-08-19 13:27 ` Loic Dachary
0 siblings, 0 replies; 9+ messages in thread
From: Loic Dachary @ 2013-08-19 13:27 UTC (permalink / raw)
To: Sage Weil; +Cc: Ceph Development
[-- Attachment #1: Type: text/plain, Size: 1712 bytes --]
Hi Sage,
This makes a lot more sense indeed. I updated the http://tracker.ceph.com/issues/5878 description accordingly.
ceph osd pool create poolname erasure-code-dir=/var/lib/ceph/erasure-code
erasure-code-plugin=jerasure erasure-code-m=10 erasure-code-k=3 erasure-code-algorithm=Reed-Solomon
Thanks :-)
On 19/08/2013 02:24, Sage Weil wrote:
> Hi Loic,
>
> One other thought on http://tracker.ceph.com/issues/5878:
>
> The user interface there would let you adjust various parameters of the
> pool's erasure coding scheme after the pool is created. As a practical
> matter, I suspect that many/most of these fields will be specified exactly
> once (at pool creation time) and will be immutable properties of the pool
> after that. The m/k at a minimum need to match up with what we are
> requesting out of crush. And once there is data stored, I don't think it
> will make sense to be able to change the encoding scheme for new objects
> and still be able to deal with old objects. (Or maybe it will be, if the
> code metadata is in the object_info_t.)
>
> Even if we do support changing some of these on the fly, though, I suspect
> the most important interface, and the first we implement, will be
> something like
>
> ceph osd pool create <name> [key=value ...]
>
> the various parameters listed, like EC algorithm, m, k, and pg_num. We
> can probably generalize the mon command interface to have a key/value list
> type that will make this easy to plumb from the CLI (and trivial via
> ceph-rest-api).
>
> sage
>
--
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 261 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Review request : Erasure Code plugin loader implementation
2013-08-19 0:01 ` Sage Weil
@ 2013-08-19 15:06 ` Loic Dachary
2013-08-19 16:19 ` Sage Weil
2013-08-20 11:32 ` Loic Dachary
0 siblings, 2 replies; 9+ messages in thread
From: Loic Dachary @ 2013-08-19 15:06 UTC (permalink / raw)
To: Sage Weil; +Cc: Ceph Development
[-- Attachment #1: Type: text/plain, Size: 5662 bytes --]
On 19/08/2013 02:01, Sage Weil wrote:
> On Sun, 18 Aug 2013, Loic Dachary wrote:
>> Hi Sage,
>>
>> Unless I misunderstood something ( which is still possible at this stage ;-) decode() is used both for recovery of missing chunks and retrieval of the original buffer. Decoding the M data chunks is a special case of decoding N <= M chunks out of the M+K chunks that were produced by encode(). It can be used to recover parity chunks as well as data chunks.
>>
>> https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api
>>
>> map<int, buffer> decode(const set<int> &want_to_read, const map<int, buffer> &chunks)
>>
>> decode chunks to read the content of the want_to_read chunks and return a map associating the chunk number with its decoded content. For instance, in the simplest case M=2,K=1 for an encoded payload of data A and B with parity Z, calling
>>
>> decode([1,2], { 1 => 'A', 2 => 'B', 3 => 'Z' })
>> => { 1 => 'A', 2 => 'B' }
>>
>> If however, the chunk B is to be read but is missing it will be:
>>
>> decode([2], { 1 => 'A', 3 => 'Z' })
>> => { 2 => 'B' }
>
> Ah, I guess this works when some of the chunks contain the original
> data (as with a parity code). There are codes that don't work that way,
> although I suspect we won't use them.
>
> Regardless, I wonder if we should generalize slightly and have some
> methods work in terms of (offset,length) of the original stripe to
> generalize that bit. Then we would have something like
>
> map<int, buffer> transcode(const set<int> &want_to_read, const map<int,
> buffer>& chunks);
>
> to go from chunks -> chunks (as we would want to do with, say, a LRC-like
> code where we can rebuild some shards from a subset of the other shards).
> And then also have
>
> int decode(const map<int, buffer>& chunks, unsigned offset,
> unsigned len, bufferlist *out);
This function would be implemented more or less as:
set<int> want_to_read = range_to_chunks(offset, len) // compute what chunks must be retrieved
set<int> available = the up set
set<int> minimum = minimum_to_decode(want_to_read, available);
map<int, buffer> available_chunks = retrieve_chunks_from_osds(minimum);
map<int, buffer> chunks = transcode(want_to_read, available_chunks); // repairs if necessary
out = bufferptr(concat_chunks(chunks), offset - offset of the first chunk, len)
or do you have something else in mind ?
>
> that recovers the original data.
>
> In our case, the read path would use decode, and for recovery we would use
> transcode.
>
> We'd also want to have alternate minimum_to_decode* methods, like
>
> virtual set<int> minimum_to_decode(unsigned offset, unsigned len, const
> set<int> &available_chunks) = 0;
I also have a convenience wrapper in mind for this but I feel I'm missing something.
Cheers
>
> What do you think?
>
> sage
>
>
>
>
>>
>> Cheers
>>
>> On 18/08/2013 19:34, Sage Weil wrote:
>>> On Sun, 18 Aug 2013, Loic Dachary wrote:
>>>> Hi Ceph,
>>>>
>>>> I've implemented a draft of the Erasure Code plugin loader in the context of http://tracker.ceph.com/issues/5878. It has a trivial unit test and an example plugin. It would be great if someone could do a quick review. The general idea is that the erasure code pool calls something like:
>>>>
>>>> ErasureCodePlugin::factory(&erasure_code, "example", parameters)
>>>>
>>>> as shown at
>>>>
>>>> https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/test/osd/TestErasureCode.cc#L28
>>>>
>>>> to get an object implementing the interface
>>>>
>>>> https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/osd/ErasureCodeInterface.h
>>>>
>>>> which matches the proposal described at
>>>>
>>>> https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api
>>>>
>>>> The draft is at
>>>>
>>>> https://github.com/ceph/ceph/commit/5a2b1d66ae17b78addc14fee68c73985412f3c8c
>>>>
>>>> Thanks in advance :-)
>>>
>>> I haven't been following this discussion too closely, but taking a look
>>> now, the first 3 make sense, but
>>>
>>> virtual map<int, bufferptr> decode(const set<int> &want_to_read, const
>>> map<int, bufferptr> &chunks) = 0;
>>>
>>> it seems like this one should be more like
>>>
>>> virtual int decode(const map<int, bufferptr> &chunks, bufferlist *out);
>>>
>>> As in, you'd decode the chunks you have to get the actual data. If you
>>> want to get (missing) chunks for recovery, you'd do
>>>
>>> minimum_to_decode(...); // see what we need
>>> <fetch those chunks from other nodes>
>>> decode(...); // reconstruct original buffer
>>> encode(...); // encode missing chunks from original data
>>>
>>> sage
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>> --
>> Lo?c Dachary, Artisan Logiciel Libre
>> All that is necessary for the triumph of evil is that good people do nothing.
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 261 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Review request : Erasure Code plugin loader implementation
2013-08-19 15:06 ` Loic Dachary
@ 2013-08-19 16:19 ` Sage Weil
2013-08-20 11:32 ` Loic Dachary
1 sibling, 0 replies; 9+ messages in thread
From: Sage Weil @ 2013-08-19 16:19 UTC (permalink / raw)
To: Loic Dachary; +Cc: Ceph Development
On Mon, 19 Aug 2013, Loic Dachary wrote:
>
>
> On 19/08/2013 02:01, Sage Weil wrote:
> > On Sun, 18 Aug 2013, Loic Dachary wrote:
> >> Hi Sage,
> >>
> >> Unless I misunderstood something ( which is still possible at this stage ;-) decode() is used both for recovery of missing chunks and retrieval of the original buffer. Decoding the M data chunks is a special case of decoding N <= M chunks out of the M+K chunks that were produced by encode(). It can be used to recover parity chunks as well as data chunks.
> >>
> >> https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api
> >>
> >> map<int, buffer> decode(const set<int> &want_to_read, const map<int, buffer> &chunks)
> >>
> >> decode chunks to read the content of the want_to_read chunks and return a map associating the chunk number with its decoded content. For instance, in the simplest case M=2,K=1 for an encoded payload of data A and B with parity Z, calling
> >>
> >> decode([1,2], { 1 => 'A', 2 => 'B', 3 => 'Z' })
> >> => { 1 => 'A', 2 => 'B' }
> >>
> >> If however, the chunk B is to be read but is missing it will be:
> >>
> >> decode([2], { 1 => 'A', 3 => 'Z' })
> >> => { 2 => 'B' }
> >
> > Ah, I guess this works when some of the chunks contain the original
> > data (as with a parity code). There are codes that don't work that way,
> > although I suspect we won't use them.
> >
> > Regardless, I wonder if we should generalize slightly and have some
> > methods work in terms of (offset,length) of the original stripe to
> > generalize that bit. Then we would have something like
> >
> > map<int, buffer> transcode(const set<int> &want_to_read, const map<int,
> > buffer>& chunks);
> >
> > to go from chunks -> chunks (as we would want to do with, say, a LRC-like
> > code where we can rebuild some shards from a subset of the other shards).
> > And then also have
> >
> > int decode(const map<int, buffer>& chunks, unsigned offset,
> > unsigned len, bufferlist *out);
>
> This function would be implemented more or less as:
>
> set<int> want_to_read = range_to_chunks(offset, len) // compute what chunks must be retrieved
> set<int> available = the up set
> set<int> minimum = minimum_to_decode(want_to_read, available);
> map<int, buffer> available_chunks = retrieve_chunks_from_osds(minimum);
> map<int, buffer> chunks = transcode(want_to_read, available_chunks); // repairs if necessary
> out = bufferptr(concat_chunks(chunks), offset - offset of the first chunk, len)
>
> or do you have something else in mind ?
This makes sense. I am still wondering if it is worth generalizing this a
bit further to codes without a nice mapping of a range -> want_to_read
(i.e. that require decoding the entire stripe to get any part of it).
For those codes, we would want to choose the N cheapest/available chunks
and the sequence above would be a bit different. I guess in reality,
though, we probably don't care to implement any such codes (I'm not sure
what their advantages would be, if any)!
sage
>
> >
> > that recovers the original data.
> >
> > In our case, the read path would use decode, and for recovery we would use
> > transcode.
> >
> > We'd also want to have alternate minimum_to_decode* methods, like
> >
> > virtual set<int> minimum_to_decode(unsigned offset, unsigned len, const
> > set<int> &available_chunks) = 0;
>
> I also have a convenience wrapper in mind for this but I feel I'm missing something.
>
> Cheers
>
> >
> > What do you think?
> >
> > sage
> >
> >
> >
> >
> >>
> >> Cheers
> >>
> >> On 18/08/2013 19:34, Sage Weil wrote:
> >>> On Sun, 18 Aug 2013, Loic Dachary wrote:
> >>>> Hi Ceph,
> >>>>
> >>>> I've implemented a draft of the Erasure Code plugin loader in the context of http://tracker.ceph.com/issues/5878. It has a trivial unit test and an example plugin. It would be great if someone could do a quick review. The general idea is that the erasure code pool calls something like:
> >>>>
> >>>> ErasureCodePlugin::factory(&erasure_code, "example", parameters)
> >>>>
> >>>> as shown at
> >>>>
> >>>> https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/test/osd/TestErasureCode.cc#L28
> >>>>
> >>>> to get an object implementing the interface
> >>>>
> >>>> https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/osd/ErasureCodeInterface.h
> >>>>
> >>>> which matches the proposal described at
> >>>>
> >>>> https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api
> >>>>
> >>>> The draft is at
> >>>>
> >>>> https://github.com/ceph/ceph/commit/5a2b1d66ae17b78addc14fee68c73985412f3c8c
> >>>>
> >>>> Thanks in advance :-)
> >>>
> >>> I haven't been following this discussion too closely, but taking a look
> >>> now, the first 3 make sense, but
> >>>
> >>> virtual map<int, bufferptr> decode(const set<int> &want_to_read, const
> >>> map<int, bufferptr> &chunks) = 0;
> >>>
> >>> it seems like this one should be more like
> >>>
> >>> virtual int decode(const map<int, bufferptr> &chunks, bufferlist *out);
> >>>
> >>> As in, you'd decode the chunks you have to get the actual data. If you
> >>> want to get (missing) chunks for recovery, you'd do
> >>>
> >>> minimum_to_decode(...); // see what we need
> >>> <fetch those chunks from other nodes>
> >>> decode(...); // reconstruct original buffer
> >>> encode(...); // encode missing chunks from original data
> >>>
> >>> sage
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>>
> >>
> >> --
> >> Lo?c Dachary, Artisan Logiciel Libre
> >> All that is necessary for the triumph of evil is that good people do nothing.
> >>
> >>
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
> --
> Lo?c Dachary, Artisan Logiciel Libre
> All that is necessary for the triumph of evil is that good people do nothing.
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Review request : Erasure Code plugin loader implementation
2013-08-19 15:06 ` Loic Dachary
2013-08-19 16:19 ` Sage Weil
@ 2013-08-20 11:32 ` Loic Dachary
1 sibling, 0 replies; 9+ messages in thread
From: Loic Dachary @ 2013-08-20 11:32 UTC (permalink / raw)
To: Sage Weil; +Cc: Ceph Development
[-- Attachment #1: Type: text/plain, Size: 6078 bytes --]
Hi Sage,
I created "erasure code : convenience functions to code / decode" http://tracker.ceph.com/issues/6064 to implement the suggested functions. Please let me know if this should be merged with another task.
Cheers
On 19/08/2013 17:06, Loic Dachary wrote:
>
>
> On 19/08/2013 02:01, Sage Weil wrote:
>> On Sun, 18 Aug 2013, Loic Dachary wrote:
>>> Hi Sage,
>>>
>>> Unless I misunderstood something ( which is still possible at this stage ;-) decode() is used both for recovery of missing chunks and retrieval of the original buffer. Decoding the M data chunks is a special case of decoding N <= M chunks out of the M+K chunks that were produced by encode(). It can be used to recover parity chunks as well as data chunks.
>>>
>>> https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api
>>>
>>> map<int, buffer> decode(const set<int> &want_to_read, const map<int, buffer> &chunks)
>>>
>>> decode chunks to read the content of the want_to_read chunks and return a map associating the chunk number with its decoded content. For instance, in the simplest case M=2,K=1 for an encoded payload of data A and B with parity Z, calling
>>>
>>> decode([1,2], { 1 => 'A', 2 => 'B', 3 => 'Z' })
>>> => { 1 => 'A', 2 => 'B' }
>>>
>>> If however, the chunk B is to be read but is missing it will be:
>>>
>>> decode([2], { 1 => 'A', 3 => 'Z' })
>>> => { 2 => 'B' }
>>
>> Ah, I guess this works when some of the chunks contain the original
>> data (as with a parity code). There are codes that don't work that way,
>> although I suspect we won't use them.
>>
>> Regardless, I wonder if we should generalize slightly and have some
>> methods work in terms of (offset,length) of the original stripe to
>> generalize that bit. Then we would have something like
>>
>> map<int, buffer> transcode(const set<int> &want_to_read, const map<int,
>> buffer>& chunks);
>>
>> to go from chunks -> chunks (as we would want to do with, say, a LRC-like
>> code where we can rebuild some shards from a subset of the other shards).
>> And then also have
>>
>> int decode(const map<int, buffer>& chunks, unsigned offset,
>> unsigned len, bufferlist *out);
>
> This function would be implemented more or less as:
>
> set<int> want_to_read = range_to_chunks(offset, len) // compute what chunks must be retrieved
> set<int> available = the up set
> set<int> minimum = minimum_to_decode(want_to_read, available);
> map<int, buffer> available_chunks = retrieve_chunks_from_osds(minimum);
> map<int, buffer> chunks = transcode(want_to_read, available_chunks); // repairs if necessary
> out = bufferptr(concat_chunks(chunks), offset - offset of the first chunk, len)
>
> or do you have something else in mind ?
>
>>
>> that recovers the original data.
>>
>> In our case, the read path would use decode, and for recovery we would use
>> transcode.
>>
>> We'd also want to have alternate minimum_to_decode* methods, like
>>
>> virtual set<int> minimum_to_decode(unsigned offset, unsigned len, const
>> set<int> &available_chunks) = 0;
>
> I also have a convenience wrapper in mind for this but I feel I'm missing something.
>
> Cheers
>
>>
>> What do you think?
>>
>> sage
>>
>>
>>
>>
>>>
>>> Cheers
>>>
>>> On 18/08/2013 19:34, Sage Weil wrote:
>>>> On Sun, 18 Aug 2013, Loic Dachary wrote:
>>>>> Hi Ceph,
>>>>>
>>>>> I've implemented a draft of the Erasure Code plugin loader in the context of http://tracker.ceph.com/issues/5878. It has a trivial unit test and an example plugin. It would be great if someone could do a quick review. The general idea is that the erasure code pool calls something like:
>>>>>
>>>>> ErasureCodePlugin::factory(&erasure_code, "example", parameters)
>>>>>
>>>>> as shown at
>>>>>
>>>>> https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/test/osd/TestErasureCode.cc#L28
>>>>>
>>>>> to get an object implementing the interface
>>>>>
>>>>> https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/osd/ErasureCodeInterface.h
>>>>>
>>>>> which matches the proposal described at
>>>>>
>>>>> https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api
>>>>>
>>>>> The draft is at
>>>>>
>>>>> https://github.com/ceph/ceph/commit/5a2b1d66ae17b78addc14fee68c73985412f3c8c
>>>>>
>>>>> Thanks in advance :-)
>>>>
>>>> I haven't been following this discussion too closely, but taking a look
>>>> now, the first 3 make sense, but
>>>>
>>>> virtual map<int, bufferptr> decode(const set<int> &want_to_read, const
>>>> map<int, bufferptr> &chunks) = 0;
>>>>
>>>> it seems like this one should be more like
>>>>
>>>> virtual int decode(const map<int, bufferptr> &chunks, bufferlist *out);
>>>>
>>>> As in, you'd decode the chunks you have to get the actual data. If you
>>>> want to get (missing) chunks for recovery, you'd do
>>>>
>>>> minimum_to_decode(...); // see what we need
>>>> <fetch those chunks from other nodes>
>>>> decode(...); // reconstruct original buffer
>>>> encode(...); // encode missing chunks from original data
>>>>
>>>> sage
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>> --
>>> Lo?c Dachary, Artisan Logiciel Libre
>>> All that is necessary for the triumph of evil is that good people do nothing.
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
--
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 261 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-08-20 11:32 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-18 16:19 Review request : Erasure Code plugin loader implementation Loic Dachary
2013-08-18 17:34 ` Sage Weil
2013-08-18 20:05 ` Loic Dachary
2013-08-19 0:01 ` Sage Weil
2013-08-19 15:06 ` Loic Dachary
2013-08-19 16:19 ` Sage Weil
2013-08-20 11:32 ` Loic Dachary
2013-08-19 0:24 ` Sage Weil
2013-08-19 13:27 ` Loic Dachary
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.