* Review request : Erasure Code plugin loader implementation @ 2013-08-18 16:19 Loic Dachary 2013-08-18 17:34 ` Sage Weil 0 siblings, 1 reply; 9+ messages in thread From: Loic Dachary @ 2013-08-18 16:19 UTC (permalink / raw) To: Ceph Development [-- Attachment #1: Type: text/plain, Size: 1068 bytes --] Hi Ceph, I've implemented a draft of the Erasure Code plugin loader in the context of http://tracker.ceph.com/issues/5878. It has a trivial unit test and an example plugin. It would be great if someone could do a quick review. The general idea is that the erasure code pool calls something like: ErasureCodePlugin::factory(&erasure_code, "example", parameters) as shown at https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/test/osd/TestErasureCode.cc#L28 to get an object implementing the interface https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/osd/ErasureCodeInterface.h which matches the proposal described at https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api The draft is at https://github.com/ceph/ceph/commit/5a2b1d66ae17b78addc14fee68c73985412f3c8c Thanks in advance :-) -- Loïc Dachary, Artisan Logiciel Libre All that is necessary for the triumph of evil is that good people do nothing. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 261 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Review request : Erasure Code plugin loader implementation 2013-08-18 16:19 Review request : Erasure Code plugin loader implementation Loic Dachary @ 2013-08-18 17:34 ` Sage Weil 2013-08-18 20:05 ` Loic Dachary 0 siblings, 1 reply; 9+ messages in thread From: Sage Weil @ 2013-08-18 17:34 UTC (permalink / raw) To: Loic Dachary; +Cc: Ceph Development On Sun, 18 Aug 2013, Loic Dachary wrote: > Hi Ceph, > > I've implemented a draft of the Erasure Code plugin loader in the context of http://tracker.ceph.com/issues/5878. It has a trivial unit test and an example plugin. It would be great if someone could do a quick review. The general idea is that the erasure code pool calls something like: > > ErasureCodePlugin::factory(&erasure_code, "example", parameters) > > as shown at > > https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/test/osd/TestErasureCode.cc#L28 > > to get an object implementing the interface > > https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/osd/ErasureCodeInterface.h > > which matches the proposal described at > > https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api > > The draft is at > > https://github.com/ceph/ceph/commit/5a2b1d66ae17b78addc14fee68c73985412f3c8c > > Thanks in advance :-) I haven't been following this discussion too closely, but taking a look now, the first 3 make sense, but virtual map<int, bufferptr> decode(const set<int> &want_to_read, const map<int, bufferptr> &chunks) = 0; it seems like this one should be more like virtual int decode(const map<int, bufferptr> &chunks, bufferlist *out); As in, you'd decode the chunks you have to get the actual data. If you want to get (missing) chunks for recovery, you'd do minimum_to_decode(...); // see what we need <fetch those chunks from other nodes> decode(...); // reconstruct original buffer encode(...); // encode missing chunks from original data sage ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Review request : Erasure Code plugin loader implementation 2013-08-18 17:34 ` Sage Weil @ 2013-08-18 20:05 ` Loic Dachary 2013-08-19 0:01 ` Sage Weil 2013-08-19 0:24 ` Sage Weil 0 siblings, 2 replies; 9+ messages in thread From: Loic Dachary @ 2013-08-18 20:05 UTC (permalink / raw) To: Sage Weil; +Cc: Ceph Development [-- Attachment #1: Type: text/plain, Size: 3211 bytes --] Hi Sage, Unless I misunderstood something ( which is still possible at this stage ;-) decode() is used both for recovery of missing chunks and retrieval of the original buffer. Decoding the M data chunks is a special case of decoding N <= M chunks out of the M+K chunks that were produced by encode(). It can be used to recover parity chunks as well as data chunks. https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api map<int, buffer> decode(const set<int> &want_to_read, const map<int, buffer> &chunks) decode chunks to read the content of the want_to_read chunks and return a map associating the chunk number with its decoded content. For instance, in the simplest case M=2,K=1 for an encoded payload of data A and B with parity Z, calling decode([1,2], { 1 => 'A', 2 => 'B', 3 => 'Z' }) => { 1 => 'A', 2 => 'B' } If however, the chunk B is to be read but is missing it will be: decode([2], { 1 => 'A', 3 => 'Z' }) => { 2 => 'B' } Cheers On 18/08/2013 19:34, Sage Weil wrote: > On Sun, 18 Aug 2013, Loic Dachary wrote: >> Hi Ceph, >> >> I've implemented a draft of the Erasure Code plugin loader in the context of http://tracker.ceph.com/issues/5878. It has a trivial unit test and an example plugin. It would be great if someone could do a quick review. The general idea is that the erasure code pool calls something like: >> >> ErasureCodePlugin::factory(&erasure_code, "example", parameters) >> >> as shown at >> >> https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/test/osd/TestErasureCode.cc#L28 >> >> to get an object implementing the interface >> >> https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/osd/ErasureCodeInterface.h >> >> which matches the proposal described at >> >> https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api >> >> The draft is at >> >> https://github.com/ceph/ceph/commit/5a2b1d66ae17b78addc14fee68c73985412f3c8c >> >> Thanks in advance :-) > > I haven't been following this discussion too closely, but taking a look > now, the first 3 make sense, but > > virtual map<int, bufferptr> decode(const set<int> &want_to_read, const > map<int, bufferptr> &chunks) = 0; > > it seems like this one should be more like > > virtual int decode(const map<int, bufferptr> &chunks, bufferlist *out); > > As in, you'd decode the chunks you have to get the actual data. If you > want to get (missing) chunks for recovery, you'd do > > minimum_to_decode(...); // see what we need > <fetch those chunks from other nodes> > decode(...); // reconstruct original buffer > encode(...); // encode missing chunks from original data > > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Loïc Dachary, Artisan Logiciel Libre All that is necessary for the triumph of evil is that good people do nothing. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 261 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Review request : Erasure Code plugin loader implementation 2013-08-18 20:05 ` Loic Dachary @ 2013-08-19 0:01 ` Sage Weil 2013-08-19 15:06 ` Loic Dachary 2013-08-19 0:24 ` Sage Weil 1 sibling, 1 reply; 9+ messages in thread From: Sage Weil @ 2013-08-19 0:01 UTC (permalink / raw) To: Loic Dachary; +Cc: Ceph Development On Sun, 18 Aug 2013, Loic Dachary wrote: > Hi Sage, > > Unless I misunderstood something ( which is still possible at this stage ;-) decode() is used both for recovery of missing chunks and retrieval of the original buffer. Decoding the M data chunks is a special case of decoding N <= M chunks out of the M+K chunks that were produced by encode(). It can be used to recover parity chunks as well as data chunks. > > https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api > > map<int, buffer> decode(const set<int> &want_to_read, const map<int, buffer> &chunks) > > decode chunks to read the content of the want_to_read chunks and return a map associating the chunk number with its decoded content. For instance, in the simplest case M=2,K=1 for an encoded payload of data A and B with parity Z, calling > > decode([1,2], { 1 => 'A', 2 => 'B', 3 => 'Z' }) > => { 1 => 'A', 2 => 'B' } > > If however, the chunk B is to be read but is missing it will be: > > decode([2], { 1 => 'A', 3 => 'Z' }) > => { 2 => 'B' } Ah, I guess this works when some of the chunks contain the original data (as with a parity code). There are codes that don't work that way, although I suspect we won't use them. Regardless, I wonder if we should generalize slightly and have some methods work in terms of (offset,length) of the original stripe to generalize that bit. Then we would have something like map<int, buffer> transcode(const set<int> &want_to_read, const map<int, buffer>& chunks); to go from chunks -> chunks (as we would want to do with, say, a LRC-like code where we can rebuild some shards from a subset of the other shards). And then also have int decode(const map<int, buffer>& chunks, unsigned offset, unsigned len, bufferlist *out); that recovers the original data. In our case, the read path would use decode, and for recovery we would use transcode. We'd also want to have alternate minimum_to_decode* methods, like virtual set<int> minimum_to_decode(unsigned offset, unsigned len, const set<int> &available_chunks) = 0; What do you think? sage > > Cheers > > On 18/08/2013 19:34, Sage Weil wrote: > > On Sun, 18 Aug 2013, Loic Dachary wrote: > >> Hi Ceph, > >> > >> I've implemented a draft of the Erasure Code plugin loader in the context of http://tracker.ceph.com/issues/5878. It has a trivial unit test and an example plugin. It would be great if someone could do a quick review. The general idea is that the erasure code pool calls something like: > >> > >> ErasureCodePlugin::factory(&erasure_code, "example", parameters) > >> > >> as shown at > >> > >> https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/test/osd/TestErasureCode.cc#L28 > >> > >> to get an object implementing the interface > >> > >> https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/osd/ErasureCodeInterface.h > >> > >> which matches the proposal described at > >> > >> https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api > >> > >> The draft is at > >> > >> https://github.com/ceph/ceph/commit/5a2b1d66ae17b78addc14fee68c73985412f3c8c > >> > >> Thanks in advance :-) > > > > I haven't been following this discussion too closely, but taking a look > > now, the first 3 make sense, but > > > > virtual map<int, bufferptr> decode(const set<int> &want_to_read, const > > map<int, bufferptr> &chunks) = 0; > > > > it seems like this one should be more like > > > > virtual int decode(const map<int, bufferptr> &chunks, bufferlist *out); > > > > As in, you'd decode the chunks you have to get the actual data. If you > > want to get (missing) chunks for recovery, you'd do > > > > minimum_to_decode(...); // see what we need > > <fetch those chunks from other nodes> > > decode(...); // reconstruct original buffer > > encode(...); // encode missing chunks from original data > > > > sage > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > Lo?c Dachary, Artisan Logiciel Libre > All that is necessary for the triumph of evil is that good people do nothing. > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Review request : Erasure Code plugin loader implementation 2013-08-19 0:01 ` Sage Weil @ 2013-08-19 15:06 ` Loic Dachary 2013-08-19 16:19 ` Sage Weil 2013-08-20 11:32 ` Loic Dachary 0 siblings, 2 replies; 9+ messages in thread From: Loic Dachary @ 2013-08-19 15:06 UTC (permalink / raw) To: Sage Weil; +Cc: Ceph Development [-- Attachment #1: Type: text/plain, Size: 5662 bytes --] On 19/08/2013 02:01, Sage Weil wrote: > On Sun, 18 Aug 2013, Loic Dachary wrote: >> Hi Sage, >> >> Unless I misunderstood something ( which is still possible at this stage ;-) decode() is used both for recovery of missing chunks and retrieval of the original buffer. Decoding the M data chunks is a special case of decoding N <= M chunks out of the M+K chunks that were produced by encode(). It can be used to recover parity chunks as well as data chunks. >> >> https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api >> >> map<int, buffer> decode(const set<int> &want_to_read, const map<int, buffer> &chunks) >> >> decode chunks to read the content of the want_to_read chunks and return a map associating the chunk number with its decoded content. For instance, in the simplest case M=2,K=1 for an encoded payload of data A and B with parity Z, calling >> >> decode([1,2], { 1 => 'A', 2 => 'B', 3 => 'Z' }) >> => { 1 => 'A', 2 => 'B' } >> >> If however, the chunk B is to be read but is missing it will be: >> >> decode([2], { 1 => 'A', 3 => 'Z' }) >> => { 2 => 'B' } > > Ah, I guess this works when some of the chunks contain the original > data (as with a parity code). There are codes that don't work that way, > although I suspect we won't use them. > > Regardless, I wonder if we should generalize slightly and have some > methods work in terms of (offset,length) of the original stripe to > generalize that bit. Then we would have something like > > map<int, buffer> transcode(const set<int> &want_to_read, const map<int, > buffer>& chunks); > > to go from chunks -> chunks (as we would want to do with, say, a LRC-like > code where we can rebuild some shards from a subset of the other shards). > And then also have > > int decode(const map<int, buffer>& chunks, unsigned offset, > unsigned len, bufferlist *out); This function would be implemented more or less as: set<int> want_to_read = range_to_chunks(offset, len) // compute what chunks must be retrieved set<int> available = the up set set<int> minimum = minimum_to_decode(want_to_read, available); map<int, buffer> available_chunks = retrieve_chunks_from_osds(minimum); map<int, buffer> chunks = transcode(want_to_read, available_chunks); // repairs if necessary out = bufferptr(concat_chunks(chunks), offset - offset of the first chunk, len) or do you have something else in mind ? > > that recovers the original data. > > In our case, the read path would use decode, and for recovery we would use > transcode. > > We'd also want to have alternate minimum_to_decode* methods, like > > virtual set<int> minimum_to_decode(unsigned offset, unsigned len, const > set<int> &available_chunks) = 0; I also have a convenience wrapper in mind for this but I feel I'm missing something. Cheers > > What do you think? > > sage > > > > >> >> Cheers >> >> On 18/08/2013 19:34, Sage Weil wrote: >>> On Sun, 18 Aug 2013, Loic Dachary wrote: >>>> Hi Ceph, >>>> >>>> I've implemented a draft of the Erasure Code plugin loader in the context of http://tracker.ceph.com/issues/5878. It has a trivial unit test and an example plugin. It would be great if someone could do a quick review. The general idea is that the erasure code pool calls something like: >>>> >>>> ErasureCodePlugin::factory(&erasure_code, "example", parameters) >>>> >>>> as shown at >>>> >>>> https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/test/osd/TestErasureCode.cc#L28 >>>> >>>> to get an object implementing the interface >>>> >>>> https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/osd/ErasureCodeInterface.h >>>> >>>> which matches the proposal described at >>>> >>>> https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api >>>> >>>> The draft is at >>>> >>>> https://github.com/ceph/ceph/commit/5a2b1d66ae17b78addc14fee68c73985412f3c8c >>>> >>>> Thanks in advance :-) >>> >>> I haven't been following this discussion too closely, but taking a look >>> now, the first 3 make sense, but >>> >>> virtual map<int, bufferptr> decode(const set<int> &want_to_read, const >>> map<int, bufferptr> &chunks) = 0; >>> >>> it seems like this one should be more like >>> >>> virtual int decode(const map<int, bufferptr> &chunks, bufferlist *out); >>> >>> As in, you'd decode the chunks you have to get the actual data. If you >>> want to get (missing) chunks for recovery, you'd do >>> >>> minimum_to_decode(...); // see what we need >>> <fetch those chunks from other nodes> >>> decode(...); // reconstruct original buffer >>> encode(...); // encode missing chunks from original data >>> >>> sage >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> >> -- >> Lo?c Dachary, Artisan Logiciel Libre >> All that is necessary for the triumph of evil is that good people do nothing. >> >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Loïc Dachary, Artisan Logiciel Libre All that is necessary for the triumph of evil is that good people do nothing. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 261 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Review request : Erasure Code plugin loader implementation 2013-08-19 15:06 ` Loic Dachary @ 2013-08-19 16:19 ` Sage Weil 2013-08-20 11:32 ` Loic Dachary 1 sibling, 0 replies; 9+ messages in thread From: Sage Weil @ 2013-08-19 16:19 UTC (permalink / raw) To: Loic Dachary; +Cc: Ceph Development On Mon, 19 Aug 2013, Loic Dachary wrote: > > > On 19/08/2013 02:01, Sage Weil wrote: > > On Sun, 18 Aug 2013, Loic Dachary wrote: > >> Hi Sage, > >> > >> Unless I misunderstood something ( which is still possible at this stage ;-) decode() is used both for recovery of missing chunks and retrieval of the original buffer. Decoding the M data chunks is a special case of decoding N <= M chunks out of the M+K chunks that were produced by encode(). It can be used to recover parity chunks as well as data chunks. > >> > >> https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api > >> > >> map<int, buffer> decode(const set<int> &want_to_read, const map<int, buffer> &chunks) > >> > >> decode chunks to read the content of the want_to_read chunks and return a map associating the chunk number with its decoded content. For instance, in the simplest case M=2,K=1 for an encoded payload of data A and B with parity Z, calling > >> > >> decode([1,2], { 1 => 'A', 2 => 'B', 3 => 'Z' }) > >> => { 1 => 'A', 2 => 'B' } > >> > >> If however, the chunk B is to be read but is missing it will be: > >> > >> decode([2], { 1 => 'A', 3 => 'Z' }) > >> => { 2 => 'B' } > > > > Ah, I guess this works when some of the chunks contain the original > > data (as with a parity code). There are codes that don't work that way, > > although I suspect we won't use them. > > > > Regardless, I wonder if we should generalize slightly and have some > > methods work in terms of (offset,length) of the original stripe to > > generalize that bit. Then we would have something like > > > > map<int, buffer> transcode(const set<int> &want_to_read, const map<int, > > buffer>& chunks); > > > > to go from chunks -> chunks (as we would want to do with, say, a LRC-like > > code where we can rebuild some shards from a subset of the other shards). > > And then also have > > > > int decode(const map<int, buffer>& chunks, unsigned offset, > > unsigned len, bufferlist *out); > > This function would be implemented more or less as: > > set<int> want_to_read = range_to_chunks(offset, len) // compute what chunks must be retrieved > set<int> available = the up set > set<int> minimum = minimum_to_decode(want_to_read, available); > map<int, buffer> available_chunks = retrieve_chunks_from_osds(minimum); > map<int, buffer> chunks = transcode(want_to_read, available_chunks); // repairs if necessary > out = bufferptr(concat_chunks(chunks), offset - offset of the first chunk, len) > > or do you have something else in mind ? This makes sense. I am still wondering if it is worth generalizing this a bit further to codes without a nice mapping of a range -> want_to_read (i.e. that require decoding the entire stripe to get any part of it). For those codes, we would want to choose the N cheapest/available chunks and the sequence above would be a bit different. I guess in reality, though, we probably don't care to implement any such codes (I'm not sure what their advantages would be, if any)! sage > > > > > that recovers the original data. > > > > In our case, the read path would use decode, and for recovery we would use > > transcode. > > > > We'd also want to have alternate minimum_to_decode* methods, like > > > > virtual set<int> minimum_to_decode(unsigned offset, unsigned len, const > > set<int> &available_chunks) = 0; > > I also have a convenience wrapper in mind for this but I feel I'm missing something. > > Cheers > > > > > What do you think? > > > > sage > > > > > > > > > >> > >> Cheers > >> > >> On 18/08/2013 19:34, Sage Weil wrote: > >>> On Sun, 18 Aug 2013, Loic Dachary wrote: > >>>> Hi Ceph, > >>>> > >>>> I've implemented a draft of the Erasure Code plugin loader in the context of http://tracker.ceph.com/issues/5878. It has a trivial unit test and an example plugin. It would be great if someone could do a quick review. The general idea is that the erasure code pool calls something like: > >>>> > >>>> ErasureCodePlugin::factory(&erasure_code, "example", parameters) > >>>> > >>>> as shown at > >>>> > >>>> https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/test/osd/TestErasureCode.cc#L28 > >>>> > >>>> to get an object implementing the interface > >>>> > >>>> https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/osd/ErasureCodeInterface.h > >>>> > >>>> which matches the proposal described at > >>>> > >>>> https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api > >>>> > >>>> The draft is at > >>>> > >>>> https://github.com/ceph/ceph/commit/5a2b1d66ae17b78addc14fee68c73985412f3c8c > >>>> > >>>> Thanks in advance :-) > >>> > >>> I haven't been following this discussion too closely, but taking a look > >>> now, the first 3 make sense, but > >>> > >>> virtual map<int, bufferptr> decode(const set<int> &want_to_read, const > >>> map<int, bufferptr> &chunks) = 0; > >>> > >>> it seems like this one should be more like > >>> > >>> virtual int decode(const map<int, bufferptr> &chunks, bufferlist *out); > >>> > >>> As in, you'd decode the chunks you have to get the actual data. If you > >>> want to get (missing) chunks for recovery, you'd do > >>> > >>> minimum_to_decode(...); // see what we need > >>> <fetch those chunks from other nodes> > >>> decode(...); // reconstruct original buffer > >>> encode(...); // encode missing chunks from original data > >>> > >>> sage > >>> -- > >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>> the body of a message to majordomo@vger.kernel.org > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>> > >> > >> -- > >> Lo?c Dachary, Artisan Logiciel Libre > >> All that is necessary for the triumph of evil is that good people do nothing. > >> > >> > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > Lo?c Dachary, Artisan Logiciel Libre > All that is necessary for the triumph of evil is that good people do nothing. > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Review request : Erasure Code plugin loader implementation 2013-08-19 15:06 ` Loic Dachary 2013-08-19 16:19 ` Sage Weil @ 2013-08-20 11:32 ` Loic Dachary 1 sibling, 0 replies; 9+ messages in thread From: Loic Dachary @ 2013-08-20 11:32 UTC (permalink / raw) To: Sage Weil; +Cc: Ceph Development [-- Attachment #1: Type: text/plain, Size: 6078 bytes --] Hi Sage, I created "erasure code : convenience functions to code / decode" http://tracker.ceph.com/issues/6064 to implement the suggested functions. Please let me know if this should be merged with another task. Cheers On 19/08/2013 17:06, Loic Dachary wrote: > > > On 19/08/2013 02:01, Sage Weil wrote: >> On Sun, 18 Aug 2013, Loic Dachary wrote: >>> Hi Sage, >>> >>> Unless I misunderstood something ( which is still possible at this stage ;-) decode() is used both for recovery of missing chunks and retrieval of the original buffer. Decoding the M data chunks is a special case of decoding N <= M chunks out of the M+K chunks that were produced by encode(). It can be used to recover parity chunks as well as data chunks. >>> >>> https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api >>> >>> map<int, buffer> decode(const set<int> &want_to_read, const map<int, buffer> &chunks) >>> >>> decode chunks to read the content of the want_to_read chunks and return a map associating the chunk number with its decoded content. For instance, in the simplest case M=2,K=1 for an encoded payload of data A and B with parity Z, calling >>> >>> decode([1,2], { 1 => 'A', 2 => 'B', 3 => 'Z' }) >>> => { 1 => 'A', 2 => 'B' } >>> >>> If however, the chunk B is to be read but is missing it will be: >>> >>> decode([2], { 1 => 'A', 3 => 'Z' }) >>> => { 2 => 'B' } >> >> Ah, I guess this works when some of the chunks contain the original >> data (as with a parity code). There are codes that don't work that way, >> although I suspect we won't use them. >> >> Regardless, I wonder if we should generalize slightly and have some >> methods work in terms of (offset,length) of the original stripe to >> generalize that bit. Then we would have something like >> >> map<int, buffer> transcode(const set<int> &want_to_read, const map<int, >> buffer>& chunks); >> >> to go from chunks -> chunks (as we would want to do with, say, a LRC-like >> code where we can rebuild some shards from a subset of the other shards). >> And then also have >> >> int decode(const map<int, buffer>& chunks, unsigned offset, >> unsigned len, bufferlist *out); > > This function would be implemented more or less as: > > set<int> want_to_read = range_to_chunks(offset, len) // compute what chunks must be retrieved > set<int> available = the up set > set<int> minimum = minimum_to_decode(want_to_read, available); > map<int, buffer> available_chunks = retrieve_chunks_from_osds(minimum); > map<int, buffer> chunks = transcode(want_to_read, available_chunks); // repairs if necessary > out = bufferptr(concat_chunks(chunks), offset - offset of the first chunk, len) > > or do you have something else in mind ? > >> >> that recovers the original data. >> >> In our case, the read path would use decode, and for recovery we would use >> transcode. >> >> We'd also want to have alternate minimum_to_decode* methods, like >> >> virtual set<int> minimum_to_decode(unsigned offset, unsigned len, const >> set<int> &available_chunks) = 0; > > I also have a convenience wrapper in mind for this but I feel I'm missing something. > > Cheers > >> >> What do you think? >> >> sage >> >> >> >> >>> >>> Cheers >>> >>> On 18/08/2013 19:34, Sage Weil wrote: >>>> On Sun, 18 Aug 2013, Loic Dachary wrote: >>>>> Hi Ceph, >>>>> >>>>> I've implemented a draft of the Erasure Code plugin loader in the context of http://tracker.ceph.com/issues/5878. It has a trivial unit test and an example plugin. It would be great if someone could do a quick review. The general idea is that the erasure code pool calls something like: >>>>> >>>>> ErasureCodePlugin::factory(&erasure_code, "example", parameters) >>>>> >>>>> as shown at >>>>> >>>>> https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/test/osd/TestErasureCode.cc#L28 >>>>> >>>>> to get an object implementing the interface >>>>> >>>>> https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/osd/ErasureCodeInterface.h >>>>> >>>>> which matches the proposal described at >>>>> >>>>> https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api >>>>> >>>>> The draft is at >>>>> >>>>> https://github.com/ceph/ceph/commit/5a2b1d66ae17b78addc14fee68c73985412f3c8c >>>>> >>>>> Thanks in advance :-) >>>> >>>> I haven't been following this discussion too closely, but taking a look >>>> now, the first 3 make sense, but >>>> >>>> virtual map<int, bufferptr> decode(const set<int> &want_to_read, const >>>> map<int, bufferptr> &chunks) = 0; >>>> >>>> it seems like this one should be more like >>>> >>>> virtual int decode(const map<int, bufferptr> &chunks, bufferlist *out); >>>> >>>> As in, you'd decode the chunks you have to get the actual data. If you >>>> want to get (missing) chunks for recovery, you'd do >>>> >>>> minimum_to_decode(...); // see what we need >>>> <fetch those chunks from other nodes> >>>> decode(...); // reconstruct original buffer >>>> encode(...); // encode missing chunks from original data >>>> >>>> sage >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>> >>> -- >>> Lo?c Dachary, Artisan Logiciel Libre >>> All that is necessary for the triumph of evil is that good people do nothing. >>> >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- Loïc Dachary, Artisan Logiciel Libre All that is necessary for the triumph of evil is that good people do nothing. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 261 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Review request : Erasure Code plugin loader implementation 2013-08-18 20:05 ` Loic Dachary 2013-08-19 0:01 ` Sage Weil @ 2013-08-19 0:24 ` Sage Weil 2013-08-19 13:27 ` Loic Dachary 1 sibling, 1 reply; 9+ messages in thread From: Sage Weil @ 2013-08-19 0:24 UTC (permalink / raw) To: Loic Dachary; +Cc: Ceph Development Hi Loic, One other thought on http://tracker.ceph.com/issues/5878: The user interface there would let you adjust various parameters of the pool's erasure coding scheme after the pool is created. As a practical matter, I suspect that many/most of these fields will be specified exactly once (at pool creation time) and will be immutable properties of the pool after that. The m/k at a minimum need to match up with what we are requesting out of crush. And once there is data stored, I don't think it will make sense to be able to change the encoding scheme for new objects and still be able to deal with old objects. (Or maybe it will be, if the code metadata is in the object_info_t.) Even if we do support changing some of these on the fly, though, I suspect the most important interface, and the first we implement, will be something like ceph osd pool create <name> [key=value ...] the various parameters listed, like EC algorithm, m, k, and pg_num. We can probably generalize the mon command interface to have a key/value list type that will make this easy to plumb from the CLI (and trivial via ceph-rest-api). sage ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Review request : Erasure Code plugin loader implementation 2013-08-19 0:24 ` Sage Weil @ 2013-08-19 13:27 ` Loic Dachary 0 siblings, 0 replies; 9+ messages in thread From: Loic Dachary @ 2013-08-19 13:27 UTC (permalink / raw) To: Sage Weil; +Cc: Ceph Development [-- Attachment #1: Type: text/plain, Size: 1712 bytes --] Hi Sage, This makes a lot more sense indeed. I updated the http://tracker.ceph.com/issues/5878 description accordingly. ceph osd pool create poolname erasure-code-dir=/var/lib/ceph/erasure-code erasure-code-plugin=jerasure erasure-code-m=10 erasure-code-k=3 erasure-code-algorithm=Reed-Solomon Thanks :-) On 19/08/2013 02:24, Sage Weil wrote: > Hi Loic, > > One other thought on http://tracker.ceph.com/issues/5878: > > The user interface there would let you adjust various parameters of the > pool's erasure coding scheme after the pool is created. As a practical > matter, I suspect that many/most of these fields will be specified exactly > once (at pool creation time) and will be immutable properties of the pool > after that. The m/k at a minimum need to match up with what we are > requesting out of crush. And once there is data stored, I don't think it > will make sense to be able to change the encoding scheme for new objects > and still be able to deal with old objects. (Or maybe it will be, if the > code metadata is in the object_info_t.) > > Even if we do support changing some of these on the fly, though, I suspect > the most important interface, and the first we implement, will be > something like > > ceph osd pool create <name> [key=value ...] > > the various parameters listed, like EC algorithm, m, k, and pg_num. We > can probably generalize the mon command interface to have a key/value list > type that will make this easy to plumb from the CLI (and trivial via > ceph-rest-api). > > sage > -- Loïc Dachary, Artisan Logiciel Libre All that is necessary for the triumph of evil is that good people do nothing. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 261 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-08-20 11:32 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-08-18 16:19 Review request : Erasure Code plugin loader implementation Loic Dachary 2013-08-18 17:34 ` Sage Weil 2013-08-18 20:05 ` Loic Dachary 2013-08-19 0:01 ` Sage Weil 2013-08-19 15:06 ` Loic Dachary 2013-08-19 16:19 ` Sage Weil 2013-08-20 11:32 ` Loic Dachary 2013-08-19 0:24 ` Sage Weil 2013-08-19 13:27 ` Loic Dachary
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.