* Hitchhiker erasure code
@ 2015-03-20 10:32 Loic Dachary
2015-03-20 12:37 ` Andreas Joachim Peters
0 siblings, 1 reply; 4+ messages in thread
From: Loic Dachary @ 2015-03-20 10:32 UTC (permalink / raw)
To: Andreas-Joachim Peters; +Cc: Ceph Development
[-- Attachment #1: Type: text/plain, Size: 380 bytes --]
Hi Andreas,
Today I learnt about Hitchhiker as described http://www.eecs.berkeley.edu/~rashmikv/papers/Hitchhiker_SIGCOMM14.pdf. For information see also the HADOOP work at
https://issues.apache.org/jira/browse/HDFS-7715
https://issues.apache.org/jira/browse/HDFS-7285
Do you have an opinion about it ?
Cheers
--
Loïc Dachary, Artisan Logiciel Libre
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: Hitchhiker erasure code
2015-03-20 10:32 Hitchhiker erasure code Loic Dachary
@ 2015-03-20 12:37 ` Andreas Joachim Peters
2015-03-20 12:42 ` Loic Dachary
0 siblings, 1 reply; 4+ messages in thread
From: Andreas Joachim Peters @ 2015-03-20 12:37 UTC (permalink / raw)
To: Loic Dachary; +Cc: ceph-devel@vger.kernel.org
Hi Loic,
I looked at that some time ago.
Table 1 in the paper says it all:
If you care about decoding and reconstruction of data it gives a good improvement.
If you care mainly about encoding speed, it is not the optimal choice (+72.1%).
The algorithm optimizes the reconstruction of data units. This is relevant if your read-size is typically smaller than the block-size e.g. you encode 4 MB objects and you read 4kb pages. With normal EC you get a read amplification of K*4k if a data stripe is down, while with hitchhiker you get only 2/3 of that traffic in case of (10,4).
The most interesting to implement is probably Hitchhiker-XOR+, which you have to combine with a Vandermonde matrix, it requires that the first parity is just the xor of all data chunks.
So, yes, there is certainly a benefit in implementing that compared to other approaches (Xorbas,LRC) since it does not involve a space overhead and opens the door to use larger K values and save space!
Cheers Andreas.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Hitchhiker erasure code
2015-03-20 12:37 ` Andreas Joachim Peters
@ 2015-03-20 12:42 ` Loic Dachary
2015-03-20 13:11 ` Andreas Joachim Peters
0 siblings, 1 reply; 4+ messages in thread
From: Loic Dachary @ 2015-03-20 12:42 UTC (permalink / raw)
To: Andreas Joachim Peters; +Cc: ceph-devel@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 1267 bytes --]
On 20/03/2015 13:37, Andreas Joachim Peters wrote:
> Hi Loic,
> I looked at that some time ago.
>
> Table 1 in the paper says it all:
>
> If you care about decoding and reconstruction of data it gives a good improvement.
> If you care mainly about encoding speed, it is not the optimal choice (+72.1%).
>
> The algorithm optimizes the reconstruction of data units. This is relevant if your read-size is typically smaller than the block-size e.g. you encode 4 MB objects and you read 4kb pages. With normal EC you get a read amplification of K*4k if a data stripe is down, while with hitchhiker you get only 2/3 of that traffic in case of (10,4).
>
> The most interesting to implement is probably Hitchhiker-XOR+, which you have to combine with a Vandermonde matrix, it requires that the first parity is just the xor of all data chunks.
>
> So, yes, there is certainly a benefit in implementing that compared to other approaches (Xorbas,LRC) since it does not involve a space overhead and opens the door to use larger K values and save space!
>
That sounds appealing :-) Do you think it would be more relevant to implement this as an additional Ceph plugin ? Or as a new jerasure technique ?
--
Loïc Dachary, Artisan Logiciel Libre
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: Hitchhiker erasure code
2015-03-20 12:42 ` Loic Dachary
@ 2015-03-20 13:11 ` Andreas Joachim Peters
0 siblings, 0 replies; 4+ messages in thread
From: Andreas Joachim Peters @ 2015-03-20 13:11 UTC (permalink / raw)
To: Loic Dachary; +Cc: ceph-devel@vger.kernel.org
Hi Loic,
the XOR+ is more generic with Jerasure because the ISA library does not create an invertible matrix for M>4 in all cases. I measured a tiny advantage for the Intel library encoding/decoding speed for given K/M values and given hardware, but Dan did some measurements in a real cluster and you cannot see any difference on a global scale between a jerasure or isa configuration (8,4) with our hardware. The EC computation does not define the base line performance in that case.
So, this is more a general question ... but I am sure it is simpler to implement that within a plug-in rather than on top of N plug-ins ...
Cheers Andreas.
________________________________________
From: Loic Dachary [loic@dachary.org]
Sent: 20 March 2015 13:42
To: Andreas Joachim Peters
Cc: ceph-devel@vger.kernel.org
Subject: Re: Hitchhiker erasure code
On 20/03/2015 13:37, Andreas Joachim Peters wrote:
> Hi Loic,
> I looked at that some time ago.
>
> Table 1 in the paper says it all:
>
> If you care about decoding and reconstruction of data it gives a good improvement.
> If you care mainly about encoding speed, it is not the optimal choice (+72.1%).
>
> The algorithm optimizes the reconstruction of data units. This is relevant if your read-size is typically smaller than the block-size e.g. you encode 4 MB objects and you read 4kb pages. With normal EC you get a read amplification of K*4k if a data stripe is down, while with hitchhiker you get only 2/3 of that traffic in case of (10,4).
>
> The most interesting to implement is probably Hitchhiker-XOR+, which you have to combine with a Vandermonde matrix, it requires that the first parity is just the xor of all data chunks.
>
> So, yes, there is certainly a benefit in implementing that compared to other approaches (Xorbas,LRC) since it does not involve a space overhead and opens the door to use larger K values and save space!
>
That sounds appealing :-) Do you think it would be more relevant to implement this as an additional Ceph plugin ? Or as a new jerasure technique ?
--
Loïc Dachary, Artisan Logiciel Libre
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-03-20 13:11 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-20 10:32 Hitchhiker erasure code Loic Dachary
2015-03-20 12:37 ` Andreas Joachim Peters
2015-03-20 12:42 ` Loic Dachary
2015-03-20 13:11 ` Andreas Joachim Peters
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.