All of lore.kernel.org
 help / color / mirror / Atom feed
* Hitchhiker erasure code
@ 2015-03-20 10:32 Loic Dachary
  2015-03-20 12:37 ` Andreas Joachim Peters
  0 siblings, 1 reply; 4+ messages in thread
From: Loic Dachary @ 2015-03-20 10:32 UTC (permalink / raw)
  To: Andreas-Joachim Peters; +Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 380 bytes --]

Hi Andreas,

Today I learnt about Hitchhiker as described http://www.eecs.berkeley.edu/~rashmikv/papers/Hitchhiker_SIGCOMM14.pdf. For information see also the HADOOP work at

    https://issues.apache.org/jira/browse/HDFS-7715
    https://issues.apache.org/jira/browse/HDFS-7285

Do you have an opinion about it ?

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Hitchhiker erasure code
  2015-03-20 10:32 Hitchhiker erasure code Loic Dachary
@ 2015-03-20 12:37 ` Andreas Joachim Peters
  2015-03-20 12:42   ` Loic Dachary
  0 siblings, 1 reply; 4+ messages in thread
From: Andreas Joachim Peters @ 2015-03-20 12:37 UTC (permalink / raw)
  To: Loic Dachary; +Cc: ceph-devel@vger.kernel.org

Hi Loic, 
I looked at that some time ago.

Table 1 in the paper says it all:

If you care about decoding and reconstruction of data it gives a good improvement.
If you care mainly about encoding speed, it is not the optimal choice (+72.1%).

The algorithm optimizes the reconstruction of data units. This is relevant if your read-size is typically smaller than the block-size e.g. you encode 4 MB objects and you read 4kb pages. With normal EC you get a read amplification of K*4k if a data stripe is down, while with hitchhiker you get only 2/3 of that traffic in case of (10,4).

The most interesting to implement is probably Hitchhiker-XOR+, which you have to combine with a Vandermonde matrix, it requires that the first parity is just the xor of all data chunks.

So, yes, there is certainly a benefit in implementing that compared to other approaches (Xorbas,LRC) since it does not involve a space overhead and opens the door to use larger K values and save space!

Cheers Andreas.






 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Hitchhiker erasure code
  2015-03-20 12:37 ` Andreas Joachim Peters
@ 2015-03-20 12:42   ` Loic Dachary
  2015-03-20 13:11     ` Andreas Joachim Peters
  0 siblings, 1 reply; 4+ messages in thread
From: Loic Dachary @ 2015-03-20 12:42 UTC (permalink / raw)
  To: Andreas Joachim Peters; +Cc: ceph-devel@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1267 bytes --]



On 20/03/2015 13:37, Andreas Joachim Peters wrote:
> Hi Loic, 
> I looked at that some time ago.
> 
> Table 1 in the paper says it all:
> 
> If you care about decoding and reconstruction of data it gives a good improvement.
> If you care mainly about encoding speed, it is not the optimal choice (+72.1%).
> 
> The algorithm optimizes the reconstruction of data units. This is relevant if your read-size is typically smaller than the block-size e.g. you encode 4 MB objects and you read 4kb pages. With normal EC you get a read amplification of K*4k if a data stripe is down, while with hitchhiker you get only 2/3 of that traffic in case of (10,4).
> 
> The most interesting to implement is probably Hitchhiker-XOR+, which you have to combine with a Vandermonde matrix, it requires that the first parity is just the xor of all data chunks.
> 
> So, yes, there is certainly a benefit in implementing that compared to other approaches (Xorbas,LRC) since it does not involve a space overhead and opens the door to use larger K values and save space!
> 

That sounds appealing :-) Do you think it would be more relevant to implement this as an additional Ceph plugin ? Or as a new jerasure technique ?

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Hitchhiker erasure code
  2015-03-20 12:42   ` Loic Dachary
@ 2015-03-20 13:11     ` Andreas Joachim Peters
  0 siblings, 0 replies; 4+ messages in thread
From: Andreas Joachim Peters @ 2015-03-20 13:11 UTC (permalink / raw)
  To: Loic Dachary; +Cc: ceph-devel@vger.kernel.org

Hi Loic, 

the XOR+ is more generic with Jerasure because the ISA library does not create an invertible matrix for M>4 in all cases. I measured a tiny advantage for the Intel library encoding/decoding speed for given K/M values and given hardware, but Dan did some measurements in a real cluster and you cannot see any difference on a global scale between a jerasure or isa configuration (8,4) with our hardware. The EC computation does not define the base line performance in that case.

So, this is more a general question ... but I am sure it is simpler to implement that within a plug-in rather than on top of N plug-ins ...

Cheers Andreas.


________________________________________
From: Loic Dachary [loic@dachary.org]
Sent: 20 March 2015 13:42
To: Andreas Joachim Peters
Cc: ceph-devel@vger.kernel.org
Subject: Re: Hitchhiker erasure code

On 20/03/2015 13:37, Andreas Joachim Peters wrote:
> Hi Loic,
> I looked at that some time ago.
>
> Table 1 in the paper says it all:
>
> If you care about decoding and reconstruction of data it gives a good improvement.
> If you care mainly about encoding speed, it is not the optimal choice (+72.1%).
>
> The algorithm optimizes the reconstruction of data units. This is relevant if your read-size is typically smaller than the block-size e.g. you encode 4 MB objects and you read 4kb pages. With normal EC you get a read amplification of K*4k if a data stripe is down, while with hitchhiker you get only 2/3 of that traffic in case of (10,4).
>
> The most interesting to implement is probably Hitchhiker-XOR+, which you have to combine with a Vandermonde matrix, it requires that the first parity is just the xor of all data chunks.
>
> So, yes, there is certainly a benefit in implementing that compared to other approaches (Xorbas,LRC) since it does not involve a space overhead and opens the door to use larger K values and save space!
>

That sounds appealing :-) Do you think it would be more relevant to implement this as an additional Ceph plugin ? Or as a new jerasure technique ?

--
Loïc Dachary, Artisan Logiciel Libre

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-03-20 13:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-20 10:32 Hitchhiker erasure code Loic Dachary
2015-03-20 12:37 ` Andreas Joachim Peters
2015-03-20 12:42   ` Loic Dachary
2015-03-20 13:11     ` Andreas Joachim Peters

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.