From: Loic Dachary <loic@dachary.org>
To: Paul Von-Stamwitz <PVonStamwitz@us.fujitsu.com>
Cc: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
Harvey Skinner <hpmpec2a@gmail.com>
Subject: Re: Comments on Ceph distributed parity implementation
Date: Sat, 22 Jun 2013 10:26:44 +0200 [thread overview]
Message-ID: <51C55FC4.7050205@dachary.org> (raw)
In-Reply-To: <622F4407872BA447A16110F65453358C01A4D8066F49@FMSAMAIL.fmsa.local>
[-- Attachment #1: Type: text/plain, Size: 2978 bytes --]
>> The first, simplest implementation is likely to be fit to use with RGW and
>> probably too slow to use with RBD. Do you think we should try to optimize
>> for RBD right now ?
>
> Yes, RGW is the obvious best candidate for the first implementation. We don't need to implement for RBD and CephFS now, but we should consider how the design would handle other applications in the future. The alternative is to optimize purely for RGW and provide an API/plug-in capability suggested by Harvey Skinner to make way for optimized solutions for other applications.
>
I agree that the design should make room to plug in optimizations in the future. I've tried to figure out where the API/plug-in should fit.
a) pluggable placement group
b) pluggable erasure code library
The pluggable placement group capability is what I'm working on right now. It requires some re-architecture of the current code and the API is starting to emerge. The implementation should eventually be in a separate shared library ( say ErasureCodePG ) loaded at run time and selected with a configuration option when creating a pool. I suspect that experimenting with new optimization strategies is going to be done by hacking ErasureCodePG and create new pools using it.
Let say we find a way to optimize for RBD and implement that in the RBDErasureCodePG placement group. And we configure the RBD pool to use this placement group backend while keeping the ErasureCodePG placement group backend for RGW. Later on it may make sense to merge the two or make sure they share similar code for maintainance purposes. But that probably leaves all the room we need to experiment until a general solution is found.
The pluggable erasure code library API will be something like what is described in http://pad.ceph.com/p/Erasure_encoding_as_a_storage_backend
context(k, m, reed-solomon|...) => context* c
encode(context* c, void* data) => void* chunks[k+m]
decode(context* c, void* chunk[k+m], int* indices_of_erased_chunks) => void* data // erased chunks are not used
repair(context* c, void* chunk[k+m], int* indices_of_erased_chunks) => void* chunks[k+m] // erased chunks are rebuilt
It won't be enough for hierarchical codes but they don't seem to be considered attractive at the moment. It should be enough for LRC ( http://anrg.usc.edu/~maheswaran/Xorbas.pdf ) since it only requires an additional argument to the context ( the number of chunks required to do a local repair ).
The need for another API ( in addition to pluggable placement groups and pluggable erasure code library ) may appear in the future. I can't see it right now. I try to refrain from over-engineering while making sure we don't need to re-architecture because something obvious was overlooked. This discussion is helping a lot :-)
What do you think ?
--
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
next prev parent reply other threads:[~2013-06-22 8:26 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-14 20:13 Comments on Ceph distributed parity implementation Martin Flyvbjerg
2013-06-14 20:29 ` Mark Nelson
2013-06-14 21:05 ` Joe Buck
2013-06-14 22:57 ` Loic Dachary
2013-06-15 1:12 ` Paul Von-Stamwitz
2013-06-15 6:51 ` Loic Dachary
2013-06-16 19:51 ` Benoît Parrein
2013-06-16 21:31 ` Loic Dachary
2013-06-17 16:48 ` Benoît Parrein
2013-06-17 16:55 ` Paul Von-Stamwitz
2013-06-18 7:44 ` Benoît Parrein
2013-06-18 14:22 ` James Plank
2013-06-19 1:35 ` Paul Von-Stamwitz
2013-06-20 18:25 ` Loic Dachary
2013-06-21 1:23 ` Paul Von-Stamwitz
2013-06-21 8:29 ` Loic Dachary
2013-06-22 0:08 ` Paul Von-Stamwitz
2013-06-22 8:26 ` Loic Dachary [this message]
2013-06-24 2:26 ` Harvey Skinner
[not found] ` <C395B77B849187439280E1CF5FE1F2FA8990491B@G9W0337.americas.hpqcorp.net>
[not found] ` <CAJOObidVdjtiwk+xk5rwZi4=DBZ9GvTQnAkteCC0OhB_vyg6pg@mail.gmail.com>
[not found] ` <CAJOObicNGkweZbVSR-V8NA9YXaZucUpNm0y8Ph3X7EkE=pRG5g@mail.gmail.com>
2013-06-18 14:31 ` Harvey Skinner
2013-06-18 15:46 ` Loic Dachary
2013-06-15 7:30 ` Loic Dachary
2013-06-15 9:40 ` Leen Besselink
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51C55FC4.7050205@dachary.org \
--to=loic@dachary.org \
--cc=PVonStamwitz@us.fujitsu.com \
--cc=ceph-devel@vger.kernel.org \
--cc=hpmpec2a@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.