From: Loic Dachary <loic@dachary.org>
To: Sage Weil <sage@inktank.com>
Cc: Ceph Development <ceph-devel@vger.kernel.org>
Subject: Re: Erasure coding implementation : high level description
Date: Sat, 29 Jun 2013 18:56:08 +0200 [thread overview]
Message-ID: <51CF11A8.2070208@dachary.org> (raw)
In-Reply-To: <51C9D65F.8000507@dachary.org>
[-- Attachment #1: Type: text/plain, Size: 2173 bytes --]
Hi Sage,
The level of understanding of ReplicatedPG/PG/OSD required to sketch the path for implementing the erasure coding is beyond me at the moment. A few hours of browsing demonstrated that a number of important areas are still unknown to me. A meaningfull example is probably the logic associated with
struct AccessMode {
https://github.com/ceph/ceph/blob/962b64a83037ff79855c5261325de0cd1541f582/src/osd/ReplicatedPG.h#L114
I suspect there are a number of similarities with the erasure code that would be relevant to ensure that a stripe is fully written to disk ( i.e. in relation with the "ondisk" acknowledgment probably ) before removing the previous version of the same stripe from all OSDs supporting it.
The time spent during this exploration was not wasted, I learnt a few things that will be useful :-) But I think it would be more useful for me to work on a more modest task to move in the direction of the erasure coding implementation.
Cheers
On 06/25/2013 07:41 PM, Loic Dachary wrote:
> Hi Sage,
>
> Paraphrasing what you suggested today :
>
> The logic for writing a stripe ( i.e. all the chunks created by the erasure encoding function for a given object or part of a given object if it exceeds the maximum size of a stripe ) for a single object is going to be done in a way that is not the same as what we currently have for replicated objects. The object is consistent when all chunks ( or at least K if K+M ) are committed to disk. It may make sense to start writing all the chunks in parallel and when they are acknowledged, send a pg_log event that says : now switch to this new version of the object. To avoid ending up with chunks that are partially for one version of the object and other chunks partially for another version of the object and we can't repair any of them.
>
> I will try to sketch the path for implementing the erasure coding ( including the above ) by adding to https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst
>
> Cheers
>
--
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
next prev parent reply other threads:[~2013-06-29 16:56 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-25 17:41 Erasure coding implementation : high level description Loic Dachary
2013-06-29 16:56 ` Loic Dachary [this message]
2013-07-01 21:45 ` Loic Dachary
2013-07-02 3:52 ` Sage Weil
2013-07-05 11:56 ` Loic Dachary
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51CF11A8.2070208@dachary.org \
--to=loic@dachary.org \
--cc=ceph-devel@vger.kernel.org \
--cc=sage@inktank.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.