All of lore.kernel.org
 help / color / mirror / Atom feed
From: Loic Dachary <loic@dachary.org>
To: Ceph Development <ceph-devel@vger.kernel.org>
Subject: Erasure encoding as a storage backend
Date: Sat, 04 May 2013 19:16:59 +0200	[thread overview]
Message-ID: <5185428B.6070109@dachary.org> (raw)

[-- Attachment #1: Type: text/plain, Size: 3135 bytes --]

Hi,

Here is an updated description of the "Erasure encoding as a storage backend" proposed implementation that will be discussed during the ceph summit ( http://wiki.ceph.com/01Planning/Developer_Summit#Schedule ). The "strip" and "stripe" terms are illustrated at http://wiki.ceph.com/01Planning/02Blueprints/Dumpling/Erasure_encoding_as_a_storage_backend#Proposed_model . 

I am well aware of the shortcomings of this proposal and it would be great to get feedback before the ceph summit to address the most prominent issues.

Cheers

http://pad.ceph.com/p/Erasure_encoding_as_a_storage_backend

	* PG and ReplicatedPG are reworked so that PG can be used as a base class for ErasureEncodedPG
		* Tests are written for ReplicatedPG to cover 100% of the LOC and most of the expected functionalities.
		* Code is reworked in PG and ReplicatedPG, moving from ReplicatedPG to PG code that is not unique to replication and from PG to ReplicatedPG code that is not generic enough to be useful for the ErasureEncodedPG base class.
	* To isolates ceph from the actual library being used ( zfec, fecpp, ... ), a wrapper around the erasure encoding library is implemented. Each block is encoded into k data blocks and m parity blocks
		* encode(void* data, k, m) => void* data[k], void* parity[m]
		* decode(void* data[k], void* parity[m]) => void* data
		* repair(void* data[k], void* parity[m], indices_of_damaged_blocks[]) => void* data
	* The ErasureEncodePG configuration is set to encode each object into k data objects and m parity objects. 
		* It use the parity ('INDEP') crush mode so that placement is intelligent. The indep  placement avoids moving around a shard between ranks, because a mapping  of [0,1,2,3,4] will change to [0,6,2,3,4] (or something) if osd.1 fails  and the shards on 2,3,4 won't need to be copied around.
		* The ErasureEncodedPG uses k + m OSDs, numbered Do .. Dk-1 and C0 ... Cm-1
		* Each object is a strip
		* Each stripe has a fixed size of B bytes
	* ErasureEncodedPG implementation
		* Write offset, length
			* read the stripes containing offset, length
			* for each stripe, decode(void* data[k], void* parity[m]) => void* data and append to a bufferlist
			* modify the bufferlist with the write request
			* encode(void* data, k, m) => void* data[k], void* parity[m]
			* write data[0] to Do, data[1] to D1 ... data[k-1] to Dk-1 and parity[0] to C0 ... parity[m-1] to Cm-1
		* Read offset, length
			* read the stripes containing offset
			* for each strip, decode(void* data[k], void* parity[m]) => void* data and append to a bufferlist
		* Object attributes
			* duplicate the object attributes on each OSD
		* Scrubbing
			* for each object, read each stripe and write back if a repair was necessary
		* Repair
			* when an OSD is decomissioned, when another OSD replaces it, for each object contained in a ErasureEncodedPG using this OSD, read the object, repair each strips and write back the strip that resides on the new OSD


-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

             reply	other threads:[~2013-05-04 17:17 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-04 17:16 Loic Dachary [this message]
2013-05-04 18:27 ` Erasure encoding as a storage backend Noah Watkins
2013-05-04 18:36   ` Loic Dachary
2013-05-04 18:47     ` Noah Watkins
2013-05-04 19:26       ` Loic Dachary
2013-05-05  4:51       ` Gregory Farnum
2013-05-05 14:51         ` Noah Watkins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5185428B.6070109@dachary.org \
    --to=loic@dachary.org \
    --cc=ceph-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.