All of lore.kernel.org
 help / color / mirror / Atom feed
* Started developing a deduplication feature
@ 2016-04-01 17:25 Marcel Lauhoff
  2016-04-01 21:31 ` Sage Weil
  0 siblings, 1 reply; 8+ messages in thread
From: Marcel Lauhoff @ 2016-04-01 17:25 UTC (permalink / raw)
  To: ceph-devel


Hi Ceph,

deduplication has been discussed on the list a couple of times.
Over the next months I'll be working on a prototype.

In short: Use a content-addressed storage pool backed by a pool
acting as storage and distributed fingerprint index.



Two pools: (1) pool that does the content addressing, (2) storage / index pool.

OSDs in the first pool readdress and chuck/reassemble objects.
They then store the new objects/chunks in a second pool.
The first pool uses a new PG backend ("CAS Backend"),
while the second can use replication or erasure coding.

The CAS backend computes fingerprints for incoming objects and
stores the fingerprint <-> original object name mapping.
It then forwards the data to a storage pool, addressing the objects by
fingerprint (the content defined name).

The storage pool therefore serves as a distributed fingerprint index.
CRUSH selects the responsible OSDs. The OSDs know their objects.

Deduplication happens when two objects/chunks have the same
fingerprint.

My current milestones:
- Develop CAS backend, fingerprinting, recipes store
- Support limited set of operations (like EC does)
- Support RBD (with/without Cache) and evaluate
- Add Chunking, Garbage Collection, ..

Currently I'm adding a new PG backend into the OSD code base. I'll
push the code the my github clone as soon as it does "something" :)

~irq0

--
Marcel Lauhoff
Mail: lauhoff@uni-mainz.de
XMPP: mlauhoff@jabber.uni-mainz.de

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-04-28 21:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-01 17:25 Started developing a deduplication feature Marcel Lauhoff
2016-04-01 21:31 ` Sage Weil
2016-04-04 12:38   ` Marcel Lauhoff
2016-04-08 15:01     ` Marcel Lauhoff
2016-04-08 15:18       ` Sage Weil
2016-04-08 21:50       ` Shinobu Kinjo
2016-04-12  9:35         ` Marcel Lauhoff
2016-04-28 21:08   ` Allen Samuels

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.