Re: Target and deduplication?

dm-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

* Re: Target and deduplication?
       [not found] <CAKdSmgsZe_dfhN_7R6P7tLfkciv-9dUJKKNj16xeWYXG8aTVVA@mail.gmail.com>
@ 2016-01-28  8:50 ` Christoph Hellwig
  2016-01-28 11:23   ` [dm-devel] " Joe Thornber
  2016-01-28 13:56   ` Mike Snitzer
  0 siblings, 2 replies; 6+ messages in thread
From: Christoph Hellwig @ 2016-01-28  8:50 UTC (permalink / raw)
  To: Henrik Goldman; +Cc: target-devel, dm-devel

On Thu, Jan 28, 2016 at 12:44:25AM +0100, Henrik Goldman wrote:
> Hello,
> 
> Has anyone (possibly except purestorage) managed to make target work
> with deduplication?

The iblock drivers works perfectly fine on top of the dm-dedup driver,
which unfortunately still hasn't made it to mainline despite looking
rather solid.

I'm pretty sure you can use the file driver on top of btrfs (and XFS
soon!) and use the non-inline dedup feature, but I haven't tried it.

> This means making a filesystem and using fileio to map to a disk image
> on the filesystem.
> ZFS is obviously another way to get the same but comes at a cost of
> large memory consumption.

Nevermind that you can ship or distribute the resulting product as the
ZFS licensing doesn't allow that.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dm-devel] Target and deduplication?
  2016-01-28  8:50 ` Target and deduplication? Christoph Hellwig
@ 2016-01-28 11:23   ` Joe Thornber
  2016-01-28 11:39     ` Nikolay Borisov
  2016-01-28 13:56   ` Mike Snitzer
  1 sibling, 1 reply; 6+ messages in thread
From: Joe Thornber @ 2016-01-28 11:23 UTC (permalink / raw)
  To: device-mapper development; +Cc: Henrik Goldman, target-devel

On Thu, Jan 28, 2016 at 12:50:13AM -0800, Christoph Hellwig wrote:
> On Thu, Jan 28, 2016 at 12:44:25AM +0100, Henrik Goldman wrote:
> > Hello,
> > 
> > Has anyone (possibly except purestorage) managed to make target work
> > with deduplication?
> 
> The iblock drivers works perfectly fine on top of the dm-dedup driver,
> which unfortunately still hasn't made it to mainline despite looking
> rather solid.

I'm working on a userland dedup tool at the moment (thin_archive), and
I think there are serious issues with dm-dedup:

- To do dedup properly you need to use a variable, small chunk size.
  This chunk size depends on the contents of the data (google 'content
  based chunking algorithms).  I did some experiments comparing fixed
  to variable chunk sizes and the difference was huge.  It also varied
  significantly depending on which file system was used.  I don't
  think a fixed sized chunk is going to identify nearly as many
  duplicates as people are expecting.

- Performance depends on being able to take a hash of a data block
  (eg, SHA1) and quickly look it up to see if that chunk has been seen
  before.  There are two plug-ins to dm-dedup that provide this look up:

  i) a ram based one.

  This will be fine on small systems, but as the number of chunks
  stored in the system increases ram consumption will go up
  significantly.  eg, a 4T disk, split into 64k chunks (too big IMO)
  will lead to 2^26 chunks (let's ignore duplicates for the moment).
  Each entry in the hash table needs to store the hash let's say 20
  bytes for SHA1, plus the physical chunk address 8bytes, plus some
  overhead for the hash table itself 4bytes.  Which gives us 32bytes
  per entry.  So our 4T disk is going to eat 2G of RAM, and I'm still
  sceptical that it will identify many duplicates.

  (I'm not sure how the ram based one recovers if there a crash)

  ii) one that uses the btrees from my persistent data library.

  On the face of it this should be better than the ram version since
  it'll just page in the metadata as it needs it.  But we're keying off
  hashes like SHA1, which are designed to be pseudo random, and will
  hit every page of metadata evenly.  So we'll be constantly trying to
  page in the whole tree.

Commercial systems use a couple of tricks to get round these problems:

   i) Use a bloom filter to quickly determine if a chunk is _not_ already
      present, this the common case, and so determining it quickly is very
      important.

   ii) Store the hashes on disk in stream order and page in big blocks of
       these hashes as required.  The reasoning being that similar
       sequences of chunks are likely to be hit again.

- Joe

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dm-devel] Target and deduplication?
  2016-01-28 11:23   ` [dm-devel] " Joe Thornber
@ 2016-01-28 11:39     ` Nikolay Borisov
  0 siblings, 0 replies; 6+ messages in thread
From: Nikolay Borisov @ 2016-01-28 11:39 UTC (permalink / raw)
  To: device-mapper development, Henrik Goldman, target-devel



On 01/28/2016 01:23 PM, Joe Thornber wrote:
> On Thu, Jan 28, 2016 at 12:50:13AM -0800, Christoph Hellwig wrote:
>> On Thu, Jan 28, 2016 at 12:44:25AM +0100, Henrik Goldman wrote:
>>> Hello,
>>>
>>> Has anyone (possibly except purestorage) managed to make target work
>>> with deduplication?
>>
>> The iblock drivers works perfectly fine on top of the dm-dedup driver,
>> which unfortunately still hasn't made it to mainline despite looking
>> rather solid.
> 
> I'm working on a userland dedup tool at the moment (thin_archive), and
> I think there are serious issues with dm-dedup:
> 
> - To do dedup properly you need to use a variable, small chunk size.
>   This chunk size depends on the contents of the data (google 'content
>   based chunking algorithms).  I did some experiments comparing fixed
>   to variable chunk sizes and the difference was huge.  It also varied
>   significantly depending on which file system was used.  I don't
>   think a fixed sized chunk is going to identify nearly as many
>   duplicates as people are expecting.
> 
> - Performance depends on being able to take a hash of a data block
>   (eg, SHA1) and quickly look it up to see if that chunk has been seen
>   before.  There are two plug-ins to dm-dedup that provide this look up:
> 
>   i) a ram based one.
> 
>   This will be fine on small systems, but as the number of chunks
>   stored in the system increases ram consumption will go up
>   significantly.  eg, a 4T disk, split into 64k chunks (too big IMO)
>   will lead to 2^26 chunks (let's ignore duplicates for the moment).
>   Each entry in the hash table needs to store the hash let's say 20
>   bytes for SHA1, plus the physical chunk address 8bytes, plus some
>   overhead for the hash table itself 4bytes.  Which gives us 32bytes
>   per entry.  So our 4T disk is going to eat 2G of RAM, and I'm still
>   sceptical that it will identify many duplicates.
> 
>   (I'm not sure how the ram based one recovers if there a crash)

I did some email exchanges with the people who implemented this and they
essentially said the RAM-based dedup wouldn't work in case of a crash
since data is not serialised on-disk. As far as I understood it it was
done solely so that they can have a baseline when comparing the other
hashing backends (the btree one and a hdd one, more on that later)


> 
>   ii) one that uses the btrees from my persistent data library.
> 
>   On the face of it this should be better than the ram version since
>   it'll just page in the metadata as it needs it.  But we're keying off
>   hashes like SHA1, which are designed to be pseudo random, and will
>   hit every page of metadata evenly.  So we'll be constantly trying to
>   page in the whole tree.

I did some performance tests and this was veery slow, dunno if it was
due to the specific implementation or because of the increased
complexity in getting data to/from disk, essentially amplifying I/O.

They also had a 3rd backend which was based on RAM but was saving data
to disk and were also using the dm-bufio to do caching before actually
writing to disk. The idea was to strike a balance between durability and
speed. The bad thing there was that in case of a crash one could
potentially suffer some loss of block data if stuff hasn't been
committed from the dm-bufio.

> 
> Commercial systems use a couple of tricks to get round these problems:
> 
>    i) Use a bloom filter to quickly determine if a chunk is _not_ already
>       present, this the common case, and so determining it quickly is very
>       important.
> 
>    ii) Store the hashes on disk in stream order and page in big blocks of
>        these hashes as required.  The reasoning being that similar
>        sequences of chunks are likely to be hit again.
> 
> - Joe
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Target and deduplication?
  2016-01-28  8:50 ` Target and deduplication? Christoph Hellwig
  2016-01-28 11:23   ` [dm-devel] " Joe Thornber
@ 2016-01-28 13:56   ` Mike Snitzer
  2016-02-02  3:51     ` Alex Gorbachev
  2016-02-02 18:15     ` Christoph Hellwig
  1 sibling, 2 replies; 6+ messages in thread
From: Mike Snitzer @ 2016-01-28 13:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Henrik Goldman, target-devel, dm-devel

On Thu, Jan 28 2016 at  3:50am -0500,
Christoph Hellwig <hch@infradead.org> wrote:

> On Thu, Jan 28, 2016 at 12:44:25AM +0100, Henrik Goldman wrote:
> > Hello,
> > 
> > Has anyone (possibly except purestorage) managed to make target work
> > with deduplication?
> 
> The iblock drivers works perfectly fine on top of the dm-dedup driver,
> which unfortunately still hasn't made it to mainline despite looking
> rather solid.

The author(s) fell off the face of the earth.  And in testing it fell
over pretty easily.  So I really have no idea what your view of it
"looking rather solid" is based on.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Target and deduplication?
  2016-01-28 13:56   ` Mike Snitzer
@ 2016-02-02  3:51     ` Alex Gorbachev
  2016-02-02 18:15     ` Christoph Hellwig
  1 sibling, 0 replies; 6+ messages in thread
From: Alex Gorbachev @ 2016-02-02  3:51 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Christoph Hellwig, Henrik Goldman, dm-devel, target-devel

>> On Thu, Jan 28, 2016 at 12:44:25AM +0100, Henrik Goldman wrote:
>> > Hello,
>> >
>> > Has anyone (possibly except purestorage) managed to make target work
>> > with deduplication?
>>
>> The iblock drivers works perfectly fine on top of the dm-dedup driver,
>> which unfortunately still hasn't made it to mainline despite looking
>> rather solid.
>
> The author(s) fell off the face of the earth.  And in testing it fell
> over pretty easily.  So I really have no idea what your view of it
> "looking rather solid" is based on.
> --

If for archival, not for inline dedup, we have had good results with
https://github.com/borgbackup

Regards,
Alex

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Target and deduplication?
  2016-01-28 13:56   ` Mike Snitzer
  2016-02-02  3:51     ` Alex Gorbachev
@ 2016-02-02 18:15     ` Christoph Hellwig
  1 sibling, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2016-02-02 18:15 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Christoph Hellwig, Henrik Goldman, dm-devel, target-devel

On Thu, Jan 28, 2016 at 08:56:54AM -0500, Mike Snitzer wrote:
> The author(s) fell off the face of the earth.  And in testing it fell
> over pretty easily.  So I really have no idea what your view of it
> "looking rather solid" is based on.

I did a fair amount of testing when it first appeared about 2 years
ago, and it seemed to work fine.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-02-02 18:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAKdSmgsZe_dfhN_7R6P7tLfkciv-9dUJKKNj16xeWYXG8aTVVA@mail.gmail.com>
2016-01-28  8:50 ` Target and deduplication? Christoph Hellwig
2016-01-28 11:23   ` [dm-devel] " Joe Thornber
2016-01-28 11:39     ` Nikolay Borisov
2016-01-28 13:56   ` Mike Snitzer
2016-02-02  3:51     ` Alex Gorbachev
2016-02-02 18:15     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).