linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Offline Deduplication for Btrfs
@ 2011-01-06  9:37 Tomasz Chmielewski
  2011-01-06  9:51 ` Mike Hommey
  2011-01-06 10:52 ` Gordan Bobic
  0 siblings, 2 replies; 50+ messages in thread
From: Tomasz Chmielewski @ 2011-01-06  9:37 UTC (permalink / raw)
  To: linux-btrfs

> I have been thinking a lot about de-duplication for a backup application
> I am writing. I wrote a little script to figure out how much it would
> save me. For my laptop home directory, about 100 GiB of data, it was a
> couple of percent, depending a bit on the size of the chunks. With 4 KiB
> chunks, I would save about two gigabytes. (That's assuming no MD5 hash
> collisions.) I don't have VM images, but I do have a fair bit of saved
> e-mail. So, for backups, I concluded it was worth it to provide an
> option to do this. I have no opinion on whether it is worthwhile to do
> in btrfs.

Online deduplication is very useful for backups of big, multi-gigabyte 
files which change constantly.
Some mail servers store files this way; some MUA store the files like 
this; databases are also common to pack everything in big files which 
tend to change here and there almost all the time.

Multi-gigabyte files which only have few megabytes changed can't be 
hardlinked; simple maths shows that even compressing multiple files 
which have few differences will lead to greater space usage than a few 
megabytes extra in each (because everything else is deduplicated).

And I don't even want to think about IO needed to offline dedup a 
multi-terabyte storage (1 TB disks and bigger are becoming standard 
nowadays) i.e. daily, especially when the storage is already heavily 
used in IO terms.


Now, one popular tool which can deal with small changes in files is 
rsync. It can be used to copy files over the network - so that if you 
want to copy/update a multi-gigabyte file which only has a few changes, 
rsync would need to transfer just a few megabytes.

On disk however, rsync creates a "temporary copy" of the original file, 
where it packs unchanged contents together with any changes made. For 
example, while it copies/updates a file, we will have:

original_file.bin
.temporary_random_name

Later, original_file.bin would be removed, and .temporary_random_name 
would be renamed to original_file.bin. Here goes away any deduplication 
we had so far, we have to start the IO over again.


-- 
Tomasz Chmielewski
http://wpkg.org

^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: Offline Deduplication for Btrfs
@ 2011-01-16  0:18 Arjen Nienhuis
  0 siblings, 0 replies; 50+ messages in thread
From: Arjen Nienhuis @ 2011-01-16  0:18 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

Hi,

I like your idea and implementation for offline deduplication a lot. I
think it will save me 50% of my backup storage!

Your code walks/scans the directory/file tree of the filesystem. Would
it be possible to walk/scan the disk extents sequentially in disk
order?

- This would be more I/O-efficient
- This would save you reading previously
deduped/snapshotted/hardlinked files more than once.
- Maybe this would make it possible to deduplicate directories as well.

Met vriendelijke groet,
Arjen Nienhuis

P.S. The NTFS implementation on Windows has 'ioctls' to read the MFT
sequentially in disk order and it's *fast*. It's being used for things
like defrag.

^ permalink raw reply	[flat|nested] 50+ messages in thread
* Offline Deduplication for Btrfs
@ 2011-01-05 16:36 Josef Bacik
  2011-01-05 17:42 ` Gordan Bobic
  2011-01-06  8:25 ` Yan, Zheng 
  0 siblings, 2 replies; 50+ messages in thread
From: Josef Bacik @ 2011-01-05 16:36 UTC (permalink / raw)
  To: linux-btrfs

Here are patches to do offline deduplication for Btrfs.  It works well for the
cases it's expected to, I'm looking for feedback on the ioctl interface and
such, I'm well aware there are missing features for the userspace app (like
being able to set a different blocksize).  If this interface is acceptable I
will flesh out the userspace app a little more, but I believe the kernel side is
ready to go.

Basically I think online dedup is huge waste of time and completely useless.
You are going to want to do different things with different data.  For example,
for a mailserver you are going to want to have very small blocksizes, but for
say a virtualization image store you are going to want much larger blocksizes.
And lets not get into heterogeneous environments, those just get much too
complicated.  So my solution is batched dedup, where a user just runs this
command and it dedups everything at this point.  This avoids the very costly
overhead of having to hash and lookup for duplicate extents online and lets us
be _much_ more flexible about what we want to deduplicate and how we want to do
it.

For the userspace app it only does 64k blocks, or whatever the largest area it
can read out of a file.  I'm going to extend this to do the following things in
the near future

1) Take the blocksize as an argument so we can have bigger/smaller blocks
2) Have an option to _only_ honor the blocksize, don't try and dedup smaller
blocks
3) Use fiemap to try and dedup extents as a whole and just ignore specific
blocksizes
4) Use fiemap to determine what would be the most optimal blocksize for the data
you want to dedup.

I've tested this out on my setup and it seems to work well.  I appreciate any
feedback you may have.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2011-01-16  0:18 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-06  9:37 Offline Deduplication for Btrfs Tomasz Chmielewski
2011-01-06  9:51 ` Mike Hommey
2011-01-06 16:57   ` Hubert Kario
2011-01-06 10:52 ` Gordan Bobic
  -- strict thread matches above, loose matches on Subject: below --
2011-01-16  0:18 Arjen Nienhuis
2011-01-05 16:36 Josef Bacik
2011-01-05 17:42 ` Gordan Bobic
2011-01-05 18:41   ` Diego Calleja
2011-01-05 19:01     ` Ray Van Dolson
2011-01-05 20:27       ` Gordan Bobic
2011-01-05 20:28       ` Josef Bacik
2011-01-05 20:25     ` Gordan Bobic
2011-01-05 21:14       ` Diego Calleja
2011-01-05 21:21         ` Gordan Bobic
2011-01-05 19:46   ` Josef Bacik
2011-01-05 19:58     ` Lars Wirzenius
2011-01-05 20:15       ` Josef Bacik
2011-01-05 20:34         ` Freddie Cash
2011-01-05 21:07       ` Lars Wirzenius
2011-01-05 20:12     ` Freddie Cash
2011-01-05 20:46     ` Gordan Bobic
     [not found]       ` <4D250B3C.6010708@shiftmail.org>
2011-01-06  1:03         ` Gordan Bobic
2011-01-06  1:56           ` Spelic
2011-01-06 10:39             ` Gordan Bobic
2011-01-06  3:33           ` Freddie Cash
2011-01-06  1:19       ` Spelic
2011-01-06  3:58         ` Peter A
2011-01-06 10:48           ` Gordan Bobic
2011-01-06 13:33             ` Peter A
2011-01-06 14:00               ` Gordan Bobic
2011-01-06 14:52                 ` Peter A
2011-01-06 15:07                   ` Gordan Bobic
2011-01-06 16:11                     ` Peter A
2011-01-06 18:35           ` Chris Mason
2011-01-08  0:27             ` Peter A
2011-01-06 14:30         ` Tomasz Torcz
2011-01-06 14:49           ` Gordan Bobic
2011-01-06  1:29   ` Chris Mason
2011-01-06 10:33     ` Gordan Bobic
2011-01-10 15:28     ` Ric Wheeler
2011-01-10 15:37       ` Josef Bacik
2011-01-10 15:39         ` Chris Mason
2011-01-10 15:43           ` Josef Bacik
2011-01-06 12:18   ` Simon Farnsworth
2011-01-06 12:29     ` Gordan Bobic
2011-01-06 13:30       ` Simon Farnsworth
2011-01-06 14:20     ` Ondřej Bílka
2011-01-06 14:41       ` Gordan Bobic
2011-01-06 15:37         ` Ondřej Bílka
2011-01-06  8:25 ` Yan, Zheng 

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).