All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2 RFC] Online data deduplication
@ 2013-04-07 13:12 Liu Bo
  2013-04-07 13:12 ` [PATCH 1/2] Btrfs: online " Liu Bo
  2013-04-07 13:12 ` [PATCH 2/2] Btrfs: skip merge part for delayed data refs Liu Bo
  0 siblings, 2 replies; 17+ messages in thread
From: Liu Bo @ 2013-04-07 13:12 UTC (permalink / raw)
  To: linux-btrfs

This is the first attempt for online data deduplication.

NOTE: This leads to a FORMAT CHANGE, DO NOT use it on real data!

Data deduplication is a specialized data compression technique for eliminating
duplicate copies of repeating data.[1]

This patch set is also related to "Content based storage" in project ideas[2].

For more implementation details, please refer to PATCH 1.

PATCH 2 is a hang fix when deduplication is on.

======
HOW To turn deduplication on:

There are 2 steps you need to do before using it,
1) mount with option "-o dedup"
2) then run 'btrfs filesystem sync /mnt_of_your_btrfs'
(Because I hack 'btrfs fi sync' to enable deduplication...)

Here is an example:
1) mkfs.btrfs /dev/sdb1
2) mount /dev/sdb1 /mnt/btrfs -o dedup
3) btrfs filesystem sync /mnt/btrfs
4) btrfs fi df /mnt/btrfs
   Data: total=8.00MB, used=256.00KB
   System, DUP: total=8.00MB, used=4.00KB
   System: total=4.00MB, used=0.00
   Metadata, DUP: total=1.00GB, used=28.00KB
   Metadata: total=8.00MB, used=0.00

5) dd if=/dev/zero of=/mnt/btrfs/foo bs=4K count=1; sync
6) dd if=/dev/zero of=/mnt/btrfs/foo bs=1M count=10; sync
   Data: total=1.01GB, used=260.00KB
   System, DUP: total=8.00MB, used=4.00KB
   System: total=4.00MB, used=0.00
   Metadata, DUP: total=1.00GB, used=432.00KB
   Metadata: total=8.00MB, used=0.00

So 4K+10M has been written, but used=256.00KB -> used=260.00KB,
only 4KB is used!

=====================
TODO:
1) a bit-to-bit comparison callback.
2) support for alternative blocksize larger than PAGESIZE

I just tested it with simple cases like above, and not even with xfstests, which
is what I'm going to do.

Any comments are welcome!


[1]: http://en.wikipedia.org/wiki/Data_deduplication
[2]: https://btrfs.wiki.kernel.org/index.php/Project_ideas#Content_based_storage

Liu Bo (2):
  Btrfs: online data deduplication
  Btrfs: skip merge part for delayed data refs

 fs/btrfs/ctree.h        |   53 ++++++++
 fs/btrfs/delayed-ref.c  |    7 +
 fs/btrfs/disk-io.c      |   33 +++++-
 fs/btrfs/extent-tree.c  |   22 +++-
 fs/btrfs/extent_io.c    |    8 +-
 fs/btrfs/extent_io.h    |   11 ++
 fs/btrfs/file-item.c    |  184 ++++++++++++++++++++++++++
 fs/btrfs/file.c         |    6 +-
 fs/btrfs/inode.c        |  327 +++++++++++++++++++++++++++++++++++++++++++----
 fs/btrfs/ioctl.c        |   34 +++++-
 fs/btrfs/ordered-data.c |   25 +++-
 fs/btrfs/ordered-data.h |    9 ++
 fs/btrfs/print-tree.c   |    6 +-
 fs/btrfs/super.c        |    7 +-
 14 files changed, 687 insertions(+), 45 deletions(-)

-- 
1.7.7


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-04-10 15:52 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-07 13:12 [PATCH 0/2 RFC] Online data deduplication Liu Bo
2013-04-07 13:12 ` [PATCH 1/2] Btrfs: online " Liu Bo
2013-04-08 12:54   ` Josef Bacik
2013-04-08 14:16     ` Liu Bo
2013-04-08 20:37       ` Josef Bacik
2013-04-09  1:34         ` Liu Bo
2013-04-09  1:48           ` Josef Bacik
2013-04-10 14:21             ` Liu Bo
2013-04-09  1:40       ` Miao Xie
2013-04-08 13:47   ` David Sterba
2013-04-08 14:08     ` Liu Bo
2013-04-10 15:42       ` David Sterba
2013-04-09  1:52     ` Miao Xie
2013-04-10 15:52       ` David Sterba
2013-04-10 12:05   ` Marek Otahal
2013-04-10 14:14     ` Liu Bo
2013-04-07 13:12 ` [PATCH 2/2] Btrfs: skip merge part for delayed data refs Liu Bo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.