linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Lakshmipathi.G" <lakshmipathi.g@giis.co.in>
To: linux-btrfs@vger.kernel.org
Subject: dduper - Offline btrfs deduplication tool
Date: Fri, 24 Aug 2018 10:01:39 +0530	[thread overview]
Message-ID: <20180824043139.GA8263@giis.co.in> (raw)

Hi -

dduper is an offline dedupe tool. Instead of reading whole file blocks and
computing checksum, It works by fetching checksum from BTRFS csum tree. This 
hugely improves the performance. 

dduper works like:
	- Read csum for given two files.
	- Find matching location.
	- Pass the location to ioctl_ficlonerange directly
  	  instead of ioctl_fideduperange

By default, dduper adds safty check to above steps by creating a 
backup reflink file and compares the md5sum after dedupe. 
If the backup file matches new deduped file, then backup file is 
removed. You can skip this check by passing --skip option. Here is 
sample cli usage [1] and quick demo [2]  

Some performance numbers: (with -skip option)

Dedupe two 1GB files with same  content - 1.2 seconds
Dedupe two 5GB files with same  content - 8.2 seconds
Dedupe two 10GB files with same  content - 13.8 seconds

dduper requires `btrfs inspect-internal dump-csum` command, you can use 
this branch [3] or apply patch by yourself [4] 

[1] https://gitlab.collabora.com/laks/btrfs-progs/blob/dump_csum/Documentation/dduper_usage.md
[2] http://giis.co.in/btrfs_dedupe.gif
[3] git clone https://gitlab.collabora.com/laks/btrfs-progs.git -b  dump_csum
[4] https://patchwork.kernel.org/patch/10540229/ 

Please remember its version-0.1, so test it out, if you plan to use dduper real data.
Let me know, if you have suggestions or feedback or bugs :)

Cheers.
Lakshmipathi.G

             reply	other threads:[~2018-08-24  8:13 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-24  4:31 Lakshmipathi.G [this message]
2018-09-05 16:00 ` dduper - Offline btrfs deduplication tool Timofey Titovets
2018-09-07  3:57   ` Lakshmipathi.G
2018-09-07 14:31     ` Adam Borowski
2018-10-02 16:05       ` Lakshmipathi.G
2018-09-07 23:32     ` Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180824043139.GA8263@giis.co.in \
    --to=lakshmipathi.g@giis.co.in \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).