Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Josef Bacik <jbacik@fusionio.com>
To: Josef Bacik <jbacik@fusionio.com>
Cc: <linux-btrfs@vger.kernel.org>
Subject: Re: [RFC] Online dedup for Btrfs
Date: Mon, 1 Apr 2013 11:38:59 -0400	[thread overview]
Message-ID: <20130401153859.GJ1876@localhost.localdomain> (raw)
In-Reply-To: <20130401125034.GG1876@localhost.localdomain>

On Mon, Apr 01, 2013 at 08:50:34AM -0400, Josef Bacik wrote:
> Hello,
> 
> I was bored this weekend so I hacked up online dedup for Btrfs.  It's working
> quite well so I think it can be more widely tested.  There are two ways to use
> it
> 
> 1) Compatible mode - this is a bit slower but will handle being used by older
> kernels.  We use the csum tree to find duplicate blocks.  Since it is relatively
> easy to have crc32c collisions this also involves reading the block from disk
> and doing a memcmp with the block we want to write to verify it has the same
> data.  This is way slow but hey, no incompat flag!
> 
> 2) Incompatible mode - so this is the way you probably want to use it if you
> don't care about being able to go back to older kernels.  You select your
> hashing function (at the momement I only support sha1 but there is room in the
> format to have different functions).  This creates a btree indexed by the hash
> and the bytenr.  Then we lookup the hash and just link the extent in if it
> matches the hash.  You can use -o paranoid-dedup if you are paranoid about hash
> collisions and this will force it to do the memcmp() dance to make sure that the
> extent we are deduping really matches the extent.
> 
> So performance wise obviously the compat mode sucks.  It's about 50% slower on
> disk and about 20% slower on my Fusion card.  We get pretty good space savings,
> about 10% in my horrible test (just copy a git tree onto the fs), but IMHO not
> worth the performance hit.
> 
> The incompat mode is a bit better, only 15% drop on disk and about 10% on my
> fusion card.  Closer to the crc numbers if we have -o paranoid-dedup.  The space
> savings is better since it uses the original extent sizes, we get about 15%
> space savings.  Please feel free to pull and try it, you can get it here
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git dedup
> 
> Thanks!
> 

It's been pointed out to me that this is probably too serious, so just FYI it's
April 1st where I am.  Thanks,

Josef

  parent reply	other threads:[~2013-04-01 15:39 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-01 12:50 [RFC] Online dedup for Btrfs Josef Bacik
2013-04-01 14:44 ` Harald Glatt
2013-04-18 15:07   ` Martin
2013-04-01 15:38 ` Josef Bacik [this message]
2013-04-01 15:50   ` Harald Glatt
2013-04-01 16:16   ` Konstantinos Skarlatos

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130401153859.GJ1876@localhost.localdomain \
    --to=jbacik@fusionio.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox