From: Konstantinos Skarlatos <k.skarlatos@gmail.com>
To: Josef Bacik <jbacik@fusionio.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [RFC] Online dedup for Btrfs
Date: Mon, 01 Apr 2013 19:16:48 +0300 [thread overview]
Message-ID: <5159B2F0.9060704@gmail.com> (raw)
In-Reply-To: <20130401153859.GJ1876@localhost.localdomain>
On 1/4/2013 6:38 μμ, Josef Bacik wrote:
> On Mon, Apr 01, 2013 at 08:50:34AM -0400, Josef Bacik wrote:
>> Hello,
>>
>> I was bored this weekend so I hacked up online dedup for Btrfs. It's working
>> quite well so I think it can be more widely tested. There are two ways to use
>> it
>>
>> 1) Compatible mode - this is a bit slower but will handle being used by older
>> kernels. We use the csum tree to find duplicate blocks. Since it is relatively
>> easy to have crc32c collisions this also involves reading the block from disk
>> and doing a memcmp with the block we want to write to verify it has the same
>> data. This is way slow but hey, no incompat flag!
>>
>> 2) Incompatible mode - so this is the way you probably want to use it if you
>> don't care about being able to go back to older kernels. You select your
>> hashing function (at the momement I only support sha1 but there is room in the
>> format to have different functions). This creates a btree indexed by the hash
>> and the bytenr. Then we lookup the hash and just link the extent in if it
>> matches the hash. You can use -o paranoid-dedup if you are paranoid about hash
>> collisions and this will force it to do the memcmp() dance to make sure that the
>> extent we are deduping really matches the extent.
>>
>> So performance wise obviously the compat mode sucks. It's about 50% slower on
>> disk and about 20% slower on my Fusion card. We get pretty good space savings,
>> about 10% in my horrible test (just copy a git tree onto the fs), but IMHO not
>> worth the performance hit.
>>
>> The incompat mode is a bit better, only 15% drop on disk and about 10% on my
>> fusion card. Closer to the crc numbers if we have -o paranoid-dedup. The space
>> savings is better since it uses the original extent sizes, we get about 15%
>> space savings. Please feel free to pull and try it, you can get it here
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git dedup
>>
>> Thanks!
>>
> It's been pointed out to me that this is probably too serious, so just FYI it's
> April 1st where I am. Thanks,
Well I believed it too, and was writing an email with questions etc. I
almost sent it, but then I saw git was downloading hundreds and hundreds
of MB of data :)
Well done anyway!
>
> Josef
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2013-04-01 16:16 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-01 12:50 [RFC] Online dedup for Btrfs Josef Bacik
2013-04-01 14:44 ` Harald Glatt
2013-04-18 15:07 ` Martin
2013-04-01 15:38 ` Josef Bacik
2013-04-01 15:50 ` Harald Glatt
2013-04-01 16:16 ` Konstantinos Skarlatos [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5159B2F0.9060704@gmail.com \
--to=k.skarlatos@gmail.com \
--cc=jbacik@fusionio.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox