* [RFC] Online dedup for Btrfs
@ 2013-04-01 12:50 Josef Bacik
2013-04-01 14:44 ` Harald Glatt
2013-04-01 15:38 ` Josef Bacik
0 siblings, 2 replies; 6+ messages in thread
From: Josef Bacik @ 2013-04-01 12:50 UTC (permalink / raw)
To: linux-btrfs
Hello,
I was bored this weekend so I hacked up online dedup for Btrfs. It's working
quite well so I think it can be more widely tested. There are two ways to use
it
1) Compatible mode - this is a bit slower but will handle being used by older
kernels. We use the csum tree to find duplicate blocks. Since it is relatively
easy to have crc32c collisions this also involves reading the block from disk
and doing a memcmp with the block we want to write to verify it has the same
data. This is way slow but hey, no incompat flag!
2) Incompatible mode - so this is the way you probably want to use it if you
don't care about being able to go back to older kernels. You select your
hashing function (at the momement I only support sha1 but there is room in the
format to have different functions). This creates a btree indexed by the hash
and the bytenr. Then we lookup the hash and just link the extent in if it
matches the hash. You can use -o paranoid-dedup if you are paranoid about hash
collisions and this will force it to do the memcmp() dance to make sure that the
extent we are deduping really matches the extent.
So performance wise obviously the compat mode sucks. It's about 50% slower on
disk and about 20% slower on my Fusion card. We get pretty good space savings,
about 10% in my horrible test (just copy a git tree onto the fs), but IMHO not
worth the performance hit.
The incompat mode is a bit better, only 15% drop on disk and about 10% on my
fusion card. Closer to the crc numbers if we have -o paranoid-dedup. The space
savings is better since it uses the original extent sizes, we get about 15%
space savings. Please feel free to pull and try it, you can get it here
git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git dedup
Thanks!
Josef
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] Online dedup for Btrfs
2013-04-01 12:50 [RFC] Online dedup for Btrfs Josef Bacik
@ 2013-04-01 14:44 ` Harald Glatt
2013-04-18 15:07 ` Martin
2013-04-01 15:38 ` Josef Bacik
1 sibling, 1 reply; 6+ messages in thread
From: Harald Glatt @ 2013-04-01 14:44 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs
On Mon, Apr 1, 2013 at 2:50 PM, Josef Bacik <jbacik@fusionio.com> wrote:
> Hello,
>
> I was bored this weekend so I hacked up online dedup for Btrfs. It's working
> quite well so I think it can be more widely tested. There are two ways to use
> it
>
> 1) Compatible mode - this is a bit slower but will handle being used by older
> kernels. We use the csum tree to find duplicate blocks. Since it is relatively
> easy to have crc32c collisions this also involves reading the block from disk
> and doing a memcmp with the block we want to write to verify it has the same
> data. This is way slow but hey, no incompat flag!
>
> 2) Incompatible mode - so this is the way you probably want to use it if you
> don't care about being able to go back to older kernels. You select your
> hashing function (at the momement I only support sha1 but there is room in the
> format to have different functions). This creates a btree indexed by the hash
> and the bytenr. Then we lookup the hash and just link the extent in if it
> matches the hash. You can use -o paranoid-dedup if you are paranoid about hash
> collisions and this will force it to do the memcmp() dance to make sure that the
> extent we are deduping really matches the extent.
>
> So performance wise obviously the compat mode sucks. It's about 50% slower on
> disk and about 20% slower on my Fusion card. We get pretty good space savings,
> about 10% in my horrible test (just copy a git tree onto the fs), but IMHO not
> worth the performance hit.
>
> The incompat mode is a bit better, only 15% drop on disk and about 10% on my
> fusion card. Closer to the crc numbers if we have -o paranoid-dedup. The space
> savings is better since it uses the original extent sizes, we get about 15%
> space savings. Please feel free to pull and try it, you can get it here
>
> git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git dedup
>
> Thanks!
>
> Josef
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Hey Josef,
that's really cool! Can this be used together with lzo compression for
example? How high (roughly) is the impact of something like
force-compress=lzo compared to the 15% hit from this dedup?
Thanks!
Harald
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] Online dedup for Btrfs
2013-04-01 12:50 [RFC] Online dedup for Btrfs Josef Bacik
2013-04-01 14:44 ` Harald Glatt
@ 2013-04-01 15:38 ` Josef Bacik
2013-04-01 15:50 ` Harald Glatt
2013-04-01 16:16 ` Konstantinos Skarlatos
1 sibling, 2 replies; 6+ messages in thread
From: Josef Bacik @ 2013-04-01 15:38 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs
On Mon, Apr 01, 2013 at 08:50:34AM -0400, Josef Bacik wrote:
> Hello,
>
> I was bored this weekend so I hacked up online dedup for Btrfs. It's working
> quite well so I think it can be more widely tested. There are two ways to use
> it
>
> 1) Compatible mode - this is a bit slower but will handle being used by older
> kernels. We use the csum tree to find duplicate blocks. Since it is relatively
> easy to have crc32c collisions this also involves reading the block from disk
> and doing a memcmp with the block we want to write to verify it has the same
> data. This is way slow but hey, no incompat flag!
>
> 2) Incompatible mode - so this is the way you probably want to use it if you
> don't care about being able to go back to older kernels. You select your
> hashing function (at the momement I only support sha1 but there is room in the
> format to have different functions). This creates a btree indexed by the hash
> and the bytenr. Then we lookup the hash and just link the extent in if it
> matches the hash. You can use -o paranoid-dedup if you are paranoid about hash
> collisions and this will force it to do the memcmp() dance to make sure that the
> extent we are deduping really matches the extent.
>
> So performance wise obviously the compat mode sucks. It's about 50% slower on
> disk and about 20% slower on my Fusion card. We get pretty good space savings,
> about 10% in my horrible test (just copy a git tree onto the fs), but IMHO not
> worth the performance hit.
>
> The incompat mode is a bit better, only 15% drop on disk and about 10% on my
> fusion card. Closer to the crc numbers if we have -o paranoid-dedup. The space
> savings is better since it uses the original extent sizes, we get about 15%
> space savings. Please feel free to pull and try it, you can get it here
>
> git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git dedup
>
> Thanks!
>
It's been pointed out to me that this is probably too serious, so just FYI it's
April 1st where I am. Thanks,
Josef
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] Online dedup for Btrfs
2013-04-01 15:38 ` Josef Bacik
@ 2013-04-01 15:50 ` Harald Glatt
2013-04-01 16:16 ` Konstantinos Skarlatos
1 sibling, 0 replies; 6+ messages in thread
From: Harald Glatt @ 2013-04-01 15:50 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs
Oh man :D It was so elaborate that I really believed it.... :P
On Mon, Apr 1, 2013 at 5:38 PM, Josef Bacik <jbacik@fusionio.com> wrote:
> On Mon, Apr 01, 2013 at 08:50:34AM -0400, Josef Bacik wrote:
>> Hello,
>>
>> I was bored this weekend so I hacked up online dedup for Btrfs. It's working
>> quite well so I think it can be more widely tested. There are two ways to use
>> it
>>
>> 1) Compatible mode - this is a bit slower but will handle being used by older
>> kernels. We use the csum tree to find duplicate blocks. Since it is relatively
>> easy to have crc32c collisions this also involves reading the block from disk
>> and doing a memcmp with the block we want to write to verify it has the same
>> data. This is way slow but hey, no incompat flag!
>>
>> 2) Incompatible mode - so this is the way you probably want to use it if you
>> don't care about being able to go back to older kernels. You select your
>> hashing function (at the momement I only support sha1 but there is room in the
>> format to have different functions). This creates a btree indexed by the hash
>> and the bytenr. Then we lookup the hash and just link the extent in if it
>> matches the hash. You can use -o paranoid-dedup if you are paranoid about hash
>> collisions and this will force it to do the memcmp() dance to make sure that the
>> extent we are deduping really matches the extent.
>>
>> So performance wise obviously the compat mode sucks. It's about 50% slower on
>> disk and about 20% slower on my Fusion card. We get pretty good space savings,
>> about 10% in my horrible test (just copy a git tree onto the fs), but IMHO not
>> worth the performance hit.
>>
>> The incompat mode is a bit better, only 15% drop on disk and about 10% on my
>> fusion card. Closer to the crc numbers if we have -o paranoid-dedup. The space
>> savings is better since it uses the original extent sizes, we get about 15%
>> space savings. Please feel free to pull and try it, you can get it here
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git dedup
>>
>> Thanks!
>>
>
> It's been pointed out to me that this is probably too serious, so just FYI it's
> April 1st where I am. Thanks,
>
> Josef
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] Online dedup for Btrfs
2013-04-01 15:38 ` Josef Bacik
2013-04-01 15:50 ` Harald Glatt
@ 2013-04-01 16:16 ` Konstantinos Skarlatos
1 sibling, 0 replies; 6+ messages in thread
From: Konstantinos Skarlatos @ 2013-04-01 16:16 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs
On 1/4/2013 6:38 μμ, Josef Bacik wrote:
> On Mon, Apr 01, 2013 at 08:50:34AM -0400, Josef Bacik wrote:
>> Hello,
>>
>> I was bored this weekend so I hacked up online dedup for Btrfs. It's working
>> quite well so I think it can be more widely tested. There are two ways to use
>> it
>>
>> 1) Compatible mode - this is a bit slower but will handle being used by older
>> kernels. We use the csum tree to find duplicate blocks. Since it is relatively
>> easy to have crc32c collisions this also involves reading the block from disk
>> and doing a memcmp with the block we want to write to verify it has the same
>> data. This is way slow but hey, no incompat flag!
>>
>> 2) Incompatible mode - so this is the way you probably want to use it if you
>> don't care about being able to go back to older kernels. You select your
>> hashing function (at the momement I only support sha1 but there is room in the
>> format to have different functions). This creates a btree indexed by the hash
>> and the bytenr. Then we lookup the hash and just link the extent in if it
>> matches the hash. You can use -o paranoid-dedup if you are paranoid about hash
>> collisions and this will force it to do the memcmp() dance to make sure that the
>> extent we are deduping really matches the extent.
>>
>> So performance wise obviously the compat mode sucks. It's about 50% slower on
>> disk and about 20% slower on my Fusion card. We get pretty good space savings,
>> about 10% in my horrible test (just copy a git tree onto the fs), but IMHO not
>> worth the performance hit.
>>
>> The incompat mode is a bit better, only 15% drop on disk and about 10% on my
>> fusion card. Closer to the crc numbers if we have -o paranoid-dedup. The space
>> savings is better since it uses the original extent sizes, we get about 15%
>> space savings. Please feel free to pull and try it, you can get it here
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git dedup
>>
>> Thanks!
>>
> It's been pointed out to me that this is probably too serious, so just FYI it's
> April 1st where I am. Thanks,
Well I believed it too, and was writing an email with questions etc. I
almost sent it, but then I saw git was downloading hundreds and hundreds
of MB of data :)
Well done anyway!
>
> Josef
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] Online dedup for Btrfs
2013-04-01 14:44 ` Harald Glatt
@ 2013-04-18 15:07 ` Martin
0 siblings, 0 replies; 6+ messages in thread
From: Martin @ 2013-04-18 15:07 UTC (permalink / raw)
To: linux-btrfs
Apart from the dates, this sounds highly plausible :-)
If the hashing is done before the compression and the compression is
done for isolated blocks, then this could even work!
Any takers? ;-)
For a performance enhancement, keep a hash tree in memory for the "n"
most recently used/seen blocks?...
A good writeup! Thanks for a good giggle. :-)
Regards,
Martin
On 01/04/13 15:44, Harald Glatt wrote:
> On Mon, Apr 1, 2013 at 2:50 PM, Josef Bacik <jbacik@fusionio.com> wrote:
>> Hello,
>>
>> I was bored this weekend so I hacked up online dedup for Btrfs. It's working
>> quite well so I think it can be more widely tested. There are two ways to use
>> it
>>
>> 1) Compatible mode - this is a bit slower but will handle being used by older
>> kernels. We use the csum tree to find duplicate blocks. Since it is relatively
>> easy to have crc32c collisions this also involves reading the block from disk
>> and doing a memcmp with the block we want to write to verify it has the same
>> data. This is way slow but hey, no incompat flag!
>>
>> 2) Incompatible mode - so this is the way you probably want to use it if you
>> don't care about being able to go back to older kernels. You select your
>> hashing function (at the momement I only support sha1 but there is room in the
>> format to have different functions). This creates a btree indexed by the hash
>> and the bytenr. Then we lookup the hash and just link the extent in if it
>> matches the hash. You can use -o paranoid-dedup if you are paranoid about hash
>> collisions and this will force it to do the memcmp() dance to make sure that the
>> extent we are deduping really matches the extent.
>>
>> So performance wise obviously the compat mode sucks. It's about 50% slower on
>> disk and about 20% slower on my Fusion card. We get pretty good space savings,
>> about 10% in my horrible test (just copy a git tree onto the fs), but IMHO not
>> worth the performance hit.
>>
>> The incompat mode is a bit better, only 15% drop on disk and about 10% on my
>> fusion card. Closer to the crc numbers if we have -o paranoid-dedup. The space
>> savings is better since it uses the original extent sizes, we get about 15%
>> space savings. Please feel free to pull and try it, you can get it here
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git dedup
>>
>> Thanks!
>>
>> Josef
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> Hey Josef,
>
> that's really cool! Can this be used together with lzo compression for
> example? How high (roughly) is the impact of something like
> force-compress=lzo compared to the 15% hit from this dedup?
>
> Thanks!
> Harald
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-04-18 15:07 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-01 12:50 [RFC] Online dedup for Btrfs Josef Bacik
2013-04-01 14:44 ` Harald Glatt
2013-04-18 15:07 ` Martin
2013-04-01 15:38 ` Josef Bacik
2013-04-01 15:50 ` Harald Glatt
2013-04-01 16:16 ` Konstantinos Skarlatos
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.