From: Miao Xie <miaox@cn.fujitsu.com>
To: dsterba@suse.cz, Liu Bo <bo.li.liu@oracle.com>,
linux-btrfs@vger.kernel.org, martin.krizek@gmail.com
Subject: Re: [PATCH 1/2] Btrfs: online data deduplication
Date: Tue, 09 Apr 2013 09:52:42 +0800 [thread overview]
Message-ID: <5163746A.50902@cn.fujitsu.com> (raw)
In-Reply-To: <20130408134726.GD18193@twin.jikos.cz>
On mon, 8 Apr 2013 15:47:27 +0200, David Sterba wrote:
> On Sun, Apr 07, 2013 at 09:12:48PM +0800, Liu Bo wrote:
>> (2) WHAT is deduplication?
>> Two key ways for practical deduplication implementations,
>> * When the data is deduplicated
>> (inband vs background)
>> * The granularity of the deduplication.
>> (block level vs file level)
>>
>> For btrfs, we choose
>> * inband(synchronous)
>> * block level
>
> Block level may be too fine grained leading to excessive fragmentation
> and increased metadata usage given that there's a much higher chance to
> find duplicate (4k) blocks here and there.
>
> There's always a tradeoff, the practical values that are considered for
> granularity range from 8k to 64, see eg. this paper for graphs and analyses
>
> http://static.usenix.org/event/fast11/tech/full_papers/Meyer.pdf .
>
> This also depends on file data type and access patterns, fixing the dedup
> basic chunk size to one block does not IMHO fit most usecases.
Maybe we can make btrfs(including dedup) support the bigalloc just like ext4.
Thanks
Miao
>
>> (3) HOW does deduplication works?
> ...
>> Here we have
>> a) a new dedicated tree(DEDUP tree) and
>> b) a new key(BTRFS_DEDUP_ITEM_KEY), which consists of
>> (stop 64bits of hash, type, disk offset),
>> * stop 64bits of hash
>> It comes from sha256, which is very helpful on avoiding collision.
>> And we take the stop 64bits as the index.
>
> Is it safe to use just 64 bits? I'd like to see better reasoning why
> this is ok. The limitation of btrfs_key to store only 1-2 64bit items is
> clear and must be handled, but it's IMO a critical design point.
>
>> * disk offset
>> It helps to find where the data is stored.
>
> Does the disk offset also help to resolving block hash collisions?
>
>> So the whole deduplication process works as,
>> 1) write something,
>> 2) calculate the hash of this "something",
>> 3) try to find the match of hash value by searching DEDUP keys in
>> a dedicated tree, DEDUP tree.
>> 4) if found, skip real IO and link to the existing copy
>> if not, do real IO and insert a DEDUP key to the DEDUP tree.
>
> ... how are the hash collisions handled? Using part of a secure has
> cannot be considered equally strong (given that there is not other
> safety checks like comparing the whole blocks).
>
> Last but not least, there was another dedup proposal (author CCed)
>
> http://thread.gmane.org/gmane.comp.file-systems.btrfs/21722
>
>
> david
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2013-04-09 1:51 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-07 13:12 [PATCH 0/2 RFC] Online data deduplication Liu Bo
2013-04-07 13:12 ` [PATCH 1/2] Btrfs: online " Liu Bo
2013-04-08 12:54 ` Josef Bacik
2013-04-08 14:16 ` Liu Bo
2013-04-08 20:37 ` Josef Bacik
2013-04-09 1:34 ` Liu Bo
2013-04-09 1:48 ` Josef Bacik
2013-04-10 14:21 ` Liu Bo
2013-04-09 1:40 ` Miao Xie
2013-04-08 13:47 ` David Sterba
2013-04-08 14:08 ` Liu Bo
2013-04-10 15:42 ` David Sterba
2013-04-09 1:52 ` Miao Xie [this message]
2013-04-10 15:52 ` David Sterba
2013-04-10 12:05 ` Marek Otahal
2013-04-10 14:14 ` Liu Bo
2013-04-07 13:12 ` [PATCH 2/2] Btrfs: skip merge part for delayed data refs Liu Bo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5163746A.50902@cn.fujitsu.com \
--to=miaox@cn.fujitsu.com \
--cc=bo.li.liu@oracle.com \
--cc=dsterba@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
--cc=martin.krizek@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox