Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Miao Xie <miaox@cn.fujitsu.com>
To: dsterba@suse.cz, Liu Bo <bo.li.liu@oracle.com>,
	linux-btrfs@vger.kernel.org, martin.krizek@gmail.com
Subject: Re: [PATCH 1/2] Btrfs: online data deduplication
Date: Tue, 09 Apr 2013 09:52:42 +0800	[thread overview]
Message-ID: <5163746A.50902@cn.fujitsu.com> (raw)
In-Reply-To: <20130408134726.GD18193@twin.jikos.cz>

On 	mon, 8 Apr 2013 15:47:27 +0200, David Sterba wrote:
> On Sun, Apr 07, 2013 at 09:12:48PM +0800, Liu Bo wrote:
>> (2) WHAT is deduplication?
>>     Two key ways for practical deduplication implementations,
>>     *  When the data is deduplicated
>>        (inband vs background)
>>     *  The granularity of the deduplication.
>>        (block level vs file level)
>>
>>     For btrfs, we choose
>>     *  inband(synchronous)
>>     *  block level
> 
> Block level may be too fine grained leading to excessive fragmentation
> and increased metadata usage given that there's a much higher chance to
> find duplicate (4k) blocks here and there.
> 
> There's always a tradeoff, the practical values that are considered for
> granularity range from 8k to 64, see eg. this paper for graphs and analyses
> 
> http://static.usenix.org/event/fast11/tech/full_papers/Meyer.pdf .
> 
> This also depends on file data type and access patterns, fixing the dedup
> basic chunk size to one block does not IMHO fit most usecases.

Maybe we can make btrfs(including dedup) support the bigalloc just like ext4.

Thanks
Miao

> 
>> (3) HOW does deduplication works?
> ...
>>     Here we have
>>     a)  a new dedicated tree(DEDUP tree) and
>>     b)  a new key(BTRFS_DEDUP_ITEM_KEY), which consists of
>>         (stop 64bits of hash, type, disk offset),
>>         *  stop 64bits of hash
>>            It comes from sha256, which is very helpful on avoiding collision.
>>            And we take the stop 64bits as the index.
> 
> Is it safe to use just 64 bits? I'd like to see better reasoning why
> this is ok. The limitation of btrfs_key to store only 1-2 64bit items is
> clear and must be handled, but it's IMO a critical design point.
> 
>>         *  disk offset
>>            It helps to find where the data is stored.
> 
> Does the disk offset also help to resolving block hash collisions?
> 
>>     So the whole deduplication process works as,
>>     1) write something,
>>     2) calculate the hash of this "something",
>>     3) try to find the match of hash value by searching DEDUP keys in
>>        a dedicated tree, DEDUP tree.
>>     4) if found, skip real IO and link to the existing copy
>>        if not, do real IO and insert a DEDUP key to the DEDUP tree.
> 
> ... how are the hash collisions handled? Using part of a secure has
> cannot be considered equally strong (given that there is not other
> safety checks like comparing the whole blocks).
> 
> Last but not least, there was another dedup proposal (author CCed)
> 
> http://thread.gmane.org/gmane.comp.file-systems.btrfs/21722
> 
> 
> david
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


  parent reply	other threads:[~2013-04-09  1:51 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-07 13:12 [PATCH 0/2 RFC] Online data deduplication Liu Bo
2013-04-07 13:12 ` [PATCH 1/2] Btrfs: online " Liu Bo
2013-04-08 12:54   ` Josef Bacik
2013-04-08 14:16     ` Liu Bo
2013-04-08 20:37       ` Josef Bacik
2013-04-09  1:34         ` Liu Bo
2013-04-09  1:48           ` Josef Bacik
2013-04-10 14:21             ` Liu Bo
2013-04-09  1:40       ` Miao Xie
2013-04-08 13:47   ` David Sterba
2013-04-08 14:08     ` Liu Bo
2013-04-10 15:42       ` David Sterba
2013-04-09  1:52     ` Miao Xie [this message]
2013-04-10 15:52       ` David Sterba
2013-04-10 12:05   ` Marek Otahal
2013-04-10 14:14     ` Liu Bo
2013-04-07 13:12 ` [PATCH 2/2] Btrfs: skip merge part for delayed data refs Liu Bo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5163746A.50902@cn.fujitsu.com \
    --to=miaox@cn.fujitsu.com \
    --cc=bo.li.liu@oracle.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=martin.krizek@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox