All of lore.kernel.org
 help / color / mirror / Atom feed
From: Miao Xie <miaox@cn.fujitsu.com>
To: dsterba@suse.cz, Liu Bo <bo.li.liu@oracle.com>,
	linux-btrfs@vger.kernel.org, martin.krizek@gmail.com
Subject: Re: [PATCH 1/2] Btrfs: online data deduplication
Date: Tue, 09 Apr 2013 09:52:42 +0800	[thread overview]
Message-ID: <5163746A.50902@cn.fujitsu.com> (raw)
In-Reply-To: <20130408134726.GD18193@twin.jikos.cz>

On 	mon, 8 Apr 2013 15:47:27 +0200, David Sterba wrote:
> On Sun, Apr 07, 2013 at 09:12:48PM +0800, Liu Bo wrote:
>> (2) WHAT is deduplication?
>>     Two key ways for practical deduplication implementations,
>>     *  When the data is deduplicated
>>        (inband vs background)
>>     *  The granularity of the deduplication.
>>        (block level vs file level)
>>
>>     For btrfs, we choose
>>     *  inband(synchronous)
>>     *  block level
> 
> Block level may be too fine grained leading to excessive fragmentation
> and increased metadata usage given that there's a much higher chance to
> find duplicate (4k) blocks here and there.
> 
> There's always a tradeoff, the practical values that are considered for
> granularity range from 8k to 64, see eg. this paper for graphs and analyses
> 
> http://static.usenix.org/event/fast11/tech/full_papers/Meyer.pdf .
> 
> This also depends on file data type and access patterns, fixing the dedup
> basic chunk size to one block does not IMHO fit most usecases.

Maybe we can make btrfs(including dedup) support the bigalloc just like ext4.

Thanks
Miao

> 
>> (3) HOW does deduplication works?
> ...
>>     Here we have
>>     a)  a new dedicated tree(DEDUP tree) and
>>     b)  a new key(BTRFS_DEDUP_ITEM_KEY), which consists of
>>         (stop 64bits of hash, type, disk offset),
>>         *  stop 64bits of hash
>>            It comes from sha256, which is very helpful on avoiding collision.
>>            And we take the stop 64bits as the index.
> 
> Is it safe to use just 64 bits? I'd like to see better reasoning why
> this is ok. The limitation of btrfs_key to store only 1-2 64bit items is
> clear and must be handled, but it's IMO a critical design point.
> 
>>         *  disk offset
>>            It helps to find where the data is stored.
> 
> Does the disk offset also help to resolving block hash collisions?
> 
>>     So the whole deduplication process works as,
>>     1) write something,
>>     2) calculate the hash of this "something",
>>     3) try to find the match of hash value by searching DEDUP keys in
>>        a dedicated tree, DEDUP tree.
>>     4) if found, skip real IO and link to the existing copy
>>        if not, do real IO and insert a DEDUP key to the DEDUP tree.
> 
> ... how are the hash collisions handled? Using part of a secure has
> cannot be considered equally strong (given that there is not other
> safety checks like comparing the whole blocks).
> 
> Last but not least, there was another dedup proposal (author CCed)
> 
> http://thread.gmane.org/gmane.comp.file-systems.btrfs/21722
> 
> 
> david
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


  parent reply	other threads:[~2013-04-09  1:51 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-07 13:12 [PATCH 0/2 RFC] Online data deduplication Liu Bo
2013-04-07 13:12 ` [PATCH 1/2] Btrfs: online " Liu Bo
2013-04-08 12:54   ` Josef Bacik
2013-04-08 14:16     ` Liu Bo
2013-04-08 20:37       ` Josef Bacik
2013-04-09  1:34         ` Liu Bo
2013-04-09  1:48           ` Josef Bacik
2013-04-10 14:21             ` Liu Bo
2013-04-09  1:40       ` Miao Xie
2013-04-08 13:47   ` David Sterba
2013-04-08 14:08     ` Liu Bo
2013-04-10 15:42       ` David Sterba
2013-04-09  1:52     ` Miao Xie [this message]
2013-04-10 15:52       ` David Sterba
2013-04-10 12:05   ` Marek Otahal
2013-04-10 14:14     ` Liu Bo
2013-04-07 13:12 ` [PATCH 2/2] Btrfs: skip merge part for delayed data refs Liu Bo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5163746A.50902@cn.fujitsu.com \
    --to=miaox@cn.fujitsu.com \
    --cc=bo.li.liu@oracle.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=martin.krizek@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.