Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Marek Otahal <markotahal@gmail.com>
To: Liu Bo <bo.li.liu@oracle.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 1/2] Btrfs: online data deduplication
Date: Wed, 10 Apr 2013 14:05:32 +0200	[thread overview]
Message-ID: <2033709.8dWmigpxlX@beruska> (raw)
In-Reply-To: <1365340369-23537-2-git-send-email-bo.li.liu@oracle.com>

Hello, 
this is awesome news! thank you for working on dedup. 

I have some questions about the dedup approach in regards to other layers/features. 

1/ How will the snapshots be handled? 

Whether data would be dedup-ed between snapshots (potentially big saved-space ratio), or would snapshots be considered isolated? Best, if this could be set by the user. My concern is about being error-prone, where with deduping snapshots, actually only 1 copy of the data would exist and a corruption would damage it as well as all snapshots. Or is this not a problem and we say "safety" is handled by RAID? 

2/ Order of dedup/compression? 

What would be done first, compress a file and then compare blocks for duplicates, or the other way around? 

Dedup 1st would save some compression work:
file's block 0000000000 -> hash -> isDup? (if no)-> compress (10x0) -> write
but proble is written data size is unknown (it's not the 1 block at start)

Other way, compress first, would waste compression cpu-operations on duplicate blocks, but would yield reduced dedup-related metadata usage, as 1 million of zeros would be compressed to a single block and that one only is compared/written. Usefullness here depends on the compression ratio of the file. 

I'm not sure which approach here would be better? 




Thank you for your time and explanation. 
Best wishes, Mark
   
On Sunday 07 April 2013 21:12:48 Liu Bo wrote:
> (NOTE: This leads to a FORMAT CHANGE, DO NOT use it on real data.)
> 
> This introduce the online data deduplication feature for btrfs.
> 
> (1) WHY do we need deduplication?
>     To improve our storage effiency.
> 
> (2) WHAT is deduplication?
>     Two key ways for practical deduplication implementations,
>     *  When the data is deduplicated
>        (inband vs background)
>     *  The granularity of the deduplication.
>        (block level vs file level)
> 
>     For btrfs, we choose
>     *  inband(synchronous)
>     *  block level
> 
>     We choose them because of the same reason as how zfs does.
>     a)  To get an immediate benefit.
>     b)  To remove redundant parts within a file.
> 
>     So we have an inband, block level data deduplication here.
> 
> (3) HOW does deduplication works?
>     This makes full use of file extent back reference, the same way as
>     IOCTL_CLONE, which lets us easily store multiple copies of a set of
>     data as a single copy along with an index of references to the copy.
> 
>     Here we have
>     a)  a new dedicated tree(DEDUP tree) and
>     b)  a new key(BTRFS_DEDUP_ITEM_KEY), which consists of
>         (stop 64bits of hash, type, disk offset),
>         *  stop 64bits of hash
>            It comes from sha256, which is very helpful on avoiding collision.
>            And we take the stop 64bits as the index.
>         *  disk offset
>            It helps to find where the data is stored.
> 
>     So the whole deduplication process works as,
>     1) write something,
>     2) calculate the hash of this "something",
>     3) try to find the match of hash value by searching DEDUP keys in
>        a dedicated tree, DEDUP tree.
>     4) if found, skip real IO and link to the existing copy
>        if not, do real IO and insert a DEDUP key to the DEDUP tree.
> 
>     For now, we limit the deduplication unit to PAGESIZE, 4096, and we're
>     going to increase this unit dynamically in the future.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>

-- 

Marek Otahal :o)

  parent reply	other threads:[~2013-04-10 12:05 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-07 13:12 [PATCH 0/2 RFC] Online data deduplication Liu Bo
2013-04-07 13:12 ` [PATCH 1/2] Btrfs: online " Liu Bo
2013-04-08 12:54   ` Josef Bacik
2013-04-08 14:16     ` Liu Bo
2013-04-08 20:37       ` Josef Bacik
2013-04-09  1:34         ` Liu Bo
2013-04-09  1:48           ` Josef Bacik
2013-04-10 14:21             ` Liu Bo
2013-04-09  1:40       ` Miao Xie
2013-04-08 13:47   ` David Sterba
2013-04-08 14:08     ` Liu Bo
2013-04-10 15:42       ` David Sterba
2013-04-09  1:52     ` Miao Xie
2013-04-10 15:52       ` David Sterba
2013-04-10 12:05   ` Marek Otahal [this message]
2013-04-10 14:14     ` Liu Bo
2013-04-07 13:12 ` [PATCH 2/2] Btrfs: skip merge part for delayed data refs Liu Bo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2033709.8dWmigpxlX@beruska \
    --to=markotahal@gmail.com \
    --cc=bo.li.liu@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox