linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: <bo.li.liu@oracle.com>
Cc: <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH RFC 00/14] Yet Another In-band(online) deduplication implement
Date: Wed, 29 Jul 2015 10:09:16 +0800	[thread overview]
Message-ID: <55B835CC.5010103@cn.fujitsu.com> (raw)
In-Reply-To: <20150728095244.GC1338@localhost.localdomain>



Liu Bo wrote on 2015/07/28 17:52 +0800:
> On Tue, Jul 28, 2015 at 04:30:36PM +0800, Qu Wenruo wrote:
>> Although Liu Bo has already submitted a V10 version of his deduplication
>> implement, here is another implement for it.
>>
>> [[CORE FEATURES]]
>> The main design concept is the following:
>> 1) Controllable memory usage
>> 2) No guarantee to dedup every duplication.
>> 3) No on-disk format change or new format
>> 4) Page size level deduplication
>>
>> [[IMPLEMENT]]
>> Implement details includes the following:
>> 1) LRU hash maps to limit the memory usage
>>     The hash -> extent mapping is control by LRU (or unlimited), to
>>     get a controllable memory usage (can be tuned by mount option)
>>     alone with controllable read/write overhead used for hash searching.
>>
>> 2) Reuse existing ordered_extent infrastructure
>>     For duplicated page, it will still submit a ordered_extent(only one
>>     page long), to make the full use of all existing infrastructure.
>>     But only not submit a bio.
>>     This can reduce the number of code lines.
>>
>> 3) Mount option to control dedup behavior
>>     Deduplication and its memory usage can be tuned by mount option.
>>     No need to indicated ioctl interface.
>>     And further more, it can easily support BTRFS_INODE flag like
>>     compression, to allow further per file dedup fine tunning.
>>
>> [[TODO]]
>> 1. Add support for compressed extent
>>     Shouldn't be quite hard.
>> 2. Try to merge dedup extent to reduce metadata size
>>     Currently, dedup extent is always in 4K size, although its reference
>>     source can be quite large.
>> 3. Add support for per file dedup flags
>>     Much easier, just like compression flags.
>>
>> [[KNOWN BUG, NEED HELP!]]
>> On the other hand, since it's still a RFC patch, it must has one or more
>> problem:
>
> You may have a look at my patchset, one of them is aimed to address the
> similar problem.
>
> Thanks,
>
> -liubo
In fact, I took a look at your patchset.
But the following two points are not so perfect so I choose to implement 
my own one:
1) Extent size.
    Extent size won't be larger than dedup_bs.
    Causing a lot of fragment even the write is not duplicated.

    So in my implement, for non-duplicated extent, its size will be up
    to 512K, not dedup_size.

    And the 512K limit can be further increase much easier.
    I choose 512K because for 512K extent, its hash can just be stored
    into one page (512K contains 128 pages, each page takes 32bytes for
    hash).

2) Submit bio directly
    Not a fan as it's crossing level.
    Normally bio is submitted in extent_io.c, and now we are doing high
    level metadata modification with low level bio submit in one
    function.

    At least it's not a good idea for me.
    So in my implement, I just add some small hooks into
    cow_file_range(), and even for duplicated page, I reuse the
    ordered_extent infrastructure to handle the inode size update and
    page/extent lock things.

To solve the problem I have another idea, to just submit duplciated 
metadata modification and don't go through ordered_extent.

But it's not perfect either.
we need extra functions to handle inode size update and 
btrfs_ordered_update_i_size() is useless as we don't have ordered_extent.

And if the duplicated page is an outstanding one, to insert the file 
extent, we also need to fill the holes between [inode_size, start).
And most of existing functions to punch hole won't help, as the range 
outstanding range is still locked by us.

Thanks,
Qu
>
>> 1) Race between __btrfs_free_extent() and dedup ordered_extent.
>>     The hook in __btrfs_free_extent() will free the corresponding hashes
>>     of a extent, even there is a dedup ordered_extent referring it.
>>
>>     The problem will happen like the following case:
>> ======================================================================
>>     cow_file_range()
>>       Submit dedup ordered_extent for extent A
>>
>>     commit_transaction()
>>       Extent A needs freeing. As the its ref is decreased to 0.
>>       And dedup ordered_extent can increase only when it hit endio time.
>>
>>     finish_ordered_io()
>>       Add reference to Extent A for dedup ordered_extent.
>>       But it is already freed in previous transaction.
>>       Causing abort_transaction().
>> ======================================================================
>>     I'd like to keep the current ordered_extent method, as it adds the
>>     least number of code lines.
>>     But I can't find a good idea to either delay transaction until dedup
>>     ordered_extent is done or things like that.
>>
>>     Trans->ordered seems to be a good idea, but it seems to cause list
>>     corruption without extra protection in tree log infrastructure.
>>
>> That's the only problem spotted yet.
>> Any early review or advice/question on the design is welcomed.
>>
>> Thanks.
>>
>> Qu Wenruo (14):
>>    btrfs: file-item: Introduce btrfs_setup_file_extent function.
>>    btrfs: Use btrfs_fill_file_extent to reduce duplicated codes
>>    btrfs: dedup: Add basic init/free functions for inband dedup.
>>    btrfs: dedup: Add internal add/remove/search function for btrfs dedup.
>>    btrfs: dedup: add ordered extent hook for inband dedup
>>    btrfs: dedup: Apply dedup hook for write time dedup.
>>    btrfs: extent_map: Add new dedup flag and corresponding hook.
>>    btrfs: extent-map: Introduce orig_block_start member for extent-map.
>>    btrfs: dedup: Add inband dedup hook for read extent.
>>    btrfs: dedup: Introduce btrfs_dedup_free_extent_range function.
>>    btrfs: dedup: Add hook to free dedup hash at extent free time.
>>    btrfs: dedup: Add mount option support for btrfs inband deduplication.
>>    Btrfs: dedup: Support dedup change at remount time.
>>    btrfs: dedup: Add mount option output for inband dedup.
>>
>>   fs/btrfs/Makefile       |   2 +-
>>   fs/btrfs/ctree.h        |  16 ++
>>   fs/btrfs/dedup.c        | 701 ++++++++++++++++++++++++++++++++++++++++++++++++
>>   fs/btrfs/dedup.h        | 132 +++++++++
>>   fs/btrfs/disk-io.c      |   7 +
>>   fs/btrfs/extent-tree.c  |  10 +
>>   fs/btrfs/extent_io.c    |   6 +-
>>   fs/btrfs/extent_map.h   |   4 +
>>   fs/btrfs/file-item.c    |  61 +++--
>>   fs/btrfs/inode.c        | 228 ++++++++++++----
>>   fs/btrfs/ordered-data.c |  32 ++-
>>   fs/btrfs/ordered-data.h |   8 +
>>   fs/btrfs/super.c        |  39 ++-
>>   13 files changed, 1163 insertions(+), 83 deletions(-)
>>   create mode 100644 fs/btrfs/dedup.c
>>   create mode 100644 fs/btrfs/dedup.h
>>
>> --
>> 2.4.6
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2015-07-29  2:09 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-28  8:30 [PATCH RFC 00/14] Yet Another In-band(online) deduplication implement Qu Wenruo
2015-07-28  8:30 ` [PATCH RFC 01/14] btrfs: file-item: Introduce btrfs_setup_file_extent function Qu Wenruo
2015-07-28  8:30 ` [PATCH RFC 02/14] btrfs: Use btrfs_fill_file_extent to reduce duplicated codes Qu Wenruo
2015-07-28  8:30 ` [PATCH RFC 03/14] btrfs: dedup: Add basic init/free functions for inband dedup Qu Wenruo
2015-07-28  8:30 ` [PATCH RFC 04/14] btrfs: dedup: Add internal add/remove/search function for btrfs dedup Qu Wenruo
2015-07-28  8:56 ` [PATCH RFC 00/14] Yet Another In-band(online) deduplication implement Qu Wenruo
2015-07-28  9:52 ` Liu Bo
2015-07-29  2:09   ` Qu Wenruo [this message]
2015-07-28 14:50 ` David Sterba
2015-07-29  1:07   ` Chris Mason
2015-07-29  1:47   ` Qu Wenruo
2015-07-29  2:40     ` Liu Bo
2015-08-03  7:18   ` Qu Wenruo
2015-08-27  0:52     ` Qu Wenruo
2015-08-27  9:14     ` David Sterba
2015-08-31  1:13       ` Qu Wenruo
2015-09-22 15:07         ` David Sterba
2015-09-23  7:16           ` Qu Wenruo
  -- strict thread matches above, loose matches on Subject: below --
2015-07-28  9:14 Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55B835CC.5010103@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=bo.li.liu@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).