linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: <dsterba@suse.com>, <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH RFC 00/14] Yet Another In-band(online) deduplication implement
Date: Thu, 27 Aug 2015 08:52:25 +0800	[thread overview]
Message-ID: <55DE5F49.9@cn.fujitsu.com> (raw)
In-Reply-To: <55BF15DF.9080805@cn.fujitsu.com>



Qu Wenruo wrote on 2015/08/03 15:18 +0800:
>
>
> David Sterba wrote on 2015/07/28 16:50 +0200:
>> On Tue, Jul 28, 2015 at 04:30:36PM +0800, Qu Wenruo wrote:
>>> Although Liu Bo has already submitted a V10 version of his deduplication
>>> implement, here is another implement for it.
>>
>> What's the reason to start another implementation?
>>
>>> [[CORE FEATURES]]
>>> The main design concept is the following:
>>> 1) Controllable memory usage
>>> 2) No guarantee to dedup every duplication.
>>> 3) No on-disk format change or new format
>>> 4) Page size level deduplication
>>
>> 1 and 2) are good goals, allow usability tradeoffs
>>
>> 3) so the dedup hash is stored only for the mount life time. Though it
>> avoids the on-disk format changes, it also reduces the effectivity. It
>> is possible to "seed" the in-memory tree by reading all files that
>> contain potentially duplicate blocks but one would have to do that after
>> each mount.
>>
>> 4) page-sized dedup chunk is IMHO way too small. Although it can achieve
>> high dedup rate, the metadata can potentially explode and cause more
>> fragmentation.
>>
>>> Implement details includes the following:
>>> 1) LRU hash maps to limit the memory usage
>>>     The hash -> extent mapping is control by LRU (or unlimited), to
>>>     get a controllable memory usage (can be tuned by mount option)
>>>     alone with controllable read/write overhead used for hash searching.
>>
>> In Liu Bo's series, I rejected the mount options as an interface and
>> will do that here as well. His patches added a dedup ioctl to (at least)
>> enable/disable the dedup.
> BTW, would you please give me some reason why that's not a good idea to
> use mount option to trigger/change dedup options?
>
> Thanks,
> Qu

Ping?
No other comment?

Thanks,
Qu
>>
>>> 2) Reuse existing ordered_extent infrastructure
>>>     For duplicated page, it will still submit a ordered_extent(only one
>>>     page long), to make the full use of all existing infrastructure.
>>>     But only not submit a bio.
>>>     This can reduce the number of code lines.
>>
>>> 3) Mount option to control dedup behavior
>>>     Deduplication and its memory usage can be tuned by mount option.
>>>     No need to indicated ioctl interface.
>>
>> I'd say the other way around.
>>
>>>     And further more, it can easily support BTRFS_INODE flag like
>>>     compression, to allow further per file dedup fine tunning.
>>>
>>> [[TODO]]
>>> 3. Add support for per file dedup flags
>>>     Much easier, just like compression flags.
>>
>> How is that supposed to work? You mean add per-file flags/attributes to
>> mark a file so it fills the dedup hash tree and is actively going to be
>> deduped agains other files?
>>
>>> Any early review or advice/question on the design is welcomed.
>>
>> The implementation is looks simpler than the Liu Bo's, but (IMHO) at the
>> cost of reduced funcionality.
>>
>> Ideally, we merge one patchset with all desired functionality. Some kind
>> of control interface is needed not only to enable/dsiable the whole
>> feature but to affect the trade-offs (memory consumptin vs dedup
>> efficiency vs speed), and that in a way that's flexible according to
>> immediate needs.
>>
>> The persistent dedup hash storage is not mandatory in theory, so we
>> could implement an "in-memory tree only" mode, ie. what you're
>> proposing, on top of Liu Bo's patchset.
>>

  reply	other threads:[~2015-08-27  0:52 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-28  8:30 [PATCH RFC 00/14] Yet Another In-band(online) deduplication implement Qu Wenruo
2015-07-28  8:30 ` [PATCH RFC 01/14] btrfs: file-item: Introduce btrfs_setup_file_extent function Qu Wenruo
2015-07-28  8:30 ` [PATCH RFC 02/14] btrfs: Use btrfs_fill_file_extent to reduce duplicated codes Qu Wenruo
2015-07-28  8:30 ` [PATCH RFC 03/14] btrfs: dedup: Add basic init/free functions for inband dedup Qu Wenruo
2015-07-28  8:30 ` [PATCH RFC 04/14] btrfs: dedup: Add internal add/remove/search function for btrfs dedup Qu Wenruo
2015-07-28  8:56 ` [PATCH RFC 00/14] Yet Another In-band(online) deduplication implement Qu Wenruo
2015-07-28  9:52 ` Liu Bo
2015-07-29  2:09   ` Qu Wenruo
2015-07-28 14:50 ` David Sterba
2015-07-29  1:07   ` Chris Mason
2015-07-29  1:47   ` Qu Wenruo
2015-07-29  2:40     ` Liu Bo
2015-08-03  7:18   ` Qu Wenruo
2015-08-27  0:52     ` Qu Wenruo [this message]
2015-08-27  9:14     ` David Sterba
2015-08-31  1:13       ` Qu Wenruo
2015-09-22 15:07         ` David Sterba
2015-09-23  7:16           ` Qu Wenruo
  -- strict thread matches above, loose matches on Subject: below --
2015-07-28  9:14 Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55DE5F49.9@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=dsterba@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).