From: Vivek Goyal <vgoyal@redhat.com>
To: Vasily Tarasov <tarasov@vasily.name>
Cc: Joe Thornber <thornber@redhat.com>,
Mike Snitzer <snitzer@redhat.com>,
Christoph Hellwig <hch@infradead.org>,
device-mapper development <dm-devel@redhat.com>,
Philip Shilane <philip.shilane@emc.com>,
Sonam Mandal <sonam.dp42@gmail.com>,
Erez Zadok <ezk@fsl.cs.sunysb.edu>
Subject: Re: [PATCH RFCv2 00/10] dm-dedup: device-mapper deduplication target
Date: Fri, 30 Jan 2015 10:56:39 -0500 [thread overview]
Message-ID: <20150130155639.GA8364@redhat.com> (raw)
In-Reply-To: <CAFTzLMNz6JCa8sndMVjyOvaQsJ_DEL1B=gRL0uL0B-Zmw+tdJA@mail.gmail.com>
On Fri, Jan 23, 2015 at 11:27:39AM -0500, Vasily Tarasov wrote:
[..]
> > - Why did you implement an inline deduplication as opposed to out-of-line
> > deduplication? Section 2 (Timeliness) in paper just mentioned
> > out-of-line dedup but does not go into more details that why did you
> > choose an in-line one.
> >
> > I am wondering that will it not make sense to first implement an
> > out-of-line dedup and punt lot of cost to worker thread (which kick
> > in only when storage is idle). That way even if don't get a high dedup
> > ratio for a workload, inserting a dedup target in the stack will be less
> > painful from performance point of view.
>
> Both in-line and off-line deduplication approaches have their own
> pluses and minuses. Among the minuses of the off-line approach is
> that it requires allocation of extra space to buffer non-deduplicated
> writes,
Well, that extra space requirement is temporary. So you got to pay the cost
somewhere. Personally, I will be more than happy to consume more disk
space when I am writing and not take a hit and let worker threads optimize
space usage later.
> re-reading the data from disk when deduplication happens (i.e.
> more I/O used).
Worker threads are supposed to kick in when disk is idle so it might not
be as big a concern.
> It also complicates space usage accounting and user
> might run out of space though deduplication process will discover many
> duplicated blocks later.
Anyway, user needs to plan for extra space. De-dup is not exact science
and one does not know how much will be the de-dup ratio in a data set.
>
> Our final goal is to support both approaches but for this code
> submission we wanted to limit the amount of new code. In-line
> deduplication is a core part, around which we can implement off-line
> dedup by adding an extra thread that will reuse the same logic as
> in-line deduplication.
Ok. I am fine with building both if that makes sense.
I also understand that there are pros/cons to both the approaches. Just
that given the higt cost of inline dedupe, I am finding it little odd
that it be implemented first as opposed to offline one.
Anyway, I will spend some time on patches now.
Thanks
Vivek
next prev parent reply other threads:[~2015-01-30 15:56 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-28 22:48 [PATCH RFCv2 00/10] dm-dedup: device-mapper deduplication target Vasily Tarasov
2014-12-03 2:31 ` Darrick J. Wong
2015-01-14 19:43 ` Vivek Goyal
2015-01-15 9:08 ` Akira Hayakawa
2015-01-23 16:34 ` Vasily Tarasov
2015-01-23 16:27 ` Vasily Tarasov
2015-01-30 15:56 ` Vivek Goyal [this message]
2015-02-03 16:11 ` Vasily Tarasov
2015-02-03 16:17 ` Vivek Goyal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150130155639.GA8364@redhat.com \
--to=vgoyal@redhat.com \
--cc=dm-devel@redhat.com \
--cc=ezk@fsl.cs.sunysb.edu \
--cc=hch@infradead.org \
--cc=philip.shilane@emc.com \
--cc=snitzer@redhat.com \
--cc=sonam.dp42@gmail.com \
--cc=tarasov@vasily.name \
--cc=thornber@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.