From: Chris Mason <chris.mason@oracle.com>
To: Valerie Aurora Henson <vaurora@redhat.com>
Cc: Ray Van Dolson <rayvd@bludgeon.org>, linux-btrfs@vger.kernel.org
Subject: Re: Data-deduplication?
Date: Thu, 16 Oct 2008 15:30:49 -0400 [thread overview]
Message-ID: <1224185449.6938.81.camel@think.oraclecorp.com> (raw)
In-Reply-To: <20081016192501.GE16946@shell>
On Thu, 2008-10-16 at 15:25 -0400, Valerie Aurora Henson wrote:
> On Mon, Oct 13, 2008 at 07:02:14AM -0400, Chris Mason wrote:
> > On Sat, 2008-10-11 at 19:06 -0700, Ray Van Dolson wrote:
> > > I recall their being a thread here a number of months back regarding
> > > data-deduplication support for bttfs.
> > >
> > > Did anyone end up picking that up and giving a go at it? Block level
> > > data dedup would be *awesome* in a Linux filesystem. It does wonders
> > > for storing virtual machines w/ NetApp and WAFL, and even ZFS doesn't
> > > have this feature yet (although I've read discussions on them looking
> > > to add it).
> > >
> >
> > So far nobody has grabbed this one, but I've had more requests (no
> > shocker there, the kvm people are interested in it too). It probably
> > won't make 1.0 but the disk format will be able to support it.
>
> Both deduplication and compression have an interesting side effect in
> which a write to a previously "allocated" block can return ENOSPC.
> This is even more exciting when you factor in mmap. Any thoughts on
> how to handle this?
Unfortunately we'll have a number of places where ENOSPC will jump in
where people don't expect it, and this includes any COW overwrite of an
existing extent. The old extent isn't freed until snapshot deletion
time, which won't happen until after the current transaction commits.
Another example is fallocate. The extent will have a little flag that
says I'm a preallocated extent, which is how we'll know we're allowed to
overwrite it directly instead of doing COW.
But, to write to the fallocated extent, we'll have to clear the flag.
So, we'll have to cow the block that holds the file extent pointer,
which means we can enospc.
-chris
next prev parent reply other threads:[~2008-10-16 19:30 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-12 2:06 Data-deduplication? Ray Van Dolson
2008-10-13 8:52 ` Data-deduplication? Andi Kleen
2008-10-15 13:39 ` Data-deduplication? Avi Kivity
2008-10-15 14:15 ` Data-deduplication? Andi Kleen
2008-10-15 14:43 ` Data-deduplication? Miguel Sousa Filipe
2008-10-15 15:00 ` Data-deduplication? Andi Kleen
2008-10-15 17:49 ` Data-deduplication? Avi Kivity
2008-10-13 11:02 ` Data-deduplication? Chris Mason
2008-10-16 19:25 ` Data-deduplication? Valerie Aurora Henson
2008-10-16 19:30 ` Chris Mason [this message]
2008-10-17 18:24 ` Data-deduplication? Valerie Aurora Henson
2008-10-20 0:16 ` Data-deduplication? Chris Mason
2008-10-21 20:33 ` Data-deduplication? Valerie Aurora Henson
2008-10-17 20:10 ` Data-deduplication? Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1224185449.6938.81.camel@think.oraclecorp.com \
--to=chris.mason@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=rayvd@bludgeon.org \
--cc=vaurora@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox