From: Chris Mason <chris.mason@oracle.com>
To: Thomas Glanzmann <thomas@glanzmann.de>
Cc: Heinz-Josef Claes <hjclaes@web.de>,
Edward Shishkin <edward.shishkin@gmail.com>,
Tomasz Chmielewski <mangoo@wpkg.org>,
linux-btrfs@vger.kernel.org
Subject: Re: Data Deduplication with the help of an online filesystem check
Date: Tue, 28 Apr 2009 17:26:17 -0400 [thread overview]
Message-ID: <1240953977.15136.76.camel@think.oraclecorp.com> (raw)
In-Reply-To: <20090428211255.GB13112@cip.informatik.uni-erlangen.de>
On Tue, 2009-04-28 at 23:12 +0200, Thomas Glanzmann wrote:
> Hello,
>
> > > - Implement a system call that reports all checksums and unique
> > > block identifiers for all stored blocks.
>
> > This would require storing the larger checksums in the filesystem. It
> > is much better done in the dedup program.
>
> I think I misunderstood something here. I thought the checksums per
> block would already be stored somewhere in btrfs?
They are, but only the crc32c are stored today.
>
> > > - Implement another system call that reports all checksums and
> > > unique identifiers for all stored blocks since the last
> > > report. This can be easily implemented:
>
> > This is racey because there's no way to prevent new changes.
>
> I got the point.
>
> > > Use a block bitmap for every block on the filesystem use one
> > > bit. If the block is modified set the bit to one, when a
> > > bitmap is retrieved simply zero it out:
>
> > > Assuming a 4 kbyte block size that would mean for a 1 Tbyte
> > > filesystem:
>
> > > 1Tbyte / 4096 / 8 = 32 Mbyte of memory (this should of course
> > > be saved to disk from time to time and be restored on startup).
>
> > Sorry, a 1TB drive is teeny, I don't think a bitmap is practical
> > across the whole FS. Btrfs has metadata that can quickly and easily
> > tell you which files and which blocks in which files have changed
> > since a given transaction id. This is how you want to find new
> > things.
>
> You're right the bitmap would not scale. So what is missing is a
> systemcall to report the changes to userland? (Is this feature used to
> generate off-site backups as well?)
Yes, that's the idea. An ioctl to walk the tree and report on changes,
but this doesn't have to be done with version 1 of the dedup code, you
can just scan the file based on mtime/ctime.
>
> > But, the ioctl to actually do the dedup needs to be able to verify a
> > given block has the contents you expect it to. The only place you can
> > lock down the pages in the file and prevent new changes is inside the
> > kernel.
>
> I totally agree to that. How much time would it consume to implement
> such a systemcall?
It is probably a 3 week to one month effort.
-chris
next prev parent reply other threads:[~2009-04-28 21:26 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-27 3:33 Data Deduplication with the help of an online filesystem check Thomas Glanzmann
2009-04-27 13:37 ` Chris Mason
2009-04-28 5:22 ` Thomas Glanzmann
2009-04-28 10:02 ` Chris Mason
2009-04-28 13:49 ` Andrey Kuzmin
2009-04-28 13:58 ` Chris Mason
2009-04-28 14:04 ` Thomas Glanzmann
2009-04-28 17:21 ` Chris Mason
2009-04-28 20:10 ` Thomas Glanzmann
2009-04-28 20:29 ` Thomas Glanzmann
2009-04-28 13:58 ` jim owens
2009-04-28 16:10 ` Anthony Roberts
2009-04-28 15:59 ` Thomas Glanzmann
2009-04-28 16:04 ` Tomasz Chmielewski
2009-04-28 17:29 ` Edward Shishkin
2009-04-28 17:34 ` Thomas Glanzmann
2009-04-28 17:38 ` Chris Mason
2009-04-28 17:43 ` Thomas Glanzmann
2009-04-28 17:45 ` Heinz-Josef Claes
2009-04-28 20:16 ` Thomas Glanzmann
2009-04-28 20:36 ` Heinz-Josef Claes
2009-04-28 20:52 ` Thomas Glanzmann
2009-04-28 20:58 ` Chris Mason
2009-04-28 21:12 ` Thomas Glanzmann
2009-04-28 21:26 ` Chris Mason [this message]
2009-04-28 22:14 ` Thomas Glanzmann
2009-04-28 23:18 ` Chris Mason
2009-04-29 12:03 ` Thomas Glanzmann
2009-04-29 13:11 ` Michael Tharp
2009-04-29 13:14 ` Chris Mason
2009-04-29 13:58 ` Thomas Glanzmann
2009-04-29 14:31 ` Chris Mason
2009-04-29 15:26 ` Thomas Glanzmann
2009-04-29 15:45 ` Chris Mason
2009-06-04 8:49 ` Thomas Glanzmann
2009-06-04 11:43 ` Chris Mason
2009-06-04 12:03 ` Thomas Glanzmann
2009-06-04 12:43 ` Chris Mason
2009-06-05 12:20 ` Tomasz Chmielewski
2009-06-05 12:50 ` Chris Mason
2009-06-05 15:35 ` Tomasz Chmielewski
2009-04-29 0:06 ` Bron Gondwana
2009-05-06 15:16 ` Sander
2009-04-28 17:32 ` Thomas Glanzmann
2009-04-28 17:41 ` Michael Tharp
2009-04-28 20:14 ` Thomas Glanzmann
2009-05-04 14:29 ` Ric Wheeler
2009-05-04 14:39 ` Tomasz Chmielewski
2009-05-04 14:45 ` Ric Wheeler
2009-05-04 15:15 ` Thomas Glanzmann
2009-05-04 16:03 ` Ric Wheeler
2009-05-04 16:16 ` Andrey Kuzmin
2009-05-04 16:24 ` Thomas Glanzmann
2009-05-04 18:06 ` Jan-Frode Myklebust
2009-05-04 19:16 ` Andrey Kuzmin
2009-05-05 8:02 ` Thomas Glanzmann
2009-05-04 16:26 ` Thomas Glanzmann
2009-05-04 19:11 ` Heinz-Josef Claes
2009-05-04 21:29 ` Dmitri Nikulin
2009-05-05 7:18 ` Heinz-Josef Claes
2009-05-24 7:27 ` Thomas Glanzmann
2009-04-28 17:23 ` Chris Mason
2009-04-28 17:37 ` Thomas Glanzmann
2009-04-28 17:43 ` Chris Mason
2009-04-28 20:15 ` Thomas Glanzmann
2009-04-28 21:19 ` Dmitri Nikulin
2009-04-28 20:24 ` Thomas Glanzmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1240953977.15136.76.camel@think.oraclecorp.com \
--to=chris.mason@oracle.com \
--cc=edward.shishkin@gmail.com \
--cc=hjclaes@web.de \
--cc=linux-btrfs@vger.kernel.org \
--cc=mangoo@wpkg.org \
--cc=thomas@glanzmann.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.