From: Chris Mason <chris.mason@oracle.com>
To: Daniel J Blueman <daniel.blueman@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>,
Linux BTRFS <linux-btrfs@vger.kernel.org>,
Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: file/extent checksums for dedup/sync...
Date: Wed, 27 Jan 2010 15:15:50 -0500 [thread overview]
Message-ID: <20100127201550.GW2770@think> (raw)
In-Reply-To: <6278d2221001270523r13ab8973v927c8b60da181c9c@mail.gmail.com>
On Wed, Jan 27, 2010 at 01:23:28PM +0000, Daniel J Blueman wrote:
> On Wed, Jan 27, 2010 at 12:30 PM, Andi Kleen <andi@firstfloor.org> wrote:
> > Daniel J Blueman <daniel.blueman@gmail.com> writes:
> >
> >> For purposes of data deduplication and data synchronisation, it would
> >> be a powerful tool to expose file data checksums.
> >>
> >> Since eg BTRFS uses the crc32c algorithm [1], it's possible to compute
> >> the file's overall CRC from the accumulation of the CRCs from all it's
> >> extents' CRCs.
> >>
> >> For now, exposing this via an IOCTL may be sufficient, though any
> >> ideas for introducing it in a more standard way? (it's a pity that
> >> when stat64 was introduced, reserved fields weren't added)
> >
> > The problem of doing it in any "standard way" is that it would
> > hard code the way the file system does checksums in the applications.
> > So the file system could never change it without breaking
> > user space.
At the end of the day the checksums are also hard coded on disk. We
can't add a new way without continuing to support the old one.
>
> I guess the filesystem would need to express this in the resulting
> data-structure, eg:
> - type 1 corresponds to using the crc32c algorithm with starting seed
> N and accumulating ascending over data extents, padding with modulus
> remainder or sparse holes with 0
> - type 2 etc
Yes, if they were exported to userland we'd need to export version info.
>
> The next question, is does filesystem (eg BTRFS) compression come
> before or after checksumming?
The checksums are based on what is on disk, so they are done on the
compressed data.
-chris
prev parent reply other threads:[~2010-01-27 20:15 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-27 12:10 file/extent checksums for dedup/sync Daniel J Blueman
2010-01-27 12:30 ` Andi Kleen
2010-01-27 13:23 ` Daniel J Blueman
2010-01-27 20:15 ` Chris Mason [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100127201550.GW2770@think \
--to=chris.mason@oracle.com \
--cc=andi@firstfloor.org \
--cc=daniel.blueman@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.