Re: btrfs checksum - Chris Mason

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Chris Mason <clm@fb.com>
To: Jan Kasiak <j.kasiak@gmail.com>, <bo.li.liu@oracle.com>,
	<linux-btrfs@vger.kernel.org>
Subject: Re: btrfs checksum
Date: Thu, 1 May 2014 11:17:49 -0400	[thread overview]
Message-ID: <5362659D.3030003@fb.com> (raw)
In-Reply-To: <CAD15GZfSy0QWzpNQ2X2_3jUMBMV-WN4obAsnHH=Ea9xt-BEA7w@mail.gmail.com>

On 05/01/2014 12:16 AM, Jan Kasiak wrote:
> Is there a design/technical reason behind btrfs using checksums
> separately per block, versus checksumming into a merkle tree?

We're using crc32c, which isn't suitable for detecting malicious data in 
general.  The goal was just to find blocks that were not correctly 
returned by the storage.  But, more below:

>
> Where I'm coming from: there's a Linux kernel device mapper module
> called dm-verity, which let's you verify the contents of a block
> device using a merkle tree (at mount time you provide the root hash).
> For my project, I'm modifying the module to enable write support, but
> in order to maintain consistency in the event of a power failure, I
> have to do the equivalent of data journaling to a circular log (so I
> end up writing data twice).
>
> Where I'm going with this: I've been looking into what improvements
> could be made if this were implemented at the filesystem level, and
> btrfs looks like a good candidate. But it already has a checksumming
> scheme, which is incompatible with a merkle tree, and would be
> redundant if a merkle tree would be implemented as well. (zfs has a
> merkle tree, but from what I can tell, it doesn't expose the root hash
> to the user)
>
> What the example use case is: you want to detect malicious data
> tampering across the entire file system, so at mount time you provide
> the merkle tree root hash. You get back an updated version of the root
> hash when you unmount, which you store in a secure place until the
> next time you mount the filesystem. You can detect if anyone modifies
> your data in between mounts, and also if they modify the underlying
> storage while the filesystem is mounted.

We're actively looking into schemes to detect malicious changes to the 
FS data.  A merkle tree could actually work fairly well.  The part I was 
missing the last time I thought about this was the data blocks.

I was thinking that we'd have to include the checksum of the data blocks 
in the combined crc of the leaves in the filesystem tree.

But if we keep the crc tree (using stronger crcs instead), we can just 
do the merkle on the crc tree and the regular metadata blocks separately.

I think the biggest problem is that we wouldn't be allowed to write a 
tree node until all of its children had been crc'd.  We've avoided these 
kinds of write ordering rules in the past because they tend to make 
things very difficult under memory pressure.

-chris

     prev parent reply	other threads:[~2014-05-01 15:17 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-01  4:16 btrfs checksum Jan Kasiak
2014-05-01 15:17 ` Chris Mason [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5362659D.3030003@fb.com \
    --to=clm@fb.com \
    --cc=bo.li.liu@oracle.com \
    --cc=j.kasiak@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).