linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@dilger.ca>
To: Dave Chinner <david@fromorbit.com>
Cc: Theodore Ts'o <tytso@mit.edu>,
	James Bottomley <James.Bottomley@HansenPartnership.com>,
	Mimi Zohar <zohar@linux.vnet.ibm.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	lsf-pc@lists.linux-foundation.org
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] fs-verity: file system-level integrity protection
Date: Thu, 1 Feb 2018 16:43:37 -0700	[thread overview]
Message-ID: <F38F74DF-0C5E-440C-A850-607FADAA9129@dilger.ca> (raw)
In-Reply-To: <20180201230415.7cyxhwq234vd3in4@destitution>

[-- Attachment #1: Type: text/plain, Size: 3904 bytes --]

On Feb 1, 2018, at 4:04 PM, Dave Chinner <david@fromorbit.com> wrote:
> 
> On Wed, Jan 31, 2018 at 07:03:16PM -0500, Theodore Ts'o wrote:
>> On Wed, Jan 31, 2018 at 12:41:13PM -0800, James Bottomley wrote:
>>>> Like fscrypto, where most of the code is in fs/crypto, most of the
>>>> fs-verity will be in fs/verity.  There will be minimal hooks in a
>>>> particular file system, so if another file system wants to play, then
>>>> can do so relatively easily.
>>> 
>>> OK, sounds good ... I notice, now I look, that fscrypt uses xattrs
>>> (albeit hidden under the covers of get/set_context), will dm-verity use
>>> the same trick or do people really need space in the inode?
>> 
>> I assume you mean fs-verity above, and no, we aren't going to use
>> xattrs because the Merkle tree won't fit in the xattr.  So the plan
>> was to put the fs-verity header, the PKCS7 signature, and the Merkle
>> tree after i_size (rounded to a blocksize boundary).  Remember, the
>> fs-verity case we only worry about the read-ony case.
> 
> I think putting valid data beyond EOF is going to be problematic for
> many filesystems. Getting things like truncate right are hard enough
> without having to special case a bunch of new functionality that
> specifically allows IO access beyond EOF. Indeed, how does "truncate
> isize but leave special data behind" work and what's the userspace
> API to drive it? And how does it interact with all the page cache
> code that checks for page->index beyond EOF to detect a truncated
> page that should not be accessed?
> 
> There's also further complications for filesystems like XFS e.g. how
> do we tell the difference between valid data beyond EOF and
> speculative allocation (done by delalloc) beyond EOF that contains
> no data and can be removed if it is not written to in a short while?
> 
> This just seems like a horrible can of worms to me and is not
> something we should be building generic infrastructure around.
> 
> Just how big do these merkle trees get, anyway?

The Merkle tree will have one checksum per "leaf block" of the filesystem
(though I'd recommend to use a fixed-size checksum leaf block like 4KB so
that userspace doesn't need to care about the actual filesystem blocksize
on disk).  After that, there is a tree of checksums from the leaf blocks
up to the root.  If there was a weak checksum like CRC32 (4 bytes/leaf)
then the tree size would be somewhat over 0.1% of the file size.  If the
tree has a strong checksum like SHA256 (32 bytes/leaf) then the overhead
is over 0.8%.

Strictly speaking, the whole Merkle tree does not need to be stored on
disk.  If the leaf checksums are stored (to allow random IO access with
data verification) and the root node (to allow verification of the rest
of the leaf blocks) then the intermediate tree could be recomputed with
relatively low overhead (0.1% vs. checksumming the whole file at open).

>> As I stated above, we need to put the Merkle tree after i_size anyway,
>> so the current plan doesn't use xattrs at all.  Xattr storage space is
>> also precious (especially if you are trying to keep all of the xattrs
> 
> No it's not. xattr space is specifically designed for uses like
> this, and if you have to take an extra IO to read it then that's the
> cost of storing large chunks of non-userdata data on a file. You;ve
> got to take extra IOs to read the merkle tree if it's stored beyond
> EOF anyway, so it doesn't matter if we take extra IOs to read it
> from an xattr....

Since the tree size depends on file size, it would hit the 64KB xattr size
limit at 64MB (CRC32) or 8MB (SHA256), unless we also allow larger xattrs
to userspace.  There was an ext4 feature landed in 4.13 to allow larger
on-disk xattrs than the previous 4KB (single block) limit (essentially any
size xattr could be stored), so that wouldn't be a problem if the userspace
xattr API limit was removed.

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

  reply	other threads:[~2018-02-01 23:43 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-25 19:11 [LSF/MM TOPIC] fs-verity: file system-level integrity protection Theodore Ts'o
2018-01-25 21:49 ` Chuck Lever
2018-01-25 23:39   ` Theodore Ts'o
2018-01-26  0:47 ` James Bottomley
2018-01-26  2:30   ` Theodore Ts'o
2018-01-26  4:50     ` James Bottomley
2018-01-26 14:58       ` Theodore Ts'o
2018-01-26 16:44         ` [Lsf-pc] " James Bottomley
2018-01-26 21:55           ` Theodore Ts'o
2018-01-27  7:58             ` Andreas Dilger
2018-01-27 16:19               ` James Bottomley
2018-01-27 17:08                 ` James Bottomley
2018-01-28  2:46                 ` Theodore Ts'o
2018-01-28 17:19                   ` James Bottomley
2018-01-28 18:03                   ` James Bottomley
2018-01-28 18:19                     ` Chuck Lever
2018-01-29  6:39                       ` James Bottomley
2018-01-29 15:22                         ` Chuck Lever
2018-01-30  6:47                           ` James Bottomley
2018-01-28 21:49                     ` Theodore Ts'o
2018-01-28 22:49                       ` Theodore Ts'o
2018-01-28 23:04                       ` Mimi Zohar
2018-01-29  0:38                         ` Theodore Ts'o
2018-01-29  1:53                           ` Mimi Zohar
2018-01-29  2:38                             ` Theodore Ts'o
2018-01-29  3:39                               ` Mimi Zohar
2018-01-29  4:40                                 ` Theodore Ts'o
2018-01-29  4:50                                 ` Theodore Ts'o
2018-01-29 12:09                                   ` Mimi Zohar
2018-01-29 13:58                                     ` Mimi Zohar
2018-01-29 23:02                                     ` Theodore Ts'o
2018-01-30 23:25                                       ` Mimi Zohar
2018-01-31 16:05                                         ` Theodore Ts'o
2018-01-31 17:12                                           ` James Bottomley
2018-01-31 18:46                                             ` Theodore Ts'o
2018-01-31 20:41                                               ` James Bottomley
2018-02-01  0:03                                                 ` Theodore Ts'o
2018-02-01 23:04                                                   ` Dave Chinner
2018-02-01 23:43                                                     ` Andreas Dilger [this message]
2018-02-02  0:13                                                       ` Dave Chinner
2018-02-02  5:34                                                       ` James Bottomley
2018-02-02  2:40                                                     ` Theodore Ts'o
2018-02-02  9:05                                                       ` Dave Chinner
2018-01-31 20:40                                           ` Mimi Zohar
2018-01-31 22:00                                             ` Theodore Ts'o
2018-02-01 15:17                                               ` Mimi Zohar
2018-01-29  0:21                       ` James Bottomley
2018-01-29  1:03                         ` Theodore Ts'o
2018-01-29 21:21                           ` Andreas Dilger
2018-01-26 18:13         ` Mimi Zohar
2018-01-29 18:54   ` Michael Halcrow
2018-01-26  7:58 ` Colin Walters
2018-01-26 15:29   ` Theodore Ts'o
2018-01-26 16:40     ` Colin Walters
2018-01-26 16:49       ` [Lsf-pc] " James Bottomley
2018-01-26 17:05         ` Colin Walters
2018-01-26 17:54 ` Mimi Zohar
2018-02-02  0:02 ` Steve French
2018-02-07 13:04 ` David Gstir

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F38F74DF-0C5E-440C-A850-607FADAA9129@dilger.ca \
    --to=adilger@dilger.ca \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=zohar@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).