From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Chris Murphy <lists@colorremedies.com>,
Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Cc: Anand Jain <anand.jain@oracle.com>
Subject: Re: reproducible builds with btrfs seed feature
Date: Mon, 15 Oct 2018 08:29:05 -0400 [thread overview]
Message-ID: <a7dd8412-15e3-f151-986c-8f34decbb532@gmail.com> (raw)
In-Reply-To: <CAJCQCtTPwQnzwkpk=4ZsZXfWTC7HymYETxp-9xUU_tsvOTW0ZQ@mail.gmail.com>
On 2018-10-13 18:28, Chris Murphy wrote:
> Is it practical and desirable to make Btrfs based OS installation
> images reproducible? Or is Btrfs simply too complex and
> non-deterministic? [1]
>
> The main three problems with Btrfs right now for reproducibility are:
> a. many objects have uuids other than the volume uuid; and mkfs only
> lets us set the volume uuid
> b. atime, ctime, mtime, otime; and no way to make them all the same
> c. non-deterministic allocation of file extents, compression, inode
> assignment, logical and physical address allocation
>
> I'm imagining reproducible image creation would be a mkfs feature that
> builds on Btrfs seed and --rootdir concepts to constrain Btrfs
> features to maybe make reproducible Btrfs volumes possible:
>
> - No raid
> - Either all objects needing uuids can have those uuids specified by
> switch, or possibly a defined set of uuids expressly for this use
> case, or possibly all of them can just be zeros (eek? not sure)
> - A flag to set all times the same
> - Possibly require that target block device is zero filled before
> creation of the Btrfs
> - Possibly disallow subvolumes and snapshots
> - Require the resulting image is seed/ro and maybe also a new
> compat_ro flag to enforce that such Btrfs file systems cannot be
> modified after the fact.
> - Enforce a consistent means of allocation and compression
>
> The end result is creating two Btrfs volumes would yield image files
> with matching hashes.
So in other words, you care about matching the block layout _exactly_.
This is a great idea for paranoid people, but it's usually overkill.
Realistically, almost nothing in userspace cares about the block layout,
worrying about it just makes verifying the reproduced image a bit easier
(there's no reason you can't verify all the relevant data without doing
a checksum or HMAC of the image as a whole).
>
> If I had to guess, the biggest challenge would be allocation. But it's
> also possible that such an image may have problems with "sprouts". A
> non-removable sprout seems fairly straightforward and safe; but if a
> "reproducible build" type of seed is removed, it seems like removal
> needs to be smart enough to refresh *all* uuids found in the sprout: a
> hard break from the seed.
>
> Competing file systems, ext4 with make_ext4 fork, and squashfs. At the
> moment I'm thinking it might be easier to teach squashfs integrity
> checking than to make Btrfs reproducible. But then I also think
> restricting Btrfs features, and applying some requirements to
> constrain Btrfs to make it reproducible, really enhances the Btrfs
> seed-sprout feature.
>
> Any thoughts? Useful? Difficult to implement?
>
> Squashfs might be a better fit for this use case *if* it can be taught
> about integrity checking. It does per file checksums for the purpose
> of deduplication but those checksums aren't retained for later
> integrity checking.
I've seen projects with SquashFS that store integrity data separately
but leverage other infrastructure. Methods I've seen so far include:
* GPG-signed SquashFS images, usually with detached signatures
* SquashFS with PAR2 integrity checking data
* SquashFS on top of dm-verity
* SquashFS on top of dm-integrity
The first two need to be externally checked prior to mount, but doing so
is not hard. The fourth is tricky to set up right, but provides better
integration with encrypted images. The third does exactly what's needed
though. You just use the embedded data variant of dm-verity, bind the
resultant image to a loop device, activate dm-verity on the loop device,
and mount the resultant mapped device like any other SquashFS image.
I've also seen some talk of using SquashFS with IMA and IMA appraisal,
but I've not seen anybody actually _do_ that, and it wouldn't be on
quite the level you seem to want (it verifies the files in the image,
but not the image as a whole).
next prev parent reply other threads:[~2018-10-15 12:29 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-13 22:28 reproducible builds with btrfs seed feature Chris Murphy
2018-10-13 23:05 ` Chris Murphy
2018-10-14 12:20 ` Cerem Cem ASLAN
2018-10-14 18:10 ` Chris Murphy
2018-10-14 19:09 ` Cerem Cem ASLAN
2018-10-14 23:38 ` Chris Murphy
2018-10-15 12:29 ` Austin S. Hemmelgarn [this message]
2018-10-15 19:52 ` Chris Murphy
2018-10-16 8:13 ` Anand Jain
2018-10-16 19:49 ` Chris Murphy
2018-10-17 4:08 ` Anand Jain
2018-10-18 18:02 ` Chris Murphy
2018-10-19 0:47 ` Anand Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a7dd8412-15e3-f151-986c-8f34decbb532@gmail.com \
--to=ahferroin7@gmail.com \
--cc=anand.jain@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=lists@colorremedies.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).