linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Anand Jain <anand.jain@oracle.com>
To: Chris Murphy <lists@colorremedies.com>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: reproducible builds with btrfs seed feature
Date: Wed, 17 Oct 2018 12:08:12 +0800	[thread overview]
Message-ID: <d63839e6-6d76-2457-cbe5-7e5a60181d23@oracle.com> (raw)
In-Reply-To: <CAJCQCtSfuLTsxXfEkmz59PrOJ3LrXmGNG+_NKS_LPk9dHTgkug@mail.gmail.com>



On 10/17/2018 03:49 AM, Chris Murphy wrote:
> On Tue, Oct 16, 2018 at 2:13 AM, Anand Jain <anand.jain@oracle.com> wrote:
>>
>>
>> On 10/14/2018 06:28 AM, Chris Murphy wrote:
>>>
>>> Is it practical and desirable to make Btrfs based OS installation
>>> images reproducible? Or is Btrfs simply too complex and
>>> non-deterministic? [1]
>>>
>>> The main three problems with Btrfs right now for reproducibility are:
>>> a. many objects have uuids other than the volume uuid; and mkfs only
>>> lets us set the volume uuid
>>> b. atime, ctime, mtime, otime; and no way to make them all the same
>>> c. non-deterministic allocation of file extents, compression, inode
>>> assignment, logical and physical address allocation
>>>
>>> I'm imagining reproducible image creation would be a mkfs feature that
>>> builds on Btrfs seed and --rootdir concepts to constrain Btrfs
>>> features to maybe make reproducible Btrfs volumes possible:
>>>
>>> - No raid
>>> - Either all objects needing uuids can have those uuids specified by
>>> switch, or possibly a defined set of uuids expressly for this use
>>> case, or possibly all of them can just be zeros (eek? not sure)
>>> - A flag to set all times the same
>>> - Possibly require that target block device is zero filled before
>>> creation of the Btrfs
>>> - Possibly disallow subvolumes and snapshots
>>> - Require the resulting image is seed/ro and maybe also a new
>>> compat_ro flag to enforce that such Btrfs file systems cannot be
>>> modified after the fact.
>>> - Enforce a consistent means of allocation and compression
>>>
>>> The end result is creating two Btrfs volumes would yield image files
>>> with matching hashes.
>>
>>
>>> If I had to guess, the biggest challenge would be allocation. But it's
>>> also possible that such an image may have problems with "sprouts". A
>>> non-removable sprout seems fairly straightforward and safe; but if a
>>> "reproducible build" type of seed is removed, it seems like removal
>>> needs to be smart enough to refresh *all* uuids found in the sprout: a
>>> hard break from the seed.
>>
>>
>> Right. The seed fsid will be gone in a detached sprout.
> 
> I think already we get a new devid, volume uuid, and device uuid.

  Yes on the sprout.

> Open
> question is whether any other uuid's need to be refreshed, such as
> chunk uuid since that appears in every node and leaf.

  There are quite a number of uuid.

>>> Any thoughts? Useful? Difficult to implement?
>>
>> Recently Nikolay sent a patch to change fsid on a mounted btrfs. However for
>> a reproducible builds it also needs neutralized uuids, time, bytenr(s)
>> further more though the ondisk layout won't change without notice but
>> block-bytenr might.
> 
> Seems like the mkfs population method of such a seed,

> could be made
> very deterministic as to what the start logical address and physical
> address are.

  Can be. But it can change in future fixes as those aren't EXPORTED().

> The vast majority of non-deterministic behavior comes
> from the nature of kernel code having to handle so many complex inputs
> and outputs, and negotiate them.

> 
>> One question why not reproducible builds get the file data extents from the
>> image and stitch the hashes together to verify the hash. And there could be
>> a vfs ioctl to import and export filesystem images for a better
>> support-ability of the use-case similar to the reproducible builds.
> 
> Perhaps. I don't know the reproducible build requirements very well,
> if all they really care about is the hash of the data extents, and
> really how important fs metadata is.


> That is important when it comes
> to fuzzing file systems that have no metadata checksumming like
> squashfs; of course you'd have to checksum the whole file system
> image.


> Another feature the mkfs variety of seed image would need,
> deduplication.  As far as I know, deduplication is kernel code only.
> You'd want to be able to deduplicate, 


> as well as compress, to have the
> smallest distributed seed possible.

btrfs-image(8) already does compress.

I don't think mkfs is the right place to sanitize the uuid/fsid/time... 
it should be when we generate the btrfs-image.

  So a possible solution for the reproducible builds:
    usual mkfs.btrfs dev
    Write the data
    unmount; create btrfs-image with uuid/fsid/time sanitized; mark it 
as a seed (RO).
    check/verify the hash of the image.

   If the hash match. To use this btrfs-image.
    Rest the seed (RO) flag; mount and use it;
    OR
    Mount the seed device; add a RW sprout; detach the seed;
    OR
    Don't set the RO at all (above) and just mount and use it;

Thanks, Anand

> And mksquashfs does deduplication
> by default.



  reply	other threads:[~2018-10-17  4:08 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-13 22:28 reproducible builds with btrfs seed feature Chris Murphy
2018-10-13 23:05 ` Chris Murphy
2018-10-14 12:20   ` Cerem Cem ASLAN
2018-10-14 18:10     ` Chris Murphy
2018-10-14 19:09       ` Cerem Cem ASLAN
2018-10-14 23:38         ` Chris Murphy
2018-10-15 12:29 ` Austin S. Hemmelgarn
2018-10-15 19:52   ` Chris Murphy
2018-10-16  8:13 ` Anand Jain
2018-10-16 19:49   ` Chris Murphy
2018-10-17  4:08     ` Anand Jain [this message]
2018-10-18 18:02       ` Chris Murphy
2018-10-19  0:47         ` Anand Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d63839e6-6d76-2457-cbe5-7e5a60181d23@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).