public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Reproducible XFS Filesystems Builds for VMs
@ 2025-04-11 14:38 Luca DiMaio
  2025-04-14  5:39 ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Luca DiMaio @ 2025-04-11 14:38 UTC (permalink / raw)
  To: linux-xfs; +Cc: Scott Moser, Dimitri Ledkov

Subject: Reproducible XFS Filesystems Builds for VMs
linux-xfs@vger.kernel.org

Dear XFS Maintainers and Community,

I am a Software Engineer at Chainguard working on reproducible builds for VMs.

While we have successfully implemented reproducible disk images with
EFI+EXT4 partitions, I’ve been unable to replicate this for XFS
filesystems.
Current Approach:

We have successfully implemented reproducible disk images with
EFI+EXT4 partitions using the following methods:

- For FAT32 partitions: `mkfs.vfat --invarian -i $EFI_UUID` with
`$SOURCE_DATE_EPOCH` and populating via mtools
- For EXT4 partitions: `mkfs.ext4 -E hash_seed=$EXT4_HASH_SEED -U
$ROOTFS_UUID` with `$SOURCE_DATE_EPOCH` plus the `-d
/path/to/rootfs.tar.gz` to populate it

XFS Challenges:

For XFS, I've attempted to create reproducible filesystems using
extensive parameters:

```
mkfs.xfs \
-b size=4096 \
-d agcount=4 \
-d noalign \
-i attr=2 \
-i projid32bit=1 \
-i size=512 \
-l size=67108864 \
-l su=4096 \
-l version=2 \
-m crc=1 \
-m finobt=1 \
-m uuid=$ROOTFS_UUID \
-n size=16384 \
-n version=2 $root_partition
```

I've tried to specify as many options as possible in order to avoid
runtime aleatory decisions.

Unfortunately, this does not produce reproducible results across
different disk images.

I've made progress with empty filesystems by using a combination of
`libfaketime`
to enforce `$SOURCE_DATE_EPOCH` and a custom library that overwrites
the libc's `getrandom()`:

```
~$ export LD_PRELOAD="./deterministic_rng.so /usr/lib/faketime/libfaketime.so.1"
~$ mkfs.xfs \
-b size=4096 \
-d agcount=4 \
-d noalign \
-i attr=2 \
-i projid32bit=1 \
-i size=512 \
-l size=67108864 \
-l su=4096 \
-l version=2 \
-m crc=1 \
-m finobt=1 \
-m uuid=$ROOTFS_UUID \
-n size=16384 \
-n version=2 disk1.img
~$ mkfs.xfs \
-b size=4096 \
-d agcount=4 \
-d noalign \
-i attr=2 \
-i projid32bit=1 \
-i size=512 \
-l size=67108864 \
-l su=4096 \
-l version=2 \
-m crc=1 \
-m finobt=1 \
-m uuid=$ROOTFS_UUID \
-n size=16384 \
-n version=2 disk2.img
~$ md5sum disk*
c68c202163dcb862762fc01970f6c8b4  disk1.img
c68c202163dcb862762fc01970f6c8b4  disk2.img
```

This approach works for empty filesystems, but when populating the filesystem by
mounting and untarring an archive, different metadata is generated
even after using
`xfs_repair -L` to reset most metadata.

The primary difference appears to be in the allocation group metadata,
which is optimized at runtime.

Question:

EXT4 addresses this issue with the -d flag, which allows populating
from an archive or directory without mounting.
Is there similar functionality available for XFS, or is there interest
in developing a method for generating reproducible XFS root
filesystems?

I'm asking this because we'd be interested in using XFS as a filesystem for the
final product.

Thank you for your time and expertise. Any guidance would be greatly
appreciated.
Regards,
L.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-04-16 14:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-11 14:38 Reproducible XFS Filesystems Builds for VMs Luca DiMaio
2025-04-14  5:39 ` Christoph Hellwig
2025-04-14 16:53   ` Luca DiMaio
2025-04-16  5:34     ` Christoph Hellwig
2025-04-16  5:55       ` Darrick J. Wong
2025-04-16 14:50         ` Luca DiMaio

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox