* [PATCH v1] libxfs: support reproducible filesystems using deterministic time/seed
@ 2025-11-07 16:12 Luca Di Maio
2025-11-07 16:37 ` Darrick J. Wong
0 siblings, 1 reply; 3+ messages in thread
From: Luca Di Maio @ 2025-11-07 16:12 UTC (permalink / raw)
To: linux-xfs; +Cc: Luca Di Maio, dimitri.ledkov, smoser, djwong
Add support for reproducible filesystem creation through two environment
variables that enable deterministic behavior when building XFS filesystems.
SOURCE_DATE_EPOCH support:
When SOURCE_DATE_EPOCH is set, use its value for all filesystem timestamps
instead of the current time. This follows the reproducible builds
specification (https://reproducible-builds.org/specs/source-date-epoch/)
and ensures consistent inode timestamps across builds.
DETERMINISTIC_SEED support:
When DETERMINISTIC_SEED is set, use it to generate deterministic values
from get_random_u32() instead of reading from /dev/urandom. This ensures
that UUIDs, and other randomly-selected values are consistent across builds.
The implementation introduces two helper functions to minimize changes
to existing code:
- current_fixed_time(): Helper that parses and caches SOURCE_DATE_EPOCH.
Returns fixed timestamp when set, with fallback on parse errors.
- get_msws_prng_32(): Helper implementing Middle Square Weyl Sequence PRNG.
Uses DETERMINISTIC_SEED to generate deterministic pseudo-random sequence.
Accepts decimal/hex/octal values via base-0 parsing.
- Both helpers use one-time initialization to avoid repeated getenv() calls.
- Both quickly exit and noop if environment is not set or has invalid
variables, falling back to original behaviour.
Example usage:
SOURCE_DATE_EPOCH=1234567890 \
DETERMINISTIC_SEED=0xDEADBEEF \
mkfs.xfs \
-m uuid=$EXAMPLE_UUID \
-p file=./rootfs \
disk1.img
This enables distributions and build systems to create bit-for-bit
identical XFS filesystems when needed for verification and debugging.
Signed-off-by: Luca Di Maio <luca.dimaio1@gmail.com>
---
libxfs/util.c | 132 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 132 insertions(+)
diff --git a/libxfs/util.c b/libxfs/util.c
index 3597850d..676da81b 100644
--- a/libxfs/util.c
+++ b/libxfs/util.c
@@ -137,12 +137,69 @@ xfs_log_calc_unit_res(
return unit_bytes;
}
+/*
+ * current_fixed_time() tries to detect if SOURCE_DATE_EPOCH is in our
+ * environment, and set input timespec's timestamp to that value.
+ *
+ * Returns true on success, fail otherwise.
+ */
+bool
+current_fixed_time(
+ struct timespec64 *tv)
+{
+ /*
+ * To avoid many getenv() we'll use an initialization static flag, so
+ * we only read once.
+ */
+ static bool read_env = false;
+ static time64_t epoch = -1;
+ char *source_date_epoch;
+
+ if (!read_env) {
+ read_env = true;
+ source_date_epoch = getenv("SOURCE_DATE_EPOCH");
+ if (source_date_epoch && source_date_epoch[0] != '\0') {
+ errno = 0;
+ epoch = strtoul(source_date_epoch, NULL, 10);
+ if (errno != 0) {
+ epoch = -1;
+ return false;
+ }
+ }
+ }
+
+ /*
+ * This will happen only if we successfully read a valid
+ * SOURCE_DATE_EPOCH and properly initiated the epoch value.
+ */
+ if (read_env && epoch >= 0) {
+ tv->tv_sec = (time_t)epoch;
+ tv->tv_nsec = 0;
+ return true;
+ }
+
+ /*
+ * We initialized but had no valid SOURCE_DATE_EPOCH so we fall back
+ * to regular behaviour.
+ */
+ return false;
+}
+
struct timespec64
current_time(struct inode *inode)
{
struct timespec64 tv;
struct timeval stv;
+ /*
+ * Check if we're creating a reproducible filesystem.
+ * In this case we try to parse our SOURCE_DATE_EPOCH from environment.
+ * If it fails, fall back to returning gettimeofday()
+ * like we used to do.
+ */
+ if (current_fixed_time(&tv))
+ return tv;
+
gettimeofday(&stv, (struct timezone *)0);
tv.tv_sec = stv.tv_sec;
tv.tv_nsec = stv.tv_usec * 1000;
@@ -515,6 +572,72 @@ void xfs_dirattr_mark_sick(struct xfs_inode *ip, int whichfork) { }
void xfs_da_mark_sick(struct xfs_da_args *args) { }
void xfs_inode_mark_sick(struct xfs_inode *ip, unsigned int mask) { }
+/*
+ * get_msws_prng_32() tries to detect if DETERMINISTIC_SEED is in our
+ * environment, and set our result to a pseudo-random number instead of
+ * extracting from getrandom().
+ *
+ * Returns true on success, fail otherwise.
+ *
+ * This function uses Middle Square Weyl Sequence to create pseudo-random
+ * numbers based on our DETERMINISTIC_SEED.
+ * Ref: https://arxiv.org/pdf/1704.00358
+ */
+bool
+get_msws_prng_32(
+ uint32_t *result)
+{
+ /*
+ * To avoid many getenv() we'll use an initialization static flag, so
+ * we only read once.
+ */
+ static bool read_env = false;
+ /* MSWS state variables */
+ static uint64_t msws_c = 0; /* increment (user seed) */
+ static uint64_t msws_n = 0; /* current value */
+ static uint64_t msws_s = 0; /* accumulator */
+ char *seed;
+ unsigned long deterministic_seed;
+
+ if (!read_env) {
+ read_env = true;
+ seed = getenv("DETERMINISTIC_SEED");
+ if (seed && seed[0] != '\0') {
+ errno = 0;
+ deterministic_seed = strtoul(seed, NULL, 0);
+ if (errno != 0)
+ return false;
+
+ /*
+ * In this variation or MSWS we will use
+ * DETERMINISTIC_SEED as our odd number in the formula,
+ * so we will need to ensure it is odd.
+ */
+ msws_c = deterministic_seed | 1;
+ }
+ }
+
+ /*
+ * This will happen only if we successfully read a valid
+ * DETERMINISTIC_SEED and properly initiated the sequence.
+ */
+ if (read_env && msws_c != 0) {
+ msws_n *= msws_n;
+ msws_s += msws_c;
+ msws_n += msws_s;
+ msws_n = (msws_n >> 32) | (msws_n << 32);
+ *result = (uint32_t)msws_n;
+
+ return true;
+ }
+
+ /*
+ * We initialized but had no valid DETERMINISTIC_SEED so we fall back
+ * to regular behaviour.
+ */
+ return false;
+}
+
#ifdef HAVE_GETRANDOM_NONBLOCK
uint32_t
get_random_u32(void)
@@ -522,6 +645,15 @@ get_random_u32(void)
uint32_t ret;
ssize_t sz;
+ /*
+ * Check if we're creating a reproducible filesystem.
+ * In this case we try to parse our DETERMINISTIC_SEED from environment
+ * and use a pseudorandom number generator.
+ * If it fails, fall back to returning getrandom()
+ * like we used to do.
+ */
+ if (get_msws_prng_32(&ret))
+ return ret;
/*
* Try to extract a u32 of randomness from /dev/urandom. If that
* fails, fall back to returning zero like we used to do.
--
2.51.2
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH v1] libxfs: support reproducible filesystems using deterministic time/seed
2025-11-07 16:12 [PATCH v1] libxfs: support reproducible filesystems using deterministic time/seed Luca Di Maio
@ 2025-11-07 16:37 ` Darrick J. Wong
2025-11-07 17:38 ` Luca Di Maio
0 siblings, 1 reply; 3+ messages in thread
From: Darrick J. Wong @ 2025-11-07 16:37 UTC (permalink / raw)
To: Luca Di Maio; +Cc: linux-xfs, dimitri.ledkov, smoser
On Fri, Nov 07, 2025 at 05:12:41PM +0100, Luca Di Maio wrote:
> Add support for reproducible filesystem creation through two environment
> variables that enable deterministic behavior when building XFS filesystems.
>
> SOURCE_DATE_EPOCH support:
> When SOURCE_DATE_EPOCH is set, use its value for all filesystem timestamps
> instead of the current time. This follows the reproducible builds
> specification (https://reproducible-builds.org/specs/source-date-epoch/)
> and ensures consistent inode timestamps across builds.
>
> DETERMINISTIC_SEED support:
> When DETERMINISTIC_SEED is set, use it to generate deterministic values
> from get_random_u32() instead of reading from /dev/urandom. This ensures
> that UUIDs, and other randomly-selected values are consistent across builds.
>
> The implementation introduces two helper functions to minimize changes
> to existing code:
>
> - current_fixed_time(): Helper that parses and caches SOURCE_DATE_EPOCH.
> Returns fixed timestamp when set, with fallback on parse errors.
> - get_msws_prng_32(): Helper implementing Middle Square Weyl Sequence PRNG.
> Uses DETERMINISTIC_SEED to generate deterministic pseudo-random sequence.
> Accepts decimal/hex/octal values via base-0 parsing.
> - Both helpers use one-time initialization to avoid repeated getenv() calls.
> - Both quickly exit and noop if environment is not set or has invalid
> variables, falling back to original behaviour.
>
> Example usage:
> SOURCE_DATE_EPOCH=1234567890 \
> DETERMINISTIC_SEED=0xDEADBEEF \
> mkfs.xfs \
> -m uuid=$EXAMPLE_UUID \
> -p file=./rootfs \
> disk1.img
>
> This enables distributions and build systems to create bit-for-bit
> identical XFS filesystems when needed for verification and debugging.
>
> Signed-off-by: Luca Di Maio <luca.dimaio1@gmail.com>
> ---
> libxfs/util.c | 132 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 132 insertions(+)
>
> diff --git a/libxfs/util.c b/libxfs/util.c
> index 3597850d..676da81b 100644
> --- a/libxfs/util.c
> +++ b/libxfs/util.c
> @@ -137,12 +137,69 @@ xfs_log_calc_unit_res(
> return unit_bytes;
> }
>
> +/*
> + * current_fixed_time() tries to detect if SOURCE_DATE_EPOCH is in our
> + * environment, and set input timespec's timestamp to that value.
> + *
> + * Returns true on success, fail otherwise.
> + */
> +bool
> +current_fixed_time(
> + struct timespec64 *tv)
> +{
> + /*
> + * To avoid many getenv() we'll use an initialization static flag, so
> + * we only read once.
> + */
> + static bool read_env = false;
> + static time64_t epoch = -1;
> + char *source_date_epoch;
> +
> + if (!read_env) {
> + read_env = true;
> + source_date_epoch = getenv("SOURCE_DATE_EPOCH");
> + if (source_date_epoch && source_date_epoch[0] != '\0') {
> + errno = 0;
> + epoch = strtoul(source_date_epoch, NULL, 10);
time64_t is an alias for long long int, I think you want strtoll here.
Also you ought to provide an endptr so that you can check that strtoll
consumed the entire $SOURCE_DATE_EPOCH string. The
reproducible-builds.org spec you reference above says:
"The value MUST be an ASCII representation of an integer with no
fractional component, identical to the output format of date +%s."
which I interpret to mean that
# SOURCE_DATE_EPOCH=35hotdogs mkfs.xfs -f /dev/sda ...
shouldn't be allowed.
> + if (errno != 0) {
> + epoch = -1;
> + return false;
> + }
> + }
> + }
> +
> + /*
> + * This will happen only if we successfully read a valid
> + * SOURCE_DATE_EPOCH and properly initiated the epoch value.
> + */
> + if (read_env && epoch >= 0) {
Why disallow negative timestamps? Suppose I want all the new files to
have a timestamp of November 5th, 1955?
> + tv->tv_sec = (time_t)epoch;
time_t can be 32-bit; don't needlessly truncate epoch.
> + tv->tv_nsec = 0;
> + return true;
> + }
> +
> + /*
> + * We initialized but had no valid SOURCE_DATE_EPOCH so we fall back
> + * to regular behaviour.
> + */
> + return false;
> +}
> +
> struct timespec64
> current_time(struct inode *inode)
> {
> struct timespec64 tv;
> struct timeval stv;
>
> + /*
> + * Check if we're creating a reproducible filesystem.
> + * In this case we try to parse our SOURCE_DATE_EPOCH from environment.
> + * If it fails, fall back to returning gettimeofday()
> + * like we used to do.
> + */
> + if (current_fixed_time(&tv))
> + return tv;
> +
> gettimeofday(&stv, (struct timezone *)0);
> tv.tv_sec = stv.tv_sec;
> tv.tv_nsec = stv.tv_usec * 1000;
> @@ -515,6 +572,72 @@ void xfs_dirattr_mark_sick(struct xfs_inode *ip, int whichfork) { }
> void xfs_da_mark_sick(struct xfs_da_args *args) { }
> void xfs_inode_mark_sick(struct xfs_inode *ip, unsigned int mask) { }
>
> +/*
> + * get_msws_prng_32() tries to detect if DETERMINISTIC_SEED is in our
> + * environment, and set our result to a pseudo-random number instead of
> + * extracting from getrandom().
Why not return a fixed "random" value? Wouldn't that be more obviously
deterministic?
if (getenv("DETERMINISTIC_SEED"))
return 0x53414d45; /* "SAME" */
--D
> + *
> + * Returns true on success, fail otherwise.
> + *
> + * This function uses Middle Square Weyl Sequence to create pseudo-random
> + * numbers based on our DETERMINISTIC_SEED.
> + * Ref: https://arxiv.org/pdf/1704.00358
> + */
> +bool
> +get_msws_prng_32(
> + uint32_t *result)
> +{
> + /*
> + * To avoid many getenv() we'll use an initialization static flag, so
> + * we only read once.
> + */
> + static bool read_env = false;
> + /* MSWS state variables */
> + static uint64_t msws_c = 0; /* increment (user seed) */
> + static uint64_t msws_n = 0; /* current value */
> + static uint64_t msws_s = 0; /* accumulator */
> + char *seed;
> + unsigned long deterministic_seed;
> +
> + if (!read_env) {
> + read_env = true;
> + seed = getenv("DETERMINISTIC_SEED");
> + if (seed && seed[0] != '\0') {
> + errno = 0;
> + deterministic_seed = strtoul(seed, NULL, 0);
> + if (errno != 0)
> + return false;
> +
> + /*
> + * In this variation or MSWS we will use
> + * DETERMINISTIC_SEED as our odd number in the formula,
> + * so we will need to ensure it is odd.
> + */
> + msws_c = deterministic_seed | 1;
> + }
> + }
> +
> + /*
> + * This will happen only if we successfully read a valid
> + * DETERMINISTIC_SEED and properly initiated the sequence.
> + */
> + if (read_env && msws_c != 0) {
> + msws_n *= msws_n;
> + msws_s += msws_c;
> + msws_n += msws_s;
> + msws_n = (msws_n >> 32) | (msws_n << 32);
> + *result = (uint32_t)msws_n;
> +
> + return true;
> + }
> +
> + /*
> + * We initialized but had no valid DETERMINISTIC_SEED so we fall back
> + * to regular behaviour.
> + */
> + return false;
> +}
> +
> #ifdef HAVE_GETRANDOM_NONBLOCK
> uint32_t
> get_random_u32(void)
> @@ -522,6 +645,15 @@ get_random_u32(void)
> uint32_t ret;
> ssize_t sz;
>
> + /*
> + * Check if we're creating a reproducible filesystem.
> + * In this case we try to parse our DETERMINISTIC_SEED from environment
> + * and use a pseudorandom number generator.
> + * If it fails, fall back to returning getrandom()
> + * like we used to do.
> + */
> + if (get_msws_prng_32(&ret))
> + return ret;
> /*
> * Try to extract a u32 of randomness from /dev/urandom. If that
> * fails, fall back to returning zero like we used to do.
> --
> 2.51.2
>
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH v1] libxfs: support reproducible filesystems using deterministic time/seed
2025-11-07 16:37 ` Darrick J. Wong
@ 2025-11-07 17:38 ` Luca Di Maio
0 siblings, 0 replies; 3+ messages in thread
From: Luca Di Maio @ 2025-11-07 17:38 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs, dimitri.ledkov, smoser
Thanks Darrick for the review
On Fri, Nov 07, 2025 at 08:37:41AM -0800, Darrick J. Wong wrote:
> > + if (!read_env) {
> > + read_env = true;
> > + source_date_epoch = getenv("SOURCE_DATE_EPOCH");
> > + if (source_date_epoch && source_date_epoch[0] != '\0') {
> > + errno = 0;
> > + epoch = strtoul(source_date_epoch, NULL, 10);
>
> time64_t is an alias for long long int, I think you want strtoll here.
>
> Also you ought to provide an endptr so that you can check that strtoll
> consumed the entire $SOURCE_DATE_EPOCH string. The
> reproducible-builds.org spec you reference above says:
>
> "The value MUST be an ASCII representation of an integer with no
> fractional component, identical to the output format of date +%s."
>
> which I interpret to mean that
>
> # SOURCE_DATE_EPOCH=35hotdogs mkfs.xfs -f /dev/sda ...
>
> shouldn't be allowed.
>
Alright, makes things easier I guess
> > + if (errno != 0) {
> > + epoch = -1;
> > + return false;
> > + }
> > + }
> > + }
> > +
> > + /*
> > + * This will happen only if we successfully read a valid
> > + * SOURCE_DATE_EPOCH and properly initiated the epoch value.
> > + */
> > + if (read_env && epoch >= 0) {
>
> Why disallow negative timestamps? Suppose I want all the new files to
> have a timestamp of November 5th, 1955?
>
Wanted to reuse the epoch also as a way to track initialization of the
value, but I guess using another variable unblocks this
> > + tv->tv_sec = (time_t)epoch;
>
> time_t can be 32-bit; don't needlessly truncate epoch.
>
Ack
> Why not return a fixed "random" value? Wouldn't that be more obviously
> deterministic?
>
> if (getenv("DETERMINISTIC_SEED"))
> return 0x53414d45; /* "SAME" */
>
> --D
>
I wanted to make sure other stuff that could rely on having different
numbers from random, didn't break, but I guess it makes things even
simpler this way
L.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-11-07 17:38 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-07 16:12 [PATCH v1] libxfs: support reproducible filesystems using deterministic time/seed Luca Di Maio
2025-11-07 16:37 ` Darrick J. Wong
2025-11-07 17:38 ` Luca Di Maio
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).