* [PATCH 0/2] reftable/stack: stop dying on exhausted entropy pool
@ 2025-01-07 15:26 Patrick Steinhardt
2025-01-07 15:26 ` [PATCH 1/2] wrapper: allow generating insecure random bytes Patrick Steinhardt
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Patrick Steinhardt @ 2025-01-07 15:26 UTC (permalink / raw)
To: git; +Cc: Randall S. Becker
Hi,
this small patch series fixes the issue reported by Randall [1], where
an exhausted entropy pool can cause us to die when writing a new table
to the reftable stack. I _think_ that this is only an issue with the
OpenSSL backend of `csprng_bytes()`:
- `arc4random_buf()` never returns an error.
- `getrandom()` pulls from "/dev/urandom" by default.
- `getentropy()` seems to block when there is not enough randomness
available.
- `GtlGenRandom()` I cannot really tell.
- The fallback reads from "/dev/urandom", which also returns bytes in
case the entropy pool is drained.
So OpenSSL's `RAND_bytes()` seems to be the only one that returns an
error when the entropy pool is empty. I did wonder whether we even need
to introduce the new flag in the first place, or whether we cannot just
use `RAND_pseudo_bytes()` unconditionally. But I'm a bit uneasy about it
given that OpenSSL has this doc:
RAND_pseudo_bytes() puts num pseudo-random bytes into buf.
Pseudo-random byte sequences generated by RAND_pseudo_bytes() will
be unique if they are of sufficient length, but are not necessarily
unpredictable. They can be used for non-cryptographic purposes and
for certain purposes in cryptographic protocols, but usually not for
key generation etc.
It might be too easy to accidentally rely on `csprng_bytes()` where it
actually requires strong cryptographic data, so I was erring on the side
of caution.
Thanks!
---
Patrick Steinhardt (2):
wrapper: allow generating insecure random bytes
reftable/stack: accept insecure random bytes
builtin/gc.c | 2 +-
reftable/stack.c | 4 ++--
t/helper/test-csprng.c | 2 +-
t/unit-tests/t-reftable-readwrite.c | 6 +++---
wrapper.c | 24 ++++++++++++++----------
wrapper.h | 16 ++++++++++++----
6 files changed, 33 insertions(+), 21 deletions(-)
---
base-commit: b74ff38af58464688b211140b90ec90598d340c6
change-id: 20250107-b4-pks-reftable-csprng-9ed4e8dd83c4
^ permalink raw reply [flat|nested] 19+ messages in thread* [PATCH 1/2] wrapper: allow generating insecure random bytes 2025-01-07 15:26 [PATCH 0/2] reftable/stack: stop dying on exhausted entropy pool Patrick Steinhardt @ 2025-01-07 15:26 ` Patrick Steinhardt 2025-01-07 15:27 ` [PATCH 2/2] reftable/stack: accept " Patrick Steinhardt 2025-01-07 23:21 ` [PATCH 0/2] reftable/stack: stop dying on exhausted entropy pool brian m. carlson 2 siblings, 0 replies; 19+ messages in thread From: Patrick Steinhardt @ 2025-01-07 15:26 UTC (permalink / raw) To: git; +Cc: Randall S. Becker The `csprng_bytes()` function generates randomness and writes it into a caller-provided buffer. It abstracts over a couple of implementations, where the exact one that is used depends on the platform. These implementations have different guarantees: while some guarantee to never fail (arc4random(3)), others may fail. There are two significant failures to distinguish from one another: - Systemic failure, where e.g. opening "/dev/urandom" fails or when OpenSSL doesn't have a provider configured. - Entropy failure, where the entropy pool is exhausted, and thus the function cannot guarantee strong cryptographic randomness. While we cannot do anything about the former, the latter failure can be acceptable in some situations where we don't care whether or not the randomness can be predicted. Introduce a new `CSPRNG_BYTES_INSECURE` flag that allows callers to opt into weak cryptographic randomness. The exact behaviour of the flag depends on the underlying implementation: - `arc4random_buf()` never returns an error, so it doesn't change. - `getrandom()` pulls from "/dev/urandom" by default, which never blocks on modern systems even when the entropy pool is empty. - `getentropy()` seems to block when there is not enough randomness available, and there is no way of changing that behaviour. - `GtlGenRandom()` doesn't mention anything about its specific failure mode. - The fallback reads from "/dev/urandom", which also returns bytes in case the entropy pool is drained in modern Linux systems. That only leaves OpenSSL with `RAND_bytes()`, which returns an error in case the returned data wouldn't be cryptographically safe. This function is replaced with a call to `RAND_pseudo_bytes()`, which can indicate whether or not the returned data is cryptographically secure via its return value. If it is insecure, and if the `CSPRNG_BYTES_INSECURE` flag is set, then we ignore the insecurity and return the data regardless. It is somewhat questionable whether we really need the flag in the first place, or whether we wouldn't just ignore the potentially-insecure data. But the risk of doing that is that we might have or grow callsites that aren't aware of the potential insecureness of the data in places where it really matters. So using a flag to opt-in to that behaviour feels like the more secure choice. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/gc.c | 2 +- reftable/stack.c | 2 +- t/helper/test-csprng.c | 2 +- t/unit-tests/t-reftable-readwrite.c | 6 +++--- wrapper.c | 24 ++++++++++++++---------- wrapper.h | 16 ++++++++++++---- 6 files changed, 32 insertions(+), 20 deletions(-) diff --git a/builtin/gc.c b/builtin/gc.c index a9b1c36de27da2ffb16aaa45e979cbaeec18bfa1..3e754f25bba2aa16fbae9b2b6229961a53409395 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -1909,7 +1909,7 @@ static int get_random_minute(void) if (getenv("GIT_TEST_MAINT_SCHEDULER")) return 13; - return git_rand() % 60; + return git_rand(0) % 60; } static int is_launchctl_available(void) diff --git a/reftable/stack.c b/reftable/stack.c index 531660a49f0948c33041831ee0d740feacb22b2f..6d0aa774e7e29d5366ed55df19725944f8eef792 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -659,7 +659,7 @@ int reftable_stack_add(struct reftable_stack *st, static int format_name(struct reftable_buf *dest, uint64_t min, uint64_t max) { char buf[100]; - uint32_t rnd = (uint32_t)git_rand(); + uint32_t rnd = (uint32_t)git_rand(0); snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x", min, max, rnd); reftable_buf_reset(dest); diff --git a/t/helper/test-csprng.c b/t/helper/test-csprng.c index a4a0aca61773b0b30de618955e5a5b61bba2d0cc..c86dcc4870fed8ab81aa7c6c17d77c0b32101cf8 100644 --- a/t/helper/test-csprng.c +++ b/t/helper/test-csprng.c @@ -15,7 +15,7 @@ int cmd__csprng(int argc, const char **argv) while (count) { unsigned long chunk = count < sizeof(buf) ? count : sizeof(buf); - if (csprng_bytes(buf, chunk) < 0) { + if (csprng_bytes(buf, chunk, 0) < 0) { perror("failed to read"); return 5; } diff --git a/t/unit-tests/t-reftable-readwrite.c b/t/unit-tests/t-reftable-readwrite.c index 6b75a419b9d3bbfb720a3c39f741f940dfcd56aa..f22b9775639e3ec9167336100753d5736e5a2957 100644 --- a/t/unit-tests/t-reftable-readwrite.c +++ b/t/unit-tests/t-reftable-readwrite.c @@ -108,8 +108,8 @@ static void t_log_buffer_size(void) hash, to ensure that the compressed part is larger than the original. */ for (i = 0; i < REFTABLE_HASH_SIZE_SHA1; i++) { - log.value.update.old_hash[i] = (uint8_t)(git_rand() % 256); - log.value.update.new_hash[i] = (uint8_t)(git_rand() % 256); + log.value.update.old_hash[i] = (uint8_t)(git_rand(0) % 256); + log.value.update.new_hash[i] = (uint8_t)(git_rand(0) % 256); } reftable_writer_set_limits(w, update_index, update_index); err = reftable_writer_add_log(w, &log); @@ -325,7 +325,7 @@ static void t_log_zlib_corruption(void) }; for (i = 0; i < sizeof(message) - 1; i++) - message[i] = (uint8_t)(git_rand() % 64 + ' '); + message[i] = (uint8_t)(git_rand(0) % 64 + ' '); reftable_writer_set_limits(w, 1, 1); diff --git a/wrapper.c b/wrapper.c index fa79fd6ec9e144509f34e007bb2ffa458d69e695..8b985931490d62acb2030c147b8728b34917df8a 100644 --- a/wrapper.c +++ b/wrapper.c @@ -479,7 +479,7 @@ int git_mkstemps_mode(char *pattern, int suffix_len, int mode) for (count = 0; count < TMP_MAX; ++count) { int i; uint64_t v; - if (csprng_bytes(&v, sizeof(v)) < 0) + if (csprng_bytes(&v, sizeof(v), 0) < 0) return error_errno("unable to get random bytes for temporary file"); /* Fill in the random bits. */ @@ -750,7 +750,7 @@ int open_nofollow(const char *path, int flags) #endif } -int csprng_bytes(void *buf, size_t len) +int csprng_bytes(void *buf, size_t len, MAYBE_UNUSED unsigned flags) { #if defined(HAVE_ARC4RANDOM) || defined(HAVE_ARC4RANDOM_LIBBSD) /* This function never returns an error. */ @@ -785,14 +785,18 @@ int csprng_bytes(void *buf, size_t len) return -1; return 0; #elif defined(HAVE_OPENSSL_CSPRNG) - int res = RAND_bytes(buf, len); - if (res == 1) + switch (RAND_pseudo_bytes(buf, len)) { + case 1: return 0; - if (res == -1) - errno = ENOTSUP; - else + case 0: + if (flags & CSPRNG_BYTES_INSECURE) + return 0; errno = EIO; - return -1; + return -1; + default: + errno = ENOTSUP; + return -1; + } #else ssize_t res; char *p = buf; @@ -816,11 +820,11 @@ int csprng_bytes(void *buf, size_t len) #endif } -uint32_t git_rand(void) +uint32_t git_rand(unsigned flags) { uint32_t result; - if (csprng_bytes(&result, sizeof(result)) < 0) + if (csprng_bytes(&result, sizeof(result), flags) < 0) die(_("unable to get random bytes")); return result; diff --git a/wrapper.h b/wrapper.h index a6b3e1f09ecdef72fdcc4a234df7a176c4daded6..7df824e34a906e9f822ad3bee27b8b1b0000621b 100644 --- a/wrapper.h +++ b/wrapper.h @@ -127,18 +127,26 @@ int open_nofollow(const char *path, int flags); void sleep_millisec(int millisec); +enum { + /* + * Accept insecure bytes, which some CSPRNG implementations may return + * in case the entropy pool has been exhausted. + */ + CSPRNG_BYTES_INSECURE = (1 << 0), +}; + /* * Generate len bytes from the system cryptographically secure PRNG. * Returns 0 on success and -1 on error, setting errno. The inability to - * satisfy the full request is an error. + * satisfy the full request is an error. Accepts CSPRNG flags. */ -int csprng_bytes(void *buf, size_t len); +int csprng_bytes(void *buf, size_t len, unsigned flags); /* * Returns a random uint32_t, uniformly distributed across all possible - * values. + * values. Accepts CSPRNG flags. */ -uint32_t git_rand(void); +uint32_t git_rand(unsigned flags); /* Provide log2 of the given `size_t`. */ static inline unsigned log2u(uintmax_t sz) -- 2.48.0.rc1.245.gb3e6e7acbc.dirty ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 2/2] reftable/stack: accept insecure random bytes 2025-01-07 15:26 [PATCH 0/2] reftable/stack: stop dying on exhausted entropy pool Patrick Steinhardt 2025-01-07 15:26 ` [PATCH 1/2] wrapper: allow generating insecure random bytes Patrick Steinhardt @ 2025-01-07 15:27 ` Patrick Steinhardt 2025-01-07 15:37 ` rsbecker ` (2 more replies) 2025-01-07 23:21 ` [PATCH 0/2] reftable/stack: stop dying on exhausted entropy pool brian m. carlson 2 siblings, 3 replies; 19+ messages in thread From: Patrick Steinhardt @ 2025-01-07 15:27 UTC (permalink / raw) To: git; +Cc: Randall S. Becker The reftable library uses randomness in two call paths: - When reading a stack in case some of the referenced tables disappears. The randomness is used to delay the next read by a couple of milliseconds. - When writing a new table, where the randomness gets appended to the table name (e.g. "0x000000000001-0x000000000002-0b1d8ddf.ref"). In neither of these cases do we need strong randomness. Unfortunately though, we have observed test failures caused by the former case. In t0610 we have a test that spawns a 100 processes at once, all of which try to write a new table to the stack. And given that all of the processes will require randomness, it can happen that these processes make the entropy pool run dry, which will then cause us to die: + test_seq 100 + printf %s commit\trefs/heads/branch-%s\n 68d032e9edd3481ac96382786ececc37ec28709e 1 + printf %s commit\trefs/heads/branch-%s\n 68d032e9edd3481ac96382786ececc37ec28709e 2 ... + git update-ref refs/heads/branch-98 HEAD + git update-ref refs/heads/branch-97 HEAD + git update-ref refs/heads/branch-99 HEAD + git update-ref refs/heads/branch-100 HEAD fatal: unable to get random bytes fatal: unable to get random bytes fatal: unable to get random bytes fatal: unable to get random bytes fatal: unable to get random bytes fatal: unable to get random bytes fatal: unable to get random bytes The report was for NonStop, which uses OpenSSL as the backend for randomness. In the preceding commit we have adapted that backend to also return randomness in case the entropy pool is empty and the caller passes the `CSPRNG_BYTES_INSECURE` flag. Do so to fix the issue. Reported-by: Randall S. Becker <rsbecker@nexbridge.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> --- reftable/stack.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/reftable/stack.c b/reftable/stack.c index 6d0aa774e7e29d5366ed55df19725944f8eef792..572a74e00f9ed6040534e060652e72c26641749d 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -493,7 +493,7 @@ static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st, close(fd); fd = -1; - delay = delay + (delay * rand()) / RAND_MAX + 1; + delay = delay + (delay * git_rand(CSPRNG_BYTES_INSECURE)) / UINT32_MAX + 1; sleep_millisec(delay); } @@ -659,7 +659,7 @@ int reftable_stack_add(struct reftable_stack *st, static int format_name(struct reftable_buf *dest, uint64_t min, uint64_t max) { char buf[100]; - uint32_t rnd = (uint32_t)git_rand(0); + uint32_t rnd = git_rand(CSPRNG_BYTES_INSECURE); snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x", min, max, rnd); reftable_buf_reset(dest); -- 2.48.0.rc1.245.gb3e6e7acbc.dirty ^ permalink raw reply related [flat|nested] 19+ messages in thread
* RE: [PATCH 2/2] reftable/stack: accept insecure random bytes 2025-01-07 15:27 ` [PATCH 2/2] reftable/stack: accept " Patrick Steinhardt @ 2025-01-07 15:37 ` rsbecker 2025-01-07 20:56 ` Junio C Hamano 2025-01-07 23:56 ` rsbecker 2 siblings, 0 replies; 19+ messages in thread From: rsbecker @ 2025-01-07 15:37 UTC (permalink / raw) To: 'Patrick Steinhardt', git; +Cc: 'Randall S. Becker' On January 7, 2025 10:27 AM, Patrick Steinhardt wrote: >The reftable library uses randomness in two call paths: > > - When reading a stack in case some of the referenced tables > disappears. The randomness is used to delay the next read by a > couple of milliseconds. > > - When writing a new table, where the randomness gets appended to the > table name (e.g. "0x000000000001-0x000000000002-0b1d8ddf.ref"). > >In neither of these cases do we need strong randomness. > >Unfortunately though, we have observed test failures caused by the former case. In >t0610 we have a test that spawns a 100 processes at once, all of which try to write >a new table to the stack. And given that all of the processes will require >randomness, it can happen that these processes make the entropy pool run dry, >which will then cause us to >die: > > + test_seq 100 > + printf %s commit\trefs/heads/branch-%s\n > 68d032e9edd3481ac96382786ececc37ec28709e 1 > + printf %s commit\trefs/heads/branch-%s\n > 68d032e9edd3481ac96382786ececc37ec28709e 2 > ... > + git update-ref refs/heads/branch-98 HEAD > + git update-ref refs/heads/branch-97 HEAD > + git update-ref refs/heads/branch-99 HEAD > + git update-ref refs/heads/branch-100 HEAD > fatal: unable to get random bytes > fatal: unable to get random bytes > fatal: unable to get random bytes > fatal: unable to get random bytes > fatal: unable to get random bytes > fatal: unable to get random bytes > fatal: unable to get random bytes > >The report was for NonStop, which uses OpenSSL as the backend for randomness. >In the preceding commit we have adapted that backend to also return randomness >in case the entropy pool is empty and the caller passes the >`CSPRNG_BYTES_INSECURE` flag. Do so to fix the issue. > >Reported-by: Randall S. Becker <rsbecker@nexbridge.com> >Signed-off-by: Patrick Steinhardt <ps@pks.im> >--- > reftable/stack.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > >diff --git a/reftable/stack.c b/reftable/stack.c index >6d0aa774e7e29d5366ed55df19725944f8eef792..572a74e00f9ed6040534e060 >652e72c26641749d 100644 >--- a/reftable/stack.c >+++ b/reftable/stack.c >@@ -493,7 +493,7 @@ static int reftable_stack_reload_maybe_reuse(struct >reftable_stack *st, > close(fd); > fd = -1; > >- delay = delay + (delay * rand()) / RAND_MAX + 1; >+ delay = delay + (delay * git_rand(CSPRNG_BYTES_INSECURE)) / >+UINT32_MAX + 1; > sleep_millisec(delay); > } > >@@ -659,7 +659,7 @@ int reftable_stack_add(struct reftable_stack *st, static int >format_name(struct reftable_buf *dest, uint64_t min, uint64_t max) { > char buf[100]; >- uint32_t rnd = (uint32_t)git_rand(0); >+ uint32_t rnd = git_rand(CSPRNG_BYTES_INSECURE); > snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x", > min, max, rnd); > reftable_buf_reset(dest); I think this is a good catch. We are seeing other tests fail similarly through this path. I will retest rc3 on both platforms since we cannot exhaust randomness on x86 using the hardware randomizer inside OpenSSL. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] reftable/stack: accept insecure random bytes 2025-01-07 15:27 ` [PATCH 2/2] reftable/stack: accept " Patrick Steinhardt 2025-01-07 15:37 ` rsbecker @ 2025-01-07 20:56 ` Junio C Hamano 2025-01-07 21:03 ` rsbecker 2025-01-07 21:03 ` Junio C Hamano 2025-01-07 23:56 ` rsbecker 2 siblings, 2 replies; 19+ messages in thread From: Junio C Hamano @ 2025-01-07 20:56 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git, Randall S. Becker Patrick Steinhardt <ps@pks.im> writes: > The report was for NonStop, which uses OpenSSL as the backend for > randomness. In the preceding commit we have adapted that backend to also > return randomness in case the entropy pool is empty and the caller > passes the `CSPRNG_BYTES_INSECURE` flag. Do so to fix the issue. No kidding. This is calling rand(3). The use of the resulting value is to fuzz the delay before retrying, which wants NO CRYPTOGRAPHIC randomness. This callsite does not require it, but rand(3) is perfect for ensuring predictability/repeatability as well (via srand(3)). And it is not allowed to fail. Yet a platform replaces it with a function that returns an error or aborts? What kind of nonsense is that? Do we really need to cater to such an insanity? Use of git_rand() here goes backwards against the more recent trend in reftable/ directory to wean the code off of the rest of Git by getting rid of unnecessary dependency, doesn't it? I think [PATCH 1/2] makes sense regardless, though. But shouldn't we be pushing back this step, with "fix your rand(3)"? Thanks. > Reported-by: Randall S. Becker <rsbecker@nexbridge.com> > Signed-off-by: Patrick Steinhardt <ps@pks.im> > --- > reftable/stack.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/reftable/stack.c b/reftable/stack.c > index 6d0aa774e7e29d5366ed55df19725944f8eef792..572a74e00f9ed6040534e060652e72c26641749d 100644 > --- a/reftable/stack.c > +++ b/reftable/stack.c > @@ -493,7 +493,7 @@ static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st, > close(fd); > fd = -1; > > - delay = delay + (delay * rand()) / RAND_MAX + 1; > + delay = delay + (delay * git_rand(CSPRNG_BYTES_INSECURE)) / UINT32_MAX + 1; > sleep_millisec(delay); > } > > @@ -659,7 +659,7 @@ int reftable_stack_add(struct reftable_stack *st, > static int format_name(struct reftable_buf *dest, uint64_t min, uint64_t max) > { > char buf[100]; > - uint32_t rnd = (uint32_t)git_rand(0); > + uint32_t rnd = git_rand(CSPRNG_BYTES_INSECURE); > snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x", > min, max, rnd); > reftable_buf_reset(dest); ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH 2/2] reftable/stack: accept insecure random bytes 2025-01-07 20:56 ` Junio C Hamano @ 2025-01-07 21:03 ` rsbecker 2025-01-07 21:09 ` Junio C Hamano 2025-01-07 21:03 ` Junio C Hamano 1 sibling, 1 reply; 19+ messages in thread From: rsbecker @ 2025-01-07 21:03 UTC (permalink / raw) To: 'Junio C Hamano', 'Patrick Steinhardt' Cc: git, 'Randall S. Becker' On January 7, 2025 3:56 PM, Junio C Hamano wrote: >Patrick Steinhardt <ps@pks.im> writes: > >> The report was for NonStop, which uses OpenSSL as the backend for >> randomness. In the preceding commit we have adapted that backend to >> also return randomness in case the entropy pool is empty and the >> caller passes the `CSPRNG_BYTES_INSECURE` flag. Do so to fix the issue. > >No kidding. > >This is calling rand(3). The use of the resulting value is to fuzz the delay before >retrying, which wants NO CRYPTOGRAPHIC randomness. >This callsite does not require it, but rand(3) is perfect for ensuring >predictability/repeatability as well (via srand(3)). > >And it is not allowed to fail. I don't think rand() is what is failing here. The call into OpenSSL is using randomness from PRNGD. A poorly configured, or overloaded instance of it will cause this situation. The ia64 platform for NonStop is using a standard PRNGD but it is being overloaded with a large volume of uses hitting that server. I think this is more of a canary-in-the-coalmine situation, where we have just brought something to light that otherwise would be hard to recreate. >Yet a platform replaces it with a function that returns an error or aborts? What kind >of nonsense is that? Do we really need to cater to such an insanity? NonStop has not replaced this call. What we did in OpenSSL is to use the x86 hardware randomizer, which is highly reliable and does not show this problem. In ia64, in OpenSSL, the PRNGD is generating randomness and that seems to be not reliable enough for git or OpenSSL when under significant load. >Use of git_rand() here goes backwards against the more recent trend in reftable/ >directory to wean the code off of the rest of Git by getting rid of unnecessary >dependency, doesn't it? > >I think [PATCH 1/2] makes sense regardless, though. But shouldn't we be pushing >back this step, with "fix your rand(3)"? > >Thanks. > >> Reported-by: Randall S. Becker <rsbecker@nexbridge.com> >> Signed-off-by: Patrick Steinhardt <ps@pks.im> >> --- >> reftable/stack.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/reftable/stack.c b/reftable/stack.c index >> >6d0aa774e7e29d5366ed55df19725944f8eef792..572a74e00f9ed6040534e060 >652e >> 72c26641749d 100644 >> --- a/reftable/stack.c >> +++ b/reftable/stack.c >> @@ -493,7 +493,7 @@ static int reftable_stack_reload_maybe_reuse(struct >reftable_stack *st, >> close(fd); >> fd = -1; >> >> - delay = delay + (delay * rand()) / RAND_MAX + 1; >> + delay = delay + (delay * git_rand(CSPRNG_BYTES_INSECURE)) / >> +UINT32_MAX + 1; >> sleep_millisec(delay); >> } >> >> @@ -659,7 +659,7 @@ int reftable_stack_add(struct reftable_stack *st, >> static int format_name(struct reftable_buf *dest, uint64_t min, >> uint64_t max) { >> char buf[100]; >> - uint32_t rnd = (uint32_t)git_rand(0); >> + uint32_t rnd = git_rand(CSPRNG_BYTES_INSECURE); >> snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x", >> min, max, rnd); >> reftable_buf_reset(dest); --Randall ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] reftable/stack: accept insecure random bytes 2025-01-07 21:03 ` rsbecker @ 2025-01-07 21:09 ` Junio C Hamano 0 siblings, 0 replies; 19+ messages in thread From: Junio C Hamano @ 2025-01-07 21:09 UTC (permalink / raw) To: rsbecker; +Cc: 'Patrick Steinhardt', git, 'Randall S. Becker' <rsbecker@nexbridge.com> writes: > I don't think rand() is what is failing here. Yeah, sorry, I misead the patch. The problematic one is git_rand() in the latter hunk, and I was commenting on the other, former, hunk. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] reftable/stack: accept insecure random bytes 2025-01-07 20:56 ` Junio C Hamano 2025-01-07 21:03 ` rsbecker @ 2025-01-07 21:03 ` Junio C Hamano 2025-01-08 6:51 ` Patrick Steinhardt 1 sibling, 1 reply; 19+ messages in thread From: Junio C Hamano @ 2025-01-07 21:03 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git, Randall S. Becker Junio C Hamano <gitster@pobox.com> writes: > Yet a platform replaces it with a function that returns an error or > aborts? What kind of nonsense is that? Do we really need to cater > to such an insanity? > > Use of git_rand() here goes backwards against the more recent trend > in reftable/ directory to wean the code off of the rest of Git by > getting rid of unnecessary dependency, doesn't it? > > I think [PATCH 1/2] makes sense regardless, though. But shouldn't > we be pushing back this step, with "fix your rand(3)"? > > Thanks. Ah, I misread the patch. It has two hunks, and what I said applies to the earlier one, but the later one is already contaminated with git_rand(), and that is what is failing, i.e. it is not a nonsense platform replacing rand() with something that can fail. It may still make sense to drop the first hunk, and consider how to proceed when you further want to reduce the unnecessary dependencies for external users of the reftable library, though. Are there correctness implications if git_rand() in format_name() yields non random results (like, always using "rnd = 0" instead of calling git_rand())? I seriously hope not. And if there is no correctness implications, perhaps we can replace it with rand() or even constant "0"? Thanks. >> Reported-by: Randall S. Becker <rsbecker@nexbridge.com> >> Signed-off-by: Patrick Steinhardt <ps@pks.im> >> --- >> reftable/stack.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/reftable/stack.c b/reftable/stack.c >> index 6d0aa774e7e29d5366ed55df19725944f8eef792..572a74e00f9ed6040534e060652e72c26641749d 100644 >> --- a/reftable/stack.c >> +++ b/reftable/stack.c >> @@ -493,7 +493,7 @@ static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st, >> close(fd); >> fd = -1; >> >> - delay = delay + (delay * rand()) / RAND_MAX + 1; >> + delay = delay + (delay * git_rand(CSPRNG_BYTES_INSECURE)) / UINT32_MAX + 1; >> sleep_millisec(delay); >> } >> >> @@ -659,7 +659,7 @@ int reftable_stack_add(struct reftable_stack *st, >> static int format_name(struct reftable_buf *dest, uint64_t min, uint64_t max) >> { >> char buf[100]; >> - uint32_t rnd = (uint32_t)git_rand(0); >> + uint32_t rnd = git_rand(CSPRNG_BYTES_INSECURE); >> snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x", >> min, max, rnd); >> reftable_buf_reset(dest); ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] reftable/stack: accept insecure random bytes 2025-01-07 21:03 ` Junio C Hamano @ 2025-01-08 6:51 ` Patrick Steinhardt 2025-01-08 15:39 ` Junio C Hamano 0 siblings, 1 reply; 19+ messages in thread From: Patrick Steinhardt @ 2025-01-08 6:51 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Randall S. Becker On Tue, Jan 07, 2025 at 01:03:26PM -0800, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > > > Yet a platform replaces it with a function that returns an error or > > aborts? What kind of nonsense is that? Do we really need to cater > > to such an insanity? > > > > Use of git_rand() here goes backwards against the more recent trend > > in reftable/ directory to wean the code off of the rest of Git by > > getting rid of unnecessary dependency, doesn't it? It certainly does, yes, but unifying those two callsites to do the same is also something I do in a patch series that gets rid of the last deps on the Git codebase. This then allows me to move `git_rand()` into "reftable/system.h" and make it a shim provided by the respective code base that it's part of. So it's not a step backwards. > > I think [PATCH 1/2] makes sense regardless, though. But shouldn't > > we be pushing back this step, with "fix your rand(3)"? > > > > Thanks. > > Ah, I misread the patch. It has two hunks, and what I said applies > to the earlier one, but the later one is already contaminated with > git_rand(), and that is what is failing, i.e. it is not a nonsense > platform replacing rand() with something that can fail. > > It may still make sense to drop the first hunk, and consider how to > proceed when you further want to reduce the unnecessary dependencies > for external users of the reftable library, though. Are there > correctness implications if git_rand() in format_name() yields non > random results (like, always using "rnd = 0" instead of calling > git_rand())? I seriously hope not. And if there is no correctness > implications, perhaps we can replace it with rand() or even constant > "0"? No, there aren't any implications on correctness in that case. Sure, the randomized delays not being randomized can lead to more contention. But even when the randomized suffix for tables is deterministic we wouldn't have an issue as the files are still distinguished by their update indices. Patrick ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] reftable/stack: accept insecure random bytes 2025-01-08 6:51 ` Patrick Steinhardt @ 2025-01-08 15:39 ` Junio C Hamano 2025-01-08 16:21 ` Patrick Steinhardt 0 siblings, 1 reply; 19+ messages in thread From: Junio C Hamano @ 2025-01-08 15:39 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git, Randall S. Becker Patrick Steinhardt <ps@pks.im> writes: >> It may still make sense to drop the first hunk, and consider how to >> proceed when you further want to reduce the unnecessary dependencies >> for external users of the reftable library, though. Are there >> correctness implications if git_rand() in format_name() yields non >> random results (like, always using "rnd = 0" instead of calling >> git_rand())? I seriously hope not. And if there is no correctness >> implications, perhaps we can replace it with rand() or even constant >> "0"? > > No, there aren't any implications on correctness in that case. Sure, the > randomized delays not being randomized can lead to more contention. But > even when the randomized suffix for tables is deterministic we wouldn't > have an issue as the files are still distinguished by their update > indices. OK, so they both can be turned into a simple rand() that is expected to work more reliably especially on more exotic systems (meaning: the ability the system providers can test their rand() is much better than our ability to test our git_rand() there)? It would help us solve the immediate issue reported, while removing one git specific function from the reftable library? Thanks. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] reftable/stack: accept insecure random bytes 2025-01-08 15:39 ` Junio C Hamano @ 2025-01-08 16:21 ` Patrick Steinhardt 2025-01-08 17:40 ` Junio C Hamano 0 siblings, 1 reply; 19+ messages in thread From: Patrick Steinhardt @ 2025-01-08 16:21 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Randall S. Becker On Wed, Jan 08, 2025 at 07:39:37AM -0800, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > >> It may still make sense to drop the first hunk, and consider how to > >> proceed when you further want to reduce the unnecessary dependencies > >> for external users of the reftable library, though. Are there > >> correctness implications if git_rand() in format_name() yields non > >> random results (like, always using "rnd = 0" instead of calling > >> git_rand())? I seriously hope not. And if there is no correctness > >> implications, perhaps we can replace it with rand() or even constant > >> "0"? > > > > No, there aren't any implications on correctness in that case. Sure, the > > randomized delays not being randomized can lead to more contention. But > > even when the randomized suffix for tables is deterministic we wouldn't > > have an issue as the files are still distinguished by their update > > indices. > > OK, so they both can be turned into a simple rand() that is expected > to work more reliably especially on more exotic systems (meaning: > the ability the system providers can test their rand() is much > better than our ability to test our git_rand() there)? It would > help us solve the immediate issue reported, while removing one git > specific function from the reftable library? Hm. The problem is when Git dies in the middle of a transaction: 1. We write the temporary table. 2. We compute the not-so-random suffix. 3. We write the temporary "tables.list" file. 4. We move the temporary table into place using the not-so-random suffix. 5. Git dies before updating "tables.list". Now we have the temporary table moved into place, but "tables.list" hasn't been updated yet. When the next Git process comes along and wants to update the table it would result in an error if it computed the same suffix. The reftable library knows to clean up such stale tables when not referenced by the "tables.list" file, but it doesn't do so on every write. So this would likely still cause issues in practice. I already though about this scenario when writing my mail, but didn't really think about it as "correctness". But I guess it is. Also, based on the feedback from Randall it's not only the reftable backend that has issues. It's a more general problem on ia64, where many tests are failing. So even if we fixed this one case, it's likely that other cases would still die when running low on entropy. I dunno. It feels like a platform issue, not like a Git issue, when the RNG cannot provide us a couple of integers. The OpenSSL backend seems unfit for use to me as none of the other backends have the same issue. Patrick ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] reftable/stack: accept insecure random bytes 2025-01-08 16:21 ` Patrick Steinhardt @ 2025-01-08 17:40 ` Junio C Hamano 2025-01-08 18:16 ` Patrick Steinhardt 0 siblings, 1 reply; 19+ messages in thread From: Junio C Hamano @ 2025-01-08 17:40 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git, Randall S. Becker Patrick Steinhardt <ps@pks.im> writes: > Hm. The problem is when Git dies in the middle of a transaction: > > 1. We write the temporary table. > 2. We compute the not-so-random suffix. > 3. We write the temporary "tables.list" file. > 4. We move the temporary table into place using the not-so-random > suffix. > 5. Git dies before updating "tables.list". > > Now we have the temporary table moved into place, but "tables.list" > hasn't been updated yet. When the next Git process comes along and wants > to update the table it would result in an error if it computed the same > suffix. Here, I hear that we _do_ depend on the suffix being relatively unique. Once our random number generator decides to give the same number twice to cause collision, the reftable data gets corrupt? > The reftable library knows to clean up such stale tables when not > referenced by the "tables.list" file, but it doesn't do so on every > write. So this would likely still cause issues in practice. > > I already though about this scenario when writing my mail, but didn't > really think about it as "correctness". But I guess it is. Hmph. I am not sure how I should feel about this. Our reliance on hash functions (which can be made to collide) not colliding is one thing, but is it sensibly safe to rely on a cryptographically unpredictable random generator not to yield the same suffix twice during the lifetime of an previous invocation for correctness? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] reftable/stack: accept insecure random bytes 2025-01-08 17:40 ` Junio C Hamano @ 2025-01-08 18:16 ` Patrick Steinhardt 0 siblings, 0 replies; 19+ messages in thread From: Patrick Steinhardt @ 2025-01-08 18:16 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Randall S. Becker On Wed, Jan 08, 2025 at 09:40:58AM -0800, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > > Hm. The problem is when Git dies in the middle of a transaction: > > > > 1. We write the temporary table. > > 2. We compute the not-so-random suffix. > > 3. We write the temporary "tables.list" file. > > 4. We move the temporary table into place using the not-so-random > > suffix. > > 5. Git dies before updating "tables.list". > > > > Now we have the temporary table moved into place, but "tables.list" > > hasn't been updated yet. When the next Git process comes along and wants > > to update the table it would result in an error if it computed the same > > suffix. > > Here, I hear that we _do_ depend on the suffix being relatively > unique. Once our random number generator decides to give the same > number twice to cause collision, the reftable data gets corrupt? No, there is no corruption. We may fail to update the stack when there are colliding files, but that's it. > > The reftable library knows to clean up such stale tables when not > > referenced by the "tables.list" file, but it doesn't do so on every > > write. So this would likely still cause issues in practice. > > > > I already though about this scenario when writing my mail, but didn't > > really think about it as "correctness". But I guess it is. > > Hmph. I am not sure how I should feel about this. Our reliance on > hash functions (which can be made to collide) not colliding is one > thing, but is it sensibly safe to rely on a cryptographically > unpredictable random generator not to yield the same suffix twice > during the lifetime of an previous invocation for correctness? This is why I've been hesistant to call it a "correctness" issue, as there is no corruption involved here. It's more of a denial of service as you may not be able to update the stack anymore until you remove the occluding file. But turns out I misremembered from 9abda98149 (reftable/stack: fix use of unseeded randomness, 2023-12-11): things indeed work alright. To demonstrate, let's update `format_name()` like this: diff --git a/reftable/stack.c b/reftable/stack.c index 531660a49f..b7422679df 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -659,7 +659,7 @@ int reftable_stack_add(struct reftable_stack *st, static int format_name(struct reftable_buf *dest, uint64_t min, uint64_t max) { char buf[100]; - uint32_t rnd = (uint32_t)git_rand(); + uint32_t rnd = 123; snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x", min, max, rnd); reftable_buf_reset(dest); And then we create an occluding file and try to update: $ ~/Development/git/build/bin-wrappers/git init repo --ref-format=reftable Initialized empty Git repository in /tmp/repo/.git/ $ cd repo/ $ ls .git/reftable/ 0x000000000001-0x000000000001-0000007b.ref tables.list $ touch .git/reftable/0x000000000002-0x000000000002-0000007b.ref $ ~/Development/git/build/git commit --allow-empty -mx [main (root-commit) 08d02ef] x $ ls .git/reftable/ 0x000000000001-0x000000000002-0000007b.ref tables.list $ git show commit 08d02efa88f085f02c31285bf909d8d3a25c70dd (HEAD -> main) Author: Patrick Steinhardt <ps@pks.im> Date: Wed Jan 8 19:03:46 2025 +0100 x So the stack gets updated as expected, no corruption there. But this _can_ be a problem on Windows, where the file cannot be deleted in case it was still open. This is also documented as the reason in "Documentation/techincal/reftable.txt". That should've gotten better though now with the improvements I made to our rename-emulation, as it now uses POSIX semantics. That being said, I still don't think that swapping out `git_rand()` for `rand()` is the right thing to do. It does not solve the issue, but only a symptom thereof. Git can still die whenever the OpenSSL CSPRNG fails as there are other uses of `git_rand()` or `csprng_bytes()` in the codebase. So this change would regress something that works everywhere but on NonStop and make the suffixes predictable. The ia64 machine in question is being EOLd end of 2025. And the fact that this has never been a problem before v2.48.0-rc2, and that the machine was rebooted for maintenance immediately before running the tests, indicates to me that something fishy is going on on that platform. Patrick ^ permalink raw reply related [flat|nested] 19+ messages in thread
* RE: [PATCH 2/2] reftable/stack: accept insecure random bytes 2025-01-07 15:27 ` [PATCH 2/2] reftable/stack: accept " Patrick Steinhardt 2025-01-07 15:37 ` rsbecker 2025-01-07 20:56 ` Junio C Hamano @ 2025-01-07 23:56 ` rsbecker 2 siblings, 0 replies; 19+ messages in thread From: rsbecker @ 2025-01-07 23:56 UTC (permalink / raw) To: 'Patrick Steinhardt', git >-----Original Message----- >From: Patrick Steinhardt <ps@pks.im> >Sent: January 7, 2025 10:27 AM >To: git@vger.kernel.org >Cc: Randall S. Becker <randall.becker@nexbridge.ca> >Subject: [PATCH 2/2] reftable/stack: accept insecure random bytes > >The reftable library uses randomness in two call paths: > > - When reading a stack in case some of the referenced tables > disappears. The randomness is used to delay the next read by a > couple of milliseconds. > > - When writing a new table, where the randomness gets appended to the > table name (e.g. "0x000000000001-0x000000000002-0b1d8ddf.ref"). > >In neither of these cases do we need strong randomness. > >Unfortunately though, we have observed test failures caused by the former case. In >t0610 we have a test that spawns a 100 processes at once, all of which try to write >a new table to the stack. And given that all of the processes will require >randomness, it can happen that these processes make the entropy pool run dry, >which will then cause us to >die: > > + test_seq 100 > + printf %s commit\trefs/heads/branch-%s\n > 68d032e9edd3481ac96382786ececc37ec28709e 1 > + printf %s commit\trefs/heads/branch-%s\n > 68d032e9edd3481ac96382786ececc37ec28709e 2 > ... > + git update-ref refs/heads/branch-98 HEAD > + git update-ref refs/heads/branch-97 HEAD > + git update-ref refs/heads/branch-99 HEAD > + git update-ref refs/heads/branch-100 HEAD > fatal: unable to get random bytes > fatal: unable to get random bytes > fatal: unable to get random bytes > fatal: unable to get random bytes > fatal: unable to get random bytes > fatal: unable to get random bytes > fatal: unable to get random bytes > >The report was for NonStop, which uses OpenSSL as the backend for randomness. >In the preceding commit we have adapted that backend to also return randomness >in case the entropy pool is empty and the caller passes the >`CSPRNG_BYTES_INSECURE` flag. Do so to fix the issue. > >Reported-by: Randall S. Becker <rsbecker@nexbridge.com> >Signed-off-by: Patrick Steinhardt <ps@pks.im> >--- > reftable/stack.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > >diff --git a/reftable/stack.c b/reftable/stack.c index >6d0aa774e7e29d5366ed55df19725944f8eef792..572a74e00f9ed6040534e060 >652e72c26641749d 100644 >--- a/reftable/stack.c >+++ b/reftable/stack.c >@@ -493,7 +493,7 @@ static int reftable_stack_reload_maybe_reuse(struct >reftable_stack *st, > close(fd); > fd = -1; > >- delay = delay + (delay * rand()) / RAND_MAX + 1; >+ delay = delay + (delay * git_rand(CSPRNG_BYTES_INSECURE)) / >+UINT32_MAX + 1; > sleep_millisec(delay); > } > >@@ -659,7 +659,7 @@ int reftable_stack_add(struct reftable_stack *st, static int >format_name(struct reftable_buf *dest, uint64_t min, uint64_t max) { > char buf[100]; >- uint32_t rnd = (uint32_t)git_rand(0); >+ uint32_t rnd = git_rand(CSPRNG_BYTES_INSECURE); > snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x", > min, max, rnd); > reftable_buf_reset(dest); > >-- >2.48.0.rc1.245.gb3e6e7acbc.dirty The tests that are all impacted on ia64 (using PRNGD) are: t0610-reftable-basics t1092-sparse-checkout-compatibility t1300-config t1404-update-ref-errors t1410-reflog t1414-reflog-walk t1415-worktree-refs t1450-fsck t1460-refs-migrate t1500-rev-parse t2006-checkout-index-basic t2009-checkout-statinfo t2013-checkout-submodule t2017-checkout-orphan t2030-unresolve-info t2400-worktree-add Everything passes on x86. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/2] reftable/stack: stop dying on exhausted entropy pool 2025-01-07 15:26 [PATCH 0/2] reftable/stack: stop dying on exhausted entropy pool Patrick Steinhardt 2025-01-07 15:26 ` [PATCH 1/2] wrapper: allow generating insecure random bytes Patrick Steinhardt 2025-01-07 15:27 ` [PATCH 2/2] reftable/stack: accept " Patrick Steinhardt @ 2025-01-07 23:21 ` brian m. carlson 2025-01-07 23:54 ` rsbecker 2 siblings, 1 reply; 19+ messages in thread From: brian m. carlson @ 2025-01-07 23:21 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git, Randall S. Becker [-- Attachment #1: Type: text/plain, Size: 3221 bytes --] On 2025-01-07 at 15:26:58, Patrick Steinhardt wrote: > Hi, > > this small patch series fixes the issue reported by Randall [1], where > an exhausted entropy pool can cause us to die when writing a new table > to the reftable stack. I _think_ that this is only an issue with the > OpenSSL backend of `csprng_bytes()`: > > - `arc4random_buf()` never returns an error. > > - `getrandom()` pulls from "/dev/urandom" by default. > > - `getentropy()` seems to block when there is not enough randomness > available. > > - `GtlGenRandom()` I cannot really tell. > > - The fallback reads from "/dev/urandom", which also returns bytes in > case the entropy pool is drained. > > So OpenSSL's `RAND_bytes()` seems to be the only one that returns an > error when the entropy pool is empty. I did wonder whether we even need > to introduce the new flag in the first place, or whether we cannot just > use `RAND_pseudo_bytes()` unconditionally. But I'm a bit uneasy about it > given that OpenSSL has this doc: > > RAND_pseudo_bytes() puts num pseudo-random bytes into buf. > Pseudo-random byte sequences generated by RAND_pseudo_bytes() will > be unique if they are of sufficient length, but are not necessarily > unpredictable. They can be used for non-cryptographic purposes and > for certain purposes in cryptographic protocols, but usually not for > key generation etc. > > It might be too easy to accidentally rely on `csprng_bytes()` where it > actually requires strong cryptographic data, so I was erring on the side > of caution. The reason I didn't use RAND_pseudo_bytes is because it's been deprecated since OpenSSL 1.1.0 and RAND_bytes uses a CSPRNG just like RAND_pseudo_bytes as of that version. Once it's seeded, it should be able to generate plenty of bytes, because I believe it uses a CTR-DRBG, which only needs to be reseeded after 2^48 bytes (which is far more than we should be using). We can full well use RAND_pseudo_bytes, but all operating systems should provide an appropriate entropy source that can provide 256 bits of entropy on startup. arc4random will just kill the process if it can't seed itself, so your changes won't actually prevent dying on a lack of entropy. I don't want an option that chooses "insecure" bytes. My preference is that we require people use a different backend or an up-to-date OpenSSL version that shouldn't have this problem. We can use RAND_pseudo_bytes if we really need to support older versions, but there are also no major operating systems which require that old of a version (CentOS 7, which is dead, used OpenSSL 1.0.2, and CentOS 8 uses 1.1.1k), so it's probably not within our support policy to do that. Note also that if OpenSSL is being used for TLS, a lack of entropy will result in TLS not working, which means that Git will be randomly broken on that system, which is not really an experience that we want to encourage, so that should be taken into account. Can we get some more information about what version of OpenSSL is being used and what the system entropy source is? -- brian m. carlson (they/them or he/him) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH 0/2] reftable/stack: stop dying on exhausted entropy pool 2025-01-07 23:21 ` [PATCH 0/2] reftable/stack: stop dying on exhausted entropy pool brian m. carlson @ 2025-01-07 23:54 ` rsbecker 2025-01-08 7:18 ` Patrick Steinhardt 0 siblings, 1 reply; 19+ messages in thread From: rsbecker @ 2025-01-07 23:54 UTC (permalink / raw) To: 'brian m. carlson', 'Patrick Steinhardt' Cc: git, 'Randall S. Becker' On January 7, 2025 6:22 PM, brian m. carlson wrote: >On 2025-01-07 at 15:26:58, Patrick Steinhardt wrote: >> Hi, >> >> this small patch series fixes the issue reported by Randall [1], where >> an exhausted entropy pool can cause us to die when writing a new table >> to the reftable stack. I _think_ that this is only an issue with the >> OpenSSL backend of `csprng_bytes()`: >> >> - `arc4random_buf()` never returns an error. >> >> - `getrandom()` pulls from "/dev/urandom" by default. >> >> - `getentropy()` seems to block when there is not enough randomness >> available. >> >> - `GtlGenRandom()` I cannot really tell. >> >> - The fallback reads from "/dev/urandom", which also returns bytes in >> case the entropy pool is drained. >> >> So OpenSSL's `RAND_bytes()` seems to be the only one that returns an >> error when the entropy pool is empty. I did wonder whether we even >> need to introduce the new flag in the first place, or whether we >> cannot just use `RAND_pseudo_bytes()` unconditionally. But I'm a bit >> uneasy about it given that OpenSSL has this doc: >> >> RAND_pseudo_bytes() puts num pseudo-random bytes into buf. >> Pseudo-random byte sequences generated by RAND_pseudo_bytes() will >> be unique if they are of sufficient length, but are not necessarily >> unpredictable. They can be used for non-cryptographic purposes and >> for certain purposes in cryptographic protocols, but usually not for >> key generation etc. >> >> It might be too easy to accidentally rely on `csprng_bytes()` where it >> actually requires strong cryptographic data, so I was erring on the >> side of caution. > >The reason I didn't use RAND_pseudo_bytes is because it's been deprecated since >OpenSSL 1.1.0 and RAND_bytes uses a CSPRNG just like RAND_pseudo_bytes as of >that version. Once it's seeded, it should be able to generate plenty of bytes, >because I believe it uses a CTR-DRBG, which only needs to be reseeded after 2^48 >bytes (which is far more than we should be using). > >We can full well use RAND_pseudo_bytes, but all operating systems should provide >an appropriate entropy source that can provide 256 bits of entropy on startup. >arc4random will just kill the process if it can't seed itself, so your changes won't >actually prevent dying on a lack of entropy. > >I don't want an option that chooses "insecure" bytes. My preference is that we >require people use a different backend or an up-to-date OpenSSL version that >shouldn't have this problem. We can use RAND_pseudo_bytes if we really need to >support older versions, but there are also no major operating systems which >require that old of a version (CentOS 7, which is dead, used OpenSSL 1.0.2, and >CentOS 8 uses 1.1.1k), so it's probably not within our support policy to do that. > >Note also that if OpenSSL is being used for TLS, a lack of entropy will result in TLS >not working, which means that Git will be randomly broken on that system, which is >not really an experience that we want to encourage, so that should be taken into >account. > >Can we get some more information about what version of OpenSSL is being used >and what the system entropy source is? In my situation, OpenSSL 3.0.11 on ia64. 3.4.1 and 3.0.13 on x86 (x86 works fine Because OpenSSL uses hardware. On ia64, we end up on PRNGD, which does fail. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/2] reftable/stack: stop dying on exhausted entropy pool 2025-01-07 23:54 ` rsbecker @ 2025-01-08 7:18 ` Patrick Steinhardt 2025-01-08 13:50 ` rsbecker 2025-01-08 22:44 ` brian m. carlson 0 siblings, 2 replies; 19+ messages in thread From: Patrick Steinhardt @ 2025-01-08 7:18 UTC (permalink / raw) To: rsbecker; +Cc: 'brian m. carlson', git, 'Randall S. Becker' On Tue, Jan 07, 2025 at 06:54:02PM -0500, rsbecker@nexbridge.com wrote: > On January 7, 2025 6:22 PM, brian m. carlson wrote: > >On 2025-01-07 at 15:26:58, Patrick Steinhardt wrote: > >> Hi, > >> > >> this small patch series fixes the issue reported by Randall [1], where > >> an exhausted entropy pool can cause us to die when writing a new table > >> to the reftable stack. I _think_ that this is only an issue with the > >> OpenSSL backend of `csprng_bytes()`: > >> > >> - `arc4random_buf()` never returns an error. > >> > >> - `getrandom()` pulls from "/dev/urandom" by default. > >> > >> - `getentropy()` seems to block when there is not enough randomness > >> available. > >> > >> - `GtlGenRandom()` I cannot really tell. > >> > >> - The fallback reads from "/dev/urandom", which also returns bytes in > >> case the entropy pool is drained. > >> > >> So OpenSSL's `RAND_bytes()` seems to be the only one that returns an > >> error when the entropy pool is empty. I did wonder whether we even > >> need to introduce the new flag in the first place, or whether we > >> cannot just use `RAND_pseudo_bytes()` unconditionally. But I'm a bit > >> uneasy about it given that OpenSSL has this doc: > >> > >> RAND_pseudo_bytes() puts num pseudo-random bytes into buf. > >> Pseudo-random byte sequences generated by RAND_pseudo_bytes() will > >> be unique if they are of sufficient length, but are not necessarily > >> unpredictable. They can be used for non-cryptographic purposes and > >> for certain purposes in cryptographic protocols, but usually not for > >> key generation etc. > >> > >> It might be too easy to accidentally rely on `csprng_bytes()` where it > >> actually requires strong cryptographic data, so I was erring on the > >> side of caution. > > > >The reason I didn't use RAND_pseudo_bytes is because it's been deprecated since > >OpenSSL 1.1.0 and RAND_bytes uses a CSPRNG just like RAND_pseudo_bytes as of > >that version. Once it's seeded, it should be able to generate plenty of bytes, > >because I believe it uses a CTR-DRBG, which only needs to be reseeded after 2^48 > >bytes (which is far more than we should be using). > > > >We can full well use RAND_pseudo_bytes, but all operating systems should provide > >an appropriate entropy source that can provide 256 bits of entropy on startup. > >arc4random will just kill the process if it can't seed itself, so your changes won't > >actually prevent dying on a lack of entropy. > > > >I don't want an option that chooses "insecure" bytes. My preference is that we > >require people use a different backend or an up-to-date OpenSSL version that > >shouldn't have this problem. We can use RAND_pseudo_bytes if we really need to > >support older versions, but there are also no major operating systems which > >require that old of a version (CentOS 7, which is dead, used OpenSSL 1.0.2, and > >CentOS 8 uses 1.1.1k), so it's probably not within our support policy to do that. > > > >Note also that if OpenSSL is being used for TLS, a lack of entropy will result in TLS > >not working, which means that Git will be randomly broken on that system, which is > >not really an experience that we want to encourage, so that should be taken into > >account. > > > >Can we get some more information about what version of OpenSSL is being used > >and what the system entropy source is? > > In my situation, OpenSSL 3.0.11 on ia64. 3.4.1 and 3.0.13 on x86 (x86 works fine > Because OpenSSL uses hardware. On ia64, we end up on PRNGD, which does fail. You reported in [1] that a couple more tests are indeed failing, not only t0610. That changes things in my opinion as it shows that this is not a localized issue in the reftable library, but likely in multiple callsites where we use randomness. So my current patch series is not sufficient as it only fixes up the reftable codebase. But in the case where it's a general issue I tend to agree with brian, because I don't want to play whack-a-mole with all the callsites of `git_rand()` where we can indeed use insecure bytes. Honestly, this rather makes me want to remove the OpenSSL backend for our CSRNG completely. NonStop is the only platform that uses it right now, and it seems to be easy to misconfigure. All the other backends we have don't have the same issue as explained further up in my message. So does NonStop support any of the alternative backends that Git has, like `arc4random_buf()`, `getrandom()`, `getentropy()` or reading from "/dev/urandom"? Might be I'm coming to conclusions too fast, so if I'm missing obvious usecases then please stop me :) Randall, you mentioned that your platform had a maintenance window right during the release of v2.48.0-rc2 [2]. You never mentioned issues with randomness before that maintenance window, and after it you hit them in many tests without any changes to the CSPRNG between v2.48.0-rc1 and -rc2. Could it be that something broke on your end? Patrick [1]: https://lore.kernel.org/git/00ad01db615f$ce9b6740$6bd235c0$@nexbridge.com/ [2]: https://lore.kernel.org/git/000501db607c$40c009a0$c2401ce0$@nexbridge.com/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH 0/2] reftable/stack: stop dying on exhausted entropy pool 2025-01-08 7:18 ` Patrick Steinhardt @ 2025-01-08 13:50 ` rsbecker 2025-01-08 22:44 ` brian m. carlson 1 sibling, 0 replies; 19+ messages in thread From: rsbecker @ 2025-01-08 13:50 UTC (permalink / raw) To: 'Patrick Steinhardt' Cc: 'brian m. carlson', git, 'Randall S. Becker' On January 8, 2025 2:19 AM, Patrick Steinhardt wrote: >On Tue, Jan 07, 2025 at 06:54:02PM -0500, rsbecker@nexbridge.com wrote: >> On January 7, 2025 6:22 PM, brian m. carlson wrote: >> >On 2025-01-07 at 15:26:58, Patrick Steinhardt wrote: >> >> Hi, >> >> >> >> this small patch series fixes the issue reported by Randall [1], >> >> where an exhausted entropy pool can cause us to die when writing a >> >> new table to the reftable stack. I _think_ that this is only an >> >> issue with the OpenSSL backend of `csprng_bytes()`: >> >> >> >> - `arc4random_buf()` never returns an error. >> >> >> >> - `getrandom()` pulls from "/dev/urandom" by default. >> >> >> >> - `getentropy()` seems to block when there is not enough randomness >> >> available. >> >> >> >> - `GtlGenRandom()` I cannot really tell. >> >> >> >> - The fallback reads from "/dev/urandom", which also returns bytes in >> >> case the entropy pool is drained. >> >> >> >> So OpenSSL's `RAND_bytes()` seems to be the only one that returns >> >> an error when the entropy pool is empty. I did wonder whether we >> >> even need to introduce the new flag in the first place, or whether >> >> we cannot just use `RAND_pseudo_bytes()` unconditionally. But I'm a >> >> bit uneasy about it given that OpenSSL has this doc: >> >> >> >> RAND_pseudo_bytes() puts num pseudo-random bytes into buf. >> >> Pseudo-random byte sequences generated by RAND_pseudo_bytes() will >> >> be unique if they are of sufficient length, but are not necessarily >> >> unpredictable. They can be used for non-cryptographic purposes and >> >> for certain purposes in cryptographic protocols, but usually not for >> >> key generation etc. >> >> >> >> It might be too easy to accidentally rely on `csprng_bytes()` where >> >> it actually requires strong cryptographic data, so I was erring on >> >> the side of caution. >> > >> >The reason I didn't use RAND_pseudo_bytes is because it's been >> >deprecated since OpenSSL 1.1.0 and RAND_bytes uses a CSPRNG just like >> >RAND_pseudo_bytes as of that version. Once it's seeded, it should be >> >able to generate plenty of bytes, because I believe it uses a >> >CTR-DRBG, which only needs to be reseeded after 2^48 bytes (which is far more >than we should be using). >> > >> >We can full well use RAND_pseudo_bytes, but all operating systems >> >should provide an appropriate entropy source that can provide 256 bits of >entropy on startup. >> >arc4random will just kill the process if it can't seed itself, so >> >your changes won't actually prevent dying on a lack of entropy. >> > >> >I don't want an option that chooses "insecure" bytes. My preference >> >is that we require people use a different backend or an up-to-date >> >OpenSSL version that shouldn't have this problem. We can use >> >RAND_pseudo_bytes if we really need to support older versions, but >> >there are also no major operating systems which require that old of a >> >version (CentOS 7, which is dead, used OpenSSL 1.0.2, and CentOS 8 uses >1.1.1k), so it's probably not within our support policy to do that. >> > >> >Note also that if OpenSSL is being used for TLS, a lack of entropy >> >will result in TLS not working, which means that Git will be randomly >> >broken on that system, which is not really an experience that we want >> >to encourage, so that should be taken into account. >> > >> >Can we get some more information about what version of OpenSSL is >> >being used and what the system entropy source is? >> >> In my situation, OpenSSL 3.0.11 on ia64. 3.4.1 and 3.0.13 on x86 (x86 >> works fine Because OpenSSL uses hardware. On ia64, we end up on PRNGD, >which does fail. > >You reported in [1] that a couple more tests are indeed failing, not only t0610. That >changes things in my opinion as it shows that this is not a localized issue in the >reftable library, but likely in multiple callsites where we use randomness. So my >current patch series is not sufficient as it only fixes up the reftable codebase. But in >the case where it's a general issue I tend to agree with brian, because I don't want to >play whack-a-mole with all the callsites of `git_rand()` where we can indeed use >insecure bytes. > >Honestly, this rather makes me want to remove the OpenSSL backend for our >CSRNG completely. NonStop is the only platform that uses it right now, and it seems >to be easy to misconfigure. All the other backends we have don't have the same >issue as explained further up in my message. So does NonStop support any of the >alternative backends that Git has, like `arc4random_buf()`, `getrandom()`, >`getentropy()` or reading from "/dev/urandom"? > >Might be I'm coming to conclusions too fast, so if I'm missing obvious usecases then >please stop me :) > >Randall, you mentioned that your platform had a maintenance window right during >the release of v2.48.0-rc2 [2]. You never mentioned issues with randomness before >that maintenance window, and after it you hit them in many tests without any >changes to the CSPRNG between v2.48.0-rc1 and -rc2. Could it be that something >broke on your end? Unfortunately, ia64 is not a great platform for randomness. There are no alternates available. We have a case open on PRNGD, but it is unlikely to be fixed any time soon. The ia64 platform goes off support at the end of 2025, so we will stop building git for that platform when that happens. If there is some stopgap solution we can use, even PRNGD, but warn about reducing randomness load, it might work. For x86, the hardware randomizer used in OpenSSL is fine. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/2] reftable/stack: stop dying on exhausted entropy pool 2025-01-08 7:18 ` Patrick Steinhardt 2025-01-08 13:50 ` rsbecker @ 2025-01-08 22:44 ` brian m. carlson 1 sibling, 0 replies; 19+ messages in thread From: brian m. carlson @ 2025-01-08 22:44 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: rsbecker, git, 'Randall S. Becker' [-- Attachment #1: Type: text/plain, Size: 2375 bytes --] On 2025-01-08 at 07:18:52, Patrick Steinhardt wrote: > You reported in [1] that a couple more tests are indeed failing, not > only t0610. That changes things in my opinion as it shows that this is > not a localized issue in the reftable library, but likely in multiple > callsites where we use randomness. So my current patch series is not > sufficient as it only fixes up the reftable codebase. But in the case > where it's a general issue I tend to agree with brian, because I don't > want to play whack-a-mole with all the callsites of `git_rand()` where > we can indeed use insecure bytes. > > Honestly, this rather makes me want to remove the OpenSSL backend for > our CSRNG completely. NonStop is the only platform that uses it right > now, and it seems to be easy to misconfigure. All the other backends we > have don't have the same issue as explained further up in my message. So > does NonStop support any of the alternative backends that Git has, like > `arc4random_buf()`, `getrandom()`, `getentropy()` or reading from > "/dev/urandom"? OpenSSL's backend is only as good as the system entropy source, which, apparently in the case of PRNGD, is not very good. The last release of PRNGD was in 2007 apparently, so I don't think we should hold our breath for a fix. Or, of course, it could be simply that prngd works just fine and there aren't enough sources for it. If the machine has an analog microphone input that isn't plugged in, streaming some data from that might be a good source, since that will be noisy. A second of recording signed 16-bit PCM data as 48 kHz might provide at least 64 bits of entropy[0]. I will say that libbsd provides a fallback implementation for its getentropy code, which would allow the use of arc4random as a backend. I know there were some portability problems with getting that to run on NonStop, and of course I provide no guarantees about its suitability or security, but it does appear that there is some alternative if the porting problems can be overcome. We explicitly have support for libbsd in the Makefile already. That doesn't avoid the problem of TLS and SSH not working, but it may get the tests passing. [0] That's 750 samples per bit of entropy, which I think should be reasonably conservative. -- brian m. carlson (they/them or he/him) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2025-01-08 22:44 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-07 15:26 [PATCH 0/2] reftable/stack: stop dying on exhausted entropy pool Patrick Steinhardt 2025-01-07 15:26 ` [PATCH 1/2] wrapper: allow generating insecure random bytes Patrick Steinhardt 2025-01-07 15:27 ` [PATCH 2/2] reftable/stack: accept " Patrick Steinhardt 2025-01-07 15:37 ` rsbecker 2025-01-07 20:56 ` Junio C Hamano 2025-01-07 21:03 ` rsbecker 2025-01-07 21:09 ` Junio C Hamano 2025-01-07 21:03 ` Junio C Hamano 2025-01-08 6:51 ` Patrick Steinhardt 2025-01-08 15:39 ` Junio C Hamano 2025-01-08 16:21 ` Patrick Steinhardt 2025-01-08 17:40 ` Junio C Hamano 2025-01-08 18:16 ` Patrick Steinhardt 2025-01-07 23:56 ` rsbecker 2025-01-07 23:21 ` [PATCH 0/2] reftable/stack: stop dying on exhausted entropy pool brian m. carlson 2025-01-07 23:54 ` rsbecker 2025-01-08 7:18 ` Patrick Steinhardt 2025-01-08 13:50 ` rsbecker 2025-01-08 22:44 ` brian m. carlson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).