* [PATCH 0/6] hash: introduce unsafe_hash_algo(), drop unsafe_ variants
@ 2024-11-20 19:13 Taylor Blau
  2024-11-20 19:13 ` [PATCH 1/6] csum-file: store the hash algorithm as a struct field Taylor Blau
                   ` (8 more replies)
  0 siblings, 9 replies; 60+ messages in thread
From: Taylor Blau @ 2024-11-20 19:13 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, brian m. carlson
(This series is based on my 'tb/sha1-unsafe-helper', which I sent to the
list here[1].)
This series implements an idea discussed in [2] which suggests that we
introduce a way to access a wrapped version of a 'struct git_hash_algo'
which represents the unsafe variant of that algorithm, rather than
having individual unsafe_ functions (like unsafe_init_fn() versus
init_fn(), etc.).
This approach is relatively straightforward to implement, and removes a
significant deficiency in the original implementation of
unsafe/non-cryptographic hash functions by making it impossible to
switch between safe- and unsafe variants of hash functions. It also
cleans up the sha1-unsafe test helper's implementation by removing a
large number of "if (unsafe)"-style conditionals.
The series is laid out as follows:
  * The first two patches prepare the hashfile API for the upcoming
    change:
      csum-file: store the hash algorithm as a struct field
      csum-file.c: extract algop from hashfile_checksum_valid()
  * The next patch implements the new 'unsafe_hash_algo()' function at
    the heart of this series' approach:
      hash.h: introduce `unsafe_hash_algo()`
  * The next two patches convert existing callers to use the new
    'unsafe_hash_algo()' function, instead of switching between safe and
    unsafe_ variants of individual functions:
      csum-file.c: use unsafe_hash_algo()
      t/helper/test-hash.c: use unsafe_hash_algo()
  * The final patch drops the unsafe_ function variants following all
    callers being converted to use the new pattern:
      hash.h: drop unsafe_ function variants
Thanks in advance for your review!
[1]: https://lore.kernel.org/git/cover.1730833506.git.me@ttaylorr.com/
[2]: https://lore.kernel.org/git/20241107013915.GA961214@coredump.intra.peff.net/
Taylor Blau (6):
  csum-file: store the hash algorithm as a struct field
  csum-file.c: extract algop from hashfile_checksum_valid()
  hash.h: introduce `unsafe_hash_algo()`
  csum-file.c: use unsafe_hash_algo()
  t/helper/test-hash.c: use unsafe_hash_algo()
  hash.h: drop unsafe_ function variants
 csum-file.c          | 33 ++++++++++++++++++---------------
 csum-file.h          |  1 +
 hash.h               | 20 +++++---------------
 object-file.c        | 41 ++++++++++++++++++++++++++---------------
 t/helper/test-hash.c | 17 +++++------------
 5 files changed, 55 insertions(+), 57 deletions(-)
base-commit: d8c1fc78b57e02a140b5c363caaa14c2dc2bb274
-- 
2.47.0.237.gc601277f4c4
^ permalink raw reply	[flat|nested] 60+ messages in thread* [PATCH 1/6] csum-file: store the hash algorithm as a struct field 2024-11-20 19:13 [PATCH 0/6] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Taylor Blau @ 2024-11-20 19:13 ` Taylor Blau 2024-11-21 9:18 ` Jeff King 2024-11-20 19:13 ` [PATCH 2/6] csum-file.c: extract algop from hashfile_checksum_valid() Taylor Blau ` (7 subsequent siblings) 8 siblings, 1 reply; 60+ messages in thread From: Taylor Blau @ 2024-11-20 19:13 UTC (permalink / raw) To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, brian m. carlson Throughout the hashfile API, we rely on a reference to 'the_hash_algo', and call its _usnafe function variants directly. Prepare for a future change where we may use a different 'git_hash_algo' pointer (instead of just relying on 'the_hash_algo' throughout) by making the 'git_hash_algo' pointer a member of the 'hashfile' structure itself. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- csum-file.c | 20 +++++++++++--------- csum-file.h | 1 + 2 files changed, 12 insertions(+), 9 deletions(-) diff --git a/csum-file.c b/csum-file.c index c203ebf11b3..62f4965f094 100644 --- a/csum-file.c +++ b/csum-file.c @@ -50,7 +50,7 @@ void hashflush(struct hashfile *f) if (offset) { if (!f->skip_hash) - the_hash_algo->unsafe_update_fn(&f->ctx, f->buffer, offset); + f->algop->unsafe_update_fn(&f->ctx, f->buffer, offset); flush(f, f->buffer, offset); f->offset = 0; } @@ -71,14 +71,14 @@ int finalize_hashfile(struct hashfile *f, unsigned char *result, hashflush(f); if (f->skip_hash) - hashclr(f->buffer, the_repository->hash_algo); + hashclr(f->buffer, f->algop); else - the_hash_algo->unsafe_final_fn(f->buffer, &f->ctx); + f->algop->unsafe_final_fn(f->buffer, &f->ctx); if (result) - hashcpy(result, f->buffer, the_repository->hash_algo); + hashcpy(result, f->buffer, f->algop); if (flags & CSUM_HASH_IN_STREAM) - flush(f, f->buffer, the_hash_algo->rawsz); + flush(f, f->buffer, f->algop->rawsz); if (flags & CSUM_FSYNC) fsync_component_or_die(component, f->fd, f->name); if (flags & CSUM_CLOSE) { @@ -128,7 +128,7 @@ void hashwrite(struct hashfile *f, const void *buf, unsigned int count) * f->offset is necessarily zero. */ if (!f->skip_hash) - the_hash_algo->unsafe_update_fn(&f->ctx, buf, nr); + f->algop->unsafe_update_fn(&f->ctx, buf, nr); flush(f, buf, nr); } else { /* @@ -174,7 +174,9 @@ static struct hashfile *hashfd_internal(int fd, const char *name, f->name = name; f->do_crc = 0; f->skip_hash = 0; - the_hash_algo->unsafe_init_fn(&f->ctx); + + f->algop = the_hash_algo; + f->algop->unsafe_init_fn(&f->ctx); f->buffer_len = buffer_len; f->buffer = xmalloc(buffer_len); @@ -208,7 +210,7 @@ void hashfile_checkpoint(struct hashfile *f, struct hashfile_checkpoint *checkpo { hashflush(f); checkpoint->offset = f->total; - the_hash_algo->unsafe_clone_fn(&checkpoint->ctx, &f->ctx); + f->algop->unsafe_clone_fn(&checkpoint->ctx, &f->ctx); } int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoint *checkpoint) @@ -219,7 +221,7 @@ int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoint *checkpoint lseek(f->fd, offset, SEEK_SET) != offset) return -1; f->total = offset; - the_hash_algo->unsafe_clone_fn(&f->ctx, &checkpoint->ctx); + f->algop->unsafe_clone_fn(&f->ctx, &checkpoint->ctx); f->offset = 0; /* hashflush() was called in checkpoint */ return 0; } diff --git a/csum-file.h b/csum-file.h index 7c73da0a40a..2b45f4673a2 100644 --- a/csum-file.h +++ b/csum-file.h @@ -20,6 +20,7 @@ struct hashfile { size_t buffer_len; unsigned char *buffer; unsigned char *check_buffer; + const struct git_hash_algo *algop; /** * If non-zero, skip_hash indicates that we should -- 2.47.0.237.gc601277f4c4 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH 1/6] csum-file: store the hash algorithm as a struct field 2024-11-20 19:13 ` [PATCH 1/6] csum-file: store the hash algorithm as a struct field Taylor Blau @ 2024-11-21 9:18 ` Jeff King 0 siblings, 0 replies; 60+ messages in thread From: Jeff King @ 2024-11-21 9:18 UTC (permalink / raw) To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano, brian m. carlson On Wed, Nov 20, 2024 at 02:13:44PM -0500, Taylor Blau wrote: > Throughout the hashfile API, we rely on a reference to 'the_hash_algo', > and call its _usnafe function variants directly. s/usnafe/unsafe/ > Prepare for a future change where we may use a different 'git_hash_algo' > pointer (instead of just relying on 'the_hash_algo' throughout) by > making the 'git_hash_algo' pointer a member of the 'hashfile' structure > itself. This makes sense in the context of your series, which will eventually just hold the unsafe algo struct. But I also think it is a good step in general, because it means that rather than referring to the_hash_algo all over the place, we only do it when initializing the hashfile struct. In the long run, that should take a repository struct or a git_hash_algo pointer directly, and we'd want to pass it just once to hashfd(), etc. -Peff ^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH 2/6] csum-file.c: extract algop from hashfile_checksum_valid() 2024-11-20 19:13 [PATCH 0/6] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Taylor Blau 2024-11-20 19:13 ` [PATCH 1/6] csum-file: store the hash algorithm as a struct field Taylor Blau @ 2024-11-20 19:13 ` Taylor Blau 2024-11-20 19:13 ` [PATCH 3/6] hash.h: introduce `unsafe_hash_algo()` Taylor Blau ` (6 subsequent siblings) 8 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2024-11-20 19:13 UTC (permalink / raw) To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, brian m. carlson Perform a similar transformation as in the previous commit, but focused instead on hashfile_checksum_valid(). This function does not work with a hashfile structure itself, and instead validates the raw contents of a file written using the hashfile API. We'll want to be prepared for a similar change to this function in the future, so prepare ourselves for that by extracting 'the_hash_algo' into its own field for use within this function. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- csum-file.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/csum-file.c b/csum-file.c index 62f4965f094..107bc4c96f9 100644 --- a/csum-file.c +++ b/csum-file.c @@ -242,14 +242,15 @@ int hashfile_checksum_valid(const unsigned char *data, size_t total_len) { unsigned char got[GIT_MAX_RAWSZ]; git_hash_ctx ctx; - size_t data_len = total_len - the_hash_algo->rawsz; + const struct git_hash_algo *algop = the_hash_algo; + size_t data_len = total_len - algop->rawsz; - if (total_len < the_hash_algo->rawsz) + if (total_len < algop->rawsz) return 0; /* say "too short"? */ - the_hash_algo->unsafe_init_fn(&ctx); - the_hash_algo->unsafe_update_fn(&ctx, data, data_len); - the_hash_algo->unsafe_final_fn(got, &ctx); + algop->unsafe_init_fn(&ctx); + algop->unsafe_update_fn(&ctx, data, data_len); + algop->unsafe_final_fn(got, &ctx); - return hasheq(got, data + data_len, the_repository->hash_algo); + return hasheq(got, data + data_len, algop); } -- 2.47.0.237.gc601277f4c4 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 3/6] hash.h: introduce `unsafe_hash_algo()` 2024-11-20 19:13 [PATCH 0/6] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Taylor Blau 2024-11-20 19:13 ` [PATCH 1/6] csum-file: store the hash algorithm as a struct field Taylor Blau 2024-11-20 19:13 ` [PATCH 2/6] csum-file.c: extract algop from hashfile_checksum_valid() Taylor Blau @ 2024-11-20 19:13 ` Taylor Blau 2024-11-21 9:37 ` Jeff King 2024-11-20 19:13 ` [PATCH 4/6] csum-file.c: use unsafe_hash_algo() Taylor Blau ` (5 subsequent siblings) 8 siblings, 1 reply; 60+ messages in thread From: Taylor Blau @ 2024-11-20 19:13 UTC (permalink / raw) To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, brian m. carlson In 253ed9ecff (hash.h: scaffolding for _unsafe hashing variants, 2024-09-26), we introduced "unsafe" variants of the SHA-1 hashing functions by introducing new functions like "unsafe_init_fn()" and so on. This approach has a major shortcoming that callers must remember to consistently use one variant or the other. Failing to consistently use (or not use) the unsafe variants can lead to crashes at best, or subtle memory corruption issues at worst. In the hashfile API, this isn't difficult to achieve, but verifying that all callers consistently use the unsafe variants is somewhat of a chore given how spread out all of the callers are. In the sha1 and sha1-unsafe test helpers, all of the calls to various hash functions are guarded by an "if (unsafe)" conditional, which is repetitive and cumbersome. Address these issues by introducing a new pattern whereby one 'git_hash_algo' can return a pointer to another 'git_hash_algo' that represents the unsafe version of itself. So instead of having something like: if (unsafe) the_hash_algo->init_fn(...); the_hash_algo->update_fn(...); the_hash_algo->final_fn(...); else the_hash_algo->unsafe_init_fn(...); the_hash_algo->unsafe_update_fn(...); the_hash_algo->unsafe_final_fn(...); we can instead write: struct git_hash_algo *algop = the_hash_algo; if (unsafe) algop = unsafe_hash_algo(algop); the_hash_algo->init_fn(...); the_hash_algo->update_fn(...); the_hash_algo->final_fn(...); This removes the existing shortcoming by no longer forcing the caller to "remember" which variant of the hash functions it wants to call, only to hold onto a 'struct git_hash_algo' pointer that is initialized once. Similarly, while there currently is still a way to "mix" safe and unsafe functions, this too will go away after subsequent commits remove all direct calls to the unsafe_ variants. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> --- hash.h | 5 +++++ object-file.c | 26 ++++++++++++++++++++++++++ 2 files changed, 31 insertions(+) diff --git a/hash.h b/hash.h index 756166ce5e8..23cf6876e50 100644 --- a/hash.h +++ b/hash.h @@ -305,6 +305,9 @@ struct git_hash_algo { /* The all-zeros OID. */ const struct object_id *null_oid; + + /* The unsafe variant of this hash function, if one exists. */ + const struct git_hash_algo *unsafe; }; extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS]; @@ -323,6 +326,8 @@ static inline int hash_algo_by_ptr(const struct git_hash_algo *p) return p - hash_algos; } +const struct git_hash_algo *unsafe_hash_algo(const struct git_hash_algo *algop); + const struct object_id *null_oid(void); static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2, const struct git_hash_algo *algop) diff --git a/object-file.c b/object-file.c index b1a3463852c..fddcdbe9ba6 100644 --- a/object-file.c +++ b/object-file.c @@ -204,6 +204,22 @@ static void git_hash_unknown_final_oid(struct object_id *oid UNUSED, BUG("trying to finalize unknown hash"); } +static const struct git_hash_algo sha1_unsafe_algo = { + .name = "sha1", + .format_id = GIT_SHA1_FORMAT_ID, + .rawsz = GIT_SHA1_RAWSZ, + .hexsz = GIT_SHA1_HEXSZ, + .blksz = GIT_SHA1_BLKSZ, + .init_fn = git_hash_sha1_init_unsafe, + .clone_fn = git_hash_sha1_clone_unsafe, + .update_fn = git_hash_sha1_update_unsafe, + .final_fn = git_hash_sha1_final_unsafe, + .final_oid_fn = git_hash_sha1_final_oid_unsafe, + .empty_tree = &empty_tree_oid, + .empty_blob = &empty_blob_oid, + .null_oid = &null_oid_sha1, +}; + const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { { .name = NULL, @@ -241,6 +257,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { .unsafe_update_fn = git_hash_sha1_update_unsafe, .unsafe_final_fn = git_hash_sha1_final_unsafe, .unsafe_final_oid_fn = git_hash_sha1_final_oid_unsafe, + .unsafe = &sha1_unsafe_algo, .empty_tree = &empty_tree_oid, .empty_blob = &empty_blob_oid, .null_oid = &null_oid_sha1, @@ -307,6 +324,15 @@ int hash_algo_by_length(int len) return GIT_HASH_UNKNOWN; } +const struct git_hash_algo *unsafe_hash_algo(const struct git_hash_algo *algop) +{ + /* If we have a faster "unsafe" implementation, use that. */ + if (algop->unsafe) + return algop->unsafe; + /* Otherwise use the default one. */ + return algop; +} + /* * This is meant to hold a *small* number of objects that you would * want repo_read_object_file() to be able to return, but yet you do not want -- 2.47.0.237.gc601277f4c4 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH 3/6] hash.h: introduce `unsafe_hash_algo()` 2024-11-20 19:13 ` [PATCH 3/6] hash.h: introduce `unsafe_hash_algo()` Taylor Blau @ 2024-11-21 9:37 ` Jeff King 2024-11-22 0:39 ` brian m. carlson 2025-01-10 21:38 ` Taylor Blau 0 siblings, 2 replies; 60+ messages in thread From: Jeff King @ 2024-11-21 9:37 UTC (permalink / raw) To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano, brian m. carlson On Wed, Nov 20, 2024 at 02:13:50PM -0500, Taylor Blau wrote: > +static const struct git_hash_algo sha1_unsafe_algo = { > + .name = "sha1", > + .format_id = GIT_SHA1_FORMAT_ID, > + .rawsz = GIT_SHA1_RAWSZ, > + .hexsz = GIT_SHA1_HEXSZ, > + .blksz = GIT_SHA1_BLKSZ, > + .init_fn = git_hash_sha1_init_unsafe, > + .clone_fn = git_hash_sha1_clone_unsafe, > + .update_fn = git_hash_sha1_update_unsafe, > + .final_fn = git_hash_sha1_final_unsafe, > + .final_oid_fn = git_hash_sha1_final_oid_unsafe, > + .empty_tree = &empty_tree_oid, > + .empty_blob = &empty_blob_oid, > + .null_oid = &null_oid_sha1, > +}; All of the non-function fields here naturally must match the ones in the parent algo struct, or chaos ensues. That's a little fragile, but it's not like we're adding new algorithm variants a lot. The biggest risk, I guess, would be adding a new field to git_hash_algo which defaults to zero-initialization here. But again, there are only three total and they are defined near each other here, so I don't think it's a big risk overall. I think this struct is a potential landmine for hash_algo_by_ptr(): static inline int hash_algo_by_ptr(const struct git_hash_algo *p) { return p - hash_algos; } It's undefined behavior to pass in sha1_unsafe_algo to this function (but the compiler would not complain since the types are the same). I don't find it incredibly likely that somebody would want to do that on an unsafe variant, but I'm not thrilled about leaving that wart for somebody to find. If we don't care about the speed of this function, then an implementation like: for (i = 0; i < GIT_HASH_NALGOS; i++) { if (p == &hash_algos[i] || p == hash_algos[i]->unsafe) return i; } return GIT_HASH_UNKNOWN; would work. I'm not sure if that would be measurable. I was surprised at the number of places that hash_algo_by_ptr() is called. Many low-level oid functions need it because we store the integer id there rather than a direct pointer (so oidread(), oidclr(), oid_object_info_extended(), and so on). But I'd also expect the loop above to be pretty fast. So I dunno. > const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { > { > .name = NULL, > @@ -241,6 +257,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { > .unsafe_update_fn = git_hash_sha1_update_unsafe, > .unsafe_final_fn = git_hash_sha1_final_unsafe, > .unsafe_final_oid_fn = git_hash_sha1_final_oid_unsafe, > + .unsafe = &sha1_unsafe_algo, > .empty_tree = &empty_tree_oid, > .empty_blob = &empty_blob_oid, > .null_oid = &null_oid_sha1, OK, and we can leave the sha256 pointer zero-initialized, since the function handles that at runtime. Good. -Peff ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 3/6] hash.h: introduce `unsafe_hash_algo()` 2024-11-21 9:37 ` Jeff King @ 2024-11-22 0:39 ` brian m. carlson 2024-11-22 8:25 ` Jeff King 2025-01-10 21:38 ` Taylor Blau 1 sibling, 1 reply; 60+ messages in thread From: brian m. carlson @ 2024-11-22 0:39 UTC (permalink / raw) To: Jeff King; +Cc: Taylor Blau, git, Elijah Newren, Junio C Hamano [-- Attachment #1: Type: text/plain, Size: 3624 bytes --] On 2024-11-21 at 09:37:31, Jeff King wrote: > On Wed, Nov 20, 2024 at 02:13:50PM -0500, Taylor Blau wrote: > > > +static const struct git_hash_algo sha1_unsafe_algo = { > > + .name = "sha1", > > + .format_id = GIT_SHA1_FORMAT_ID, > > + .rawsz = GIT_SHA1_RAWSZ, > > + .hexsz = GIT_SHA1_HEXSZ, > > + .blksz = GIT_SHA1_BLKSZ, > > + .init_fn = git_hash_sha1_init_unsafe, > > + .clone_fn = git_hash_sha1_clone_unsafe, > > + .update_fn = git_hash_sha1_update_unsafe, > > + .final_fn = git_hash_sha1_final_unsafe, > > + .final_oid_fn = git_hash_sha1_final_oid_unsafe, > > + .empty_tree = &empty_tree_oid, > > + .empty_blob = &empty_blob_oid, > > + .null_oid = &null_oid_sha1, > > +}; > > All of the non-function fields here naturally must match the ones in the > parent algo struct, or chaos ensues. That's a little fragile, but it's > not like we're adding new algorithm variants a lot. The biggest risk, I > guess, would be adding a new field to git_hash_algo which defaults to > zero-initialization here. But again, there are only three total and they > are defined near each other here, so I don't think it's a big risk > overall. > > I think this struct is a potential landmine for hash_algo_by_ptr(): > > static inline int hash_algo_by_ptr(const struct git_hash_algo *p) > { > return p - hash_algos; > } > > It's undefined behavior to pass in sha1_unsafe_algo to this function > (but the compiler would not complain since the types are the same). I > don't find it incredibly likely that somebody would want to do that on > an unsafe variant, but I'm not thrilled about leaving that wart for > somebody to find. > > If we don't care about the speed of this function, then an > implementation like: > > for (i = 0; i < GIT_HASH_NALGOS; i++) { > if (p == &hash_algos[i] || p == hash_algos[i]->unsafe) > return i; > } > return GIT_HASH_UNKNOWN; > > would work. I'm not sure if that would be measurable. I was surprised at > the number of places that hash_algo_by_ptr() is called. Many low-level > oid functions need it because we store the integer id there rather than > a direct pointer (so oidread(), oidclr(), oid_object_info_extended(), > and so on). But I'd also expect the loop above to be pretty fast. So I > dunno. Yeah, I'm also a little nervous about this change with hash_algo_by_ptr. I think we had discussed in the other thread about doing something like this: struct git_hash_algo_fns { /* The hash initialization function. */ git_hash_init_fn init_fn; /* The hash context cloning function. */ git_hash_clone_fn clone_fn; /* The hash update function. */ git_hash_update_fn update_fn; /* The hash finalization function. */ git_hash_final_fn final_fn; /* The hash finalization function for object IDs. */ git_hash_final_oid_fn final_oid_fn; }; and then doing this: struct git_hash_algo { [...] struct git_hash_algo_fns fns; struct git_hash_algo_fns unsafe_fns; [...] }; That would mean that we'd have to deal with two pointers in your first patch, but I don't really think that's too terrible. And, since this approach doesn't result in an extra struct git_hash_algo, it's impossible to misuse hash_algo_by_ptr. I'm not a hard no on the current approach, but I think the above would probably be a little safer and harder to misuse. It would require extra patches to convert existing callers, though, but they aren't hugely numerous, and the conversion could probably be mostly automated. -- brian m. carlson (they/them or he/him) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 3/6] hash.h: introduce `unsafe_hash_algo()` 2024-11-22 0:39 ` brian m. carlson @ 2024-11-22 8:25 ` Jeff King 2024-11-22 20:37 ` brian m. carlson 0 siblings, 1 reply; 60+ messages in thread From: Jeff King @ 2024-11-22 8:25 UTC (permalink / raw) To: brian m. carlson; +Cc: Taylor Blau, git, Elijah Newren, Junio C Hamano On Fri, Nov 22, 2024 at 12:39:30AM +0000, brian m. carlson wrote: > Yeah, I'm also a little nervous about this change with hash_algo_by_ptr. > I think we had discussed in the other thread about doing something like > this: > > struct git_hash_algo_fns { > [...] > > I'm not a hard no on the current approach, but I think the above would > probably be a little safer and harder to misuse. It would require extra > patches to convert existing callers, though, but they aren't hugely > numerous, and the conversion could probably be mostly automated. Yeah, I was worried about the fallout to callers. But it might be worth seeing how extensive it is. ... OK, here's a quick and dirty conversion that applies on top of Taylor's patches 1 and 2 (and then patches 3 and on would need to adjust to the new world order). There are quite a lot of spots that needed to be adjusted, but the most part it was just s/->\([a-z]\)_fn/->fn.\1/. (I'll leave adjusting the unsafe_ variants as an exercise for the reader). The naming convention was just what I made up, but one nice effect is that it replaces "_fn" with "fn.", so the line lengths remain the same. :) It does mean there's a function called "final", which is a keyword in C++. I don't know if that matters to us or not (I feel like we were trying to avoid those, but it appears as a function elsewhere, too). So I dunno. It is a one-time pain, but I think the result harmonizes the type system with the concept better. --- builtin/fast-import.c | 18 +++--- builtin/index-pack.c | 18 +++--- builtin/patch-id.c | 10 ++-- builtin/receive-pack.c | 22 +++---- builtin/unpack-objects.c | 12 ++-- bulk-checkin.c | 10 ++-- csum-file.c | 18 +++--- diff.c | 28 ++++----- hash.h | 37 ++++-------- http-push.c | 6 +- http.c | 8 +-- object-file.c | 112 +++++++++++++++++++---------------- pack-check.c | 6 +- pack-write.c | 20 +++---- read-cache.c | 20 +++---- rerere.c | 8 +-- t/helper/test-hash-speed.c | 6 +- t/helper/test-hash.c | 6 +- t/unit-tests/t-hash.c | 6 +- trace2/tr2_sid.c | 6 +- 20 files changed, 189 insertions(+), 188 deletions(-) diff --git a/builtin/fast-import.c b/builtin/fast-import.c index 76d5c20f14..b7870d1f1e 100644 --- a/builtin/fast-import.c +++ b/builtin/fast-import.c @@ -953,10 +953,10 @@ static int store_object( hdrlen = format_object_header((char *)hdr, sizeof(hdr), type, dat->len); - the_hash_algo->init_fn(&c); - the_hash_algo->update_fn(&c, hdr, hdrlen); - the_hash_algo->update_fn(&c, dat->buf, dat->len); - the_hash_algo->final_oid_fn(&oid, &c); + the_hash_algo->fn.init(&c); + the_hash_algo->fn.update(&c, hdr, hdrlen); + the_hash_algo->fn.update(&c, dat->buf, dat->len); + the_hash_algo->fn.final_oid(&oid, &c); if (oidout) oidcpy(oidout, &oid); @@ -1101,14 +1101,14 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark) || (pack_size + PACK_SIZE_THRESHOLD + len) < pack_size) cycle_packfile(); - the_hash_algo->init_fn(&checkpoint.ctx); + the_hash_algo->fn.init(&checkpoint.ctx); hashfile_checkpoint(pack_file, &checkpoint); offset = checkpoint.offset; hdrlen = format_object_header((char *)out_buf, out_sz, OBJ_BLOB, len); - the_hash_algo->init_fn(&c); - the_hash_algo->update_fn(&c, out_buf, hdrlen); + the_hash_algo->fn.init(&c); + the_hash_algo->fn.update(&c, out_buf, hdrlen); crc32_begin(pack_file); @@ -1126,7 +1126,7 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark) if (!n && feof(stdin)) die("EOF in data (%" PRIuMAX " bytes remaining)", len); - the_hash_algo->update_fn(&c, in_buf, n); + the_hash_algo->fn.update(&c, in_buf, n); s.next_in = in_buf; s.avail_in = n; len -= n; @@ -1152,7 +1152,7 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark) } } git_deflate_end(&s); - the_hash_algo->final_oid_fn(&oid, &c); + the_hash_algo->fn.final_oid(&oid, &c); if (oidout) oidcpy(oidout, &oid); diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 08b340552f..c5b5573edf 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -298,7 +298,7 @@ static void flush(void) if (input_offset) { if (output_fd >= 0) write_or_die(output_fd, input_buffer, input_offset); - the_hash_algo->update_fn(&input_ctx, input_buffer, input_offset); + the_hash_algo->fn.update(&input_ctx, input_buffer, input_offset); memmove(input_buffer, input_buffer + input_offset, input_len); input_offset = 0; } @@ -371,7 +371,7 @@ static const char *open_pack_file(const char *pack_name) output_fd = -1; nothread_data.pack_fd = input_fd; } - the_hash_algo->init_fn(&input_ctx); + the_hash_algo->fn.init(&input_ctx); return pack_name; } @@ -476,8 +476,8 @@ static void *unpack_entry_data(off_t offset, unsigned long size, if (!is_delta_type(type)) { hdrlen = format_object_header(hdr, sizeof(hdr), type, size); - the_hash_algo->init_fn(&c); - the_hash_algo->update_fn(&c, hdr, hdrlen); + the_hash_algo->fn.init(&c); + the_hash_algo->fn.update(&c, hdr, hdrlen); } else oid = NULL; if (type == OBJ_BLOB && size > big_file_threshold) @@ -497,7 +497,7 @@ static void *unpack_entry_data(off_t offset, unsigned long size, status = git_inflate(&stream, 0); use(input_len - stream.avail_in); if (oid) - the_hash_algo->update_fn(&c, last_out, stream.next_out - last_out); + the_hash_algo->fn.update(&c, last_out, stream.next_out - last_out); if (buf == fixed_buf) { stream.next_out = buf; stream.avail_out = sizeof(fixed_buf); @@ -507,7 +507,7 @@ static void *unpack_entry_data(off_t offset, unsigned long size, bad_object(offset, _("inflate returned %d"), status); git_inflate_end(&stream); if (oid) - the_hash_algo->final_oid_fn(oid, &c); + the_hash_algo->fn.final_oid(oid, &c); return buf == fixed_buf ? NULL : buf; } @@ -1256,9 +1256,9 @@ static void parse_pack_objects(unsigned char *hash) /* Check pack integrity */ flush(); - the_hash_algo->init_fn(&tmp_ctx); - the_hash_algo->clone_fn(&tmp_ctx, &input_ctx); - the_hash_algo->final_fn(hash, &tmp_ctx); + the_hash_algo->fn.init(&tmp_ctx); + the_hash_algo->fn.clone(&tmp_ctx, &input_ctx); + the_hash_algo->fn.final(hash, &tmp_ctx); if (!hasheq(fill(the_hash_algo->rawsz), hash, the_repository->hash_algo)) die(_("pack is corrupted (SHA1 mismatch)")); use(the_hash_algo->rawsz); diff --git a/builtin/patch-id.c b/builtin/patch-id.c index 93b398e391..3bf1367341 100644 --- a/builtin/patch-id.c +++ b/builtin/patch-id.c @@ -70,7 +70,7 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result, char pre_oid_str[GIT_MAX_HEXSZ + 1], post_oid_str[GIT_MAX_HEXSZ + 1]; git_hash_ctx ctx; - the_hash_algo->init_fn(&ctx); + the_hash_algo->fn.init(&ctx); oidclr(result, the_repository->hash_algo); while (strbuf_getwholeline(line_buf, stdin, '\n') != EOF) { @@ -83,7 +83,7 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result, !skip_prefix(line, "From ", &p) && starts_with(line, "\\ ") && 12 < strlen(line)) { if (verbatim) - the_hash_algo->update_fn(&ctx, line, strlen(line)); + the_hash_algo->fn.update(&ctx, line, strlen(line)); continue; } @@ -102,9 +102,9 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result, starts_with(line, "Binary files")) { diff_is_binary = 1; before = 0; - the_hash_algo->update_fn(&ctx, pre_oid_str, + the_hash_algo->fn.update(&ctx, pre_oid_str, strlen(pre_oid_str)); - the_hash_algo->update_fn(&ctx, post_oid_str, + the_hash_algo->fn.update(&ctx, post_oid_str, strlen(post_oid_str)); if (stable) flush_one_hunk(result, &ctx); @@ -163,7 +163,7 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result, /* Add line to hash algo (possibly removing whitespace) */ len = verbatim ? strlen(line) : remove_space(line); patchlen += len; - the_hash_algo->update_fn(&ctx, line, len); + the_hash_algo->fn.update(&ctx, line, len); } if (!found_next) diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c index ab5b20e39c..b7aa6cc556 100644 --- a/builtin/receive-pack.c +++ b/builtin/receive-pack.c @@ -569,9 +569,9 @@ static void hmac_hash(unsigned char *out, /* RFC 2104 2. (1) */ memset(key, '\0', GIT_MAX_BLKSZ); if (the_hash_algo->blksz < key_len) { - the_hash_algo->init_fn(&ctx); - the_hash_algo->update_fn(&ctx, key_in, key_len); - the_hash_algo->final_fn(key, &ctx); + the_hash_algo->fn.init(&ctx); + the_hash_algo->fn.update(&ctx, key_in, key_len); + the_hash_algo->fn.final(key, &ctx); } else { memcpy(key, key_in, key_len); } @@ -583,16 +583,16 @@ static void hmac_hash(unsigned char *out, } /* RFC 2104 2. (3) & (4) */ - the_hash_algo->init_fn(&ctx); - the_hash_algo->update_fn(&ctx, k_ipad, sizeof(k_ipad)); - the_hash_algo->update_fn(&ctx, text, text_len); - the_hash_algo->final_fn(out, &ctx); + the_hash_algo->fn.init(&ctx); + the_hash_algo->fn.update(&ctx, k_ipad, sizeof(k_ipad)); + the_hash_algo->fn.update(&ctx, text, text_len); + the_hash_algo->fn.final(out, &ctx); /* RFC 2104 2. (6) & (7) */ - the_hash_algo->init_fn(&ctx); - the_hash_algo->update_fn(&ctx, k_opad, sizeof(k_opad)); - the_hash_algo->update_fn(&ctx, out, the_hash_algo->rawsz); - the_hash_algo->final_fn(out, &ctx); + the_hash_algo->fn.init(&ctx); + the_hash_algo->fn.update(&ctx, k_opad, sizeof(k_opad)); + the_hash_algo->fn.update(&ctx, out, the_hash_algo->rawsz); + the_hash_algo->fn.final(out, &ctx); } static char *prepare_push_cert_nonce(const char *path, timestamp_t stamp) diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c index 02b8d02f63..c263343531 100644 --- a/builtin/unpack-objects.c +++ b/builtin/unpack-objects.c @@ -67,7 +67,7 @@ static void *fill(int min) if (min > sizeof(buffer)) die("cannot fill %d bytes", min); if (offset) { - the_hash_algo->update_fn(&ctx, buffer, offset); + the_hash_algo->fn.update(&ctx, buffer, offset); memmove(buffer, buffer + offset, len); offset = 0; } @@ -667,12 +667,12 @@ int cmd_unpack_objects(int argc, /* We don't take any non-flag arguments now.. Maybe some day */ usage(unpack_usage); } - the_hash_algo->init_fn(&ctx); + the_hash_algo->fn.init(&ctx); unpack_all(); - the_hash_algo->update_fn(&ctx, buffer, offset); - the_hash_algo->init_fn(&tmp_ctx); - the_hash_algo->clone_fn(&tmp_ctx, &ctx); - the_hash_algo->final_oid_fn(&oid, &tmp_ctx); + the_hash_algo->fn.update(&ctx, buffer, offset); + the_hash_algo->fn.init(&tmp_ctx); + the_hash_algo->fn.clone(&tmp_ctx, &ctx); + the_hash_algo->fn.final_oid(&oid, &tmp_ctx); if (strict) { write_rest(); if (fsck_finish(&fsck_options)) diff --git a/bulk-checkin.c b/bulk-checkin.c index 2753d5bbe4..7fc545703a 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -193,7 +193,7 @@ static int stream_blob_to_pack(struct bulk_checkin_packfile *state, if (rsize < hsize) hsize = rsize; if (hsize) - the_hash_algo->update_fn(ctx, ibuf, hsize); + the_hash_algo->fn.update(ctx, ibuf, hsize); *already_hashed_to = offset; } s.next_in = ibuf; @@ -269,9 +269,9 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, header_len = format_object_header((char *)obuf, sizeof(obuf), OBJ_BLOB, size); - the_hash_algo->init_fn(&ctx); - the_hash_algo->update_fn(&ctx, obuf, header_len); - the_hash_algo->init_fn(&checkpoint.ctx); + the_hash_algo->fn.init(&ctx); + the_hash_algo->fn.update(&ctx, obuf, header_len); + the_hash_algo->fn.init(&checkpoint.ctx); /* Note: idx is non-NULL when we are writing */ if ((flags & HASH_WRITE_OBJECT) != 0) @@ -302,7 +302,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, if (lseek(fd, seekback, SEEK_SET) == (off_t) -1) return error("cannot seek back"); } - the_hash_algo->final_oid_fn(result_oid, &ctx); + the_hash_algo->fn.final_oid(result_oid, &ctx); if (!idx) return 0; diff --git a/csum-file.c b/csum-file.c index 107bc4c96f..05e20de441 100644 --- a/csum-file.c +++ b/csum-file.c @@ -50,7 +50,7 @@ void hashflush(struct hashfile *f) if (offset) { if (!f->skip_hash) - f->algop->unsafe_update_fn(&f->ctx, f->buffer, offset); + f->algop->unsafe_fn.update(&f->ctx, f->buffer, offset); flush(f, f->buffer, offset); f->offset = 0; } @@ -73,7 +73,7 @@ int finalize_hashfile(struct hashfile *f, unsigned char *result, if (f->skip_hash) hashclr(f->buffer, f->algop); else - f->algop->unsafe_final_fn(f->buffer, &f->ctx); + f->algop->unsafe_fn.final(f->buffer, &f->ctx); if (result) hashcpy(result, f->buffer, f->algop); @@ -128,7 +128,7 @@ void hashwrite(struct hashfile *f, const void *buf, unsigned int count) * f->offset is necessarily zero. */ if (!f->skip_hash) - f->algop->unsafe_update_fn(&f->ctx, buf, nr); + f->algop->unsafe_fn.update(&f->ctx, buf, nr); flush(f, buf, nr); } else { /* @@ -176,7 +176,7 @@ static struct hashfile *hashfd_internal(int fd, const char *name, f->skip_hash = 0; f->algop = the_hash_algo; - f->algop->unsafe_init_fn(&f->ctx); + f->algop->unsafe_fn.init(&f->ctx); f->buffer_len = buffer_len; f->buffer = xmalloc(buffer_len); @@ -210,7 +210,7 @@ void hashfile_checkpoint(struct hashfile *f, struct hashfile_checkpoint *checkpo { hashflush(f); checkpoint->offset = f->total; - f->algop->unsafe_clone_fn(&checkpoint->ctx, &f->ctx); + f->algop->unsafe_fn.clone(&checkpoint->ctx, &f->ctx); } int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoint *checkpoint) @@ -221,7 +221,7 @@ int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoint *checkpoint lseek(f->fd, offset, SEEK_SET) != offset) return -1; f->total = offset; - f->algop->unsafe_clone_fn(&f->ctx, &checkpoint->ctx); + f->algop->unsafe_fn.clone(&f->ctx, &checkpoint->ctx); f->offset = 0; /* hashflush() was called in checkpoint */ return 0; } @@ -248,9 +248,9 @@ int hashfile_checksum_valid(const unsigned char *data, size_t total_len) if (total_len < algop->rawsz) return 0; /* say "too short"? */ - algop->unsafe_init_fn(&ctx); - algop->unsafe_update_fn(&ctx, data, data_len); - algop->unsafe_final_fn(got, &ctx); + algop->unsafe_fn.init(&ctx); + algop->unsafe_fn.update(&ctx, data, data_len); + algop->unsafe_fn.final(got, &ctx); return hasheq(got, data + data_len, algop); } diff --git a/diff.c b/diff.c index dceac20d18..85d776fa37 100644 --- a/diff.c +++ b/diff.c @@ -6413,8 +6413,8 @@ void flush_one_hunk(struct object_id *result, git_hash_ctx *ctx) unsigned short carry = 0; int i; - the_hash_algo->final_fn(hash, ctx); - the_hash_algo->init_fn(ctx); + the_hash_algo->fn.final(hash, ctx); + the_hash_algo->fn.init(ctx); /* 20-byte sum, with carry */ for (i = 0; i < the_hash_algo->rawsz; ++i) { carry += result->hash[i] + hash[i]; @@ -6432,22 +6432,22 @@ static int patch_id_consume(void *priv, char *line, unsigned long len) return 0; new_len = remove_space(line, len); - the_hash_algo->update_fn(data->ctx, line, new_len); + the_hash_algo->fn.update(data->ctx, line, new_len); data->patchlen += new_len; return 0; } static void patch_id_add_string(git_hash_ctx *ctx, const char *str) { - the_hash_algo->update_fn(ctx, str, strlen(str)); + the_hash_algo->fn.update(ctx, str, strlen(str)); } static void patch_id_add_mode(git_hash_ctx *ctx, unsigned mode) { /* large enough for 2^32 in octal */ char buf[12]; int len = xsnprintf(buf, sizeof(buf), "%06o", mode); - the_hash_algo->update_fn(ctx, buf, len); + the_hash_algo->fn.update(ctx, buf, len); } /* returns 0 upon success, and writes result into oid */ @@ -6458,7 +6458,7 @@ static int diff_get_patch_id(struct diff_options *options, struct object_id *oid git_hash_ctx ctx; struct patch_id_t data; - the_hash_algo->init_fn(&ctx); + the_hash_algo->fn.init(&ctx); memset(&data, 0, sizeof(struct patch_id_t)); data.ctx = &ctx; oidclr(oid, the_repository->hash_algo); @@ -6491,9 +6491,9 @@ static int diff_get_patch_id(struct diff_options *options, struct object_id *oid len2 = remove_space(p->two->path, strlen(p->two->path)); patch_id_add_string(&ctx, "diff--git"); patch_id_add_string(&ctx, "a/"); - the_hash_algo->update_fn(&ctx, p->one->path, len1); + the_hash_algo->fn.update(&ctx, p->one->path, len1); patch_id_add_string(&ctx, "b/"); - the_hash_algo->update_fn(&ctx, p->two->path, len2); + the_hash_algo->fn.update(&ctx, p->two->path, len2); if (p->one->mode == 0) { patch_id_add_string(&ctx, "newfilemode"); @@ -6512,24 +6512,24 @@ static int diff_get_patch_id(struct diff_options *options, struct object_id *oid /* don't do anything since we're only populating header info */ } else if (diff_filespec_is_binary(options->repo, p->one) || diff_filespec_is_binary(options->repo, p->two)) { - the_hash_algo->update_fn(&ctx, oid_to_hex(&p->one->oid), + the_hash_algo->fn.update(&ctx, oid_to_hex(&p->one->oid), the_hash_algo->hexsz); - the_hash_algo->update_fn(&ctx, oid_to_hex(&p->two->oid), + the_hash_algo->fn.update(&ctx, oid_to_hex(&p->two->oid), the_hash_algo->hexsz); } else { if (p->one->mode == 0) { patch_id_add_string(&ctx, "---/dev/null"); patch_id_add_string(&ctx, "+++b/"); - the_hash_algo->update_fn(&ctx, p->two->path, len2); + the_hash_algo->fn.update(&ctx, p->two->path, len2); } else if (p->two->mode == 0) { patch_id_add_string(&ctx, "---a/"); - the_hash_algo->update_fn(&ctx, p->one->path, len1); + the_hash_algo->fn.update(&ctx, p->one->path, len1); patch_id_add_string(&ctx, "+++/dev/null"); } else { patch_id_add_string(&ctx, "---a/"); - the_hash_algo->update_fn(&ctx, p->one->path, len1); + the_hash_algo->fn.update(&ctx, p->one->path, len1); patch_id_add_string(&ctx, "+++b/"); - the_hash_algo->update_fn(&ctx, p->two->path, len2); + the_hash_algo->fn.update(&ctx, p->two->path, len2); } if (fill_mmfile(options->repo, &mf1, p->one) < 0 || diff --git a/hash.h b/hash.h index 756166ce5e..6107d5b47a 100644 --- a/hash.h +++ b/hash.h @@ -267,35 +267,24 @@ struct git_hash_algo { /* The block size of the hash. */ size_t blksz; - /* The hash initialization function. */ - git_hash_init_fn init_fn; + struct git_hash_algo_fns { + /* The hash initialization function. */ + git_hash_init_fn init; - /* The hash context cloning function. */ - git_hash_clone_fn clone_fn; + /* The hash context cloning function. */ + git_hash_clone_fn clone; - /* The hash update function. */ - git_hash_update_fn update_fn; + /* The hash update function. */ + git_hash_update_fn update; - /* The hash finalization function. */ - git_hash_final_fn final_fn; + /* The hash finalization function. */ + git_hash_final_fn final; - /* The hash finalization function for object IDs. */ - git_hash_final_oid_fn final_oid_fn; + /* The hash finalization function for object IDs. */ + git_hash_final_oid_fn final_oid; + } fn; - /* The non-cryptographic hash initialization function. */ - git_hash_init_fn unsafe_init_fn; - - /* The non-cryptographic hash context cloning function. */ - git_hash_clone_fn unsafe_clone_fn; - - /* The non-cryptographic hash update function. */ - git_hash_update_fn unsafe_update_fn; - - /* The non-cryptographic hash finalization function. */ - git_hash_final_fn unsafe_final_fn; - - /* The non-cryptographic hash finalization function. */ - git_hash_final_oid_fn unsafe_final_oid_fn; + struct git_hash_algo_fns unsafe_fn; /* The OID of the empty tree. */ const struct object_id *empty_tree; diff --git a/http-push.c b/http-push.c index 4d24e6b8d4..9659e777d1 100644 --- a/http-push.c +++ b/http-push.c @@ -773,9 +773,9 @@ static void handle_new_lock_ctx(struct xml_ctx *ctx, int tag_closed) } else if (!strcmp(ctx->name, DAV_ACTIVELOCK_TOKEN)) { lock->token = xstrdup(ctx->cdata); - the_hash_algo->init_fn(&hash_ctx); - the_hash_algo->update_fn(&hash_ctx, lock->token, strlen(lock->token)); - the_hash_algo->final_fn(lock_token_hash, &hash_ctx); + the_hash_algo->fn.init(&hash_ctx); + the_hash_algo->fn.update(&hash_ctx, lock->token, strlen(lock->token)); + the_hash_algo->fn.final(lock_token_hash, &hash_ctx); lock->tmpfile_suffix[0] = '_'; memcpy(lock->tmpfile_suffix + 1, hash_to_hex(lock_token_hash), the_hash_algo->hexsz); diff --git a/http.c b/http.c index 58242b9d2d..b3f31b52e6 100644 --- a/http.c +++ b/http.c @@ -2654,7 +2654,7 @@ static size_t fwrite_sha1_file(char *ptr, size_t eltsize, size_t nmemb, freq->stream.next_out = expn; freq->stream.avail_out = sizeof(expn); freq->zret = git_inflate(&freq->stream, Z_SYNC_FLUSH); - the_hash_algo->update_fn(&freq->c, expn, + the_hash_algo->fn.update(&freq->c, expn, sizeof(expn) - freq->stream.avail_out); } while (freq->stream.avail_in && freq->zret == Z_OK); return nmemb; @@ -2713,7 +2713,7 @@ struct http_object_request *new_http_object_request(const char *base_url, git_inflate_init(&freq->stream); - the_hash_algo->init_fn(&freq->c); + the_hash_algo->fn.init(&freq->c); freq->url = get_remote_object_url(base_url, hex, 0); @@ -2749,7 +2749,7 @@ struct http_object_request *new_http_object_request(const char *base_url, git_inflate_end(&freq->stream); memset(&freq->stream, 0, sizeof(freq->stream)); git_inflate_init(&freq->stream); - the_hash_algo->init_fn(&freq->c); + the_hash_algo->fn.init(&freq->c); if (prev_posn>0) { prev_posn = 0; lseek(freq->localfile, 0, SEEK_SET); @@ -2820,7 +2820,7 @@ int finish_http_object_request(struct http_object_request *freq) return -1; } - the_hash_algo->final_oid_fn(&freq->real_oid, &freq->c); + the_hash_algo->fn.final_oid(&freq->real_oid, &freq->c); if (freq->zret != Z_STREAM_END) { unlink_or_warn(freq->tmpfile.buf); return -1; diff --git a/object-file.c b/object-file.c index b1a3463852..c8399745d8 100644 --- a/object-file.c +++ b/object-file.c @@ -211,16 +211,20 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { .rawsz = 0, .hexsz = 0, .blksz = 0, - .init_fn = git_hash_unknown_init, - .clone_fn = git_hash_unknown_clone, - .update_fn = git_hash_unknown_update, - .final_fn = git_hash_unknown_final, - .final_oid_fn = git_hash_unknown_final_oid, - .unsafe_init_fn = git_hash_unknown_init, - .unsafe_clone_fn = git_hash_unknown_clone, - .unsafe_update_fn = git_hash_unknown_update, - .unsafe_final_fn = git_hash_unknown_final, - .unsafe_final_oid_fn = git_hash_unknown_final_oid, + .fn = { + .init = git_hash_unknown_init, + .clone = git_hash_unknown_clone, + .update = git_hash_unknown_update, + .final = git_hash_unknown_final, + .final_oid = git_hash_unknown_final_oid, + }, + .unsafe_fn = { + .init = git_hash_unknown_init, + .clone = git_hash_unknown_clone, + .update = git_hash_unknown_update, + .final = git_hash_unknown_final, + .final_oid = git_hash_unknown_final_oid, + }, .empty_tree = NULL, .empty_blob = NULL, .null_oid = NULL, @@ -231,16 +235,20 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { .rawsz = GIT_SHA1_RAWSZ, .hexsz = GIT_SHA1_HEXSZ, .blksz = GIT_SHA1_BLKSZ, - .init_fn = git_hash_sha1_init, - .clone_fn = git_hash_sha1_clone, - .update_fn = git_hash_sha1_update, - .final_fn = git_hash_sha1_final, - .final_oid_fn = git_hash_sha1_final_oid, - .unsafe_init_fn = git_hash_sha1_init_unsafe, - .unsafe_clone_fn = git_hash_sha1_clone_unsafe, - .unsafe_update_fn = git_hash_sha1_update_unsafe, - .unsafe_final_fn = git_hash_sha1_final_unsafe, - .unsafe_final_oid_fn = git_hash_sha1_final_oid_unsafe, + .fn = { + .init = git_hash_sha1_init, + .clone = git_hash_sha1_clone, + .update = git_hash_sha1_update, + .final = git_hash_sha1_final, + .final_oid = git_hash_sha1_final_oid, + }, + .unsafe_fn = { + .init = git_hash_sha1_init_unsafe, + .clone = git_hash_sha1_clone_unsafe, + .update = git_hash_sha1_update_unsafe, + .final = git_hash_sha1_final_unsafe, + .final_oid = git_hash_sha1_final_oid_unsafe, + }, .empty_tree = &empty_tree_oid, .empty_blob = &empty_blob_oid, .null_oid = &null_oid_sha1, @@ -251,16 +259,20 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { .rawsz = GIT_SHA256_RAWSZ, .hexsz = GIT_SHA256_HEXSZ, .blksz = GIT_SHA256_BLKSZ, - .init_fn = git_hash_sha256_init, - .clone_fn = git_hash_sha256_clone, - .update_fn = git_hash_sha256_update, - .final_fn = git_hash_sha256_final, - .final_oid_fn = git_hash_sha256_final_oid, - .unsafe_init_fn = git_hash_sha256_init, - .unsafe_clone_fn = git_hash_sha256_clone, - .unsafe_update_fn = git_hash_sha256_update, - .unsafe_final_fn = git_hash_sha256_final, - .unsafe_final_oid_fn = git_hash_sha256_final_oid, + .fn = { + .init = git_hash_sha256_init, + .clone = git_hash_sha256_clone, + .update = git_hash_sha256_update, + .final = git_hash_sha256_final, + .final_oid = git_hash_sha256_final_oid, + }, + .unsafe_fn = { + .init = git_hash_sha256_init, + .clone = git_hash_sha256_clone, + .update = git_hash_sha256_update, + .final = git_hash_sha256_final, + .final_oid = git_hash_sha256_final_oid, + }, .empty_tree = &empty_tree_oid_sha256, .empty_blob = &empty_blob_oid_sha256, .null_oid = &null_oid_sha256, @@ -1185,8 +1197,8 @@ int stream_object_signature(struct repository *r, const struct object_id *oid) hdrlen = format_object_header(hdr, sizeof(hdr), obj_type, size); /* Sha1.. */ - r->hash_algo->init_fn(&c); - r->hash_algo->update_fn(&c, hdr, hdrlen); + r->hash_algo->fn.init(&c); + r->hash_algo->fn.update(&c, hdr, hdrlen); for (;;) { char buf[1024 * 16]; ssize_t readlen = read_istream(st, buf, sizeof(buf)); @@ -1197,9 +1209,9 @@ int stream_object_signature(struct repository *r, const struct object_id *oid) } if (!readlen) break; - r->hash_algo->update_fn(&c, buf, readlen); + r->hash_algo->fn.update(&c, buf, readlen); } - r->hash_algo->final_oid_fn(&real_oid, &c); + r->hash_algo->fn.final_oid(&real_oid, &c); close_istream(st); return !oideq(oid, &real_oid) ? -1 : 0; } @@ -1943,10 +1955,10 @@ static void hash_object_body(const struct git_hash_algo *algo, git_hash_ctx *c, struct object_id *oid, char *hdr, int *hdrlen) { - algo->init_fn(c); - algo->update_fn(c, hdr, *hdrlen); - algo->update_fn(c, buf, len); - algo->final_oid_fn(oid, c); + algo->fn.init(c); + algo->fn.update(c, hdr, *hdrlen); + algo->fn.update(c, buf, len); + algo->fn.final_oid(oid, c); } static void write_object_file_prepare(const struct git_hash_algo *algo, @@ -2206,18 +2218,18 @@ static int start_loose_object_common(struct strbuf *tmp_file, git_deflate_init(stream, zlib_compression_level); stream->next_out = buf; stream->avail_out = buflen; - algo->init_fn(c); + algo->fn.init(c); if (compat && compat_c) - compat->init_fn(compat_c); + compat->fn.init(compat_c); /* Start to feed header to zlib stream */ stream->next_in = (unsigned char *)hdr; stream->avail_in = hdrlen; while (git_deflate(stream, 0) == Z_OK) ; /* nothing */ - algo->update_fn(c, hdr, hdrlen); + algo->fn.update(c, hdr, hdrlen); if (compat && compat_c) - compat->update_fn(compat_c, hdr, hdrlen); + compat->fn.update(compat_c, hdr, hdrlen); return fd; } @@ -2238,9 +2250,9 @@ static int write_loose_object_common(git_hash_ctx *c, git_hash_ctx *compat_c, int ret; ret = git_deflate(stream, flush ? Z_FINISH : 0); - algo->update_fn(c, in0, stream->next_in - in0); + algo->fn.update(c, in0, stream->next_in - in0); if (compat && compat_c) - compat->update_fn(compat_c, in0, stream->next_in - in0); + compat->fn.update(compat_c, in0, stream->next_in - in0); if (write_in_full(fd, compressed, stream->next_out - compressed) < 0) die_errno(_("unable to write loose object file")); stream->next_out = compressed; @@ -2267,9 +2279,9 @@ static int end_loose_object_common(git_hash_ctx *c, git_hash_ctx *compat_c, ret = git_deflate_end_gently(stream); if (ret != Z_OK) return ret; - algo->final_oid_fn(oid, c); + algo->fn.final_oid(oid, c); if (compat && compat_c) - compat->final_oid_fn(compat_oid, compat_c); + compat->fn.final_oid(compat_oid, compat_c); return Z_OK; } @@ -3027,8 +3039,8 @@ static int check_stream_oid(git_zstream *stream, unsigned long total_read; int status = Z_OK; - the_hash_algo->init_fn(&c); - the_hash_algo->update_fn(&c, hdr, stream->total_out); + the_hash_algo->fn.init(&c); + the_hash_algo->fn.update(&c, hdr, stream->total_out); /* * We already read some bytes into hdr, but the ones up to the NUL @@ -3048,7 +3060,7 @@ static int check_stream_oid(git_zstream *stream, if (size - total_read < stream->avail_out) stream->avail_out = size - total_read; status = git_inflate(stream, Z_FINISH); - the_hash_algo->update_fn(&c, buf, stream->next_out - buf); + the_hash_algo->fn.update(&c, buf, stream->next_out - buf); total_read += stream->next_out - buf; } git_inflate_end(stream); @@ -3063,7 +3075,7 @@ static int check_stream_oid(git_zstream *stream, return -1; } - the_hash_algo->final_oid_fn(&real_oid, &c); + the_hash_algo->fn.final_oid(&real_oid, &c); if (!oideq(expected_oid, &real_oid)) { error(_("hash mismatch for %s (expected %s)"), path, oid_to_hex(expected_oid)); diff --git a/pack-check.c b/pack-check.c index e883dae3f2..2c1c2c54fa 100644 --- a/pack-check.c +++ b/pack-check.c @@ -67,7 +67,7 @@ static int verify_packfile(struct repository *r, if (!is_pack_valid(p)) return error("packfile %s cannot be accessed", p->pack_name); - r->hash_algo->init_fn(&ctx); + r->hash_algo->fn.init(&ctx); do { unsigned long remaining; unsigned char *in = use_pack(p, w_curs, offset, &remaining); @@ -76,9 +76,9 @@ static int verify_packfile(struct repository *r, pack_sig_ofs = p->pack_size - r->hash_algo->rawsz; if (offset > pack_sig_ofs) remaining -= (unsigned int)(offset - pack_sig_ofs); - r->hash_algo->update_fn(&ctx, in, remaining); + r->hash_algo->fn.update(&ctx, in, remaining); } while (offset < pack_sig_ofs); - r->hash_algo->final_fn(hash, &ctx); + r->hash_algo->fn.final(hash, &ctx); pack_sig = use_pack(p, w_curs, pack_sig_ofs, NULL); if (!hasheq(hash, pack_sig, the_repository->hash_algo)) err = error("%s pack checksum mismatch", diff --git a/pack-write.c b/pack-write.c index 8c7dfddc5a..d202c6c9a4 100644 --- a/pack-write.c +++ b/pack-write.c @@ -392,8 +392,8 @@ void fixup_pack_header_footer(int pack_fd, char *buf; ssize_t read_result; - the_hash_algo->init_fn(&old_hash_ctx); - the_hash_algo->init_fn(&new_hash_ctx); + the_hash_algo->fn.init(&old_hash_ctx); + the_hash_algo->fn.init(&new_hash_ctx); if (lseek(pack_fd, 0, SEEK_SET) != 0) die_errno("Failed seeking to start of '%s'", pack_name); @@ -405,9 +405,9 @@ void fixup_pack_header_footer(int pack_fd, pack_name); if (lseek(pack_fd, 0, SEEK_SET) != 0) die_errno("Failed seeking to start of '%s'", pack_name); - the_hash_algo->update_fn(&old_hash_ctx, &hdr, sizeof(hdr)); + the_hash_algo->fn.update(&old_hash_ctx, &hdr, sizeof(hdr)); hdr.hdr_entries = htonl(object_count); - the_hash_algo->update_fn(&new_hash_ctx, &hdr, sizeof(hdr)); + the_hash_algo->fn.update(&new_hash_ctx, &hdr, sizeof(hdr)); write_or_die(pack_fd, &hdr, sizeof(hdr)); partial_pack_offset -= sizeof(hdr); @@ -422,7 +422,7 @@ void fixup_pack_header_footer(int pack_fd, break; if (n < 0) die_errno("Failed to checksum '%s'", pack_name); - the_hash_algo->update_fn(&new_hash_ctx, buf, n); + the_hash_algo->fn.update(&new_hash_ctx, buf, n); aligned_sz -= n; if (!aligned_sz) @@ -431,11 +431,11 @@ void fixup_pack_header_footer(int pack_fd, if (!partial_pack_hash) continue; - the_hash_algo->update_fn(&old_hash_ctx, buf, n); + the_hash_algo->fn.update(&old_hash_ctx, buf, n); partial_pack_offset -= n; if (partial_pack_offset == 0) { unsigned char hash[GIT_MAX_RAWSZ]; - the_hash_algo->final_fn(hash, &old_hash_ctx); + the_hash_algo->fn.final(hash, &old_hash_ctx); if (!hasheq(hash, partial_pack_hash, the_repository->hash_algo)) die("Unexpected checksum for %s " @@ -445,16 +445,16 @@ void fixup_pack_header_footer(int pack_fd, * pack, which also means making partial_pack_offset * big enough not to matter anymore. */ - the_hash_algo->init_fn(&old_hash_ctx); + the_hash_algo->fn.init(&old_hash_ctx); partial_pack_offset = ~partial_pack_offset; partial_pack_offset -= MSB(partial_pack_offset, 1); } } free(buf); if (partial_pack_hash) - the_hash_algo->final_fn(partial_pack_hash, &old_hash_ctx); - the_hash_algo->final_fn(new_pack_hash, &new_hash_ctx); + the_hash_algo->fn.final(partial_pack_hash, &old_hash_ctx); + the_hash_algo->fn.final(new_pack_hash, &new_hash_ctx); write_or_die(pack_fd, new_pack_hash, the_hash_algo->rawsz); fsync_component_or_die(FSYNC_COMPONENT_PACK, pack_fd, pack_name); } diff --git a/read-cache.c b/read-cache.c index 01d0b3ad22..5170b07a33 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1736,9 +1736,9 @@ static int verify_hdr(const struct cache_header *hdr, unsigned long size) if (oideq(&oid, null_oid())) return 0; - the_hash_algo->init_fn(&c); - the_hash_algo->update_fn(&c, hdr, size - the_hash_algo->rawsz); - the_hash_algo->final_fn(hash, &c); + the_hash_algo->fn.init(&c); + the_hash_algo->fn.update(&c, hdr, size - the_hash_algo->rawsz); + the_hash_algo->fn.final(hash, &c); if (!hasheq(hash, start, the_repository->hash_algo)) return error(_("bad index file sha1 signature")); return 0; @@ -2574,8 +2574,8 @@ static int write_index_ext_header(struct hashfile *f, if (eoie_f) { ext = htonl(ext); sz = htonl(sz); - the_hash_algo->update_fn(eoie_f, &ext, sizeof(ext)); - the_hash_algo->update_fn(eoie_f, &sz, sizeof(sz)); + the_hash_algo->fn.update(eoie_f, &ext, sizeof(ext)); + the_hash_algo->fn.update(eoie_f, &sz, sizeof(sz)); } return 0; } @@ -2977,7 +2977,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, */ if (offset && record_eoie()) { CALLOC_ARRAY(eoie_c, 1); - the_hash_algo->init_fn(eoie_c); + the_hash_algo->fn.init(eoie_c); } /* @@ -3616,7 +3616,7 @@ static size_t read_eoie_extension(const char *mmap, size_t mmap_size) * "REUC" + <binary representation of M>) */ src_offset = offset; - the_hash_algo->init_fn(&c); + the_hash_algo->fn.init(&c); while (src_offset < mmap_size - the_hash_algo->rawsz - EOIE_SIZE_WITH_HEADER) { /* After an array of active_nr index entries, * there can be arbitrary number of extended @@ -3632,12 +3632,12 @@ static size_t read_eoie_extension(const char *mmap, size_t mmap_size) if (src_offset + 8 + extsize < src_offset) return 0; - the_hash_algo->update_fn(&c, mmap + src_offset, 8); + the_hash_algo->fn.update(&c, mmap + src_offset, 8); src_offset += 8; src_offset += extsize; } - the_hash_algo->final_fn(hash, &c); + the_hash_algo->fn.final(hash, &c); if (!hasheq(hash, (const unsigned char *)index, the_repository->hash_algo)) return 0; @@ -3658,7 +3658,7 @@ static void write_eoie_extension(struct strbuf *sb, git_hash_ctx *eoie_context, strbuf_add(sb, &buffer, sizeof(uint32_t)); /* hash */ - the_hash_algo->final_fn(hash, eoie_context); + the_hash_algo->fn.final(hash, eoie_context); strbuf_add(sb, hash, the_hash_algo->rawsz); } diff --git a/rerere.c b/rerere.c index d01e98bf65..ef3b58b5c7 100644 --- a/rerere.c +++ b/rerere.c @@ -395,10 +395,10 @@ static int handle_conflict(struct strbuf *out, struct rerere_io *io, strbuf_addbuf(out, &two); rerere_strbuf_putconflict(out, '>', marker_size); if (ctx) { - the_hash_algo->update_fn(ctx, one.buf ? + the_hash_algo->fn.update(ctx, one.buf ? one.buf : "", one.len + 1); - the_hash_algo->update_fn(ctx, two.buf ? + the_hash_algo->fn.update(ctx, two.buf ? two.buf : "", two.len + 1); } @@ -435,7 +435,7 @@ static int handle_path(unsigned char *hash, struct rerere_io *io, int marker_siz struct strbuf buf = STRBUF_INIT, out = STRBUF_INIT; int has_conflicts = 0; if (hash) - the_hash_algo->init_fn(&ctx); + the_hash_algo->fn.init(&ctx); while (!io->getline(&buf, io)) { if (is_cmarker(buf.buf, '<', marker_size)) { @@ -452,7 +452,7 @@ static int handle_path(unsigned char *hash, struct rerere_io *io, int marker_siz strbuf_release(&out); if (hash) - the_hash_algo->final_fn(hash, &ctx); + the_hash_algo->fn.final(hash, &ctx); return has_conflicts; } diff --git a/t/helper/test-hash-speed.c b/t/helper/test-hash-speed.c index 7de822af51..1fd8cdac79 100644 --- a/t/helper/test-hash-speed.c +++ b/t/helper/test-hash-speed.c @@ -5,9 +5,9 @@ static inline void compute_hash(const struct git_hash_algo *algo, git_hash_ctx *ctx, uint8_t *final, const void *p, size_t len) { - algo->init_fn(ctx); - algo->update_fn(ctx, p, len); - algo->final_fn(final, ctx); + algo->fn.init(ctx); + algo->fn.update(ctx, p, len); + algo->fn.final(final, ctx); } int cmd__hash_speed(int ac, const char **av) diff --git a/t/helper/test-hash.c b/t/helper/test-hash.c index 45d829c908..3e740bc4fb 100644 --- a/t/helper/test-hash.c +++ b/t/helper/test-hash.c @@ -27,7 +27,7 @@ int cmd_hash_impl(int ac, const char **av, int algo) die("OOPS"); } - algop->init_fn(&ctx); + algop->fn.init(&ctx); while (1) { ssize_t sz, this_sz; @@ -46,9 +46,9 @@ int cmd_hash_impl(int ac, const char **av, int algo) } if (this_sz == 0) break; - algop->update_fn(&ctx, buffer, this_sz); + algop->fn.update(&ctx, buffer, this_sz); } - algop->final_fn(hash, &ctx); + algop->fn.final(hash, &ctx); if (binary) fwrite(hash, 1, algop->rawsz, stdout); diff --git a/t/unit-tests/t-hash.c b/t/unit-tests/t-hash.c index e62647019b..9c1c9bf70b 100644 --- a/t/unit-tests/t-hash.c +++ b/t/unit-tests/t-hash.c @@ -15,9 +15,9 @@ static void check_hash_data(const void *data, size_t data_length, unsigned char hash[GIT_MAX_HEXSZ]; const struct git_hash_algo *algop = &hash_algos[i]; - algop->init_fn(&ctx); - algop->update_fn(&ctx, data, data_length); - algop->final_fn(hash, &ctx); + algop->fn.init(&ctx); + algop->fn.update(&ctx, data, data_length); + algop->fn.final(hash, &ctx); if (!check_str(hash_to_hex_algop(hash, algop), expected_hashes[i - 1])) test_msg("result does not match with the expected for %s\n", hash_algos[i].name); diff --git a/trace2/tr2_sid.c b/trace2/tr2_sid.c index 09c4ef0d17..a7b17abe2b 100644 --- a/trace2/tr2_sid.c +++ b/trace2/tr2_sid.c @@ -45,9 +45,9 @@ static void tr2_sid_append_my_sid_component(void) if (xgethostname(hostname, sizeof(hostname))) strbuf_add(&tr2sid_buf, "Localhost", 9); else { - algo->init_fn(&ctx); - algo->update_fn(&ctx, hostname, strlen(hostname)); - algo->final_fn(hash, &ctx); + algo->fn.init(&ctx); + algo->fn.update(&ctx, hostname, strlen(hostname)); + algo->fn.final(hash, &ctx); hash_to_hex_algop_r(hex, hash, algo); strbuf_addch(&tr2sid_buf, 'H'); strbuf_add(&tr2sid_buf, hex, 8); ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH 3/6] hash.h: introduce `unsafe_hash_algo()` 2024-11-22 8:25 ` Jeff King @ 2024-11-22 20:37 ` brian m. carlson 0 siblings, 0 replies; 60+ messages in thread From: brian m. carlson @ 2024-11-22 20:37 UTC (permalink / raw) To: Jeff King; +Cc: Taylor Blau, git, Elijah Newren, Junio C Hamano [-- Attachment #1: Type: text/plain, Size: 1827 bytes --] On 2024-11-22 at 08:25:23, Jeff King wrote: > OK, here's a quick and dirty conversion that applies on top of Taylor's > patches 1 and 2 (and then patches 3 and on would need to adjust to the > new world order). > > There are quite a lot of spots that needed to be adjusted, but the most > part it was just s/->\([a-z]\)_fn/->fn.\1/. (I'll leave adjusting the > unsafe_ variants as an exercise for the reader). Yeah, I saw that there were about 50 callers of the hash functions, which on one hand isn't too many to change, but on the other hand also surprised me. I don't remember there being that many callers when I did the SHA-256 work, but it could be that I'm remembering poorly. > The naming convention was just what I made up, but one nice effect is > that it replaces "_fn" with "fn.", so the line lengths remain the same. > :) It does mean there's a function called "final", which is a keyword in > C++. I don't know if that matters to us or not (I feel like we were > trying to avoid those, but it appears as a function elsewhere, too). It doesn't. I think we had one contributor trying to make the code compile as C++ at one point, but it definitely doesn't now. A quick `git grep` show that we have lots of variables named `new` as well. (In addition, the implicit cast of `void *`, such as from `malloc`, to a pointer of any type is not allowed without a cast in C++, so that would need to be handled, too.) > So I dunno. It is a one-time pain, but I think the result harmonizes the > type system with the concept better. Yeah, I agree. It definitely looks like it could mostly be done with a Perl or Ruby script, even if slightly inconvenient. And if there is a conflict, it should be easy to fix up. -- brian m. carlson (they/them or he/him) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 3/6] hash.h: introduce `unsafe_hash_algo()` 2024-11-21 9:37 ` Jeff King 2024-11-22 0:39 ` brian m. carlson @ 2025-01-10 21:38 ` Taylor Blau 2025-01-11 2:45 ` Jeff King 1 sibling, 1 reply; 60+ messages in thread From: Taylor Blau @ 2025-01-10 21:38 UTC (permalink / raw) To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano, brian m. carlson On Thu, Nov 21, 2024 at 04:37:31AM -0500, Jeff King wrote: > If we don't care about the speed of this function, then an > implementation like: > > for (i = 0; i < GIT_HASH_NALGOS; i++) { > if (p == &hash_algos[i] || p == hash_algos[i]->unsafe) > return i; > } > return GIT_HASH_UNKNOWN; > > would work. I'm not sure if that would be measurable. I was surprised at > the number of places that hash_algo_by_ptr() is called. Many low-level > oid functions need it because we store the integer id there rather than > a direct pointer (so oidread(), oidclr(), oid_object_info_extended(), > and so on). But I'd also expect the loop above to be pretty fast. So I > dunno. Concerns about the speed aside (I agree that the for loop is likely to be very fast, and will probably get unrolled by modern compilers), this looks good to me with one small tweak. We can't use `hash_algos[i]->unsafe` directly it might be NULL, so the function as above would change the behavior of hash_algo_by_ptr() slightly when provided NULL (it would return SHA-256 instead of UNKNOWN). So I think you'd want to write the loop like: size_t i; for (i = 0; i < GIT_HASH_NALGOS) { const struct git_hash_algo *algop = &hash_algos[i]; if (p == algop || (algop->unsafe && p == algop->unsafe)) return i; } return GIT_HASH_UNKNOWN; Thanks, Taylor ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 3/6] hash.h: introduce `unsafe_hash_algo()` 2025-01-10 21:38 ` Taylor Blau @ 2025-01-11 2:45 ` Jeff King 0 siblings, 0 replies; 60+ messages in thread From: Jeff King @ 2025-01-11 2:45 UTC (permalink / raw) To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano, brian m. carlson On Fri, Jan 10, 2025 at 04:38:56PM -0500, Taylor Blau wrote: > On Thu, Nov 21, 2024 at 04:37:31AM -0500, Jeff King wrote: > > If we don't care about the speed of this function, then an > > implementation like: > > > > for (i = 0; i < GIT_HASH_NALGOS; i++) { > > if (p == &hash_algos[i] || p == hash_algos[i]->unsafe) > > return i; > > } > > return GIT_HASH_UNKNOWN; > > > > would work. I'm not sure if that would be measurable. I was surprised at > > the number of places that hash_algo_by_ptr() is called. Many low-level > > oid functions need it because we store the integer id there rather than > > a direct pointer (so oidread(), oidclr(), oid_object_info_extended(), > > and so on). But I'd also expect the loop above to be pretty fast. So I > > dunno. > > Concerns about the speed aside (I agree that the for loop is likely to > be very fast, and will probably get unrolled by modern compilers), this > looks good to me with one small tweak. It should definitely be unrolled. The big change against the existing code is that it has branches. But again, I'm not sure how much the performance matters here (I would have naively said not at all, but it does get called in lots of low-level spots). > We can't use `hash_algos[i]->unsafe` directly it might be NULL, so the > function as above would change the behavior of hash_algo_by_ptr() > slightly when provided NULL (it would return SHA-256 instead of > UNKNOWN). > > So I think you'd want to write the loop like: > > size_t i; > > for (i = 0; i < GIT_HASH_NALGOS) { > const struct git_hash_algo *algop = &hash_algos[i]; > > if (p == algop || (algop->unsafe && p == algop->unsafe)) > return i; > } > > return GIT_HASH_UNKNOWN; Yes, that works. Or maybe even just: if (!algop) return GIT_HASH_UNKNOWN; at the top to cover the special case. -Peff ^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH 4/6] csum-file.c: use unsafe_hash_algo() 2024-11-20 19:13 [PATCH 0/6] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Taylor Blau ` (2 preceding siblings ...) 2024-11-20 19:13 ` [PATCH 3/6] hash.h: introduce `unsafe_hash_algo()` Taylor Blau @ 2024-11-20 19:13 ` Taylor Blau 2024-11-20 19:13 ` [PATCH 5/6] t/helper/test-hash.c: " Taylor Blau ` (4 subsequent siblings) 8 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2024-11-20 19:13 UTC (permalink / raw) To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, brian m. carlson Instead of calling the unsafe_ hash function variants directly, make use of the shared 'algop' pointer by initializing it to: f->algop = unsafe_hash_algo(the_hash_algo); , thus making all calls use the unsafe variants directly. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- csum-file.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/csum-file.c b/csum-file.c index 107bc4c96f9..fd51f5e432e 100644 --- a/csum-file.c +++ b/csum-file.c @@ -50,7 +50,7 @@ void hashflush(struct hashfile *f) if (offset) { if (!f->skip_hash) - f->algop->unsafe_update_fn(&f->ctx, f->buffer, offset); + f->algop->update_fn(&f->ctx, f->buffer, offset); flush(f, f->buffer, offset); f->offset = 0; } @@ -73,7 +73,7 @@ int finalize_hashfile(struct hashfile *f, unsigned char *result, if (f->skip_hash) hashclr(f->buffer, f->algop); else - f->algop->unsafe_final_fn(f->buffer, &f->ctx); + f->algop->final_fn(f->buffer, &f->ctx); if (result) hashcpy(result, f->buffer, f->algop); @@ -128,7 +128,7 @@ void hashwrite(struct hashfile *f, const void *buf, unsigned int count) * f->offset is necessarily zero. */ if (!f->skip_hash) - f->algop->unsafe_update_fn(&f->ctx, buf, nr); + f->algop->update_fn(&f->ctx, buf, nr); flush(f, buf, nr); } else { /* @@ -175,8 +175,8 @@ static struct hashfile *hashfd_internal(int fd, const char *name, f->do_crc = 0; f->skip_hash = 0; - f->algop = the_hash_algo; - f->algop->unsafe_init_fn(&f->ctx); + f->algop = unsafe_hash_algo(the_hash_algo); + f->algop->init_fn(&f->ctx); f->buffer_len = buffer_len; f->buffer = xmalloc(buffer_len); @@ -210,7 +210,7 @@ void hashfile_checkpoint(struct hashfile *f, struct hashfile_checkpoint *checkpo { hashflush(f); checkpoint->offset = f->total; - f->algop->unsafe_clone_fn(&checkpoint->ctx, &f->ctx); + f->algop->clone_fn(&checkpoint->ctx, &f->ctx); } int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoint *checkpoint) @@ -221,7 +221,7 @@ int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoint *checkpoint lseek(f->fd, offset, SEEK_SET) != offset) return -1; f->total = offset; - f->algop->unsafe_clone_fn(&f->ctx, &checkpoint->ctx); + f->algop->clone_fn(&f->ctx, &checkpoint->ctx); f->offset = 0; /* hashflush() was called in checkpoint */ return 0; } @@ -242,15 +242,15 @@ int hashfile_checksum_valid(const unsigned char *data, size_t total_len) { unsigned char got[GIT_MAX_RAWSZ]; git_hash_ctx ctx; - const struct git_hash_algo *algop = the_hash_algo; + const struct git_hash_algo *algop = unsafe_hash_algo(the_hash_algo); size_t data_len = total_len - algop->rawsz; if (total_len < algop->rawsz) return 0; /* say "too short"? */ - algop->unsafe_init_fn(&ctx); - algop->unsafe_update_fn(&ctx, data, data_len); - algop->unsafe_final_fn(got, &ctx); + algop->init_fn(&ctx); + algop->update_fn(&ctx, data, data_len); + algop->final_fn(got, &ctx); return hasheq(got, data + data_len, algop); } -- 2.47.0.237.gc601277f4c4 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 5/6] t/helper/test-hash.c: use unsafe_hash_algo() 2024-11-20 19:13 [PATCH 0/6] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Taylor Blau ` (3 preceding siblings ...) 2024-11-20 19:13 ` [PATCH 4/6] csum-file.c: use unsafe_hash_algo() Taylor Blau @ 2024-11-20 19:13 ` Taylor Blau 2024-11-20 19:13 ` [PATCH 6/6] hash.h: drop unsafe_ function variants Taylor Blau ` (3 subsequent siblings) 8 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2024-11-20 19:13 UTC (permalink / raw) To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, brian m. carlson Remove a series of conditionals within the shared cmd_hash_impl() helper that powers the 'sha1' and 'sha1-unsafe' helpers. Instead, replace them with a single conditional that transforms the specified hash algorithm into its unsafe variant. Then all subsequent calls can directly use whatever function it wants to call without having to decide between the safe and unsafe variants. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- t/helper/test-hash.c | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/t/helper/test-hash.c b/t/helper/test-hash.c index d0ee668df95..aa82638c621 100644 --- a/t/helper/test-hash.c +++ b/t/helper/test-hash.c @@ -9,6 +9,8 @@ int cmd_hash_impl(int ac, const char **av, int algo, int unsafe) int binary = 0; char *buffer; const struct git_hash_algo *algop = &hash_algos[algo]; + if (unsafe) + algop = unsafe_hash_algo(algop); if (ac == 2) { if (!strcmp(av[1], "-b")) @@ -27,10 +29,7 @@ int cmd_hash_impl(int ac, const char **av, int algo, int unsafe) die("OOPS"); } - if (unsafe) - algop->unsafe_init_fn(&ctx); - else - algop->init_fn(&ctx); + algop->init_fn(&ctx); while (1) { ssize_t sz, this_sz; @@ -49,15 +48,9 @@ int cmd_hash_impl(int ac, const char **av, int algo, int unsafe) } if (this_sz == 0) break; - if (unsafe) - algop->unsafe_update_fn(&ctx, buffer, this_sz); - else - algop->update_fn(&ctx, buffer, this_sz); + algop->update_fn(&ctx, buffer, this_sz); } - if (unsafe) - algop->unsafe_final_fn(hash, &ctx); - else - algop->final_fn(hash, &ctx); + algop->final_fn(hash, &ctx); if (binary) fwrite(hash, 1, algop->rawsz, stdout); -- 2.47.0.237.gc601277f4c4 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 6/6] hash.h: drop unsafe_ function variants 2024-11-20 19:13 [PATCH 0/6] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Taylor Blau ` (4 preceding siblings ...) 2024-11-20 19:13 ` [PATCH 5/6] t/helper/test-hash.c: " Taylor Blau @ 2024-11-20 19:13 ` Taylor Blau 2024-11-21 9:41 ` Jeff King 2025-01-08 19:14 ` [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Taylor Blau ` (2 subsequent siblings) 8 siblings, 1 reply; 60+ messages in thread From: Taylor Blau @ 2024-11-20 19:13 UTC (permalink / raw) To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, brian m. carlson Now that all callers have been converted from: the_hash_algo->unsafe_init_fn(); to unsafe_hash_algo(the_hash_algo)->unsafe_init_fn(); and similar, we can remove the scaffolding for the unsafe_ function variants and force callers to use the new unsafe_hash_algo() mechanic instead. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- hash.h | 15 --------------- object-file.c | 15 --------------- 2 files changed, 30 deletions(-) diff --git a/hash.h b/hash.h index 23cf6876e50..6dcbf6ab835 100644 --- a/hash.h +++ b/hash.h @@ -282,21 +282,6 @@ struct git_hash_algo { /* The hash finalization function for object IDs. */ git_hash_final_oid_fn final_oid_fn; - /* The non-cryptographic hash initialization function. */ - git_hash_init_fn unsafe_init_fn; - - /* The non-cryptographic hash context cloning function. */ - git_hash_clone_fn unsafe_clone_fn; - - /* The non-cryptographic hash update function. */ - git_hash_update_fn unsafe_update_fn; - - /* The non-cryptographic hash finalization function. */ - git_hash_final_fn unsafe_final_fn; - - /* The non-cryptographic hash finalization function. */ - git_hash_final_oid_fn unsafe_final_oid_fn; - /* The OID of the empty tree. */ const struct object_id *empty_tree; diff --git a/object-file.c b/object-file.c index fddcdbe9ba6..1040a5408f2 100644 --- a/object-file.c +++ b/object-file.c @@ -232,11 +232,6 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { .update_fn = git_hash_unknown_update, .final_fn = git_hash_unknown_final, .final_oid_fn = git_hash_unknown_final_oid, - .unsafe_init_fn = git_hash_unknown_init, - .unsafe_clone_fn = git_hash_unknown_clone, - .unsafe_update_fn = git_hash_unknown_update, - .unsafe_final_fn = git_hash_unknown_final, - .unsafe_final_oid_fn = git_hash_unknown_final_oid, .empty_tree = NULL, .empty_blob = NULL, .null_oid = NULL, @@ -252,11 +247,6 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { .update_fn = git_hash_sha1_update, .final_fn = git_hash_sha1_final, .final_oid_fn = git_hash_sha1_final_oid, - .unsafe_init_fn = git_hash_sha1_init_unsafe, - .unsafe_clone_fn = git_hash_sha1_clone_unsafe, - .unsafe_update_fn = git_hash_sha1_update_unsafe, - .unsafe_final_fn = git_hash_sha1_final_unsafe, - .unsafe_final_oid_fn = git_hash_sha1_final_oid_unsafe, .unsafe = &sha1_unsafe_algo, .empty_tree = &empty_tree_oid, .empty_blob = &empty_blob_oid, @@ -273,11 +263,6 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { .update_fn = git_hash_sha256_update, .final_fn = git_hash_sha256_final, .final_oid_fn = git_hash_sha256_final_oid, - .unsafe_init_fn = git_hash_sha256_init, - .unsafe_clone_fn = git_hash_sha256_clone, - .unsafe_update_fn = git_hash_sha256_update, - .unsafe_final_fn = git_hash_sha256_final, - .unsafe_final_oid_fn = git_hash_sha256_final_oid, .empty_tree = &empty_tree_oid_sha256, .empty_blob = &empty_blob_oid_sha256, .null_oid = &null_oid_sha256, -- 2.47.0.237.gc601277f4c4 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH 6/6] hash.h: drop unsafe_ function variants 2024-11-20 19:13 ` [PATCH 6/6] hash.h: drop unsafe_ function variants Taylor Blau @ 2024-11-21 9:41 ` Jeff King 0 siblings, 0 replies; 60+ messages in thread From: Jeff King @ 2024-11-21 9:41 UTC (permalink / raw) To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano, brian m. carlson On Wed, Nov 20, 2024 at 02:13:59PM -0500, Taylor Blau wrote: > Now that all callers have been converted from: > > the_hash_algo->unsafe_init_fn(); > > to > > unsafe_hash_algo(the_hash_algo)->unsafe_init_fn(); > > and similar, we can remove the scaffolding for the unsafe_ function > variants and force callers to use the new unsafe_hash_algo() mechanic > instead. Nice. Especially for sha256, which does not even need to care about this unsafe thing at all (so in 2099, when we finally remove sha1 support, this whole system can go away!). I think this also opens up alternatives for how we conditionally compile things. E.g., if you have no *_SHA1_UNSAFE macro defined, we could avoid defining sha1_unsafe_algo at all, and just leave it as NULL. I can't think of a significant enough advantage to merit the work in converting to that, though, so it's probably not worth doing unless we later decide it would make things simpler for some reason. -Peff ^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants 2024-11-20 19:13 [PATCH 0/6] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Taylor Blau ` (5 preceding siblings ...) 2024-11-20 19:13 ` [PATCH 6/6] hash.h: drop unsafe_ function variants Taylor Blau @ 2025-01-08 19:14 ` Taylor Blau 2025-01-08 19:14 ` [PATCH v2 1/8] t/helper/test-tool: implement sha1-unsafe helper Taylor Blau ` (9 more replies) 2025-01-17 22:03 ` [PATCH v3 " Taylor Blau 2025-01-23 17:34 ` [PATCH v4 " Taylor Blau 8 siblings, 10 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-08 19:14 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt (This series is rebased on 'master', which is 14650065b7 (RelNotes/2.48.0: fix typos etc., 2025-01-07) at the time of writing). The bulk of this series is unchanged since last time, but a new seventh patch that further hardens the hashfile_checkpoint callers on top of Patrick's recent series[1]. The original cover letter is as follows: ------------ This series implements an idea discussed in [2] which suggests that we introduce a way to access a wrapped version of a 'struct git_hash_algo' which represents the unsafe variant of that algorithm, rather than having individual unsafe_ functions (like unsafe_init_fn() versus init_fn(), etc.). This approach is relatively straightforward to implement, and removes a significant deficiency in the original implementation of unsafe/non-cryptographic hash functions by making it impossible to switch between safe- and unsafe variants of hash functions. It also cleans up the sha1-unsafe test helper's implementation by removing a large number of "if (unsafe)"-style conditionals. The series is laid out as follows: * The first two patches prepare the hashfile API for the upcoming change: csum-file: store the hash algorithm as a struct field csum-file.c: extract algop from hashfile_checksum_valid() * The next patch implements the new 'unsafe_hash_algo()' function at the heart of this series' approach: hash.h: introduce `unsafe_hash_algo()` * The next two patches convert existing callers to use the new 'unsafe_hash_algo()' function, instead of switching between safe and unsafe_ variants of individual functions: csum-file.c: use unsafe_hash_algo() t/helper/test-hash.c: use unsafe_hash_algo() * The final patch drops the unsafe_ function variants following all callers being converted to use the new pattern: hash.h: drop unsafe_ function variants Thanks in advance for your review! [1]: https://lore.kernel.org/git/20241230-pks-meson-sha1-unsafe-v1-0-efb276e171f5@pks.im/ [2]: https://lore.kernel.org/git/20241107013915.GA961214@coredump.intra.peff.net/ Taylor Blau (8): t/helper/test-tool: implement sha1-unsafe helper csum-file: store the hash algorithm as a struct field csum-file.c: extract algop from hashfile_checksum_valid() hash.h: introduce `unsafe_hash_algo()` csum-file.c: use unsafe_hash_algo() t/helper/test-hash.c: use unsafe_hash_algo() csum-file: introduce hashfile_checkpoint_init() hash.h: drop unsafe_ function variants builtin/fast-import.c | 2 +- bulk-checkin.c | 9 ++++++--- csum-file.c | 40 +++++++++++++++++++++++++--------------- csum-file.h | 2 ++ hash.h | 20 +++++--------------- object-file.c | 41 ++++++++++++++++++++++++++--------------- t/helper/test-hash.c | 4 +++- t/helper/test-sha1.c | 7 ++++++- t/helper/test-sha1.sh | 38 ++++++++++++++++++++++---------------- t/helper/test-sha256.c | 2 +- t/helper/test-tool.c | 1 + t/helper/test-tool.h | 3 ++- 12 files changed, 100 insertions(+), 69 deletions(-) Range-diff against v1: 2: d8c1fc78b57 ! 1: 4c1523a04f1 t/helper/test-tool: implement sha1-unsafe helper @@ Metadata ## Commit message ## t/helper/test-tool: implement sha1-unsafe helper - Add a new helper similar to 't/helper/test-tool sha1' called instead - "sha1-unsafe" which uses the unsafe variant of Git's SHA-1 wrappers. + With the new "unsafe" SHA-1 build knob, it is convenient to have a + test-tool that can exercise Git's unsafe SHA-1 wrappers for testing, + similar to 't/helper/test-tool sha1'. - While we're at it, modify the test-sha1.sh script to exercise both - the sha1 and sha1-unsafe test tools to ensure that both produce the - expected hash values. + Implement that helper by altering the implementation of that test-tool + (in cmd_hash_impl(), which is generic and parameterized over different + hash functions) to conditionally run the unsafe variants of the chosen + hash function, and expose the new behavior via a new 'sha1-unsafe' test + helper. Signed-off-by: Taylor Blau <me@ttaylorr.com> + ## t/helper/test-hash.c ## +@@ + #include "test-tool.h" + #include "hex.h" + +-int cmd_hash_impl(int ac, const char **av, int algo) ++int cmd_hash_impl(int ac, const char **av, int algo, int unsafe) + { + git_hash_ctx ctx; + unsigned char hash[GIT_MAX_HEXSZ]; +@@ t/helper/test-hash.c: int cmd_hash_impl(int ac, const char **av, int algo) + die("OOPS"); + } + +- algop->init_fn(&ctx); ++ if (unsafe) ++ algop->unsafe_init_fn(&ctx); ++ else ++ algop->init_fn(&ctx); + + while (1) { + ssize_t sz, this_sz; +@@ t/helper/test-hash.c: int cmd_hash_impl(int ac, const char **av, int algo) + } + if (this_sz == 0) + break; +- algop->update_fn(&ctx, buffer, this_sz); ++ if (unsafe) ++ algop->unsafe_update_fn(&ctx, buffer, this_sz); ++ else ++ algop->update_fn(&ctx, buffer, this_sz); + } +- algop->final_fn(hash, &ctx); ++ if (unsafe) ++ algop->unsafe_final_fn(hash, &ctx); ++ else ++ algop->final_fn(hash, &ctx); + + if (binary) + fwrite(hash, 1, algop->rawsz, stdout); + ## t/helper/test-sha1.c ## +@@ + + int cmd__sha1(int ac, const char **av) + { +- return cmd_hash_impl(ac, av, GIT_HASH_SHA1); ++ return cmd_hash_impl(ac, av, GIT_HASH_SHA1, 0); + } + + int cmd__sha1_is_sha1dc(int argc UNUSED, const char **argv UNUSED) @@ t/helper/test-sha1.c: int cmd__sha1_is_sha1dc(int argc UNUSED, const char **argv UNUSED) #endif return 1; @@ t/helper/test-sha1.sh da39a3ee5e6b4b0d3255bfef95601890afd80709 0 3f786850e387550fdab836ed7e6dc881de23001b 0 a + ## t/helper/test-sha256.c ## +@@ + + int cmd__sha256(int ac, const char **av) + { +- return cmd_hash_impl(ac, av, GIT_HASH_SHA256); ++ return cmd_hash_impl(ac, av, GIT_HASH_SHA256, 0); + } + ## t/helper/test-tool.c ## @@ t/helper/test-tool.c: static struct test_cmd cmds[] = { { "serve-v2", cmd__serve_v2 }, @@ t/helper/test-tool.h: int cmd__scrap_cache_tree(int argc, const char **argv); int cmd__sha256(int argc, const char **argv); int cmd__sigchain(int argc, const char **argv); int cmd__simple_ipc(int argc, const char **argv); +@@ t/helper/test-tool.h: int cmd__windows_named_pipe(int argc, const char **argv); + #endif + int cmd__write_cache(int argc, const char **argv); + +-int cmd_hash_impl(int ac, const char **av, int algo); ++int cmd_hash_impl(int ac, const char **av, int algo, int unsafe); + + #endif 3: 380133a1142 = 2: 99cc44895b5 csum-file: store the hash algorithm as a struct field 4: e5076f003bf = 3: 1ffab2f8289 csum-file.c: extract algop from hashfile_checksum_valid() 5: 17f92dba34b = 4: 99dcbe2e716 hash.h: introduce `unsafe_hash_algo()` 6: 0d8e12599e2 = 5: 2dcc2aa6803 csum-file.c: use unsafe_hash_algo() 7: a49a41703e2 = 6: a2b9ef03080 t/helper/test-hash.c: use unsafe_hash_algo() 1: 0e2fcee6894 ! 7: 94c07fd8a55 t/helper/test-sha1: prepare for an unsafe mode @@ Metadata Author: Taylor Blau <me@ttaylorr.com> ## Commit message ## - t/helper/test-sha1: prepare for an unsafe mode + csum-file: introduce hashfile_checkpoint_init() - With the new "unsafe" SHA-1 build knob, it would be convenient to have - a test-tool that can exercise Git's unsafe SHA-1 wrappers for testing, - similar to 't/helper/test-tool sha1'. + In 106140a99f (builtin/fast-import: fix segfault with unsafe SHA1 + backend, 2024-12-30) and 9218c0bfe1 (bulk-checkin: fix segfault with + unsafe SHA1 backend, 2024-12-30), we observed the effects of failing to + initialize a hashfile_checkpoint with the same hash function + implementation as is used by the hashfile it is used to checkpoint. - Prepare for such a helper by altering the implementation of that - test-tool (in cmd_hash_impl(), which is generic and parameterized over - different hash functions) to conditionally run the unsafe variants of - the chosen hash function. + While both 106140a99f and 9218c0bfe1 work around the immediate crash, + changing the hash function implementation within the hashfile API to, + for example, the non-unsafe variant would re-introduce the crash. This + is a result of the tight coupling between initializing hashfiles and + hashfile_checkpoints. - The following commit will add a new test-tool which makes use of this - new parameter. + Introduce and use a new function which ensures that both parts of a + hashfile and hashfile_checkpoint pair use the same hash function + implementation to avoid such crashes. + A few things worth noting: + + - In the change to builtin/fast-import.c::stream_blob(), we can see + that by removing the explicit reference to + 'the_hash_algo->unsafe_init_fn()', we are hardened against the + hashfile API changing away from the_hash_algo (or its unsafe + variant) in the future. + + - The bulk-checkin code no longer needs to explicitly zero-initialize + the hashfile_checkpoint, since it is now done as a result of calling + 'hashfile_checkpoint_init()'. + + - Also in the bulk-checkin code, we add an additional call to + prepare_to_stream() outside of the main loop in order to initialize + 'state->f' so we know which hash function implementation to use when + calling 'hashfile_checkpoint_init()'. + + This is OK, since subsequent 'prepare_to_stream()' calls are noops. + However, we only need to call 'prepare_to_stream()' when we have the + HASH_WRITE_OBJECT bit set in our flags. Without that bit, calling + 'prepare_to_stream()' does not assign 'state->f', so we have nothing + to initialize. + + - Other uses of the 'checkpoint' in 'deflate_blob_to_pack()' are + appropriately guarded. + + Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com> - ## t/helper/test-hash.c ## -@@ - #include "test-tool.h" - #include "hex.h" + ## builtin/fast-import.c ## +@@ builtin/fast-import.c: static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark) + || (pack_size + PACK_SIZE_THRESHOLD + len) < pack_size) + cycle_packfile(); --int cmd_hash_impl(int ac, const char **av, int algo) -+int cmd_hash_impl(int ac, const char **av, int algo, int unsafe) - { +- the_hash_algo->unsafe_init_fn(&checkpoint.ctx); ++ hashfile_checkpoint_init(pack_file, &checkpoint); + hashfile_checkpoint(pack_file, &checkpoint); + offset = checkpoint.offset; + + + ## bulk-checkin.c ## +@@ bulk-checkin.c: static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, git_hash_ctx ctx; - unsigned char hash[GIT_MAX_HEXSZ]; -@@ t/helper/test-hash.c: int cmd_hash_impl(int ac, const char **av, int algo) - die("OOPS"); - } + unsigned char obuf[16384]; + unsigned header_len; +- struct hashfile_checkpoint checkpoint = {0}; ++ struct hashfile_checkpoint checkpoint; + struct pack_idx_entry *idx = NULL; -- algop->init_fn(&ctx); -+ if (unsafe) -+ algop->unsafe_init_fn(&ctx); -+ else -+ algop->init_fn(&ctx); + seekback = lseek(fd, 0, SEEK_CUR); +@@ bulk-checkin.c: static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, + OBJ_BLOB, size); + the_hash_algo->init_fn(&ctx); + the_hash_algo->update_fn(&ctx, obuf, header_len); +- the_hash_algo->unsafe_init_fn(&checkpoint.ctx); + + /* Note: idx is non-NULL when we are writing */ +- if ((flags & HASH_WRITE_OBJECT) != 0) ++ if ((flags & HASH_WRITE_OBJECT) != 0) { + CALLOC_ARRAY(idx, 1); + ++ prepare_to_stream(state, flags); ++ hashfile_checkpoint_init(state->f, &checkpoint); ++ } ++ + already_hashed_to = 0; while (1) { - ssize_t sz, this_sz; -@@ t/helper/test-hash.c: int cmd_hash_impl(int ac, const char **av, int algo) - } - if (this_sz == 0) - break; -- algop->update_fn(&ctx, buffer, this_sz); -+ if (unsafe) -+ algop->unsafe_update_fn(&ctx, buffer, this_sz); -+ else -+ algop->update_fn(&ctx, buffer, this_sz); - } -- algop->final_fn(hash, &ctx); -+ if (unsafe) -+ algop->unsafe_final_fn(hash, &ctx); -+ else -+ algop->final_fn(hash, &ctx); - - if (binary) - fwrite(hash, 1, algop->rawsz, stdout); - ## t/helper/test-sha1.c ## -@@ - - int cmd__sha1(int ac, const char **av) - { -- return cmd_hash_impl(ac, av, GIT_HASH_SHA1); -+ return cmd_hash_impl(ac, av, GIT_HASH_SHA1, 0); + ## csum-file.c ## +@@ csum-file.c: struct hashfile *hashfd_throughput(int fd, const char *name, struct progress *tp + return hashfd_internal(fd, name, tp, 8 * 1024); } - int cmd__sha1_is_sha1dc(int argc UNUSED, const char **argv UNUSED) - - ## t/helper/test-sha256.c ## -@@ - - int cmd__sha256(int ac, const char **av) ++void hashfile_checkpoint_init(struct hashfile *f, ++ struct hashfile_checkpoint *checkpoint) ++{ ++ memset(checkpoint, 0, sizeof(*checkpoint)); ++ f->algop->init_fn(&checkpoint->ctx); ++} ++ + void hashfile_checkpoint(struct hashfile *f, struct hashfile_checkpoint *checkpoint) { -- return cmd_hash_impl(ac, av, GIT_HASH_SHA256); -+ return cmd_hash_impl(ac, av, GIT_HASH_SHA256, 0); - } + hashflush(f); - ## t/helper/test-tool.h ## -@@ t/helper/test-tool.h: int cmd__windows_named_pipe(int argc, const char **argv); - #endif - int cmd__write_cache(int argc, const char **argv); + ## csum-file.h ## +@@ csum-file.h: struct hashfile_checkpoint { + git_hash_ctx ctx; + }; --int cmd_hash_impl(int ac, const char **av, int algo); -+int cmd_hash_impl(int ac, const char **av, int algo, int unsafe); ++void hashfile_checkpoint_init(struct hashfile *, struct hashfile_checkpoint *); + void hashfile_checkpoint(struct hashfile *, struct hashfile_checkpoint *); + int hashfile_truncate(struct hashfile *, struct hashfile_checkpoint *); - #endif 8: 4081ad08549 = 8: f5579883816 hash.h: drop unsafe_ function variants base-commit: 14650065b76b28d3cfa9453356ac5669b19e706e -- 2.48.0.rc2.33.gaab3d23ed4c ^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v2 1/8] t/helper/test-tool: implement sha1-unsafe helper 2025-01-08 19:14 ` [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Taylor Blau @ 2025-01-08 19:14 ` Taylor Blau 2025-01-08 19:14 ` [PATCH v2 2/8] csum-file: store the hash algorithm as a struct field Taylor Blau ` (8 subsequent siblings) 9 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-08 19:14 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt With the new "unsafe" SHA-1 build knob, it is convenient to have a test-tool that can exercise Git's unsafe SHA-1 wrappers for testing, similar to 't/helper/test-tool sha1'. Implement that helper by altering the implementation of that test-tool (in cmd_hash_impl(), which is generic and parameterized over different hash functions) to conditionally run the unsafe variants of the chosen hash function, and expose the new behavior via a new 'sha1-unsafe' test helper. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- t/helper/test-hash.c | 17 +++++++++++++---- t/helper/test-sha1.c | 7 ++++++- t/helper/test-sha1.sh | 38 ++++++++++++++++++++++---------------- t/helper/test-sha256.c | 2 +- t/helper/test-tool.c | 1 + t/helper/test-tool.h | 3 ++- 6 files changed, 45 insertions(+), 23 deletions(-) diff --git a/t/helper/test-hash.c b/t/helper/test-hash.c index 45d829c908f..d0ee668df95 100644 --- a/t/helper/test-hash.c +++ b/t/helper/test-hash.c @@ -1,7 +1,7 @@ #include "test-tool.h" #include "hex.h" -int cmd_hash_impl(int ac, const char **av, int algo) +int cmd_hash_impl(int ac, const char **av, int algo, int unsafe) { git_hash_ctx ctx; unsigned char hash[GIT_MAX_HEXSZ]; @@ -27,7 +27,10 @@ int cmd_hash_impl(int ac, const char **av, int algo) die("OOPS"); } - algop->init_fn(&ctx); + if (unsafe) + algop->unsafe_init_fn(&ctx); + else + algop->init_fn(&ctx); while (1) { ssize_t sz, this_sz; @@ -46,9 +49,15 @@ int cmd_hash_impl(int ac, const char **av, int algo) } if (this_sz == 0) break; - algop->update_fn(&ctx, buffer, this_sz); + if (unsafe) + algop->unsafe_update_fn(&ctx, buffer, this_sz); + else + algop->update_fn(&ctx, buffer, this_sz); } - algop->final_fn(hash, &ctx); + if (unsafe) + algop->unsafe_final_fn(hash, &ctx); + else + algop->final_fn(hash, &ctx); if (binary) fwrite(hash, 1, algop->rawsz, stdout); diff --git a/t/helper/test-sha1.c b/t/helper/test-sha1.c index e60d000c039..349540c4df8 100644 --- a/t/helper/test-sha1.c +++ b/t/helper/test-sha1.c @@ -3,7 +3,7 @@ int cmd__sha1(int ac, const char **av) { - return cmd_hash_impl(ac, av, GIT_HASH_SHA1); + return cmd_hash_impl(ac, av, GIT_HASH_SHA1, 0); } int cmd__sha1_is_sha1dc(int argc UNUSED, const char **argv UNUSED) @@ -13,3 +13,8 @@ int cmd__sha1_is_sha1dc(int argc UNUSED, const char **argv UNUSED) #endif return 1; } + +int cmd__sha1_unsafe(int ac, const char **av) +{ + return cmd_hash_impl(ac, av, GIT_HASH_SHA1, 1); +} diff --git a/t/helper/test-sha1.sh b/t/helper/test-sha1.sh index 84594885c70..bf387d3db14 100755 --- a/t/helper/test-sha1.sh +++ b/t/helper/test-sha1.sh @@ -3,25 +3,31 @@ dd if=/dev/zero bs=1048576 count=100 2>/dev/null | /usr/bin/time t/helper/test-tool sha1 >/dev/null +dd if=/dev/zero bs=1048576 count=100 2>/dev/null | +/usr/bin/time t/helper/test-tool sha1-unsafe >/dev/null + while read expect cnt pfx do case "$expect" in '#'*) continue ;; esac - actual=$( - { - test -z "$pfx" || echo "$pfx" - dd if=/dev/zero bs=1048576 count=$cnt 2>/dev/null | - perl -pe 'y/\000/g/' - } | ./t/helper/test-tool sha1 $cnt - ) - if test "$expect" = "$actual" - then - echo "OK: $expect $cnt $pfx" - else - echo >&2 "OOPS: $cnt" - echo >&2 "expect: $expect" - echo >&2 "actual: $actual" - exit 1 - fi + for sha1 in sha1 sha1-unsafe + do + actual=$( + { + test -z "$pfx" || echo "$pfx" + dd if=/dev/zero bs=1048576 count=$cnt 2>/dev/null | + perl -pe 'y/\000/g/' + } | ./t/helper/test-tool $sha1 $cnt + ) + if test "$expect" = "$actual" + then + echo "OK ($sha1): $expect $cnt $pfx" + else + echo >&2 "OOPS ($sha1): $cnt" + echo >&2 "expect ($sha1): $expect" + echo >&2 "actual ($sha1): $actual" + exit 1 + fi + done done <<EOF da39a3ee5e6b4b0d3255bfef95601890afd80709 0 3f786850e387550fdab836ed7e6dc881de23001b 0 a diff --git a/t/helper/test-sha256.c b/t/helper/test-sha256.c index 2fb20438f3c..7fd0aa1fcd3 100644 --- a/t/helper/test-sha256.c +++ b/t/helper/test-sha256.c @@ -3,5 +3,5 @@ int cmd__sha256(int ac, const char **av) { - return cmd_hash_impl(ac, av, GIT_HASH_SHA256); + return cmd_hash_impl(ac, av, GIT_HASH_SHA256, 0); } diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 4a7aa993ba9..958452ef12e 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -70,6 +70,7 @@ static struct test_cmd cmds[] = { { "serve-v2", cmd__serve_v2 }, { "sha1", cmd__sha1 }, { "sha1-is-sha1dc", cmd__sha1_is_sha1dc }, + { "sha1-unsafe", cmd__sha1_unsafe }, { "sha256", cmd__sha256 }, { "sigchain", cmd__sigchain }, { "simple-ipc", cmd__simple_ipc }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index 21802ac27da..24149edd414 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -63,6 +63,7 @@ int cmd__scrap_cache_tree(int argc, const char **argv); int cmd__serve_v2(int argc, const char **argv); int cmd__sha1(int argc, const char **argv); int cmd__sha1_is_sha1dc(int argc, const char **argv); +int cmd__sha1_unsafe(int argc, const char **argv); int cmd__sha256(int argc, const char **argv); int cmd__sigchain(int argc, const char **argv); int cmd__simple_ipc(int argc, const char **argv); @@ -81,6 +82,6 @@ int cmd__windows_named_pipe(int argc, const char **argv); #endif int cmd__write_cache(int argc, const char **argv); -int cmd_hash_impl(int ac, const char **av, int algo); +int cmd_hash_impl(int ac, const char **av, int algo, int unsafe); #endif -- 2.48.0.rc2.33.gaab3d23ed4c ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v2 2/8] csum-file: store the hash algorithm as a struct field 2025-01-08 19:14 ` [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Taylor Blau 2025-01-08 19:14 ` [PATCH v2 1/8] t/helper/test-tool: implement sha1-unsafe helper Taylor Blau @ 2025-01-08 19:14 ` Taylor Blau 2025-01-16 11:48 ` Patrick Steinhardt 2025-01-08 19:14 ` [PATCH v2 3/8] csum-file.c: extract algop from hashfile_checksum_valid() Taylor Blau ` (7 subsequent siblings) 9 siblings, 1 reply; 60+ messages in thread From: Taylor Blau @ 2025-01-08 19:14 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt Throughout the hashfile API, we rely on a reference to 'the_hash_algo', and call its _usnafe function variants directly. Prepare for a future change where we may use a different 'git_hash_algo' pointer (instead of just relying on 'the_hash_algo' throughout) by making the 'git_hash_algo' pointer a member of the 'hashfile' structure itself. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- csum-file.c | 20 +++++++++++--------- csum-file.h | 1 + 2 files changed, 12 insertions(+), 9 deletions(-) diff --git a/csum-file.c b/csum-file.c index 5716016e12e..b28cd047e3f 100644 --- a/csum-file.c +++ b/csum-file.c @@ -50,7 +50,7 @@ void hashflush(struct hashfile *f) if (offset) { if (!f->skip_hash) - the_hash_algo->unsafe_update_fn(&f->ctx, f->buffer, offset); + f->algop->unsafe_update_fn(&f->ctx, f->buffer, offset); flush(f, f->buffer, offset); f->offset = 0; } @@ -71,14 +71,14 @@ int finalize_hashfile(struct hashfile *f, unsigned char *result, hashflush(f); if (f->skip_hash) - hashclr(f->buffer, the_repository->hash_algo); + hashclr(f->buffer, f->algop); else - the_hash_algo->unsafe_final_fn(f->buffer, &f->ctx); + f->algop->unsafe_final_fn(f->buffer, &f->ctx); if (result) - hashcpy(result, f->buffer, the_repository->hash_algo); + hashcpy(result, f->buffer, f->algop); if (flags & CSUM_HASH_IN_STREAM) - flush(f, f->buffer, the_hash_algo->rawsz); + flush(f, f->buffer, f->algop->rawsz); if (flags & CSUM_FSYNC) fsync_component_or_die(component, f->fd, f->name); if (flags & CSUM_CLOSE) { @@ -128,7 +128,7 @@ void hashwrite(struct hashfile *f, const void *buf, unsigned int count) * f->offset is necessarily zero. */ if (!f->skip_hash) - the_hash_algo->unsafe_update_fn(&f->ctx, buf, nr); + f->algop->unsafe_update_fn(&f->ctx, buf, nr); flush(f, buf, nr); } else { /* @@ -174,7 +174,9 @@ static struct hashfile *hashfd_internal(int fd, const char *name, f->name = name; f->do_crc = 0; f->skip_hash = 0; - the_hash_algo->unsafe_init_fn(&f->ctx); + + f->algop = the_hash_algo; + f->algop->unsafe_init_fn(&f->ctx); f->buffer_len = buffer_len; f->buffer = xmalloc(buffer_len); @@ -208,7 +210,7 @@ void hashfile_checkpoint(struct hashfile *f, struct hashfile_checkpoint *checkpo { hashflush(f); checkpoint->offset = f->total; - the_hash_algo->unsafe_clone_fn(&checkpoint->ctx, &f->ctx); + f->algop->unsafe_clone_fn(&checkpoint->ctx, &f->ctx); } int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoint *checkpoint) @@ -219,7 +221,7 @@ int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoint *checkpoint lseek(f->fd, offset, SEEK_SET) != offset) return -1; f->total = offset; - the_hash_algo->unsafe_clone_fn(&f->ctx, &checkpoint->ctx); + f->algop->unsafe_clone_fn(&f->ctx, &checkpoint->ctx); f->offset = 0; /* hashflush() was called in checkpoint */ return 0; } diff --git a/csum-file.h b/csum-file.h index 7c73da0a40a..2b45f4673a2 100644 --- a/csum-file.h +++ b/csum-file.h @@ -20,6 +20,7 @@ struct hashfile { size_t buffer_len; unsigned char *buffer; unsigned char *check_buffer; + const struct git_hash_algo *algop; /** * If non-zero, skip_hash indicates that we should -- 2.48.0.rc2.33.gaab3d23ed4c ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH v2 2/8] csum-file: store the hash algorithm as a struct field 2025-01-08 19:14 ` [PATCH v2 2/8] csum-file: store the hash algorithm as a struct field Taylor Blau @ 2025-01-16 11:48 ` Patrick Steinhardt 2025-01-17 21:17 ` Taylor Blau 0 siblings, 1 reply; 60+ messages in thread From: Patrick Steinhardt @ 2025-01-16 11:48 UTC (permalink / raw) To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano, Elijah Newren On Wed, Jan 08, 2025 at 02:14:35PM -0500, Taylor Blau wrote: > Throughout the hashfile API, we rely on a reference to 'the_hash_algo', > and call its _usnafe function variants directly. s/usnafe/unsafe/ > Prepare for a future change where we may use a different 'git_hash_algo' > pointer (instead of just relying on 'the_hash_algo' throughout) by > making the 'git_hash_algo' pointer a member of the 'hashfile' structure > itself. Makes sense, and it's also a good step for libification. I wonder: does it mean that we can also get rid of `USE_THE_REPOSITORY_VARIABLE`, or do we still depend on it in this file? The answer is yes, as we only reduce the sites where we use `the_hash_algo`, but don't remove it altogether. That would require the caller to provide the hash algo to us. Patrick ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 2/8] csum-file: store the hash algorithm as a struct field 2025-01-16 11:48 ` Patrick Steinhardt @ 2025-01-17 21:17 ` Taylor Blau 0 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-17 21:17 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano, Elijah Newren On Thu, Jan 16, 2025 at 12:48:58PM +0100, Patrick Steinhardt wrote: > On Wed, Jan 08, 2025 at 02:14:35PM -0500, Taylor Blau wrote: > > Throughout the hashfile API, we rely on a reference to 'the_hash_algo', > > and call its _usnafe function variants directly. > > s/usnafe/unsafe/ Oops. Thanks for spotting! > > Prepare for a future change where we may use a different 'git_hash_algo' > > pointer (instead of just relying on 'the_hash_algo' throughout) by > > making the 'git_hash_algo' pointer a member of the 'hashfile' structure > > itself. > > Makes sense, and it's also a good step for libification. I wonder: does > it mean that we can also get rid of `USE_THE_REPOSITORY_VARIABLE`, or do > we still depend on it in this file? The answer is yes, as we only reduce > the sites where we use `the_hash_algo`, but don't remove it altogether. > That would require the caller to provide the hash algo to us. Yeah, I agree that we're not quite there yet, but likewise that this patch is a step in that direction. Thanks, Taylor ^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v2 3/8] csum-file.c: extract algop from hashfile_checksum_valid() 2025-01-08 19:14 ` [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Taylor Blau 2025-01-08 19:14 ` [PATCH v2 1/8] t/helper/test-tool: implement sha1-unsafe helper Taylor Blau 2025-01-08 19:14 ` [PATCH v2 2/8] csum-file: store the hash algorithm as a struct field Taylor Blau @ 2025-01-08 19:14 ` Taylor Blau 2025-01-08 19:14 ` [PATCH v2 4/8] hash.h: introduce `unsafe_hash_algo()` Taylor Blau ` (6 subsequent siblings) 9 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-08 19:14 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt Perform a similar transformation as in the previous commit, but focused instead on hashfile_checksum_valid(). This function does not work with a hashfile structure itself, and instead validates the raw contents of a file written using the hashfile API. We'll want to be prepared for a similar change to this function in the future, so prepare ourselves for that by extracting 'the_hash_algo' into its own field for use within this function. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- csum-file.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/csum-file.c b/csum-file.c index b28cd047e3f..7a71121e340 100644 --- a/csum-file.c +++ b/csum-file.c @@ -242,14 +242,15 @@ int hashfile_checksum_valid(const unsigned char *data, size_t total_len) { unsigned char got[GIT_MAX_RAWSZ]; git_hash_ctx ctx; - size_t data_len = total_len - the_hash_algo->rawsz; + const struct git_hash_algo *algop = the_hash_algo; + size_t data_len = total_len - algop->rawsz; - if (total_len < the_hash_algo->rawsz) + if (total_len < algop->rawsz) return 0; /* say "too short"? */ - the_hash_algo->unsafe_init_fn(&ctx); - the_hash_algo->unsafe_update_fn(&ctx, data, data_len); - the_hash_algo->unsafe_final_fn(got, &ctx); + algop->unsafe_init_fn(&ctx); + algop->unsafe_update_fn(&ctx, data, data_len); + algop->unsafe_final_fn(got, &ctx); - return hasheq(got, data + data_len, the_repository->hash_algo); + return hasheq(got, data + data_len, algop); } -- 2.48.0.rc2.33.gaab3d23ed4c ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v2 4/8] hash.h: introduce `unsafe_hash_algo()` 2025-01-08 19:14 ` [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Taylor Blau ` (2 preceding siblings ...) 2025-01-08 19:14 ` [PATCH v2 3/8] csum-file.c: extract algop from hashfile_checksum_valid() Taylor Blau @ 2025-01-08 19:14 ` Taylor Blau 2025-01-16 11:49 ` Patrick Steinhardt 2025-01-08 19:14 ` [PATCH v2 5/8] csum-file.c: use unsafe_hash_algo() Taylor Blau ` (5 subsequent siblings) 9 siblings, 1 reply; 60+ messages in thread From: Taylor Blau @ 2025-01-08 19:14 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt In 253ed9ecff (hash.h: scaffolding for _unsafe hashing variants, 2024-09-26), we introduced "unsafe" variants of the SHA-1 hashing functions by introducing new functions like "unsafe_init_fn()" and so on. This approach has a major shortcoming that callers must remember to consistently use one variant or the other. Failing to consistently use (or not use) the unsafe variants can lead to crashes at best, or subtle memory corruption issues at worst. In the hashfile API, this isn't difficult to achieve, but verifying that all callers consistently use the unsafe variants is somewhat of a chore given how spread out all of the callers are. In the sha1 and sha1-unsafe test helpers, all of the calls to various hash functions are guarded by an "if (unsafe)" conditional, which is repetitive and cumbersome. Address these issues by introducing a new pattern whereby one 'git_hash_algo' can return a pointer to another 'git_hash_algo' that represents the unsafe version of itself. So instead of having something like: if (unsafe) the_hash_algo->init_fn(...); the_hash_algo->update_fn(...); the_hash_algo->final_fn(...); else the_hash_algo->unsafe_init_fn(...); the_hash_algo->unsafe_update_fn(...); the_hash_algo->unsafe_final_fn(...); we can instead write: struct git_hash_algo *algop = the_hash_algo; if (unsafe) algop = unsafe_hash_algo(algop); the_hash_algo->init_fn(...); the_hash_algo->update_fn(...); the_hash_algo->final_fn(...); This removes the existing shortcoming by no longer forcing the caller to "remember" which variant of the hash functions it wants to call, only to hold onto a 'struct git_hash_algo' pointer that is initialized once. Similarly, while there currently is still a way to "mix" safe and unsafe functions, this too will go away after subsequent commits remove all direct calls to the unsafe_ variants. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> --- hash.h | 5 +++++ object-file.c | 26 ++++++++++++++++++++++++++ 2 files changed, 31 insertions(+) diff --git a/hash.h b/hash.h index 756166ce5e8..23cf6876e50 100644 --- a/hash.h +++ b/hash.h @@ -305,6 +305,9 @@ struct git_hash_algo { /* The all-zeros OID. */ const struct object_id *null_oid; + + /* The unsafe variant of this hash function, if one exists. */ + const struct git_hash_algo *unsafe; }; extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS]; @@ -323,6 +326,8 @@ static inline int hash_algo_by_ptr(const struct git_hash_algo *p) return p - hash_algos; } +const struct git_hash_algo *unsafe_hash_algo(const struct git_hash_algo *algop); + const struct object_id *null_oid(void); static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2, const struct git_hash_algo *algop) diff --git a/object-file.c b/object-file.c index 5b792b3dd42..43efa4ca361 100644 --- a/object-file.c +++ b/object-file.c @@ -202,6 +202,22 @@ static void git_hash_unknown_final_oid(struct object_id *oid UNUSED, BUG("trying to finalize unknown hash"); } +static const struct git_hash_algo sha1_unsafe_algo = { + .name = "sha1", + .format_id = GIT_SHA1_FORMAT_ID, + .rawsz = GIT_SHA1_RAWSZ, + .hexsz = GIT_SHA1_HEXSZ, + .blksz = GIT_SHA1_BLKSZ, + .init_fn = git_hash_sha1_init_unsafe, + .clone_fn = git_hash_sha1_clone_unsafe, + .update_fn = git_hash_sha1_update_unsafe, + .final_fn = git_hash_sha1_final_unsafe, + .final_oid_fn = git_hash_sha1_final_oid_unsafe, + .empty_tree = &empty_tree_oid, + .empty_blob = &empty_blob_oid, + .null_oid = &null_oid_sha1, +}; + const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { { .name = NULL, @@ -239,6 +255,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { .unsafe_update_fn = git_hash_sha1_update_unsafe, .unsafe_final_fn = git_hash_sha1_final_unsafe, .unsafe_final_oid_fn = git_hash_sha1_final_oid_unsafe, + .unsafe = &sha1_unsafe_algo, .empty_tree = &empty_tree_oid, .empty_blob = &empty_blob_oid, .null_oid = &null_oid_sha1, @@ -305,6 +322,15 @@ int hash_algo_by_length(int len) return GIT_HASH_UNKNOWN; } +const struct git_hash_algo *unsafe_hash_algo(const struct git_hash_algo *algop) +{ + /* If we have a faster "unsafe" implementation, use that. */ + if (algop->unsafe) + return algop->unsafe; + /* Otherwise use the default one. */ + return algop; +} + /* * This is meant to hold a *small* number of objects that you would * want repo_read_object_file() to be able to return, but yet you do not want -- 2.48.0.rc2.33.gaab3d23ed4c ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH v2 4/8] hash.h: introduce `unsafe_hash_algo()` 2025-01-08 19:14 ` [PATCH v2 4/8] hash.h: introduce `unsafe_hash_algo()` Taylor Blau @ 2025-01-16 11:49 ` Patrick Steinhardt 2025-01-17 21:18 ` Taylor Blau 0 siblings, 1 reply; 60+ messages in thread From: Patrick Steinhardt @ 2025-01-16 11:49 UTC (permalink / raw) To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano, Elijah Newren On Wed, Jan 08, 2025 at 02:14:42PM -0500, Taylor Blau wrote: > Address these issues by introducing a new pattern whereby one > 'git_hash_algo' can return a pointer to another 'git_hash_algo' that > represents the unsafe version of itself. So instead of having something > like: > > if (unsafe) > the_hash_algo->init_fn(...); > the_hash_algo->update_fn(...); > the_hash_algo->final_fn(...); > else > the_hash_algo->unsafe_init_fn(...); > the_hash_algo->unsafe_update_fn(...); > the_hash_algo->unsafe_final_fn(...); > > we can instead write: > > struct git_hash_algo *algop = the_hash_algo; > if (unsafe) > algop = unsafe_hash_algo(algop); > > the_hash_algo->init_fn(...); > the_hash_algo->update_fn(...); > the_hash_algo->final_fn(...); This should all be `algop->init_fn(...)` and so on, right? > This removes the existing shortcoming by no longer forcing the caller to > "remember" which variant of the hash functions it wants to call, only to > hold onto a 'struct git_hash_algo' pointer that is initialized once. We still have to remember something, but now we only have to remember the pointer to the hash algorithm itself, which definitely is less tedious. For what it's worth, this prompted me to have another look at my proposal to stop having to use the hash algo altogether for `update()` et al, and now have a version that works. It builds on top of your patch series, which still is a step into the right direction. So I'll send it once your series is being merged down. Thanks! Patrick ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 4/8] hash.h: introduce `unsafe_hash_algo()` 2025-01-16 11:49 ` Patrick Steinhardt @ 2025-01-17 21:18 ` Taylor Blau 0 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-17 21:18 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano, Elijah Newren On Thu, Jan 16, 2025 at 12:49:56PM +0100, Patrick Steinhardt wrote: > > we can instead write: > > > > struct git_hash_algo *algop = the_hash_algo; > > if (unsafe) > > algop = unsafe_hash_algo(algop); > > > > the_hash_algo->init_fn(...); > > the_hash_algo->update_fn(...); > > the_hash_algo->final_fn(...); > > This should all be `algop->init_fn(...)` and so on, right? Most definitely :-). > For what it's worth, this prompted me to have another look at my > proposal to stop having to use the hash algo altogether for `update()` > et al, and now have a version that works. It builds on top of your patch > series, which still is a step into the right direction. So I'll send it > once your series is being merged down. Exciting! I'm looking forward to reviewing it. Thanks, Taylor ^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v2 5/8] csum-file.c: use unsafe_hash_algo() 2025-01-08 19:14 ` [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Taylor Blau ` (3 preceding siblings ...) 2025-01-08 19:14 ` [PATCH v2 4/8] hash.h: introduce `unsafe_hash_algo()` Taylor Blau @ 2025-01-08 19:14 ` Taylor Blau 2025-01-08 19:14 ` [PATCH v2 6/8] t/helper/test-hash.c: " Taylor Blau ` (4 subsequent siblings) 9 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-08 19:14 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt Instead of calling the unsafe_ hash function variants directly, make use of the shared 'algop' pointer by initializing it to: f->algop = unsafe_hash_algo(the_hash_algo); , thus making all calls use the unsafe variants directly. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- csum-file.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/csum-file.c b/csum-file.c index 7a71121e340..ebffc80ef71 100644 --- a/csum-file.c +++ b/csum-file.c @@ -50,7 +50,7 @@ void hashflush(struct hashfile *f) if (offset) { if (!f->skip_hash) - f->algop->unsafe_update_fn(&f->ctx, f->buffer, offset); + f->algop->update_fn(&f->ctx, f->buffer, offset); flush(f, f->buffer, offset); f->offset = 0; } @@ -73,7 +73,7 @@ int finalize_hashfile(struct hashfile *f, unsigned char *result, if (f->skip_hash) hashclr(f->buffer, f->algop); else - f->algop->unsafe_final_fn(f->buffer, &f->ctx); + f->algop->final_fn(f->buffer, &f->ctx); if (result) hashcpy(result, f->buffer, f->algop); @@ -128,7 +128,7 @@ void hashwrite(struct hashfile *f, const void *buf, unsigned int count) * f->offset is necessarily zero. */ if (!f->skip_hash) - f->algop->unsafe_update_fn(&f->ctx, buf, nr); + f->algop->update_fn(&f->ctx, buf, nr); flush(f, buf, nr); } else { /* @@ -175,8 +175,8 @@ static struct hashfile *hashfd_internal(int fd, const char *name, f->do_crc = 0; f->skip_hash = 0; - f->algop = the_hash_algo; - f->algop->unsafe_init_fn(&f->ctx); + f->algop = unsafe_hash_algo(the_hash_algo); + f->algop->init_fn(&f->ctx); f->buffer_len = buffer_len; f->buffer = xmalloc(buffer_len); @@ -210,7 +210,7 @@ void hashfile_checkpoint(struct hashfile *f, struct hashfile_checkpoint *checkpo { hashflush(f); checkpoint->offset = f->total; - f->algop->unsafe_clone_fn(&checkpoint->ctx, &f->ctx); + f->algop->clone_fn(&checkpoint->ctx, &f->ctx); } int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoint *checkpoint) @@ -221,7 +221,7 @@ int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoint *checkpoint lseek(f->fd, offset, SEEK_SET) != offset) return -1; f->total = offset; - f->algop->unsafe_clone_fn(&f->ctx, &checkpoint->ctx); + f->algop->clone_fn(&f->ctx, &checkpoint->ctx); f->offset = 0; /* hashflush() was called in checkpoint */ return 0; } @@ -242,15 +242,15 @@ int hashfile_checksum_valid(const unsigned char *data, size_t total_len) { unsigned char got[GIT_MAX_RAWSZ]; git_hash_ctx ctx; - const struct git_hash_algo *algop = the_hash_algo; + const struct git_hash_algo *algop = unsafe_hash_algo(the_hash_algo); size_t data_len = total_len - algop->rawsz; if (total_len < algop->rawsz) return 0; /* say "too short"? */ - algop->unsafe_init_fn(&ctx); - algop->unsafe_update_fn(&ctx, data, data_len); - algop->unsafe_final_fn(got, &ctx); + algop->init_fn(&ctx); + algop->update_fn(&ctx, data, data_len); + algop->final_fn(got, &ctx); return hasheq(got, data + data_len, algop); } -- 2.48.0.rc2.33.gaab3d23ed4c ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v2 6/8] t/helper/test-hash.c: use unsafe_hash_algo() 2025-01-08 19:14 ` [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Taylor Blau ` (4 preceding siblings ...) 2025-01-08 19:14 ` [PATCH v2 5/8] csum-file.c: use unsafe_hash_algo() Taylor Blau @ 2025-01-08 19:14 ` Taylor Blau 2025-01-08 19:14 ` [PATCH v2 7/8] csum-file: introduce hashfile_checkpoint_init() Taylor Blau ` (3 subsequent siblings) 9 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-08 19:14 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt Remove a series of conditionals within the shared cmd_hash_impl() helper that powers the 'sha1' and 'sha1-unsafe' helpers. Instead, replace them with a single conditional that transforms the specified hash algorithm into its unsafe variant. Then all subsequent calls can directly use whatever function it wants to call without having to decide between the safe and unsafe variants. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- t/helper/test-hash.c | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/t/helper/test-hash.c b/t/helper/test-hash.c index d0ee668df95..aa82638c621 100644 --- a/t/helper/test-hash.c +++ b/t/helper/test-hash.c @@ -9,6 +9,8 @@ int cmd_hash_impl(int ac, const char **av, int algo, int unsafe) int binary = 0; char *buffer; const struct git_hash_algo *algop = &hash_algos[algo]; + if (unsafe) + algop = unsafe_hash_algo(algop); if (ac == 2) { if (!strcmp(av[1], "-b")) @@ -27,10 +29,7 @@ int cmd_hash_impl(int ac, const char **av, int algo, int unsafe) die("OOPS"); } - if (unsafe) - algop->unsafe_init_fn(&ctx); - else - algop->init_fn(&ctx); + algop->init_fn(&ctx); while (1) { ssize_t sz, this_sz; @@ -49,15 +48,9 @@ int cmd_hash_impl(int ac, const char **av, int algo, int unsafe) } if (this_sz == 0) break; - if (unsafe) - algop->unsafe_update_fn(&ctx, buffer, this_sz); - else - algop->update_fn(&ctx, buffer, this_sz); + algop->update_fn(&ctx, buffer, this_sz); } - if (unsafe) - algop->unsafe_final_fn(hash, &ctx); - else - algop->final_fn(hash, &ctx); + algop->final_fn(hash, &ctx); if (binary) fwrite(hash, 1, algop->rawsz, stdout); -- 2.48.0.rc2.33.gaab3d23ed4c ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v2 7/8] csum-file: introduce hashfile_checkpoint_init() 2025-01-08 19:14 ` [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Taylor Blau ` (5 preceding siblings ...) 2025-01-08 19:14 ` [PATCH v2 6/8] t/helper/test-hash.c: " Taylor Blau @ 2025-01-08 19:14 ` Taylor Blau 2025-01-10 10:37 ` Jeff King 2025-01-08 19:14 ` [PATCH v2 8/8] hash.h: drop unsafe_ function variants Taylor Blau ` (2 subsequent siblings) 9 siblings, 1 reply; 60+ messages in thread From: Taylor Blau @ 2025-01-08 19:14 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt In 106140a99f (builtin/fast-import: fix segfault with unsafe SHA1 backend, 2024-12-30) and 9218c0bfe1 (bulk-checkin: fix segfault with unsafe SHA1 backend, 2024-12-30), we observed the effects of failing to initialize a hashfile_checkpoint with the same hash function implementation as is used by the hashfile it is used to checkpoint. While both 106140a99f and 9218c0bfe1 work around the immediate crash, changing the hash function implementation within the hashfile API to, for example, the non-unsafe variant would re-introduce the crash. This is a result of the tight coupling between initializing hashfiles and hashfile_checkpoints. Introduce and use a new function which ensures that both parts of a hashfile and hashfile_checkpoint pair use the same hash function implementation to avoid such crashes. A few things worth noting: - In the change to builtin/fast-import.c::stream_blob(), we can see that by removing the explicit reference to 'the_hash_algo->unsafe_init_fn()', we are hardened against the hashfile API changing away from the_hash_algo (or its unsafe variant) in the future. - The bulk-checkin code no longer needs to explicitly zero-initialize the hashfile_checkpoint, since it is now done as a result of calling 'hashfile_checkpoint_init()'. - Also in the bulk-checkin code, we add an additional call to prepare_to_stream() outside of the main loop in order to initialize 'state->f' so we know which hash function implementation to use when calling 'hashfile_checkpoint_init()'. This is OK, since subsequent 'prepare_to_stream()' calls are noops. However, we only need to call 'prepare_to_stream()' when we have the HASH_WRITE_OBJECT bit set in our flags. Without that bit, calling 'prepare_to_stream()' does not assign 'state->f', so we have nothing to initialize. - Other uses of the 'checkpoint' in 'deflate_blob_to_pack()' are appropriately guarded. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com> --- builtin/fast-import.c | 2 +- bulk-checkin.c | 9 ++++++--- csum-file.c | 7 +++++++ csum-file.h | 1 + 4 files changed, 15 insertions(+), 4 deletions(-) diff --git a/builtin/fast-import.c b/builtin/fast-import.c index 0f86392761a..4a6c7ab52ac 100644 --- a/builtin/fast-import.c +++ b/builtin/fast-import.c @@ -1106,7 +1106,7 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark) || (pack_size + PACK_SIZE_THRESHOLD + len) < pack_size) cycle_packfile(); - the_hash_algo->unsafe_init_fn(&checkpoint.ctx); + hashfile_checkpoint_init(pack_file, &checkpoint); hashfile_checkpoint(pack_file, &checkpoint); offset = checkpoint.offset; diff --git a/bulk-checkin.c b/bulk-checkin.c index 433070a3bda..892176d23d2 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -261,7 +261,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, git_hash_ctx ctx; unsigned char obuf[16384]; unsigned header_len; - struct hashfile_checkpoint checkpoint = {0}; + struct hashfile_checkpoint checkpoint; struct pack_idx_entry *idx = NULL; seekback = lseek(fd, 0, SEEK_CUR); @@ -272,12 +272,15 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, OBJ_BLOB, size); the_hash_algo->init_fn(&ctx); the_hash_algo->update_fn(&ctx, obuf, header_len); - the_hash_algo->unsafe_init_fn(&checkpoint.ctx); /* Note: idx is non-NULL when we are writing */ - if ((flags & HASH_WRITE_OBJECT) != 0) + if ((flags & HASH_WRITE_OBJECT) != 0) { CALLOC_ARRAY(idx, 1); + prepare_to_stream(state, flags); + hashfile_checkpoint_init(state->f, &checkpoint); + } + already_hashed_to = 0; while (1) { diff --git a/csum-file.c b/csum-file.c index ebffc80ef71..232121f415f 100644 --- a/csum-file.c +++ b/csum-file.c @@ -206,6 +206,13 @@ struct hashfile *hashfd_throughput(int fd, const char *name, struct progress *tp return hashfd_internal(fd, name, tp, 8 * 1024); } +void hashfile_checkpoint_init(struct hashfile *f, + struct hashfile_checkpoint *checkpoint) +{ + memset(checkpoint, 0, sizeof(*checkpoint)); + f->algop->init_fn(&checkpoint->ctx); +} + void hashfile_checkpoint(struct hashfile *f, struct hashfile_checkpoint *checkpoint) { hashflush(f); diff --git a/csum-file.h b/csum-file.h index 2b45f4673a2..b7475f16c20 100644 --- a/csum-file.h +++ b/csum-file.h @@ -36,6 +36,7 @@ struct hashfile_checkpoint { git_hash_ctx ctx; }; +void hashfile_checkpoint_init(struct hashfile *, struct hashfile_checkpoint *); void hashfile_checkpoint(struct hashfile *, struct hashfile_checkpoint *); int hashfile_truncate(struct hashfile *, struct hashfile_checkpoint *); -- 2.48.0.rc2.33.gaab3d23ed4c ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH v2 7/8] csum-file: introduce hashfile_checkpoint_init() 2025-01-08 19:14 ` [PATCH v2 7/8] csum-file: introduce hashfile_checkpoint_init() Taylor Blau @ 2025-01-10 10:37 ` Jeff King 2025-01-10 21:50 ` Taylor Blau 0 siblings, 1 reply; 60+ messages in thread From: Jeff King @ 2025-01-10 10:37 UTC (permalink / raw) To: Taylor Blau; +Cc: git, Junio C Hamano, Elijah Newren, Patrick Steinhardt On Wed, Jan 08, 2025 at 02:14:51PM -0500, Taylor Blau wrote: > Introduce and use a new function which ensures that both parts of a > hashfile and hashfile_checkpoint pair use the same hash function > implementation to avoid such crashes. That makes sense. This should have been encapsulated all along, just like the actual hash initialization happens inside hashfile_init(). A hashfile_checkpoint is sort of inherently tied to a hashfile, right? I mean, it is recording an offset that only makes sense in the context of the parent hashfile. And that is only more true after the unsafe-hash patches, because now it needs to use the "algop" pointer from the parent hashfile (though for now we expect all hashfiles to use the same unsafe-algop, in theory we could use different checksums for each file). So in the new constructor: > +void hashfile_checkpoint_init(struct hashfile *f, > + struct hashfile_checkpoint *checkpoint) > +{ > + memset(checkpoint, 0, sizeof(*checkpoint)); > + f->algop->init_fn(&checkpoint->ctx); > +} ...should we actually record "f" itself? And then in the existing functions: > void hashfile_checkpoint(struct hashfile *f, struct hashfile_checkpoint *checkpoint) ...they'd no longer need to take the extra parameter. It creates a lifetime dependency of the checkpoint struct on the "f" it is checkpointing, but I think that is naturally modeling the domain. A semi-related thing I wondered about: do we need a destructor/release function of some kind? Long ago when this checkpoint code was added, a memcpy() of the sha_ctx struct was sufficient. But these days we use clone_fn(), which may call openssl_SHA1_Clone(), which does EVP_MD_CTX_copy_ex() under the hood. Do we have any promise that this doesn't allocate any resources that might need a call to _Final() to release (or I guess the more efficient way is directly EVP_MD_CTX_free() under the hood). My reading of the openssl manpages suggests that we should be doing that, or we may see leaks. But it may also be the case that it doesn't happen to trigger for their implementation. At any rate, we do not seem to have such a cleanup function. So it is certainly an orthogonal issue to your series. I wondered about it here because if we did have one, it would be necessary to clean up checkpoint before the hashfile due to the lifetime dependency I mentioned above. -Peff ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 7/8] csum-file: introduce hashfile_checkpoint_init() 2025-01-10 10:37 ` Jeff King @ 2025-01-10 21:50 ` Taylor Blau 2025-01-17 21:30 ` Taylor Blau 0 siblings, 1 reply; 60+ messages in thread From: Taylor Blau @ 2025-01-10 21:50 UTC (permalink / raw) To: Jeff King; +Cc: git, Junio C Hamano, Elijah Newren, Patrick Steinhardt On Fri, Jan 10, 2025 at 05:37:56AM -0500, Jeff King wrote: > So in the new constructor: > > > +void hashfile_checkpoint_init(struct hashfile *f, > > + struct hashfile_checkpoint *checkpoint) > > +{ > > + memset(checkpoint, 0, sizeof(*checkpoint)); > > + f->algop->init_fn(&checkpoint->ctx); > > +} > > ...should we actually record "f" itself? And then in the existing > functions: > > > void hashfile_checkpoint(struct hashfile *f, struct hashfile_checkpoint *checkpoint) > > ...they'd no longer need to take the extra parameter. > > It creates a lifetime dependency of the checkpoint struct on the "f" it > is checkpointing, but I think that is naturally modeling the domain. Thanks, I really like these suggestions. I adjusted the series accordingly to do this cleanup in two patches (one for hashfile_checkpoint(), another for hashfile_truncate()) after the patch introducing hashfile_checkpoint_init(). > A semi-related thing I wondered about: do we need a destructor/release > function of some kind? Long ago when this checkpoint code was added, a > memcpy() of the sha_ctx struct was sufficient. But these days we use > clone_fn(), which may call openssl_SHA1_Clone(), which does > EVP_MD_CTX_copy_ex() under the hood. Do we have any promise that this > doesn't allocate any resources that might need a call to _Final() to > release (or I guess the more efficient way is directly EVP_MD_CTX_free() > under the hood). > > My reading of the openssl manpages suggests that we should be doing > that, or we may see leaks. But it may also be the case that it doesn't > happen to trigger for their implementation. > > At any rate, we do not seem to have such a cleanup function. So it is > certainly an orthogonal issue to your series. I wondered about it here > because if we did have one, it would be necessary to clean up checkpoint > before the hashfile due to the lifetime dependency I mentioned above. I like the idea of a cleanup function, but let's do so in a separate series. Thanks, Taylor ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 7/8] csum-file: introduce hashfile_checkpoint_init() 2025-01-10 21:50 ` Taylor Blau @ 2025-01-17 21:30 ` Taylor Blau 2025-01-18 12:15 ` Jeff King 0 siblings, 1 reply; 60+ messages in thread From: Taylor Blau @ 2025-01-17 21:30 UTC (permalink / raw) To: Jeff King; +Cc: git, Junio C Hamano, Elijah Newren, Patrick Steinhardt On Fri, Jan 10, 2025 at 04:50:25PM -0500, Taylor Blau wrote: > On Fri, Jan 10, 2025 at 05:37:56AM -0500, Jeff King wrote: > > So in the new constructor: > > > > > +void hashfile_checkpoint_init(struct hashfile *f, > > > + struct hashfile_checkpoint *checkpoint) > > > +{ > > > + memset(checkpoint, 0, sizeof(*checkpoint)); > > > + f->algop->init_fn(&checkpoint->ctx); > > > +} > > > > ...should we actually record "f" itself? And then in the existing > > functions: > > > > > void hashfile_checkpoint(struct hashfile *f, struct hashfile_checkpoint *checkpoint) > > > > ...they'd no longer need to take the extra parameter. > > > > It creates a lifetime dependency of the checkpoint struct on the "f" it > > is checkpointing, but I think that is naturally modeling the domain. > > Thanks, I really like these suggestions. I adjusted the series > accordingly to do this cleanup in two patches (one for > hashfile_checkpoint(), another for hashfile_truncate()) after the patch > introducing hashfile_checkpoint_init(). Hmm. I'm not sure that I like this as much as I thought I did. I agree with you that ultimately the hashfile_checkpoint is (or should be) tied to the lifetime of the hashfile that it is checkpointing underneath. But in practice things are a little funky. Let's suppose I did something like the following: --- 8< --- diff --git a/csum-file.c b/csum-file.c index ebffc80ef7..47b8317a1f 100644 --- a/csum-file.c +++ b/csum-file.c @@ -206,6 +206,15 @@ struct hashfile *hashfd_throughput(int fd, const char *name, struct progress *tp return hashfd_internal(fd, name, tp, 8 * 1024); } +void hashfile_checkpoint_init(struct hashfile *f, + struct hashfile_checkpoint *checkpoint) +{ + memset(checkpoint, 0, sizeof(*checkpoint)); + + checkpoint->f = f; + checkpoint->f->algop->init_fn(&checkpoint->ctx); +} + void hashfile_checkpoint(struct hashfile *f, struct hashfile_checkpoint *checkpoint) { hashflush(f); diff --git a/csum-file.h b/csum-file.h index 2b45f4673a..8016509c71 100644 --- a/csum-file.h +++ b/csum-file.h @@ -34,8 +34,10 @@ struct hashfile { struct hashfile_checkpoint { off_t offset; git_hash_ctx ctx; + struct hashfile *f; }; +void hashfile_checkpoint_init(struct hashfile *, struct hashfile_checkpoint *); void hashfile_checkpoint(struct hashfile *, struct hashfile_checkpoint *); int hashfile_truncate(struct hashfile *, struct hashfile_checkpoint *); --- >8 --- , where I'm eliding the trivial changes necessary to make this work at the two callers. Let's look a little closer at the bulk-checkin caller. If I do this on top: --- 8< --- diff --git a/bulk-checkin.c b/bulk-checkin.c index 433070a3bd..892176d23d 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -261,7 +261,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, git_hash_ctx ctx; unsigned char obuf[16384]; unsigned header_len; - struct hashfile_checkpoint checkpoint = {0}; + struct hashfile_checkpoint checkpoint; struct pack_idx_entry *idx = NULL; seekback = lseek(fd, 0, SEEK_CUR); @@ -272,12 +272,15 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, OBJ_BLOB, size); the_hash_algo->init_fn(&ctx); the_hash_algo->update_fn(&ctx, obuf, header_len); - the_hash_algo->unsafe_init_fn(&checkpoint.ctx); /* Note: idx is non-NULL when we are writing */ - if ((flags & HASH_WRITE_OBJECT) != 0) + if ((flags & HASH_WRITE_OBJECT) != 0) { CALLOC_ARRAY(idx, 1); + prepare_to_stream(state, flags); + hashfile_checkpoint_init(state->f, &checkpoint); + } + already_hashed_to = 0; while (1) { --- >8 --- then test 14 in t1050-large.sh fails because of a segfault in 'git add'. If we compile with SANITIZE=address, we can see that there's a use-after-free in hashflush(), which is called by hashfile_checkpoint(). That is a result of the max pack-size code. So we could try something like: --- 8< --- diff --git a/bulk-checkin.c b/bulk-checkin.c index 7b8a6eb2df..9dc114d132 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -261,7 +261,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, git_hash_ctx ctx; unsigned char obuf[16384]; unsigned header_len; - struct hashfile_checkpoint checkpoint; + struct hashfile_checkpoint checkpoint = { 0 }; struct pack_idx_entry *idx = NULL; seekback = lseek(fd, 0, SEEK_CUR); @@ -274,17 +274,14 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, the_hash_algo->update_fn(&ctx, obuf, header_len); /* Note: idx is non-NULL when we are writing */ - if ((flags & HASH_WRITE_OBJECT) != 0) { + if ((flags & HASH_WRITE_OBJECT) != 0) CALLOC_ARRAY(idx, 1); - - prepare_to_stream(state, flags); - hashfile_checkpoint_init(state->f, &checkpoint); - } - already_hashed_to = 0; while (1) { prepare_to_stream(state, flags); + if (checkpoint.f != state->f) + hashfile_checkpoint_init(state->f, &checkpoint); if (idx) { hashfile_checkpoint(&checkpoint); idx->offset = state->offset; --- >8 --- which would do the trick, but it feels awfully hacky to have the "if (checkpoint.f != state->f)" check in there, since that feels too intimately tied to the implementation of the hashfile_checkpoint API for my comfort. It would be nice if we could make the checkpoint only declared within the loop body itself, but we can't because we need to call hashfile_truncate() outside of the loop. Anyway, that's all to say that I think that while this is probably doable in theory, in practice it's kind of a mess, at least currently. I would rather see if there are other ways to clean up the deflate_blob_to_pack() function first in a way that made this change less awkward. I think the most reasonable course here would be to pursue a minimal change like the one presented here and then think about further clean up as a separate step. Thanks, Taylor ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH v2 7/8] csum-file: introduce hashfile_checkpoint_init() 2025-01-17 21:30 ` Taylor Blau @ 2025-01-18 12:15 ` Jeff King 0 siblings, 0 replies; 60+ messages in thread From: Jeff King @ 2025-01-18 12:15 UTC (permalink / raw) To: Taylor Blau; +Cc: git, Junio C Hamano, Elijah Newren, Patrick Steinhardt On Fri, Jan 17, 2025 at 04:30:05PM -0500, Taylor Blau wrote: > diff --git a/bulk-checkin.c b/bulk-checkin.c > index 433070a3bd..892176d23d 100644 > --- a/bulk-checkin.c > +++ b/bulk-checkin.c > @@ -261,7 +261,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, > git_hash_ctx ctx; > unsigned char obuf[16384]; > unsigned header_len; > - struct hashfile_checkpoint checkpoint = {0}; > + struct hashfile_checkpoint checkpoint; > struct pack_idx_entry *idx = NULL; > > seekback = lseek(fd, 0, SEEK_CUR); > @@ -272,12 +272,15 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, > OBJ_BLOB, size); > the_hash_algo->init_fn(&ctx); > the_hash_algo->update_fn(&ctx, obuf, header_len); > - the_hash_algo->unsafe_init_fn(&checkpoint.ctx); > > /* Note: idx is non-NULL when we are writing */ > - if ((flags & HASH_WRITE_OBJECT) != 0) > + if ((flags & HASH_WRITE_OBJECT) != 0) { > CALLOC_ARRAY(idx, 1); > > + prepare_to_stream(state, flags); > + hashfile_checkpoint_init(state->f, &checkpoint); > + } > + > already_hashed_to = 0; > > while (1) { Yeah, that's ugly. We are potentially throwing away the hashfile that the checkpoint was created for. That makes my instinct to push the checkpoint down into the loop where we we might restart a new pack, like this (and like you suggested below): diff --git a/bulk-checkin.c b/bulk-checkin.c index 433070a3bd..efa59074fb 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -261,7 +261,6 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, git_hash_ctx ctx; unsigned char obuf[16384]; unsigned header_len; - struct hashfile_checkpoint checkpoint = {0}; struct pack_idx_entry *idx = NULL; seekback = lseek(fd, 0, SEEK_CUR); @@ -281,8 +280,10 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, already_hashed_to = 0; while (1) { + struct hashfile_checkpoint checkpoint = {0}; prepare_to_stream(state, flags); if (idx) { + hashfile_checkpoint_init(state->f, &checkpoint); hashfile_checkpoint(state->f, &checkpoint); idx->offset = state->offset; crc32_begin(state->f); but that doesn't work, because the checkpoint is also used later for the already_written() check: if (already_written(state, result_oid)) { hashfile_truncate(state->f, &checkpoint); state->offset = checkpoint.offset; free(idx); } else That made me wonder if there is a bug lurking there. What if we found the pack was too big, truncated to our checkpoint, and then opened a new pack? Then the original checkpoint would now be bogus! It would mention an offset in the original packfile which doesn't make any sense with what we have open. But I think this is OK, because we can only leave the loop when stream_blob_to_pack() returns, and we always establish a new checkpoint before then. So I do think that moving the initialization of the checkpoint into the loop, but _not_ moving the variable would work the same way it does now (i.e., what you suggested below). But I admit that the way this loop works kind of makes my head spin. It can really only ever run twice, but it is hard to see: we break out if stream_blob_to_pack() returns success. And it will only return error if we would bust the packsize limit (all other errors cause us to die()). And only if we would bust the limit _and_ we are not the only object in the pack. And since we start a new pack if we loop, that will never be true on the second iteration; we'll always either die() or return success. I do think it would be much easier to read with a single explicit retry: if (checkpoint_and_try_to_stream() < 0) { /* we busted the limit; make a new pack and try again */ hashfile_truncate(); etc... if (checkpoint_and_try_to_stream() < 0) BUG("yikes, we should not fail a second time!"); } where checkpoint_and_try_to_stream() is the first half of the loop, up to the stream_blob_to_pack() call. Anyway, that is all outside of your patch, and relevant only because _if_ we untangled it a bit more, it might make the checkpoint lifetime a bit more obvious and less scary to refactor. But it does imply to me that the data dependency introduced by my suggestion is not always so straight-forward as I thought it would be, and we should probably punt on it for your series. > which would do the trick, but it feels awfully hacky to have the "if > (checkpoint.f != state->f)" check in there, since that feels too > intimately tied to the implementation of the hashfile_checkpoint API for > my comfort. I think you could unconditionally checkpoint at that part; we're about to do a write, so we want to store the state before the write in case we need to roll back. > Anyway, that's all to say that I think that while this is probably > doable in theory, in practice it's kind of a mess, at least currently. > I would rather see if there are other ways to clean up the > deflate_blob_to_pack() function first in a way that made this change > less awkward. Yeah, I actually wrote what I wrote above before reading this far down in your email, but we arrived at the exact same conclusion. ;) Hopefully what I wrote might give some pointers if somebody wants to refactor later. > I think the most reasonable course here would be to pursue a minimal > change like the one presented here and then think about further clean up > as a separate step. Yep. Thanks for looking into it. -Peff ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v2 8/8] hash.h: drop unsafe_ function variants 2025-01-08 19:14 ` [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Taylor Blau ` (6 preceding siblings ...) 2025-01-08 19:14 ` [PATCH v2 7/8] csum-file: introduce hashfile_checkpoint_init() Taylor Blau @ 2025-01-08 19:14 ` Taylor Blau 2025-01-10 10:41 ` [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Jeff King 2025-01-11 0:14 ` Junio C Hamano 9 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-08 19:14 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt Now that all callers have been converted from: the_hash_algo->unsafe_init_fn(); to unsafe_hash_algo(the_hash_algo)->unsafe_init_fn(); and similar, we can remove the scaffolding for the unsafe_ function variants and force callers to use the new unsafe_hash_algo() mechanic instead. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- hash.h | 15 --------------- object-file.c | 15 --------------- 2 files changed, 30 deletions(-) diff --git a/hash.h b/hash.h index 23cf6876e50..6dcbf6ab835 100644 --- a/hash.h +++ b/hash.h @@ -282,21 +282,6 @@ struct git_hash_algo { /* The hash finalization function for object IDs. */ git_hash_final_oid_fn final_oid_fn; - /* The non-cryptographic hash initialization function. */ - git_hash_init_fn unsafe_init_fn; - - /* The non-cryptographic hash context cloning function. */ - git_hash_clone_fn unsafe_clone_fn; - - /* The non-cryptographic hash update function. */ - git_hash_update_fn unsafe_update_fn; - - /* The non-cryptographic hash finalization function. */ - git_hash_final_fn unsafe_final_fn; - - /* The non-cryptographic hash finalization function. */ - git_hash_final_oid_fn unsafe_final_oid_fn; - /* The OID of the empty tree. */ const struct object_id *empty_tree; diff --git a/object-file.c b/object-file.c index 43efa4ca361..c4b42dd4be9 100644 --- a/object-file.c +++ b/object-file.c @@ -230,11 +230,6 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { .update_fn = git_hash_unknown_update, .final_fn = git_hash_unknown_final, .final_oid_fn = git_hash_unknown_final_oid, - .unsafe_init_fn = git_hash_unknown_init, - .unsafe_clone_fn = git_hash_unknown_clone, - .unsafe_update_fn = git_hash_unknown_update, - .unsafe_final_fn = git_hash_unknown_final, - .unsafe_final_oid_fn = git_hash_unknown_final_oid, .empty_tree = NULL, .empty_blob = NULL, .null_oid = NULL, @@ -250,11 +245,6 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { .update_fn = git_hash_sha1_update, .final_fn = git_hash_sha1_final, .final_oid_fn = git_hash_sha1_final_oid, - .unsafe_init_fn = git_hash_sha1_init_unsafe, - .unsafe_clone_fn = git_hash_sha1_clone_unsafe, - .unsafe_update_fn = git_hash_sha1_update_unsafe, - .unsafe_final_fn = git_hash_sha1_final_unsafe, - .unsafe_final_oid_fn = git_hash_sha1_final_oid_unsafe, .unsafe = &sha1_unsafe_algo, .empty_tree = &empty_tree_oid, .empty_blob = &empty_blob_oid, @@ -271,11 +261,6 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { .update_fn = git_hash_sha256_update, .final_fn = git_hash_sha256_final, .final_oid_fn = git_hash_sha256_final_oid, - .unsafe_init_fn = git_hash_sha256_init, - .unsafe_clone_fn = git_hash_sha256_clone, - .unsafe_update_fn = git_hash_sha256_update, - .unsafe_final_fn = git_hash_sha256_final, - .unsafe_final_oid_fn = git_hash_sha256_final_oid, .empty_tree = &empty_tree_oid_sha256, .empty_blob = &empty_blob_oid_sha256, .null_oid = &null_oid_sha256, -- 2.48.0.rc2.33.gaab3d23ed4c ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants 2025-01-08 19:14 ` [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Taylor Blau ` (7 preceding siblings ...) 2025-01-08 19:14 ` [PATCH v2 8/8] hash.h: drop unsafe_ function variants Taylor Blau @ 2025-01-10 10:41 ` Jeff King 2025-01-10 21:29 ` Taylor Blau 2025-01-11 0:14 ` Junio C Hamano 9 siblings, 1 reply; 60+ messages in thread From: Jeff King @ 2025-01-10 10:41 UTC (permalink / raw) To: Taylor Blau; +Cc: git, Junio C Hamano, Elijah Newren, Patrick Steinhardt On Wed, Jan 08, 2025 at 02:14:29PM -0500, Taylor Blau wrote: > (This series is rebased on 'master', which is 14650065b7 > (RelNotes/2.48.0: fix typos etc., 2025-01-07) at the time of writing). > > The bulk of this series is unchanged since last time, but a new seventh > patch that further hardens the hashfile_checkpoint callers on top of > Patrick's recent series[1]. I think that new patch is a definite improvement, though I left some comments there. The changes in patch 1 look fine to me (I still think a generic "test-tool hash" would make more sense, but I'm OK punting on that for now). I didn't see any response to the review in round 1 about the pointer dangers in patch 3. What do you think of using a separate git_hash_algo_fns struct, with the one-time conversion I showed in the subthread of: https://lore.kernel.org/git/20241121093731.GD602681@coredump.intra.peff.net/ ? -Peff ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants 2025-01-10 10:41 ` [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Jeff King @ 2025-01-10 21:29 ` Taylor Blau 2025-01-11 2:42 ` Jeff King 0 siblings, 1 reply; 60+ messages in thread From: Taylor Blau @ 2025-01-10 21:29 UTC (permalink / raw) To: Jeff King; +Cc: git, Junio C Hamano, Elijah Newren, Patrick Steinhardt On Fri, Jan 10, 2025 at 05:41:06AM -0500, Jeff King wrote: > On Wed, Jan 08, 2025 at 02:14:29PM -0500, Taylor Blau wrote: > > > (This series is rebased on 'master', which is 14650065b7 > > (RelNotes/2.48.0: fix typos etc., 2025-01-07) at the time of writing). > > > > The bulk of this series is unchanged since last time, but a new seventh > > patch that further hardens the hashfile_checkpoint callers on top of > > Patrick's recent series[1]. > > I think that new patch is a definite improvement, though I left some > comments there. > > The changes in patch 1 look fine to me (I still think a generic > "test-tool hash" would make more sense, but I'm OK punting on that for > now). > > I didn't see any response to the review in round 1 about the pointer > dangers in patch 3. What do you think of using a separate > git_hash_algo_fns struct, with the one-time conversion I showed in the > subthread of: > > https://lore.kernel.org/git/20241121093731.GD602681@coredump.intra.peff.net/ > > ? Oops. I must have missed those messages; and sure enough when focusing my inbox on this thread they are indeed unread :-). That said, I am not sure that that direction is one that I'd want to go in. Part of the goal of this series is to make it possible to mix safe and unsafe function calls on the same hash function. So doing something like: struct git_hash_algo *algop; algop->init_fn(&ctx); in one part of the code, and then (using the same algop) calling: algop->unsafe_final_fn(...); should be impossible to do to. The benefit of having only a single set of functions implemented on the git_hash_algo type is that it is impossible to mix the two: you'd have to use a different git_hash_algo altogether! So porting the above example to your and brian's git_hash_algo_fns struct, you'd still be able to do: algop->fn.init(&ctx); in one part of the code and algop->unsafe_fn.final(...) in another part, which doesn't appear to me to be safer than the current situation that this series aims to solve. But if I am misunderstanding something about your/brian's discussion earlier in this thread, please feel free to correct me. Thanks, Taylor ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants 2025-01-10 21:29 ` Taylor Blau @ 2025-01-11 2:42 ` Jeff King 0 siblings, 0 replies; 60+ messages in thread From: Jeff King @ 2025-01-11 2:42 UTC (permalink / raw) To: Taylor Blau; +Cc: git, Junio C Hamano, Elijah Newren, Patrick Steinhardt On Fri, Jan 10, 2025 at 04:29:38PM -0500, Taylor Blau wrote: > > I didn't see any response to the review in round 1 about the pointer > > dangers in patch 3. What do you think of using a separate > > git_hash_algo_fns struct, with the one-time conversion I showed in the > > subthread of: > > > > https://lore.kernel.org/git/20241121093731.GD602681@coredump.intra.peff.net/ > > > > ? > > Oops. I must have missed those messages; and sure enough when focusing > my inbox on this thread they are indeed unread :-). > > That said, I am not sure that that direction is one that I'd want to go > in. Part of the goal of this series is to make it possible to mix safe > and unsafe function calls on the same hash function. So doing something > like: > > struct git_hash_algo *algop; > > algop->init_fn(&ctx); > > in one part of the code, and then (using the same algop) calling: > > algop->unsafe_final_fn(...); > > should be impossible to do to. The benefit of having only a single set > of functions implemented on the git_hash_algo type is that it is > impossible to mix the two: you'd have to use a different git_hash_algo > altogether! > > So porting the above example to your and brian's git_hash_algo_fns > struct, you'd still be able to do: > > algop->fn.init(&ctx); > > in one part of the code and algop->unsafe_fn.final(...) in another part, > which doesn't appear to me to be safer than the current situation that > this series aims to solve. I think what that proposal is doing is orthogonal to the goal of your series. You'd still have an unsafe_hash_algo() function, but it would return a git_hash_algo_fns struct, and that's what struct hashfile would store. So your patches would still remain. The advantage is mostly that you can't confuse it with a regular git_hash_algo struct, so it avoids the pointer and hash-id issues. I do think there is one gotcha, though, which is that the hashfile code probably still needs the outer algop pointer for things like algop->raw_size. So you'd have to store both. It's _possible_ to still confuse the two, but the idea is that you'd have to explicitly call algop->fn, to get the wrong one there. If we wanted to make that harder to get wrong, we could start making it a habit to never use algo->fn directly, but to ask for the safe/unsafe git_hash_algo_fns struct. But that would be even more churn in the surrounding code. I think just doing it consistently within hashfile (which is the only unsafe user) would be sufficient. -Peff ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants 2025-01-08 19:14 ` [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Taylor Blau ` (8 preceding siblings ...) 2025-01-10 10:41 ` [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Jeff King @ 2025-01-11 0:14 ` Junio C Hamano 2025-01-11 17:14 ` Taylor Blau 9 siblings, 1 reply; 60+ messages in thread From: Junio C Hamano @ 2025-01-11 0:14 UTC (permalink / raw) To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren, Patrick Steinhardt Taylor Blau <me@ttaylorr.com> writes: > (This series is rebased on 'master', which is 14650065b7 > (RelNotes/2.48.0: fix typos etc., 2025-01-07) at the time of writing). The previous round was based on <cover.1730833506.git.me@ttaylorr.com> which became 'tb/unsafe-hash-test', but this round is based on a recent 'master' that does not yet contain it? Does it mean that the 2-patch series the previous round of this series was based on is no longer needed? Thanks. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants 2025-01-11 0:14 ` Junio C Hamano @ 2025-01-11 17:14 ` Taylor Blau 0 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-11 17:14 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Jeff King, Elijah Newren, Patrick Steinhardt On Fri, Jan 10, 2025 at 04:14:58PM -0800, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > > > (This series is rebased on 'master', which is 14650065b7 > > (RelNotes/2.48.0: fix typos etc., 2025-01-07) at the time of writing). > > The previous round was based on > <cover.1730833506.git.me@ttaylorr.com> which became > 'tb/unsafe-hash-test', but this round is based on a recent 'master' > that does not yet contain it? Does it mean that the 2-patch series > the previous round of this series was based on is no longer needed? Those two patches got squashed together and became the first patch of this series, so 'tb/unsafe-hash-test' is safe to discard. Thank you for shuffling the patches around as always :-). Thanks, Taylor ^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v3 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants 2024-11-20 19:13 [PATCH 0/6] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Taylor Blau ` (6 preceding siblings ...) 2025-01-08 19:14 ` [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Taylor Blau @ 2025-01-17 22:03 ` Taylor Blau 2025-01-17 22:03 ` [PATCH v3 1/8] t/helper/test-tool: implement sha1-unsafe helper Taylor Blau ` (8 more replies) 2025-01-23 17:34 ` [PATCH v4 " Taylor Blau 8 siblings, 9 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-17 22:03 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt (This series is based on 14650065b7 (RelNotes/2.48.0: fix typos etc., 2025-01-07)). The bulk of this series is unchanged since last time, save for a couple of typofixes on spots noticed by Peff and Patrick Steinhardt. More importantly, it fixes hash_algo_by_ptr() when passing the unsafe SHA-1 variant. There were a couple of other ideas floated around, but I don't think they panned out as we had hoped in practice, so I think that this version should be good to go. The original cover letter is as follows: ------------ This series implements an idea discussed in [2] which suggests that we introduce a way to access a wrapped version of a 'struct git_hash_algo' which represents the unsafe variant of that algorithm, rather than having individual unsafe_ functions (like unsafe_init_fn() versus init_fn(), etc.). This approach is relatively straightforward to implement, and removes a significant deficiency in the original implementation of unsafe/non-cryptographic hash functions by making it impossible to switch between safe- and unsafe variants of hash functions. It also cleans up the sha1-unsafe test helper's implementation by removing a large number of "if (unsafe)"-style conditionals. The series is laid out as follows: * The first two patches prepare the hashfile API for the upcoming change: csum-file: store the hash algorithm as a struct field csum-file.c: extract algop from hashfile_checksum_valid() * The next patch implements the new 'unsafe_hash_algo()' function at the heart of this series' approach: hash.h: introduce `unsafe_hash_algo()` * The next two patches convert existing callers to use the new 'unsafe_hash_algo()' function, instead of switching between safe and unsafe_ variants of individual functions: csum-file.c: use unsafe_hash_algo() t/helper/test-hash.c: use unsafe_hash_algo() * The final patch drops the unsafe_ function variants following all callers being converted to use the new pattern: hash.h: drop unsafe_ function variants Thanks in advance for your review! [1]: https://lore.kernel.org/git/20241230-pks-meson-sha1-unsafe-v1-0-efb276e171f5@pks.im/ [2]: https://lore.kernel.org/git/20241107013915.GA961214@coredump.intra.peff.net/ Taylor Blau (8): t/helper/test-tool: implement sha1-unsafe helper csum-file: store the hash algorithm as a struct field csum-file.c: extract algop from hashfile_checksum_valid() hash.h: introduce `unsafe_hash_algo()` csum-file.c: use unsafe_hash_algo() t/helper/test-hash.c: use unsafe_hash_algo() csum-file: introduce hashfile_checkpoint_init() hash.h: drop unsafe_ function variants builtin/fast-import.c | 2 +- bulk-checkin.c | 9 ++++++--- csum-file.c | 40 +++++++++++++++++++++++++--------------- csum-file.h | 2 ++ hash.h | 28 ++++++++++++---------------- object-file.c | 41 ++++++++++++++++++++++++++--------------- t/helper/test-hash.c | 4 +++- t/helper/test-sha1.c | 7 ++++++- t/helper/test-sha1.sh | 38 ++++++++++++++++++++++---------------- t/helper/test-sha256.c | 2 +- t/helper/test-tool.c | 1 + t/helper/test-tool.h | 3 ++- 12 files changed, 107 insertions(+), 70 deletions(-) Range-diff against v2: 1: 4c1523a04f1 = 1: ae6b8c75294 t/helper/test-tool: implement sha1-unsafe helper 2: 99cc44895b5 ! 2: 2b79c76e471 csum-file: store the hash algorithm as a struct field @@ Commit message csum-file: store the hash algorithm as a struct field Throughout the hashfile API, we rely on a reference to 'the_hash_algo', - and call its _usnafe function variants directly. + and call its _unsafe function variants directly. Prepare for a future change where we may use a different 'git_hash_algo' pointer (instead of just relying on 'the_hash_algo' throughout) by 3: 1ffab2f8289 = 3: d7deb3f338e csum-file.c: extract algop from hashfile_checksum_valid() 4: 99dcbe2e716 ! 4: b6b2bb2714f hash.h: introduce `unsafe_hash_algo()` @@ Commit message if (unsafe) algop = unsafe_hash_algo(algop); - the_hash_algo->init_fn(...); - the_hash_algo->update_fn(...); - the_hash_algo->final_fn(...); + algop->init_fn(...); + algop->update_fn(...); + algop->final_fn(...); This removes the existing shortcoming by no longer forcing the caller to "remember" which variant of the hash functions it wants to call, only to @@ Commit message functions, this too will go away after subsequent commits remove all direct calls to the unsafe_ variants. + Note that hash_algo_by_ptr() needs an adjustment to allow passing in the + unsafe variant of a hash function. All other query functions on the + hash_algos array will continue to return the safe variants of any + function. + Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> @@ hash.h: struct git_hash_algo { }; extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS]; -@@ hash.h: static inline int hash_algo_by_ptr(const struct git_hash_algo *p) - return p - hash_algos; +@@ hash.h: int hash_algo_by_length(int len); + /* Identical, except for a pointer to struct git_hash_algo. */ + static inline int hash_algo_by_ptr(const struct git_hash_algo *p) + { +- return p - hash_algos; ++ size_t i; ++ for (i = 0; i < GIT_HASH_NALGOS; i++) { ++ const struct git_hash_algo *algop = &hash_algos[i]; ++ if (p == algop || (algop->unsafe && p == algop->unsafe)) ++ return i; ++ } ++ return GIT_HASH_UNKNOWN; } +const struct git_hash_algo *unsafe_hash_algo(const struct git_hash_algo *algop); 5: 2dcc2aa6803 = 5: ca67de80971 csum-file.c: use unsafe_hash_algo() 6: a2b9ef03080 = 6: 21b175b07ff t/helper/test-hash.c: use unsafe_hash_algo() 7: 94c07fd8a55 = 7: 850d4f407db csum-file: introduce hashfile_checkpoint_init() 8: f5579883816 ! 8: 0c4d006e6e8 hash.h: drop unsafe_ function variants @@ Commit message to - unsafe_hash_algo(the_hash_algo)->unsafe_init_fn(); + unsafe_hash_algo(the_hash_algo)->init_fn(); and similar, we can remove the scaffolding for the unsafe_ function variants and force callers to use the new unsafe_hash_algo() mechanic base-commit: 14650065b76b28d3cfa9453356ac5669b19e706e -- 2.48.0.rc2.35.g0c4d006e6e8 ^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v3 1/8] t/helper/test-tool: implement sha1-unsafe helper 2025-01-17 22:03 ` [PATCH v3 " Taylor Blau @ 2025-01-17 22:03 ` Taylor Blau 2025-01-17 22:03 ` [PATCH v3 2/8] csum-file: store the hash algorithm as a struct field Taylor Blau ` (7 subsequent siblings) 8 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-17 22:03 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt With the new "unsafe" SHA-1 build knob, it is convenient to have a test-tool that can exercise Git's unsafe SHA-1 wrappers for testing, similar to 't/helper/test-tool sha1'. Implement that helper by altering the implementation of that test-tool (in cmd_hash_impl(), which is generic and parameterized over different hash functions) to conditionally run the unsafe variants of the chosen hash function, and expose the new behavior via a new 'sha1-unsafe' test helper. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- t/helper/test-hash.c | 17 +++++++++++++---- t/helper/test-sha1.c | 7 ++++++- t/helper/test-sha1.sh | 38 ++++++++++++++++++++++---------------- t/helper/test-sha256.c | 2 +- t/helper/test-tool.c | 1 + t/helper/test-tool.h | 3 ++- 6 files changed, 45 insertions(+), 23 deletions(-) diff --git a/t/helper/test-hash.c b/t/helper/test-hash.c index 45d829c908f..d0ee668df95 100644 --- a/t/helper/test-hash.c +++ b/t/helper/test-hash.c @@ -1,7 +1,7 @@ #include "test-tool.h" #include "hex.h" -int cmd_hash_impl(int ac, const char **av, int algo) +int cmd_hash_impl(int ac, const char **av, int algo, int unsafe) { git_hash_ctx ctx; unsigned char hash[GIT_MAX_HEXSZ]; @@ -27,7 +27,10 @@ int cmd_hash_impl(int ac, const char **av, int algo) die("OOPS"); } - algop->init_fn(&ctx); + if (unsafe) + algop->unsafe_init_fn(&ctx); + else + algop->init_fn(&ctx); while (1) { ssize_t sz, this_sz; @@ -46,9 +49,15 @@ int cmd_hash_impl(int ac, const char **av, int algo) } if (this_sz == 0) break; - algop->update_fn(&ctx, buffer, this_sz); + if (unsafe) + algop->unsafe_update_fn(&ctx, buffer, this_sz); + else + algop->update_fn(&ctx, buffer, this_sz); } - algop->final_fn(hash, &ctx); + if (unsafe) + algop->unsafe_final_fn(hash, &ctx); + else + algop->final_fn(hash, &ctx); if (binary) fwrite(hash, 1, algop->rawsz, stdout); diff --git a/t/helper/test-sha1.c b/t/helper/test-sha1.c index e60d000c039..349540c4df8 100644 --- a/t/helper/test-sha1.c +++ b/t/helper/test-sha1.c @@ -3,7 +3,7 @@ int cmd__sha1(int ac, const char **av) { - return cmd_hash_impl(ac, av, GIT_HASH_SHA1); + return cmd_hash_impl(ac, av, GIT_HASH_SHA1, 0); } int cmd__sha1_is_sha1dc(int argc UNUSED, const char **argv UNUSED) @@ -13,3 +13,8 @@ int cmd__sha1_is_sha1dc(int argc UNUSED, const char **argv UNUSED) #endif return 1; } + +int cmd__sha1_unsafe(int ac, const char **av) +{ + return cmd_hash_impl(ac, av, GIT_HASH_SHA1, 1); +} diff --git a/t/helper/test-sha1.sh b/t/helper/test-sha1.sh index 84594885c70..bf387d3db14 100755 --- a/t/helper/test-sha1.sh +++ b/t/helper/test-sha1.sh @@ -3,25 +3,31 @@ dd if=/dev/zero bs=1048576 count=100 2>/dev/null | /usr/bin/time t/helper/test-tool sha1 >/dev/null +dd if=/dev/zero bs=1048576 count=100 2>/dev/null | +/usr/bin/time t/helper/test-tool sha1-unsafe >/dev/null + while read expect cnt pfx do case "$expect" in '#'*) continue ;; esac - actual=$( - { - test -z "$pfx" || echo "$pfx" - dd if=/dev/zero bs=1048576 count=$cnt 2>/dev/null | - perl -pe 'y/\000/g/' - } | ./t/helper/test-tool sha1 $cnt - ) - if test "$expect" = "$actual" - then - echo "OK: $expect $cnt $pfx" - else - echo >&2 "OOPS: $cnt" - echo >&2 "expect: $expect" - echo >&2 "actual: $actual" - exit 1 - fi + for sha1 in sha1 sha1-unsafe + do + actual=$( + { + test -z "$pfx" || echo "$pfx" + dd if=/dev/zero bs=1048576 count=$cnt 2>/dev/null | + perl -pe 'y/\000/g/' + } | ./t/helper/test-tool $sha1 $cnt + ) + if test "$expect" = "$actual" + then + echo "OK ($sha1): $expect $cnt $pfx" + else + echo >&2 "OOPS ($sha1): $cnt" + echo >&2 "expect ($sha1): $expect" + echo >&2 "actual ($sha1): $actual" + exit 1 + fi + done done <<EOF da39a3ee5e6b4b0d3255bfef95601890afd80709 0 3f786850e387550fdab836ed7e6dc881de23001b 0 a diff --git a/t/helper/test-sha256.c b/t/helper/test-sha256.c index 2fb20438f3c..7fd0aa1fcd3 100644 --- a/t/helper/test-sha256.c +++ b/t/helper/test-sha256.c @@ -3,5 +3,5 @@ int cmd__sha256(int ac, const char **av) { - return cmd_hash_impl(ac, av, GIT_HASH_SHA256); + return cmd_hash_impl(ac, av, GIT_HASH_SHA256, 0); } diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 4a7aa993ba9..958452ef12e 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -70,6 +70,7 @@ static struct test_cmd cmds[] = { { "serve-v2", cmd__serve_v2 }, { "sha1", cmd__sha1 }, { "sha1-is-sha1dc", cmd__sha1_is_sha1dc }, + { "sha1-unsafe", cmd__sha1_unsafe }, { "sha256", cmd__sha256 }, { "sigchain", cmd__sigchain }, { "simple-ipc", cmd__simple_ipc }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index 21802ac27da..24149edd414 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -63,6 +63,7 @@ int cmd__scrap_cache_tree(int argc, const char **argv); int cmd__serve_v2(int argc, const char **argv); int cmd__sha1(int argc, const char **argv); int cmd__sha1_is_sha1dc(int argc, const char **argv); +int cmd__sha1_unsafe(int argc, const char **argv); int cmd__sha256(int argc, const char **argv); int cmd__sigchain(int argc, const char **argv); int cmd__simple_ipc(int argc, const char **argv); @@ -81,6 +82,6 @@ int cmd__windows_named_pipe(int argc, const char **argv); #endif int cmd__write_cache(int argc, const char **argv); -int cmd_hash_impl(int ac, const char **av, int algo); +int cmd_hash_impl(int ac, const char **av, int algo, int unsafe); #endif -- 2.48.0.rc2.35.g0c4d006e6e8 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v3 2/8] csum-file: store the hash algorithm as a struct field 2025-01-17 22:03 ` [PATCH v3 " Taylor Blau 2025-01-17 22:03 ` [PATCH v3 1/8] t/helper/test-tool: implement sha1-unsafe helper Taylor Blau @ 2025-01-17 22:03 ` Taylor Blau 2025-01-17 22:03 ` [PATCH v3 3/8] csum-file.c: extract algop from hashfile_checksum_valid() Taylor Blau ` (6 subsequent siblings) 8 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-17 22:03 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt Throughout the hashfile API, we rely on a reference to 'the_hash_algo', and call its _unsafe function variants directly. Prepare for a future change where we may use a different 'git_hash_algo' pointer (instead of just relying on 'the_hash_algo' throughout) by making the 'git_hash_algo' pointer a member of the 'hashfile' structure itself. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- csum-file.c | 20 +++++++++++--------- csum-file.h | 1 + 2 files changed, 12 insertions(+), 9 deletions(-) diff --git a/csum-file.c b/csum-file.c index 5716016e12e..b28cd047e3f 100644 --- a/csum-file.c +++ b/csum-file.c @@ -50,7 +50,7 @@ void hashflush(struct hashfile *f) if (offset) { if (!f->skip_hash) - the_hash_algo->unsafe_update_fn(&f->ctx, f->buffer, offset); + f->algop->unsafe_update_fn(&f->ctx, f->buffer, offset); flush(f, f->buffer, offset); f->offset = 0; } @@ -71,14 +71,14 @@ int finalize_hashfile(struct hashfile *f, unsigned char *result, hashflush(f); if (f->skip_hash) - hashclr(f->buffer, the_repository->hash_algo); + hashclr(f->buffer, f->algop); else - the_hash_algo->unsafe_final_fn(f->buffer, &f->ctx); + f->algop->unsafe_final_fn(f->buffer, &f->ctx); if (result) - hashcpy(result, f->buffer, the_repository->hash_algo); + hashcpy(result, f->buffer, f->algop); if (flags & CSUM_HASH_IN_STREAM) - flush(f, f->buffer, the_hash_algo->rawsz); + flush(f, f->buffer, f->algop->rawsz); if (flags & CSUM_FSYNC) fsync_component_or_die(component, f->fd, f->name); if (flags & CSUM_CLOSE) { @@ -128,7 +128,7 @@ void hashwrite(struct hashfile *f, const void *buf, unsigned int count) * f->offset is necessarily zero. */ if (!f->skip_hash) - the_hash_algo->unsafe_update_fn(&f->ctx, buf, nr); + f->algop->unsafe_update_fn(&f->ctx, buf, nr); flush(f, buf, nr); } else { /* @@ -174,7 +174,9 @@ static struct hashfile *hashfd_internal(int fd, const char *name, f->name = name; f->do_crc = 0; f->skip_hash = 0; - the_hash_algo->unsafe_init_fn(&f->ctx); + + f->algop = the_hash_algo; + f->algop->unsafe_init_fn(&f->ctx); f->buffer_len = buffer_len; f->buffer = xmalloc(buffer_len); @@ -208,7 +210,7 @@ void hashfile_checkpoint(struct hashfile *f, struct hashfile_checkpoint *checkpo { hashflush(f); checkpoint->offset = f->total; - the_hash_algo->unsafe_clone_fn(&checkpoint->ctx, &f->ctx); + f->algop->unsafe_clone_fn(&checkpoint->ctx, &f->ctx); } int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoint *checkpoint) @@ -219,7 +221,7 @@ int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoint *checkpoint lseek(f->fd, offset, SEEK_SET) != offset) return -1; f->total = offset; - the_hash_algo->unsafe_clone_fn(&f->ctx, &checkpoint->ctx); + f->algop->unsafe_clone_fn(&f->ctx, &checkpoint->ctx); f->offset = 0; /* hashflush() was called in checkpoint */ return 0; } diff --git a/csum-file.h b/csum-file.h index 7c73da0a40a..2b45f4673a2 100644 --- a/csum-file.h +++ b/csum-file.h @@ -20,6 +20,7 @@ struct hashfile { size_t buffer_len; unsigned char *buffer; unsigned char *check_buffer; + const struct git_hash_algo *algop; /** * If non-zero, skip_hash indicates that we should -- 2.48.0.rc2.35.g0c4d006e6e8 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v3 3/8] csum-file.c: extract algop from hashfile_checksum_valid() 2025-01-17 22:03 ` [PATCH v3 " Taylor Blau 2025-01-17 22:03 ` [PATCH v3 1/8] t/helper/test-tool: implement sha1-unsafe helper Taylor Blau 2025-01-17 22:03 ` [PATCH v3 2/8] csum-file: store the hash algorithm as a struct field Taylor Blau @ 2025-01-17 22:03 ` Taylor Blau 2025-01-17 22:03 ` [PATCH v3 4/8] hash.h: introduce `unsafe_hash_algo()` Taylor Blau ` (5 subsequent siblings) 8 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-17 22:03 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt Perform a similar transformation as in the previous commit, but focused instead on hashfile_checksum_valid(). This function does not work with a hashfile structure itself, and instead validates the raw contents of a file written using the hashfile API. We'll want to be prepared for a similar change to this function in the future, so prepare ourselves for that by extracting 'the_hash_algo' into its own field for use within this function. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- csum-file.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/csum-file.c b/csum-file.c index b28cd047e3f..7a71121e340 100644 --- a/csum-file.c +++ b/csum-file.c @@ -242,14 +242,15 @@ int hashfile_checksum_valid(const unsigned char *data, size_t total_len) { unsigned char got[GIT_MAX_RAWSZ]; git_hash_ctx ctx; - size_t data_len = total_len - the_hash_algo->rawsz; + const struct git_hash_algo *algop = the_hash_algo; + size_t data_len = total_len - algop->rawsz; - if (total_len < the_hash_algo->rawsz) + if (total_len < algop->rawsz) return 0; /* say "too short"? */ - the_hash_algo->unsafe_init_fn(&ctx); - the_hash_algo->unsafe_update_fn(&ctx, data, data_len); - the_hash_algo->unsafe_final_fn(got, &ctx); + algop->unsafe_init_fn(&ctx); + algop->unsafe_update_fn(&ctx, data, data_len); + algop->unsafe_final_fn(got, &ctx); - return hasheq(got, data + data_len, the_repository->hash_algo); + return hasheq(got, data + data_len, algop); } -- 2.48.0.rc2.35.g0c4d006e6e8 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v3 4/8] hash.h: introduce `unsafe_hash_algo()` 2025-01-17 22:03 ` [PATCH v3 " Taylor Blau ` (2 preceding siblings ...) 2025-01-17 22:03 ` [PATCH v3 3/8] csum-file.c: extract algop from hashfile_checksum_valid() Taylor Blau @ 2025-01-17 22:03 ` Taylor Blau 2025-01-17 22:03 ` [PATCH v3 5/8] csum-file.c: use unsafe_hash_algo() Taylor Blau ` (4 subsequent siblings) 8 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-17 22:03 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt In 253ed9ecff (hash.h: scaffolding for _unsafe hashing variants, 2024-09-26), we introduced "unsafe" variants of the SHA-1 hashing functions by introducing new functions like "unsafe_init_fn()" and so on. This approach has a major shortcoming that callers must remember to consistently use one variant or the other. Failing to consistently use (or not use) the unsafe variants can lead to crashes at best, or subtle memory corruption issues at worst. In the hashfile API, this isn't difficult to achieve, but verifying that all callers consistently use the unsafe variants is somewhat of a chore given how spread out all of the callers are. In the sha1 and sha1-unsafe test helpers, all of the calls to various hash functions are guarded by an "if (unsafe)" conditional, which is repetitive and cumbersome. Address these issues by introducing a new pattern whereby one 'git_hash_algo' can return a pointer to another 'git_hash_algo' that represents the unsafe version of itself. So instead of having something like: if (unsafe) the_hash_algo->init_fn(...); the_hash_algo->update_fn(...); the_hash_algo->final_fn(...); else the_hash_algo->unsafe_init_fn(...); the_hash_algo->unsafe_update_fn(...); the_hash_algo->unsafe_final_fn(...); we can instead write: struct git_hash_algo *algop = the_hash_algo; if (unsafe) algop = unsafe_hash_algo(algop); algop->init_fn(...); algop->update_fn(...); algop->final_fn(...); This removes the existing shortcoming by no longer forcing the caller to "remember" which variant of the hash functions it wants to call, only to hold onto a 'struct git_hash_algo' pointer that is initialized once. Similarly, while there currently is still a way to "mix" safe and unsafe functions, this too will go away after subsequent commits remove all direct calls to the unsafe_ variants. Note that hash_algo_by_ptr() needs an adjustment to allow passing in the unsafe variant of a hash function. All other query functions on the hash_algos array will continue to return the safe variants of any function. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> --- hash.h | 13 ++++++++++++- object-file.c | 26 ++++++++++++++++++++++++++ 2 files changed, 38 insertions(+), 1 deletion(-) diff --git a/hash.h b/hash.h index 756166ce5e8..a68a2b6a161 100644 --- a/hash.h +++ b/hash.h @@ -305,6 +305,9 @@ struct git_hash_algo { /* The all-zeros OID. */ const struct object_id *null_oid; + + /* The unsafe variant of this hash function, if one exists. */ + const struct git_hash_algo *unsafe; }; extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS]; @@ -320,9 +323,17 @@ int hash_algo_by_length(int len); /* Identical, except for a pointer to struct git_hash_algo. */ static inline int hash_algo_by_ptr(const struct git_hash_algo *p) { - return p - hash_algos; + size_t i; + for (i = 0; i < GIT_HASH_NALGOS; i++) { + const struct git_hash_algo *algop = &hash_algos[i]; + if (p == algop || (algop->unsafe && p == algop->unsafe)) + return i; + } + return GIT_HASH_UNKNOWN; } +const struct git_hash_algo *unsafe_hash_algo(const struct git_hash_algo *algop); + const struct object_id *null_oid(void); static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2, const struct git_hash_algo *algop) diff --git a/object-file.c b/object-file.c index 5b792b3dd42..43efa4ca361 100644 --- a/object-file.c +++ b/object-file.c @@ -202,6 +202,22 @@ static void git_hash_unknown_final_oid(struct object_id *oid UNUSED, BUG("trying to finalize unknown hash"); } +static const struct git_hash_algo sha1_unsafe_algo = { + .name = "sha1", + .format_id = GIT_SHA1_FORMAT_ID, + .rawsz = GIT_SHA1_RAWSZ, + .hexsz = GIT_SHA1_HEXSZ, + .blksz = GIT_SHA1_BLKSZ, + .init_fn = git_hash_sha1_init_unsafe, + .clone_fn = git_hash_sha1_clone_unsafe, + .update_fn = git_hash_sha1_update_unsafe, + .final_fn = git_hash_sha1_final_unsafe, + .final_oid_fn = git_hash_sha1_final_oid_unsafe, + .empty_tree = &empty_tree_oid, + .empty_blob = &empty_blob_oid, + .null_oid = &null_oid_sha1, +}; + const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { { .name = NULL, @@ -239,6 +255,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { .unsafe_update_fn = git_hash_sha1_update_unsafe, .unsafe_final_fn = git_hash_sha1_final_unsafe, .unsafe_final_oid_fn = git_hash_sha1_final_oid_unsafe, + .unsafe = &sha1_unsafe_algo, .empty_tree = &empty_tree_oid, .empty_blob = &empty_blob_oid, .null_oid = &null_oid_sha1, @@ -305,6 +322,15 @@ int hash_algo_by_length(int len) return GIT_HASH_UNKNOWN; } +const struct git_hash_algo *unsafe_hash_algo(const struct git_hash_algo *algop) +{ + /* If we have a faster "unsafe" implementation, use that. */ + if (algop->unsafe) + return algop->unsafe; + /* Otherwise use the default one. */ + return algop; +} + /* * This is meant to hold a *small* number of objects that you would * want repo_read_object_file() to be able to return, but yet you do not want -- 2.48.0.rc2.35.g0c4d006e6e8 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v3 5/8] csum-file.c: use unsafe_hash_algo() 2025-01-17 22:03 ` [PATCH v3 " Taylor Blau ` (3 preceding siblings ...) 2025-01-17 22:03 ` [PATCH v3 4/8] hash.h: introduce `unsafe_hash_algo()` Taylor Blau @ 2025-01-17 22:03 ` Taylor Blau 2025-01-17 22:03 ` [PATCH v3 6/8] t/helper/test-hash.c: " Taylor Blau ` (3 subsequent siblings) 8 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-17 22:03 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt Instead of calling the unsafe_ hash function variants directly, make use of the shared 'algop' pointer by initializing it to: f->algop = unsafe_hash_algo(the_hash_algo); , thus making all calls use the unsafe variants directly. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- csum-file.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/csum-file.c b/csum-file.c index 7a71121e340..ebffc80ef71 100644 --- a/csum-file.c +++ b/csum-file.c @@ -50,7 +50,7 @@ void hashflush(struct hashfile *f) if (offset) { if (!f->skip_hash) - f->algop->unsafe_update_fn(&f->ctx, f->buffer, offset); + f->algop->update_fn(&f->ctx, f->buffer, offset); flush(f, f->buffer, offset); f->offset = 0; } @@ -73,7 +73,7 @@ int finalize_hashfile(struct hashfile *f, unsigned char *result, if (f->skip_hash) hashclr(f->buffer, f->algop); else - f->algop->unsafe_final_fn(f->buffer, &f->ctx); + f->algop->final_fn(f->buffer, &f->ctx); if (result) hashcpy(result, f->buffer, f->algop); @@ -128,7 +128,7 @@ void hashwrite(struct hashfile *f, const void *buf, unsigned int count) * f->offset is necessarily zero. */ if (!f->skip_hash) - f->algop->unsafe_update_fn(&f->ctx, buf, nr); + f->algop->update_fn(&f->ctx, buf, nr); flush(f, buf, nr); } else { /* @@ -175,8 +175,8 @@ static struct hashfile *hashfd_internal(int fd, const char *name, f->do_crc = 0; f->skip_hash = 0; - f->algop = the_hash_algo; - f->algop->unsafe_init_fn(&f->ctx); + f->algop = unsafe_hash_algo(the_hash_algo); + f->algop->init_fn(&f->ctx); f->buffer_len = buffer_len; f->buffer = xmalloc(buffer_len); @@ -210,7 +210,7 @@ void hashfile_checkpoint(struct hashfile *f, struct hashfile_checkpoint *checkpo { hashflush(f); checkpoint->offset = f->total; - f->algop->unsafe_clone_fn(&checkpoint->ctx, &f->ctx); + f->algop->clone_fn(&checkpoint->ctx, &f->ctx); } int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoint *checkpoint) @@ -221,7 +221,7 @@ int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoint *checkpoint lseek(f->fd, offset, SEEK_SET) != offset) return -1; f->total = offset; - f->algop->unsafe_clone_fn(&f->ctx, &checkpoint->ctx); + f->algop->clone_fn(&f->ctx, &checkpoint->ctx); f->offset = 0; /* hashflush() was called in checkpoint */ return 0; } @@ -242,15 +242,15 @@ int hashfile_checksum_valid(const unsigned char *data, size_t total_len) { unsigned char got[GIT_MAX_RAWSZ]; git_hash_ctx ctx; - const struct git_hash_algo *algop = the_hash_algo; + const struct git_hash_algo *algop = unsafe_hash_algo(the_hash_algo); size_t data_len = total_len - algop->rawsz; if (total_len < algop->rawsz) return 0; /* say "too short"? */ - algop->unsafe_init_fn(&ctx); - algop->unsafe_update_fn(&ctx, data, data_len); - algop->unsafe_final_fn(got, &ctx); + algop->init_fn(&ctx); + algop->update_fn(&ctx, data, data_len); + algop->final_fn(got, &ctx); return hasheq(got, data + data_len, algop); } -- 2.48.0.rc2.35.g0c4d006e6e8 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v3 6/8] t/helper/test-hash.c: use unsafe_hash_algo() 2025-01-17 22:03 ` [PATCH v3 " Taylor Blau ` (4 preceding siblings ...) 2025-01-17 22:03 ` [PATCH v3 5/8] csum-file.c: use unsafe_hash_algo() Taylor Blau @ 2025-01-17 22:03 ` Taylor Blau 2025-01-17 22:03 ` [PATCH v3 7/8] csum-file: introduce hashfile_checkpoint_init() Taylor Blau ` (2 subsequent siblings) 8 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-17 22:03 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt Remove a series of conditionals within the shared cmd_hash_impl() helper that powers the 'sha1' and 'sha1-unsafe' helpers. Instead, replace them with a single conditional that transforms the specified hash algorithm into its unsafe variant. Then all subsequent calls can directly use whatever function it wants to call without having to decide between the safe and unsafe variants. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- t/helper/test-hash.c | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/t/helper/test-hash.c b/t/helper/test-hash.c index d0ee668df95..aa82638c621 100644 --- a/t/helper/test-hash.c +++ b/t/helper/test-hash.c @@ -9,6 +9,8 @@ int cmd_hash_impl(int ac, const char **av, int algo, int unsafe) int binary = 0; char *buffer; const struct git_hash_algo *algop = &hash_algos[algo]; + if (unsafe) + algop = unsafe_hash_algo(algop); if (ac == 2) { if (!strcmp(av[1], "-b")) @@ -27,10 +29,7 @@ int cmd_hash_impl(int ac, const char **av, int algo, int unsafe) die("OOPS"); } - if (unsafe) - algop->unsafe_init_fn(&ctx); - else - algop->init_fn(&ctx); + algop->init_fn(&ctx); while (1) { ssize_t sz, this_sz; @@ -49,15 +48,9 @@ int cmd_hash_impl(int ac, const char **av, int algo, int unsafe) } if (this_sz == 0) break; - if (unsafe) - algop->unsafe_update_fn(&ctx, buffer, this_sz); - else - algop->update_fn(&ctx, buffer, this_sz); + algop->update_fn(&ctx, buffer, this_sz); } - if (unsafe) - algop->unsafe_final_fn(hash, &ctx); - else - algop->final_fn(hash, &ctx); + algop->final_fn(hash, &ctx); if (binary) fwrite(hash, 1, algop->rawsz, stdout); -- 2.48.0.rc2.35.g0c4d006e6e8 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v3 7/8] csum-file: introduce hashfile_checkpoint_init() 2025-01-17 22:03 ` [PATCH v3 " Taylor Blau ` (5 preceding siblings ...) 2025-01-17 22:03 ` [PATCH v3 6/8] t/helper/test-hash.c: " Taylor Blau @ 2025-01-17 22:03 ` Taylor Blau 2025-01-17 22:03 ` [PATCH v3 8/8] hash.h: drop unsafe_ function variants Taylor Blau 2025-01-18 12:28 ` [PATCH v3 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Jeff King 8 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-17 22:03 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt In 106140a99f (builtin/fast-import: fix segfault with unsafe SHA1 backend, 2024-12-30) and 9218c0bfe1 (bulk-checkin: fix segfault with unsafe SHA1 backend, 2024-12-30), we observed the effects of failing to initialize a hashfile_checkpoint with the same hash function implementation as is used by the hashfile it is used to checkpoint. While both 106140a99f and 9218c0bfe1 work around the immediate crash, changing the hash function implementation within the hashfile API to, for example, the non-unsafe variant would re-introduce the crash. This is a result of the tight coupling between initializing hashfiles and hashfile_checkpoints. Introduce and use a new function which ensures that both parts of a hashfile and hashfile_checkpoint pair use the same hash function implementation to avoid such crashes. A few things worth noting: - In the change to builtin/fast-import.c::stream_blob(), we can see that by removing the explicit reference to 'the_hash_algo->unsafe_init_fn()', we are hardened against the hashfile API changing away from the_hash_algo (or its unsafe variant) in the future. - The bulk-checkin code no longer needs to explicitly zero-initialize the hashfile_checkpoint, since it is now done as a result of calling 'hashfile_checkpoint_init()'. - Also in the bulk-checkin code, we add an additional call to prepare_to_stream() outside of the main loop in order to initialize 'state->f' so we know which hash function implementation to use when calling 'hashfile_checkpoint_init()'. This is OK, since subsequent 'prepare_to_stream()' calls are noops. However, we only need to call 'prepare_to_stream()' when we have the HASH_WRITE_OBJECT bit set in our flags. Without that bit, calling 'prepare_to_stream()' does not assign 'state->f', so we have nothing to initialize. - Other uses of the 'checkpoint' in 'deflate_blob_to_pack()' are appropriately guarded. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com> --- builtin/fast-import.c | 2 +- bulk-checkin.c | 9 ++++++--- csum-file.c | 7 +++++++ csum-file.h | 1 + 4 files changed, 15 insertions(+), 4 deletions(-) diff --git a/builtin/fast-import.c b/builtin/fast-import.c index 0f86392761a..4a6c7ab52ac 100644 --- a/builtin/fast-import.c +++ b/builtin/fast-import.c @@ -1106,7 +1106,7 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark) || (pack_size + PACK_SIZE_THRESHOLD + len) < pack_size) cycle_packfile(); - the_hash_algo->unsafe_init_fn(&checkpoint.ctx); + hashfile_checkpoint_init(pack_file, &checkpoint); hashfile_checkpoint(pack_file, &checkpoint); offset = checkpoint.offset; diff --git a/bulk-checkin.c b/bulk-checkin.c index 433070a3bda..892176d23d2 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -261,7 +261,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, git_hash_ctx ctx; unsigned char obuf[16384]; unsigned header_len; - struct hashfile_checkpoint checkpoint = {0}; + struct hashfile_checkpoint checkpoint; struct pack_idx_entry *idx = NULL; seekback = lseek(fd, 0, SEEK_CUR); @@ -272,12 +272,15 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, OBJ_BLOB, size); the_hash_algo->init_fn(&ctx); the_hash_algo->update_fn(&ctx, obuf, header_len); - the_hash_algo->unsafe_init_fn(&checkpoint.ctx); /* Note: idx is non-NULL when we are writing */ - if ((flags & HASH_WRITE_OBJECT) != 0) + if ((flags & HASH_WRITE_OBJECT) != 0) { CALLOC_ARRAY(idx, 1); + prepare_to_stream(state, flags); + hashfile_checkpoint_init(state->f, &checkpoint); + } + already_hashed_to = 0; while (1) { diff --git a/csum-file.c b/csum-file.c index ebffc80ef71..232121f415f 100644 --- a/csum-file.c +++ b/csum-file.c @@ -206,6 +206,13 @@ struct hashfile *hashfd_throughput(int fd, const char *name, struct progress *tp return hashfd_internal(fd, name, tp, 8 * 1024); } +void hashfile_checkpoint_init(struct hashfile *f, + struct hashfile_checkpoint *checkpoint) +{ + memset(checkpoint, 0, sizeof(*checkpoint)); + f->algop->init_fn(&checkpoint->ctx); +} + void hashfile_checkpoint(struct hashfile *f, struct hashfile_checkpoint *checkpoint) { hashflush(f); diff --git a/csum-file.h b/csum-file.h index 2b45f4673a2..b7475f16c20 100644 --- a/csum-file.h +++ b/csum-file.h @@ -36,6 +36,7 @@ struct hashfile_checkpoint { git_hash_ctx ctx; }; +void hashfile_checkpoint_init(struct hashfile *, struct hashfile_checkpoint *); void hashfile_checkpoint(struct hashfile *, struct hashfile_checkpoint *); int hashfile_truncate(struct hashfile *, struct hashfile_checkpoint *); -- 2.48.0.rc2.35.g0c4d006e6e8 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v3 8/8] hash.h: drop unsafe_ function variants 2025-01-17 22:03 ` [PATCH v3 " Taylor Blau ` (6 preceding siblings ...) 2025-01-17 22:03 ` [PATCH v3 7/8] csum-file: introduce hashfile_checkpoint_init() Taylor Blau @ 2025-01-17 22:03 ` Taylor Blau 2025-01-18 12:28 ` [PATCH v3 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Jeff King 8 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-17 22:03 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt Now that all callers have been converted from: the_hash_algo->unsafe_init_fn(); to unsafe_hash_algo(the_hash_algo)->init_fn(); and similar, we can remove the scaffolding for the unsafe_ function variants and force callers to use the new unsafe_hash_algo() mechanic instead. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- hash.h | 15 --------------- object-file.c | 15 --------------- 2 files changed, 30 deletions(-) diff --git a/hash.h b/hash.h index a68a2b6a161..68d4292e6da 100644 --- a/hash.h +++ b/hash.h @@ -282,21 +282,6 @@ struct git_hash_algo { /* The hash finalization function for object IDs. */ git_hash_final_oid_fn final_oid_fn; - /* The non-cryptographic hash initialization function. */ - git_hash_init_fn unsafe_init_fn; - - /* The non-cryptographic hash context cloning function. */ - git_hash_clone_fn unsafe_clone_fn; - - /* The non-cryptographic hash update function. */ - git_hash_update_fn unsafe_update_fn; - - /* The non-cryptographic hash finalization function. */ - git_hash_final_fn unsafe_final_fn; - - /* The non-cryptographic hash finalization function. */ - git_hash_final_oid_fn unsafe_final_oid_fn; - /* The OID of the empty tree. */ const struct object_id *empty_tree; diff --git a/object-file.c b/object-file.c index 43efa4ca361..c4b42dd4be9 100644 --- a/object-file.c +++ b/object-file.c @@ -230,11 +230,6 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { .update_fn = git_hash_unknown_update, .final_fn = git_hash_unknown_final, .final_oid_fn = git_hash_unknown_final_oid, - .unsafe_init_fn = git_hash_unknown_init, - .unsafe_clone_fn = git_hash_unknown_clone, - .unsafe_update_fn = git_hash_unknown_update, - .unsafe_final_fn = git_hash_unknown_final, - .unsafe_final_oid_fn = git_hash_unknown_final_oid, .empty_tree = NULL, .empty_blob = NULL, .null_oid = NULL, @@ -250,11 +245,6 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { .update_fn = git_hash_sha1_update, .final_fn = git_hash_sha1_final, .final_oid_fn = git_hash_sha1_final_oid, - .unsafe_init_fn = git_hash_sha1_init_unsafe, - .unsafe_clone_fn = git_hash_sha1_clone_unsafe, - .unsafe_update_fn = git_hash_sha1_update_unsafe, - .unsafe_final_fn = git_hash_sha1_final_unsafe, - .unsafe_final_oid_fn = git_hash_sha1_final_oid_unsafe, .unsafe = &sha1_unsafe_algo, .empty_tree = &empty_tree_oid, .empty_blob = &empty_blob_oid, @@ -271,11 +261,6 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { .update_fn = git_hash_sha256_update, .final_fn = git_hash_sha256_final, .final_oid_fn = git_hash_sha256_final_oid, - .unsafe_init_fn = git_hash_sha256_init, - .unsafe_clone_fn = git_hash_sha256_clone, - .unsafe_update_fn = git_hash_sha256_update, - .unsafe_final_fn = git_hash_sha256_final, - .unsafe_final_oid_fn = git_hash_sha256_final_oid, .empty_tree = &empty_tree_oid_sha256, .empty_blob = &empty_blob_oid_sha256, .null_oid = &null_oid_sha256, -- 2.48.0.rc2.35.g0c4d006e6e8 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH v3 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants 2025-01-17 22:03 ` [PATCH v3 " Taylor Blau ` (7 preceding siblings ...) 2025-01-17 22:03 ` [PATCH v3 8/8] hash.h: drop unsafe_ function variants Taylor Blau @ 2025-01-18 12:28 ` Jeff King 2025-01-18 12:43 ` Jeff King 8 siblings, 1 reply; 60+ messages in thread From: Jeff King @ 2025-01-18 12:28 UTC (permalink / raw) To: Taylor Blau; +Cc: git, Junio C Hamano, Elijah Newren, Patrick Steinhardt On Fri, Jan 17, 2025 at 05:03:07PM -0500, Taylor Blau wrote: > + Note that hash_algo_by_ptr() needs an adjustment to allow passing in the > + unsafe variant of a hash function. All other query functions on the > + hash_algos array will continue to return the safe variants of any > + function. > + > Suggested-by: Jeff King <peff@peff.net> > Signed-off-by: Taylor Blau <me@ttaylorr.com> > > @@ hash.h: struct git_hash_algo { > }; > extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS]; > > -@@ hash.h: static inline int hash_algo_by_ptr(const struct git_hash_algo *p) > - return p - hash_algos; > +@@ hash.h: int hash_algo_by_length(int len); > + /* Identical, except for a pointer to struct git_hash_algo. */ > + static inline int hash_algo_by_ptr(const struct git_hash_algo *p) > + { > +- return p - hash_algos; > ++ size_t i; > ++ for (i = 0; i < GIT_HASH_NALGOS; i++) { > ++ const struct git_hash_algo *algop = &hash_algos[i]; > ++ if (p == algop || (algop->unsafe && p == algop->unsafe)) > ++ return i; > ++ } > ++ return GIT_HASH_UNKNOWN; > } OK, so this version introduces the loop we discussed earlier. I think we can probably dismiss any performance loss as theoretical unless somebody can think of a good way to measure. It seems like worrying about it is probably a premature micro-optimization. It is a little quirky that it loses the transitive nature of hash_algo_by_ptr() and hash_algo_by_id(). So this is unsafe: /* start with unsafe variant */ algo = unsafe_hash_algo(the_hash_algo); algo->init_fn(...); /* returns GIT_HASH_SHA1, even for the unsafe variant */ id = hash_algo_by_ptr(algo); /* returns the safe variant */ algo = hash_algo_by_id(id); /* oops, this is mix-and-matching */ alog->update_fn(...); Now obviously that sort of round-trip is a silly thing to do in a single function. Could it happen across a larger call-chain, where the id is stored in a struct and passed somewhere far away? I guess so, but it also seems kind of unlikely. It does make me wonder if hash_algo_by_ptr() should just return UNKNOWN for the unsafe variant. Then we'd at least get a loud error from the caller (as opposed to the code before your patch, which is undefined behavior). I dunno. -Peff ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v3 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants 2025-01-18 12:28 ` [PATCH v3 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Jeff King @ 2025-01-18 12:43 ` Jeff King 2025-01-22 21:31 ` Junio C Hamano 0 siblings, 1 reply; 60+ messages in thread From: Jeff King @ 2025-01-18 12:43 UTC (permalink / raw) To: Taylor Blau; +Cc: git, Junio C Hamano, Elijah Newren, Patrick Steinhardt On Sat, Jan 18, 2025 at 07:28:31AM -0500, Jeff King wrote: > It does make me wonder if hash_algo_by_ptr() should just return UNKNOWN > for the unsafe variant. Then we'd at least get a loud error from the > caller (as opposed to the code before your patch, which is undefined > behavior). I dunno. Doing this: diff --git a/hash.h b/hash.h index 68d4292e6d..ad2c919991 100644 --- a/hash.h +++ b/hash.h @@ -311,7 +311,7 @@ static inline int hash_algo_by_ptr(const struct git_hash_algo *p) size_t i; for (i = 0; i < GIT_HASH_NALGOS; i++) { const struct git_hash_algo *algop = &hash_algos[i]; - if (p == algop || (algop->unsafe && p == algop->unsafe)) + if (p == algop) return i; } return GIT_HASH_UNKNOWN; on top of your series doesn't trigger any issues in the test suite. Which I think shows that we would never have triggered the UB from the original implementation, either[1]. Still, I think I prefer your loop to being one errant function call away from undefined behavior. But maybe returning UNKNOWN (as above) is a safer bet than losing the inverse nature of the by_ptr() and by_id() functions, which could otherwise cause hard-to-find interactions? -Peff [1] One thing that still puzzles me. If you further add a BUG() to that function when we'd return UNKNOWN, you can see that we trigger a few cases in the test suite where we pass in NULL (e.g., because we're not in a repo). I think these are already UB with the existing "p - hash_algos" implementation! We'll return what is effectively "p" cast to an int. E.g. if I do this: diff --git a/hash.h b/hash.h index 756166ce5e..ff31156416 100644 --- a/hash.h +++ b/hash.h @@ -320,6 +320,8 @@ int hash_algo_by_length(int len); /* Identical, except for a pointer to struct git_hash_algo. */ static inline int hash_algo_by_ptr(const struct git_hash_algo *p) { + if (!p) + BUG("null hash, return is %d", (int)(p - hash_algos)); return p - hash_algos; } then t1517 shows grep dying with: BUG: hash.h:324: null hash, return is -1646754892 I think it "works" because the backtrace is: [...] #5 0x0000556d436f6b71 in BUG_fl (file=0x556d43790e8b "hash.h", line=324, fmt=0x556d43790e73 "null hash, return is %d") at usage.c:335 #6 0x0000556d4357c2f9 in hash_algo_by_ptr (p=0x0) at /home/peff/compile/git/hash.h:324 #7 0x0000556d4357c437 in oidclr (oid=0x7ffdf5810ea4, algop=0x0) at /home/peff/compile/git/hash.h:392 #8 0x0000556d435801c7 in prep_exclude (dir=0x7ffdf5811190, istate=0x556d608959c0, base=0x556d6089da50 "nums", baselen=0) at dir.c:1699 [...] #16 0x0000556d4344dd4a in cmd_grep (argc=0, argv=0x556d60895ee8, prefix=0x0, repo=0x0) at builtin/grep.c:1257 So we probably write a totally garbage algo field into the object_id, but nobody ever ends up looking at it. Probably something we should clean up, but way out of scope for your series. But I do think it reinforces that returning UNKNOWN is an improvement; it avoids undefined behavior and anybody who tried to use it should get a BUG() from calling git_hash_unknown_*() functions. ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH v3 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants 2025-01-18 12:43 ` Jeff King @ 2025-01-22 21:31 ` Junio C Hamano 0 siblings, 0 replies; 60+ messages in thread From: Junio C Hamano @ 2025-01-22 21:31 UTC (permalink / raw) To: Jeff King; +Cc: Taylor Blau, git, Elijah Newren, Patrick Steinhardt Jeff King <peff@peff.net> writes: > .... But maybe > returning UNKNOWN (as above) is a safer bet than losing the inverse > nature of the by_ptr() and by_id() functions, which could otherwise > cause hard-to-find interactions? Yeah, that sounds sensible. > I think it "works" because the backtrace is: > > [...] > #5 0x0000556d436f6b71 in BUG_fl (file=0x556d43790e8b "hash.h", line=324, > fmt=0x556d43790e73 "null hash, return is %d") at usage.c:335 > #6 0x0000556d4357c2f9 in hash_algo_by_ptr (p=0x0) at /home/peff/compile/git/hash.h:324 > #7 0x0000556d4357c437 in oidclr (oid=0x7ffdf5810ea4, algop=0x0) at /home/peff/compile/git/hash.h:392 > #8 0x0000556d435801c7 in prep_exclude (dir=0x7ffdf5811190, istate=0x556d608959c0, base=0x556d6089da50 "nums", > baselen=0) at dir.c:1699 > [...] > #16 0x0000556d4344dd4a in cmd_grep (argc=0, argv=0x556d60895ee8, prefix=0x0, repo=0x0) at builtin/grep.c:1257 > > So we probably write a totally garbage algo field into the > object_id, but nobody ever ends up looking at it. Probably > something we should clean up, but way out of scope for your series. Yeah, that looks bad, but I think it is fine to leave it outside the series to fix that. > But I do think it reinforces that returning UNKNOWN is an > improvement; it avoids undefined behavior and anybody who tried to > use it should get a BUG() from calling git_hash_unknown_*() > functions. Sounds like a sensitive direction to go in. Thanks. ^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v4 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants 2024-11-20 19:13 [PATCH 0/6] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Taylor Blau ` (7 preceding siblings ...) 2025-01-17 22:03 ` [PATCH v3 " Taylor Blau @ 2025-01-23 17:34 ` Taylor Blau 2025-01-23 17:34 ` [PATCH v4 1/8] t/helper/test-tool: implement sha1-unsafe helper Taylor Blau ` (8 more replies) 8 siblings, 9 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-23 17:34 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt (This series is based on 14650065b7 (RelNotes/2.48.0: fix typos etc., 2025-01-07)). Here is a hopefully final version of my series to harden the unsafe hash algorithm changes added in v2.47.0. The only difference from last time is that hash_algo_by_ptr() now returns GIT_HASH_UNKNOWN for NULL and unsafe variants, which is a strict improvement from both v3 of this series and the status-quo on master. As usual, a (small) range-diff is included below for convenience, and the original cover letter is as follows: ------------ This series implements an idea discussed in [2] which suggests that we introduce a way to access a wrapped version of a 'struct git_hash_algo' which represents the unsafe variant of that algorithm, rather than having individual unsafe_ functions (like unsafe_init_fn() versus init_fn(), etc.). This approach is relatively straightforward to implement, and removes a significant deficiency in the original implementation of unsafe/non-cryptographic hash functions by making it impossible to switch between safe- and unsafe variants of hash functions. It also cleans up the sha1-unsafe test helper's implementation by removing a large number of "if (unsafe)"-style conditionals. The series is laid out as follows: * The first two patches prepare the hashfile API for the upcoming change: csum-file: store the hash algorithm as a struct field csum-file.c: extract algop from hashfile_checksum_valid() * The next patch implements the new 'unsafe_hash_algo()' function at the heart of this series' approach: hash.h: introduce `unsafe_hash_algo()` * The next two patches convert existing callers to use the new 'unsafe_hash_algo()' function, instead of switching between safe and unsafe_ variants of individual functions: csum-file.c: use unsafe_hash_algo() t/helper/test-hash.c: use unsafe_hash_algo() * The final patch drops the unsafe_ function variants following all callers being converted to use the new pattern: hash.h: drop unsafe_ function variants Thanks in advance for your review! [1]: https://lore.kernel.org/git/20241230-pks-meson-sha1-unsafe-v1-0-efb276e171f5@pks.im/ [2]: https://lore.kernel.org/git/20241107013915.GA961214@coredump.intra.peff.net/ Taylor Blau (8): t/helper/test-tool: implement sha1-unsafe helper csum-file: store the hash algorithm as a struct field csum-file.c: extract algop from hashfile_checksum_valid() hash.h: introduce `unsafe_hash_algo()` csum-file.c: use unsafe_hash_algo() t/helper/test-hash.c: use unsafe_hash_algo() csum-file: introduce hashfile_checkpoint_init() hash.h: drop unsafe_ function variants builtin/fast-import.c | 2 +- bulk-checkin.c | 9 ++++++--- csum-file.c | 40 +++++++++++++++++++++++++--------------- csum-file.h | 2 ++ hash.h | 28 ++++++++++++---------------- object-file.c | 41 ++++++++++++++++++++++++++--------------- t/helper/test-hash.c | 4 +++- t/helper/test-sha1.c | 7 ++++++- t/helper/test-sha1.sh | 38 ++++++++++++++++++++++---------------- t/helper/test-sha256.c | 2 +- t/helper/test-tool.c | 1 + t/helper/test-tool.h | 3 ++- 12 files changed, 107 insertions(+), 70 deletions(-) Range-diff against v3: 1: ae6b8c75294 = 1: b64ae238248 t/helper/test-tool: implement sha1-unsafe helper 2: 2b79c76e471 = 2: d03f503682f csum-file: store the hash algorithm as a struct field 3: d7deb3f338e = 3: 73554c3b881 csum-file.c: extract algop from hashfile_checksum_valid() 4: b6b2bb2714f ! 4: ae01f1f4274 hash.h: introduce `unsafe_hash_algo()` @@ hash.h: int hash_algo_by_length(int len); + size_t i; + for (i = 0; i < GIT_HASH_NALGOS; i++) { + const struct git_hash_algo *algop = &hash_algos[i]; -+ if (p == algop || (algop->unsafe && p == algop->unsafe)) ++ if (p == algop) + return i; + } + return GIT_HASH_UNKNOWN; 5: ca67de80971 = 5: 64a850c77ae csum-file.c: use unsafe_hash_algo() 6: 21b175b07ff = 6: 3dcccccf752 t/helper/test-hash.c: use unsafe_hash_algo() 7: 850d4f407db = 7: da97157c4a1 csum-file: introduce hashfile_checkpoint_init() 8: 0c4d006e6e8 = 8: 0ba27182b5e hash.h: drop unsafe_ function variants base-commit: 14650065b76b28d3cfa9453356ac5669b19e706e -- 2.48.0.rc2.35.gd215225db14 ^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v4 1/8] t/helper/test-tool: implement sha1-unsafe helper 2025-01-23 17:34 ` [PATCH v4 " Taylor Blau @ 2025-01-23 17:34 ` Taylor Blau 2025-01-23 17:34 ` [PATCH v4 2/8] csum-file: store the hash algorithm as a struct field Taylor Blau ` (7 subsequent siblings) 8 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-23 17:34 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt With the new "unsafe" SHA-1 build knob, it is convenient to have a test-tool that can exercise Git's unsafe SHA-1 wrappers for testing, similar to 't/helper/test-tool sha1'. Implement that helper by altering the implementation of that test-tool (in cmd_hash_impl(), which is generic and parameterized over different hash functions) to conditionally run the unsafe variants of the chosen hash function, and expose the new behavior via a new 'sha1-unsafe' test helper. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- t/helper/test-hash.c | 17 +++++++++++++---- t/helper/test-sha1.c | 7 ++++++- t/helper/test-sha1.sh | 38 ++++++++++++++++++++++---------------- t/helper/test-sha256.c | 2 +- t/helper/test-tool.c | 1 + t/helper/test-tool.h | 3 ++- 6 files changed, 45 insertions(+), 23 deletions(-) diff --git a/t/helper/test-hash.c b/t/helper/test-hash.c index 45d829c908f..d0ee668df95 100644 --- a/t/helper/test-hash.c +++ b/t/helper/test-hash.c @@ -1,7 +1,7 @@ #include "test-tool.h" #include "hex.h" -int cmd_hash_impl(int ac, const char **av, int algo) +int cmd_hash_impl(int ac, const char **av, int algo, int unsafe) { git_hash_ctx ctx; unsigned char hash[GIT_MAX_HEXSZ]; @@ -27,7 +27,10 @@ int cmd_hash_impl(int ac, const char **av, int algo) die("OOPS"); } - algop->init_fn(&ctx); + if (unsafe) + algop->unsafe_init_fn(&ctx); + else + algop->init_fn(&ctx); while (1) { ssize_t sz, this_sz; @@ -46,9 +49,15 @@ int cmd_hash_impl(int ac, const char **av, int algo) } if (this_sz == 0) break; - algop->update_fn(&ctx, buffer, this_sz); + if (unsafe) + algop->unsafe_update_fn(&ctx, buffer, this_sz); + else + algop->update_fn(&ctx, buffer, this_sz); } - algop->final_fn(hash, &ctx); + if (unsafe) + algop->unsafe_final_fn(hash, &ctx); + else + algop->final_fn(hash, &ctx); if (binary) fwrite(hash, 1, algop->rawsz, stdout); diff --git a/t/helper/test-sha1.c b/t/helper/test-sha1.c index e60d000c039..349540c4df8 100644 --- a/t/helper/test-sha1.c +++ b/t/helper/test-sha1.c @@ -3,7 +3,7 @@ int cmd__sha1(int ac, const char **av) { - return cmd_hash_impl(ac, av, GIT_HASH_SHA1); + return cmd_hash_impl(ac, av, GIT_HASH_SHA1, 0); } int cmd__sha1_is_sha1dc(int argc UNUSED, const char **argv UNUSED) @@ -13,3 +13,8 @@ int cmd__sha1_is_sha1dc(int argc UNUSED, const char **argv UNUSED) #endif return 1; } + +int cmd__sha1_unsafe(int ac, const char **av) +{ + return cmd_hash_impl(ac, av, GIT_HASH_SHA1, 1); +} diff --git a/t/helper/test-sha1.sh b/t/helper/test-sha1.sh index 84594885c70..bf387d3db14 100755 --- a/t/helper/test-sha1.sh +++ b/t/helper/test-sha1.sh @@ -3,25 +3,31 @@ dd if=/dev/zero bs=1048576 count=100 2>/dev/null | /usr/bin/time t/helper/test-tool sha1 >/dev/null +dd if=/dev/zero bs=1048576 count=100 2>/dev/null | +/usr/bin/time t/helper/test-tool sha1-unsafe >/dev/null + while read expect cnt pfx do case "$expect" in '#'*) continue ;; esac - actual=$( - { - test -z "$pfx" || echo "$pfx" - dd if=/dev/zero bs=1048576 count=$cnt 2>/dev/null | - perl -pe 'y/\000/g/' - } | ./t/helper/test-tool sha1 $cnt - ) - if test "$expect" = "$actual" - then - echo "OK: $expect $cnt $pfx" - else - echo >&2 "OOPS: $cnt" - echo >&2 "expect: $expect" - echo >&2 "actual: $actual" - exit 1 - fi + for sha1 in sha1 sha1-unsafe + do + actual=$( + { + test -z "$pfx" || echo "$pfx" + dd if=/dev/zero bs=1048576 count=$cnt 2>/dev/null | + perl -pe 'y/\000/g/' + } | ./t/helper/test-tool $sha1 $cnt + ) + if test "$expect" = "$actual" + then + echo "OK ($sha1): $expect $cnt $pfx" + else + echo >&2 "OOPS ($sha1): $cnt" + echo >&2 "expect ($sha1): $expect" + echo >&2 "actual ($sha1): $actual" + exit 1 + fi + done done <<EOF da39a3ee5e6b4b0d3255bfef95601890afd80709 0 3f786850e387550fdab836ed7e6dc881de23001b 0 a diff --git a/t/helper/test-sha256.c b/t/helper/test-sha256.c index 2fb20438f3c..7fd0aa1fcd3 100644 --- a/t/helper/test-sha256.c +++ b/t/helper/test-sha256.c @@ -3,5 +3,5 @@ int cmd__sha256(int ac, const char **av) { - return cmd_hash_impl(ac, av, GIT_HASH_SHA256); + return cmd_hash_impl(ac, av, GIT_HASH_SHA256, 0); } diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 4a7aa993ba9..958452ef12e 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -70,6 +70,7 @@ static struct test_cmd cmds[] = { { "serve-v2", cmd__serve_v2 }, { "sha1", cmd__sha1 }, { "sha1-is-sha1dc", cmd__sha1_is_sha1dc }, + { "sha1-unsafe", cmd__sha1_unsafe }, { "sha256", cmd__sha256 }, { "sigchain", cmd__sigchain }, { "simple-ipc", cmd__simple_ipc }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index 21802ac27da..24149edd414 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -63,6 +63,7 @@ int cmd__scrap_cache_tree(int argc, const char **argv); int cmd__serve_v2(int argc, const char **argv); int cmd__sha1(int argc, const char **argv); int cmd__sha1_is_sha1dc(int argc, const char **argv); +int cmd__sha1_unsafe(int argc, const char **argv); int cmd__sha256(int argc, const char **argv); int cmd__sigchain(int argc, const char **argv); int cmd__simple_ipc(int argc, const char **argv); @@ -81,6 +82,6 @@ int cmd__windows_named_pipe(int argc, const char **argv); #endif int cmd__write_cache(int argc, const char **argv); -int cmd_hash_impl(int ac, const char **av, int algo); +int cmd_hash_impl(int ac, const char **av, int algo, int unsafe); #endif -- 2.48.0.rc2.35.gd215225db14 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v4 2/8] csum-file: store the hash algorithm as a struct field 2025-01-23 17:34 ` [PATCH v4 " Taylor Blau 2025-01-23 17:34 ` [PATCH v4 1/8] t/helper/test-tool: implement sha1-unsafe helper Taylor Blau @ 2025-01-23 17:34 ` Taylor Blau 2025-01-23 17:34 ` [PATCH v4 3/8] csum-file.c: extract algop from hashfile_checksum_valid() Taylor Blau ` (6 subsequent siblings) 8 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-23 17:34 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt Throughout the hashfile API, we rely on a reference to 'the_hash_algo', and call its _unsafe function variants directly. Prepare for a future change where we may use a different 'git_hash_algo' pointer (instead of just relying on 'the_hash_algo' throughout) by making the 'git_hash_algo' pointer a member of the 'hashfile' structure itself. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- csum-file.c | 20 +++++++++++--------- csum-file.h | 1 + 2 files changed, 12 insertions(+), 9 deletions(-) diff --git a/csum-file.c b/csum-file.c index 5716016e12e..b28cd047e3f 100644 --- a/csum-file.c +++ b/csum-file.c @@ -50,7 +50,7 @@ void hashflush(struct hashfile *f) if (offset) { if (!f->skip_hash) - the_hash_algo->unsafe_update_fn(&f->ctx, f->buffer, offset); + f->algop->unsafe_update_fn(&f->ctx, f->buffer, offset); flush(f, f->buffer, offset); f->offset = 0; } @@ -71,14 +71,14 @@ int finalize_hashfile(struct hashfile *f, unsigned char *result, hashflush(f); if (f->skip_hash) - hashclr(f->buffer, the_repository->hash_algo); + hashclr(f->buffer, f->algop); else - the_hash_algo->unsafe_final_fn(f->buffer, &f->ctx); + f->algop->unsafe_final_fn(f->buffer, &f->ctx); if (result) - hashcpy(result, f->buffer, the_repository->hash_algo); + hashcpy(result, f->buffer, f->algop); if (flags & CSUM_HASH_IN_STREAM) - flush(f, f->buffer, the_hash_algo->rawsz); + flush(f, f->buffer, f->algop->rawsz); if (flags & CSUM_FSYNC) fsync_component_or_die(component, f->fd, f->name); if (flags & CSUM_CLOSE) { @@ -128,7 +128,7 @@ void hashwrite(struct hashfile *f, const void *buf, unsigned int count) * f->offset is necessarily zero. */ if (!f->skip_hash) - the_hash_algo->unsafe_update_fn(&f->ctx, buf, nr); + f->algop->unsafe_update_fn(&f->ctx, buf, nr); flush(f, buf, nr); } else { /* @@ -174,7 +174,9 @@ static struct hashfile *hashfd_internal(int fd, const char *name, f->name = name; f->do_crc = 0; f->skip_hash = 0; - the_hash_algo->unsafe_init_fn(&f->ctx); + + f->algop = the_hash_algo; + f->algop->unsafe_init_fn(&f->ctx); f->buffer_len = buffer_len; f->buffer = xmalloc(buffer_len); @@ -208,7 +210,7 @@ void hashfile_checkpoint(struct hashfile *f, struct hashfile_checkpoint *checkpo { hashflush(f); checkpoint->offset = f->total; - the_hash_algo->unsafe_clone_fn(&checkpoint->ctx, &f->ctx); + f->algop->unsafe_clone_fn(&checkpoint->ctx, &f->ctx); } int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoint *checkpoint) @@ -219,7 +221,7 @@ int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoint *checkpoint lseek(f->fd, offset, SEEK_SET) != offset) return -1; f->total = offset; - the_hash_algo->unsafe_clone_fn(&f->ctx, &checkpoint->ctx); + f->algop->unsafe_clone_fn(&f->ctx, &checkpoint->ctx); f->offset = 0; /* hashflush() was called in checkpoint */ return 0; } diff --git a/csum-file.h b/csum-file.h index 7c73da0a40a..2b45f4673a2 100644 --- a/csum-file.h +++ b/csum-file.h @@ -20,6 +20,7 @@ struct hashfile { size_t buffer_len; unsigned char *buffer; unsigned char *check_buffer; + const struct git_hash_algo *algop; /** * If non-zero, skip_hash indicates that we should -- 2.48.0.rc2.35.gd215225db14 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v4 3/8] csum-file.c: extract algop from hashfile_checksum_valid() 2025-01-23 17:34 ` [PATCH v4 " Taylor Blau 2025-01-23 17:34 ` [PATCH v4 1/8] t/helper/test-tool: implement sha1-unsafe helper Taylor Blau 2025-01-23 17:34 ` [PATCH v4 2/8] csum-file: store the hash algorithm as a struct field Taylor Blau @ 2025-01-23 17:34 ` Taylor Blau 2025-01-23 17:34 ` [PATCH v4 4/8] hash.h: introduce `unsafe_hash_algo()` Taylor Blau ` (5 subsequent siblings) 8 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-23 17:34 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt Perform a similar transformation as in the previous commit, but focused instead on hashfile_checksum_valid(). This function does not work with a hashfile structure itself, and instead validates the raw contents of a file written using the hashfile API. We'll want to be prepared for a similar change to this function in the future, so prepare ourselves for that by extracting 'the_hash_algo' into its own field for use within this function. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- csum-file.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/csum-file.c b/csum-file.c index b28cd047e3f..7a71121e340 100644 --- a/csum-file.c +++ b/csum-file.c @@ -242,14 +242,15 @@ int hashfile_checksum_valid(const unsigned char *data, size_t total_len) { unsigned char got[GIT_MAX_RAWSZ]; git_hash_ctx ctx; - size_t data_len = total_len - the_hash_algo->rawsz; + const struct git_hash_algo *algop = the_hash_algo; + size_t data_len = total_len - algop->rawsz; - if (total_len < the_hash_algo->rawsz) + if (total_len < algop->rawsz) return 0; /* say "too short"? */ - the_hash_algo->unsafe_init_fn(&ctx); - the_hash_algo->unsafe_update_fn(&ctx, data, data_len); - the_hash_algo->unsafe_final_fn(got, &ctx); + algop->unsafe_init_fn(&ctx); + algop->unsafe_update_fn(&ctx, data, data_len); + algop->unsafe_final_fn(got, &ctx); - return hasheq(got, data + data_len, the_repository->hash_algo); + return hasheq(got, data + data_len, algop); } -- 2.48.0.rc2.35.gd215225db14 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v4 4/8] hash.h: introduce `unsafe_hash_algo()` 2025-01-23 17:34 ` [PATCH v4 " Taylor Blau ` (2 preceding siblings ...) 2025-01-23 17:34 ` [PATCH v4 3/8] csum-file.c: extract algop from hashfile_checksum_valid() Taylor Blau @ 2025-01-23 17:34 ` Taylor Blau 2025-01-23 17:34 ` [PATCH v4 5/8] csum-file.c: use unsafe_hash_algo() Taylor Blau ` (4 subsequent siblings) 8 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-23 17:34 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt In 253ed9ecff (hash.h: scaffolding for _unsafe hashing variants, 2024-09-26), we introduced "unsafe" variants of the SHA-1 hashing functions by introducing new functions like "unsafe_init_fn()" and so on. This approach has a major shortcoming that callers must remember to consistently use one variant or the other. Failing to consistently use (or not use) the unsafe variants can lead to crashes at best, or subtle memory corruption issues at worst. In the hashfile API, this isn't difficult to achieve, but verifying that all callers consistently use the unsafe variants is somewhat of a chore given how spread out all of the callers are. In the sha1 and sha1-unsafe test helpers, all of the calls to various hash functions are guarded by an "if (unsafe)" conditional, which is repetitive and cumbersome. Address these issues by introducing a new pattern whereby one 'git_hash_algo' can return a pointer to another 'git_hash_algo' that represents the unsafe version of itself. So instead of having something like: if (unsafe) the_hash_algo->init_fn(...); the_hash_algo->update_fn(...); the_hash_algo->final_fn(...); else the_hash_algo->unsafe_init_fn(...); the_hash_algo->unsafe_update_fn(...); the_hash_algo->unsafe_final_fn(...); we can instead write: struct git_hash_algo *algop = the_hash_algo; if (unsafe) algop = unsafe_hash_algo(algop); algop->init_fn(...); algop->update_fn(...); algop->final_fn(...); This removes the existing shortcoming by no longer forcing the caller to "remember" which variant of the hash functions it wants to call, only to hold onto a 'struct git_hash_algo' pointer that is initialized once. Similarly, while there currently is still a way to "mix" safe and unsafe functions, this too will go away after subsequent commits remove all direct calls to the unsafe_ variants. Note that hash_algo_by_ptr() needs an adjustment to allow passing in the unsafe variant of a hash function. All other query functions on the hash_algos array will continue to return the safe variants of any function. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> --- hash.h | 13 ++++++++++++- object-file.c | 26 ++++++++++++++++++++++++++ 2 files changed, 38 insertions(+), 1 deletion(-) diff --git a/hash.h b/hash.h index 756166ce5e8..0bf63cedfa4 100644 --- a/hash.h +++ b/hash.h @@ -305,6 +305,9 @@ struct git_hash_algo { /* The all-zeros OID. */ const struct object_id *null_oid; + + /* The unsafe variant of this hash function, if one exists. */ + const struct git_hash_algo *unsafe; }; extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS]; @@ -320,9 +323,17 @@ int hash_algo_by_length(int len); /* Identical, except for a pointer to struct git_hash_algo. */ static inline int hash_algo_by_ptr(const struct git_hash_algo *p) { - return p - hash_algos; + size_t i; + for (i = 0; i < GIT_HASH_NALGOS; i++) { + const struct git_hash_algo *algop = &hash_algos[i]; + if (p == algop) + return i; + } + return GIT_HASH_UNKNOWN; } +const struct git_hash_algo *unsafe_hash_algo(const struct git_hash_algo *algop); + const struct object_id *null_oid(void); static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2, const struct git_hash_algo *algop) diff --git a/object-file.c b/object-file.c index 5b792b3dd42..43efa4ca361 100644 --- a/object-file.c +++ b/object-file.c @@ -202,6 +202,22 @@ static void git_hash_unknown_final_oid(struct object_id *oid UNUSED, BUG("trying to finalize unknown hash"); } +static const struct git_hash_algo sha1_unsafe_algo = { + .name = "sha1", + .format_id = GIT_SHA1_FORMAT_ID, + .rawsz = GIT_SHA1_RAWSZ, + .hexsz = GIT_SHA1_HEXSZ, + .blksz = GIT_SHA1_BLKSZ, + .init_fn = git_hash_sha1_init_unsafe, + .clone_fn = git_hash_sha1_clone_unsafe, + .update_fn = git_hash_sha1_update_unsafe, + .final_fn = git_hash_sha1_final_unsafe, + .final_oid_fn = git_hash_sha1_final_oid_unsafe, + .empty_tree = &empty_tree_oid, + .empty_blob = &empty_blob_oid, + .null_oid = &null_oid_sha1, +}; + const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { { .name = NULL, @@ -239,6 +255,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { .unsafe_update_fn = git_hash_sha1_update_unsafe, .unsafe_final_fn = git_hash_sha1_final_unsafe, .unsafe_final_oid_fn = git_hash_sha1_final_oid_unsafe, + .unsafe = &sha1_unsafe_algo, .empty_tree = &empty_tree_oid, .empty_blob = &empty_blob_oid, .null_oid = &null_oid_sha1, @@ -305,6 +322,15 @@ int hash_algo_by_length(int len) return GIT_HASH_UNKNOWN; } +const struct git_hash_algo *unsafe_hash_algo(const struct git_hash_algo *algop) +{ + /* If we have a faster "unsafe" implementation, use that. */ + if (algop->unsafe) + return algop->unsafe; + /* Otherwise use the default one. */ + return algop; +} + /* * This is meant to hold a *small* number of objects that you would * want repo_read_object_file() to be able to return, but yet you do not want -- 2.48.0.rc2.35.gd215225db14 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v4 5/8] csum-file.c: use unsafe_hash_algo() 2025-01-23 17:34 ` [PATCH v4 " Taylor Blau ` (3 preceding siblings ...) 2025-01-23 17:34 ` [PATCH v4 4/8] hash.h: introduce `unsafe_hash_algo()` Taylor Blau @ 2025-01-23 17:34 ` Taylor Blau 2025-01-23 17:34 ` [PATCH v4 6/8] t/helper/test-hash.c: " Taylor Blau ` (3 subsequent siblings) 8 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-23 17:34 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt Instead of calling the unsafe_ hash function variants directly, make use of the shared 'algop' pointer by initializing it to: f->algop = unsafe_hash_algo(the_hash_algo); , thus making all calls use the unsafe variants directly. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- csum-file.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/csum-file.c b/csum-file.c index 7a71121e340..ebffc80ef71 100644 --- a/csum-file.c +++ b/csum-file.c @@ -50,7 +50,7 @@ void hashflush(struct hashfile *f) if (offset) { if (!f->skip_hash) - f->algop->unsafe_update_fn(&f->ctx, f->buffer, offset); + f->algop->update_fn(&f->ctx, f->buffer, offset); flush(f, f->buffer, offset); f->offset = 0; } @@ -73,7 +73,7 @@ int finalize_hashfile(struct hashfile *f, unsigned char *result, if (f->skip_hash) hashclr(f->buffer, f->algop); else - f->algop->unsafe_final_fn(f->buffer, &f->ctx); + f->algop->final_fn(f->buffer, &f->ctx); if (result) hashcpy(result, f->buffer, f->algop); @@ -128,7 +128,7 @@ void hashwrite(struct hashfile *f, const void *buf, unsigned int count) * f->offset is necessarily zero. */ if (!f->skip_hash) - f->algop->unsafe_update_fn(&f->ctx, buf, nr); + f->algop->update_fn(&f->ctx, buf, nr); flush(f, buf, nr); } else { /* @@ -175,8 +175,8 @@ static struct hashfile *hashfd_internal(int fd, const char *name, f->do_crc = 0; f->skip_hash = 0; - f->algop = the_hash_algo; - f->algop->unsafe_init_fn(&f->ctx); + f->algop = unsafe_hash_algo(the_hash_algo); + f->algop->init_fn(&f->ctx); f->buffer_len = buffer_len; f->buffer = xmalloc(buffer_len); @@ -210,7 +210,7 @@ void hashfile_checkpoint(struct hashfile *f, struct hashfile_checkpoint *checkpo { hashflush(f); checkpoint->offset = f->total; - f->algop->unsafe_clone_fn(&checkpoint->ctx, &f->ctx); + f->algop->clone_fn(&checkpoint->ctx, &f->ctx); } int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoint *checkpoint) @@ -221,7 +221,7 @@ int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoint *checkpoint lseek(f->fd, offset, SEEK_SET) != offset) return -1; f->total = offset; - f->algop->unsafe_clone_fn(&f->ctx, &checkpoint->ctx); + f->algop->clone_fn(&f->ctx, &checkpoint->ctx); f->offset = 0; /* hashflush() was called in checkpoint */ return 0; } @@ -242,15 +242,15 @@ int hashfile_checksum_valid(const unsigned char *data, size_t total_len) { unsigned char got[GIT_MAX_RAWSZ]; git_hash_ctx ctx; - const struct git_hash_algo *algop = the_hash_algo; + const struct git_hash_algo *algop = unsafe_hash_algo(the_hash_algo); size_t data_len = total_len - algop->rawsz; if (total_len < algop->rawsz) return 0; /* say "too short"? */ - algop->unsafe_init_fn(&ctx); - algop->unsafe_update_fn(&ctx, data, data_len); - algop->unsafe_final_fn(got, &ctx); + algop->init_fn(&ctx); + algop->update_fn(&ctx, data, data_len); + algop->final_fn(got, &ctx); return hasheq(got, data + data_len, algop); } -- 2.48.0.rc2.35.gd215225db14 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v4 6/8] t/helper/test-hash.c: use unsafe_hash_algo() 2025-01-23 17:34 ` [PATCH v4 " Taylor Blau ` (4 preceding siblings ...) 2025-01-23 17:34 ` [PATCH v4 5/8] csum-file.c: use unsafe_hash_algo() Taylor Blau @ 2025-01-23 17:34 ` Taylor Blau 2025-01-23 17:34 ` [PATCH v4 7/8] csum-file: introduce hashfile_checkpoint_init() Taylor Blau ` (2 subsequent siblings) 8 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-23 17:34 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt Remove a series of conditionals within the shared cmd_hash_impl() helper that powers the 'sha1' and 'sha1-unsafe' helpers. Instead, replace them with a single conditional that transforms the specified hash algorithm into its unsafe variant. Then all subsequent calls can directly use whatever function it wants to call without having to decide between the safe and unsafe variants. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- t/helper/test-hash.c | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/t/helper/test-hash.c b/t/helper/test-hash.c index d0ee668df95..aa82638c621 100644 --- a/t/helper/test-hash.c +++ b/t/helper/test-hash.c @@ -9,6 +9,8 @@ int cmd_hash_impl(int ac, const char **av, int algo, int unsafe) int binary = 0; char *buffer; const struct git_hash_algo *algop = &hash_algos[algo]; + if (unsafe) + algop = unsafe_hash_algo(algop); if (ac == 2) { if (!strcmp(av[1], "-b")) @@ -27,10 +29,7 @@ int cmd_hash_impl(int ac, const char **av, int algo, int unsafe) die("OOPS"); } - if (unsafe) - algop->unsafe_init_fn(&ctx); - else - algop->init_fn(&ctx); + algop->init_fn(&ctx); while (1) { ssize_t sz, this_sz; @@ -49,15 +48,9 @@ int cmd_hash_impl(int ac, const char **av, int algo, int unsafe) } if (this_sz == 0) break; - if (unsafe) - algop->unsafe_update_fn(&ctx, buffer, this_sz); - else - algop->update_fn(&ctx, buffer, this_sz); + algop->update_fn(&ctx, buffer, this_sz); } - if (unsafe) - algop->unsafe_final_fn(hash, &ctx); - else - algop->final_fn(hash, &ctx); + algop->final_fn(hash, &ctx); if (binary) fwrite(hash, 1, algop->rawsz, stdout); -- 2.48.0.rc2.35.gd215225db14 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v4 7/8] csum-file: introduce hashfile_checkpoint_init() 2025-01-23 17:34 ` [PATCH v4 " Taylor Blau ` (5 preceding siblings ...) 2025-01-23 17:34 ` [PATCH v4 6/8] t/helper/test-hash.c: " Taylor Blau @ 2025-01-23 17:34 ` Taylor Blau 2025-01-23 17:34 ` [PATCH v4 8/8] hash.h: drop unsafe_ function variants Taylor Blau 2025-01-23 18:30 ` [PATCH v4 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Junio C Hamano 8 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-23 17:34 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt In 106140a99f (builtin/fast-import: fix segfault with unsafe SHA1 backend, 2024-12-30) and 9218c0bfe1 (bulk-checkin: fix segfault with unsafe SHA1 backend, 2024-12-30), we observed the effects of failing to initialize a hashfile_checkpoint with the same hash function implementation as is used by the hashfile it is used to checkpoint. While both 106140a99f and 9218c0bfe1 work around the immediate crash, changing the hash function implementation within the hashfile API to, for example, the non-unsafe variant would re-introduce the crash. This is a result of the tight coupling between initializing hashfiles and hashfile_checkpoints. Introduce and use a new function which ensures that both parts of a hashfile and hashfile_checkpoint pair use the same hash function implementation to avoid such crashes. A few things worth noting: - In the change to builtin/fast-import.c::stream_blob(), we can see that by removing the explicit reference to 'the_hash_algo->unsafe_init_fn()', we are hardened against the hashfile API changing away from the_hash_algo (or its unsafe variant) in the future. - The bulk-checkin code no longer needs to explicitly zero-initialize the hashfile_checkpoint, since it is now done as a result of calling 'hashfile_checkpoint_init()'. - Also in the bulk-checkin code, we add an additional call to prepare_to_stream() outside of the main loop in order to initialize 'state->f' so we know which hash function implementation to use when calling 'hashfile_checkpoint_init()'. This is OK, since subsequent 'prepare_to_stream()' calls are noops. However, we only need to call 'prepare_to_stream()' when we have the HASH_WRITE_OBJECT bit set in our flags. Without that bit, calling 'prepare_to_stream()' does not assign 'state->f', so we have nothing to initialize. - Other uses of the 'checkpoint' in 'deflate_blob_to_pack()' are appropriately guarded. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com> --- builtin/fast-import.c | 2 +- bulk-checkin.c | 9 ++++++--- csum-file.c | 7 +++++++ csum-file.h | 1 + 4 files changed, 15 insertions(+), 4 deletions(-) diff --git a/builtin/fast-import.c b/builtin/fast-import.c index 0f86392761a..4a6c7ab52ac 100644 --- a/builtin/fast-import.c +++ b/builtin/fast-import.c @@ -1106,7 +1106,7 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark) || (pack_size + PACK_SIZE_THRESHOLD + len) < pack_size) cycle_packfile(); - the_hash_algo->unsafe_init_fn(&checkpoint.ctx); + hashfile_checkpoint_init(pack_file, &checkpoint); hashfile_checkpoint(pack_file, &checkpoint); offset = checkpoint.offset; diff --git a/bulk-checkin.c b/bulk-checkin.c index 433070a3bda..892176d23d2 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -261,7 +261,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, git_hash_ctx ctx; unsigned char obuf[16384]; unsigned header_len; - struct hashfile_checkpoint checkpoint = {0}; + struct hashfile_checkpoint checkpoint; struct pack_idx_entry *idx = NULL; seekback = lseek(fd, 0, SEEK_CUR); @@ -272,12 +272,15 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, OBJ_BLOB, size); the_hash_algo->init_fn(&ctx); the_hash_algo->update_fn(&ctx, obuf, header_len); - the_hash_algo->unsafe_init_fn(&checkpoint.ctx); /* Note: idx is non-NULL when we are writing */ - if ((flags & HASH_WRITE_OBJECT) != 0) + if ((flags & HASH_WRITE_OBJECT) != 0) { CALLOC_ARRAY(idx, 1); + prepare_to_stream(state, flags); + hashfile_checkpoint_init(state->f, &checkpoint); + } + already_hashed_to = 0; while (1) { diff --git a/csum-file.c b/csum-file.c index ebffc80ef71..232121f415f 100644 --- a/csum-file.c +++ b/csum-file.c @@ -206,6 +206,13 @@ struct hashfile *hashfd_throughput(int fd, const char *name, struct progress *tp return hashfd_internal(fd, name, tp, 8 * 1024); } +void hashfile_checkpoint_init(struct hashfile *f, + struct hashfile_checkpoint *checkpoint) +{ + memset(checkpoint, 0, sizeof(*checkpoint)); + f->algop->init_fn(&checkpoint->ctx); +} + void hashfile_checkpoint(struct hashfile *f, struct hashfile_checkpoint *checkpoint) { hashflush(f); diff --git a/csum-file.h b/csum-file.h index 2b45f4673a2..b7475f16c20 100644 --- a/csum-file.h +++ b/csum-file.h @@ -36,6 +36,7 @@ struct hashfile_checkpoint { git_hash_ctx ctx; }; +void hashfile_checkpoint_init(struct hashfile *, struct hashfile_checkpoint *); void hashfile_checkpoint(struct hashfile *, struct hashfile_checkpoint *); int hashfile_truncate(struct hashfile *, struct hashfile_checkpoint *); -- 2.48.0.rc2.35.gd215225db14 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v4 8/8] hash.h: drop unsafe_ function variants 2025-01-23 17:34 ` [PATCH v4 " Taylor Blau ` (6 preceding siblings ...) 2025-01-23 17:34 ` [PATCH v4 7/8] csum-file: introduce hashfile_checkpoint_init() Taylor Blau @ 2025-01-23 17:34 ` Taylor Blau 2025-01-23 18:30 ` [PATCH v4 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Junio C Hamano 8 siblings, 0 replies; 60+ messages in thread From: Taylor Blau @ 2025-01-23 17:34 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren, Patrick Steinhardt Now that all callers have been converted from: the_hash_algo->unsafe_init_fn(); to unsafe_hash_algo(the_hash_algo)->init_fn(); and similar, we can remove the scaffolding for the unsafe_ function variants and force callers to use the new unsafe_hash_algo() mechanic instead. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- hash.h | 15 --------------- object-file.c | 15 --------------- 2 files changed, 30 deletions(-) diff --git a/hash.h b/hash.h index 0bf63cedfa4..ad2c919991c 100644 --- a/hash.h +++ b/hash.h @@ -282,21 +282,6 @@ struct git_hash_algo { /* The hash finalization function for object IDs. */ git_hash_final_oid_fn final_oid_fn; - /* The non-cryptographic hash initialization function. */ - git_hash_init_fn unsafe_init_fn; - - /* The non-cryptographic hash context cloning function. */ - git_hash_clone_fn unsafe_clone_fn; - - /* The non-cryptographic hash update function. */ - git_hash_update_fn unsafe_update_fn; - - /* The non-cryptographic hash finalization function. */ - git_hash_final_fn unsafe_final_fn; - - /* The non-cryptographic hash finalization function. */ - git_hash_final_oid_fn unsafe_final_oid_fn; - /* The OID of the empty tree. */ const struct object_id *empty_tree; diff --git a/object-file.c b/object-file.c index 43efa4ca361..c4b42dd4be9 100644 --- a/object-file.c +++ b/object-file.c @@ -230,11 +230,6 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { .update_fn = git_hash_unknown_update, .final_fn = git_hash_unknown_final, .final_oid_fn = git_hash_unknown_final_oid, - .unsafe_init_fn = git_hash_unknown_init, - .unsafe_clone_fn = git_hash_unknown_clone, - .unsafe_update_fn = git_hash_unknown_update, - .unsafe_final_fn = git_hash_unknown_final, - .unsafe_final_oid_fn = git_hash_unknown_final_oid, .empty_tree = NULL, .empty_blob = NULL, .null_oid = NULL, @@ -250,11 +245,6 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { .update_fn = git_hash_sha1_update, .final_fn = git_hash_sha1_final, .final_oid_fn = git_hash_sha1_final_oid, - .unsafe_init_fn = git_hash_sha1_init_unsafe, - .unsafe_clone_fn = git_hash_sha1_clone_unsafe, - .unsafe_update_fn = git_hash_sha1_update_unsafe, - .unsafe_final_fn = git_hash_sha1_final_unsafe, - .unsafe_final_oid_fn = git_hash_sha1_final_oid_unsafe, .unsafe = &sha1_unsafe_algo, .empty_tree = &empty_tree_oid, .empty_blob = &empty_blob_oid, @@ -271,11 +261,6 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { .update_fn = git_hash_sha256_update, .final_fn = git_hash_sha256_final, .final_oid_fn = git_hash_sha256_final_oid, - .unsafe_init_fn = git_hash_sha256_init, - .unsafe_clone_fn = git_hash_sha256_clone, - .unsafe_update_fn = git_hash_sha256_update, - .unsafe_final_fn = git_hash_sha256_final, - .unsafe_final_oid_fn = git_hash_sha256_final_oid, .empty_tree = &empty_tree_oid_sha256, .empty_blob = &empty_blob_oid_sha256, .null_oid = &null_oid_sha256, -- 2.48.0.rc2.35.gd215225db14 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH v4 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants 2025-01-23 17:34 ` [PATCH v4 " Taylor Blau ` (7 preceding siblings ...) 2025-01-23 17:34 ` [PATCH v4 8/8] hash.h: drop unsafe_ function variants Taylor Blau @ 2025-01-23 18:30 ` Junio C Hamano 2025-01-23 18:50 ` Jeff King 8 siblings, 1 reply; 60+ messages in thread From: Junio C Hamano @ 2025-01-23 18:30 UTC (permalink / raw) To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren, Patrick Steinhardt Taylor Blau <me@ttaylorr.com> writes: > (This series is based on 14650065b7 (RelNotes/2.48.0: fix typos etc., > 2025-01-07)). > > Here is a hopefully final version of my series to harden the unsafe hash > algorithm changes added in v2.47.0. The only difference from last time > is that hash_algo_by_ptr() now returns GIT_HASH_UNKNOWN for NULL and > unsafe variants, which is a strict improvement from both v3 of this > series and the status-quo on master. > > As usual, a (small) range-diff is included below for convenience, and > the original cover letter is as follows: Thanks. I'll mark it for 'next', unless there are any further comments. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v4 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants 2025-01-23 18:30 ` [PATCH v4 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Junio C Hamano @ 2025-01-23 18:50 ` Jeff King 0 siblings, 0 replies; 60+ messages in thread From: Jeff King @ 2025-01-23 18:50 UTC (permalink / raw) To: Junio C Hamano; +Cc: Taylor Blau, git, Elijah Newren, Patrick Steinhardt On Thu, Jan 23, 2025 at 10:30:06AM -0800, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > > > (This series is based on 14650065b7 (RelNotes/2.48.0: fix typos etc., > > 2025-01-07)). > > > > Here is a hopefully final version of my series to harden the unsafe hash > > algorithm changes added in v2.47.0. The only difference from last time > > is that hash_algo_by_ptr() now returns GIT_HASH_UNKNOWN for NULL and > > unsafe variants, which is a strict improvement from both v3 of this > > series and the status-quo on master. > > > > As usual, a (small) range-diff is included below for convenience, and > > the original cover letter is as follows: > > Thanks. I'll mark it for 'next', unless there are any further > comments. Nope, it looks good to me. -Peff ^ permalink raw reply [flat|nested] 60+ messages in thread
end of thread, other threads:[~2025-01-23 18:50 UTC | newest] Thread overview: 60+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-20 19:13 [PATCH 0/6] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Taylor Blau 2024-11-20 19:13 ` [PATCH 1/6] csum-file: store the hash algorithm as a struct field Taylor Blau 2024-11-21 9:18 ` Jeff King 2024-11-20 19:13 ` [PATCH 2/6] csum-file.c: extract algop from hashfile_checksum_valid() Taylor Blau 2024-11-20 19:13 ` [PATCH 3/6] hash.h: introduce `unsafe_hash_algo()` Taylor Blau 2024-11-21 9:37 ` Jeff King 2024-11-22 0:39 ` brian m. carlson 2024-11-22 8:25 ` Jeff King 2024-11-22 20:37 ` brian m. carlson 2025-01-10 21:38 ` Taylor Blau 2025-01-11 2:45 ` Jeff King 2024-11-20 19:13 ` [PATCH 4/6] csum-file.c: use unsafe_hash_algo() Taylor Blau 2024-11-20 19:13 ` [PATCH 5/6] t/helper/test-hash.c: " Taylor Blau 2024-11-20 19:13 ` [PATCH 6/6] hash.h: drop unsafe_ function variants Taylor Blau 2024-11-21 9:41 ` Jeff King 2025-01-08 19:14 ` [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Taylor Blau 2025-01-08 19:14 ` [PATCH v2 1/8] t/helper/test-tool: implement sha1-unsafe helper Taylor Blau 2025-01-08 19:14 ` [PATCH v2 2/8] csum-file: store the hash algorithm as a struct field Taylor Blau 2025-01-16 11:48 ` Patrick Steinhardt 2025-01-17 21:17 ` Taylor Blau 2025-01-08 19:14 ` [PATCH v2 3/8] csum-file.c: extract algop from hashfile_checksum_valid() Taylor Blau 2025-01-08 19:14 ` [PATCH v2 4/8] hash.h: introduce `unsafe_hash_algo()` Taylor Blau 2025-01-16 11:49 ` Patrick Steinhardt 2025-01-17 21:18 ` Taylor Blau 2025-01-08 19:14 ` [PATCH v2 5/8] csum-file.c: use unsafe_hash_algo() Taylor Blau 2025-01-08 19:14 ` [PATCH v2 6/8] t/helper/test-hash.c: " Taylor Blau 2025-01-08 19:14 ` [PATCH v2 7/8] csum-file: introduce hashfile_checkpoint_init() Taylor Blau 2025-01-10 10:37 ` Jeff King 2025-01-10 21:50 ` Taylor Blau 2025-01-17 21:30 ` Taylor Blau 2025-01-18 12:15 ` Jeff King 2025-01-08 19:14 ` [PATCH v2 8/8] hash.h: drop unsafe_ function variants Taylor Blau 2025-01-10 10:41 ` [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Jeff King 2025-01-10 21:29 ` Taylor Blau 2025-01-11 2:42 ` Jeff King 2025-01-11 0:14 ` Junio C Hamano 2025-01-11 17:14 ` Taylor Blau 2025-01-17 22:03 ` [PATCH v3 " Taylor Blau 2025-01-17 22:03 ` [PATCH v3 1/8] t/helper/test-tool: implement sha1-unsafe helper Taylor Blau 2025-01-17 22:03 ` [PATCH v3 2/8] csum-file: store the hash algorithm as a struct field Taylor Blau 2025-01-17 22:03 ` [PATCH v3 3/8] csum-file.c: extract algop from hashfile_checksum_valid() Taylor Blau 2025-01-17 22:03 ` [PATCH v3 4/8] hash.h: introduce `unsafe_hash_algo()` Taylor Blau 2025-01-17 22:03 ` [PATCH v3 5/8] csum-file.c: use unsafe_hash_algo() Taylor Blau 2025-01-17 22:03 ` [PATCH v3 6/8] t/helper/test-hash.c: " Taylor Blau 2025-01-17 22:03 ` [PATCH v3 7/8] csum-file: introduce hashfile_checkpoint_init() Taylor Blau 2025-01-17 22:03 ` [PATCH v3 8/8] hash.h: drop unsafe_ function variants Taylor Blau 2025-01-18 12:28 ` [PATCH v3 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Jeff King 2025-01-18 12:43 ` Jeff King 2025-01-22 21:31 ` Junio C Hamano 2025-01-23 17:34 ` [PATCH v4 " Taylor Blau 2025-01-23 17:34 ` [PATCH v4 1/8] t/helper/test-tool: implement sha1-unsafe helper Taylor Blau 2025-01-23 17:34 ` [PATCH v4 2/8] csum-file: store the hash algorithm as a struct field Taylor Blau 2025-01-23 17:34 ` [PATCH v4 3/8] csum-file.c: extract algop from hashfile_checksum_valid() Taylor Blau 2025-01-23 17:34 ` [PATCH v4 4/8] hash.h: introduce `unsafe_hash_algo()` Taylor Blau 2025-01-23 17:34 ` [PATCH v4 5/8] csum-file.c: use unsafe_hash_algo() Taylor Blau 2025-01-23 17:34 ` [PATCH v4 6/8] t/helper/test-hash.c: " Taylor Blau 2025-01-23 17:34 ` [PATCH v4 7/8] csum-file: introduce hashfile_checkpoint_init() Taylor Blau 2025-01-23 17:34 ` [PATCH v4 8/8] hash.h: drop unsafe_ function variants Taylor Blau 2025-01-23 18:30 ` [PATCH v4 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Junio C Hamano 2025-01-23 18:50 ` Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).