All of lore.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>,
	Elijah Newren <newren@gmail.com>, Patrick Steinhardt <ps@pks.im>
Subject: [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants
Date: Wed, 8 Jan 2025 14:14:29 -0500	[thread overview]
Message-ID: <cover.1736363652.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1732130001.git.me@ttaylorr.com>

(This series is rebased on 'master', which is 14650065b7
(RelNotes/2.48.0: fix typos etc., 2025-01-07) at the time of writing).

The bulk of this series is unchanged since last time, but a new seventh
patch that further hardens the hashfile_checkpoint callers on top of
Patrick's recent series[1].

The original cover letter is as follows:

------------

This series implements an idea discussed in [2] which suggests that we
introduce a way to access a wrapped version of a 'struct git_hash_algo'
which represents the unsafe variant of that algorithm, rather than
having individual unsafe_ functions (like unsafe_init_fn() versus
init_fn(), etc.).

This approach is relatively straightforward to implement, and removes a
significant deficiency in the original implementation of
unsafe/non-cryptographic hash functions by making it impossible to
switch between safe- and unsafe variants of hash functions. It also
cleans up the sha1-unsafe test helper's implementation by removing a
large number of "if (unsafe)"-style conditionals.

The series is laid out as follows:

  * The first two patches prepare the hashfile API for the upcoming
    change:

      csum-file: store the hash algorithm as a struct field
      csum-file.c: extract algop from hashfile_checksum_valid()

  * The next patch implements the new 'unsafe_hash_algo()' function at
    the heart of this series' approach:

      hash.h: introduce `unsafe_hash_algo()`

  * The next two patches convert existing callers to use the new
    'unsafe_hash_algo()' function, instead of switching between safe and
    unsafe_ variants of individual functions:

      csum-file.c: use unsafe_hash_algo()
      t/helper/test-hash.c: use unsafe_hash_algo()

  * The final patch drops the unsafe_ function variants following all
    callers being converted to use the new pattern:

      hash.h: drop unsafe_ function variants

Thanks in advance for your review!

[1]: https://lore.kernel.org/git/20241230-pks-meson-sha1-unsafe-v1-0-efb276e171f5@pks.im/
[2]: https://lore.kernel.org/git/20241107013915.GA961214@coredump.intra.peff.net/

Taylor Blau (8):
  t/helper/test-tool: implement sha1-unsafe helper
  csum-file: store the hash algorithm as a struct field
  csum-file.c: extract algop from hashfile_checksum_valid()
  hash.h: introduce `unsafe_hash_algo()`
  csum-file.c: use unsafe_hash_algo()
  t/helper/test-hash.c: use unsafe_hash_algo()
  csum-file: introduce hashfile_checkpoint_init()
  hash.h: drop unsafe_ function variants

 builtin/fast-import.c  |  2 +-
 bulk-checkin.c         |  9 ++++++---
 csum-file.c            | 40 +++++++++++++++++++++++++---------------
 csum-file.h            |  2 ++
 hash.h                 | 20 +++++---------------
 object-file.c          | 41 ++++++++++++++++++++++++++---------------
 t/helper/test-hash.c   |  4 +++-
 t/helper/test-sha1.c   |  7 ++++++-
 t/helper/test-sha1.sh  | 38 ++++++++++++++++++++++----------------
 t/helper/test-sha256.c |  2 +-
 t/helper/test-tool.c   |  1 +
 t/helper/test-tool.h   |  3 ++-
 12 files changed, 100 insertions(+), 69 deletions(-)

Range-diff against v1:
2:  d8c1fc78b57 ! 1:  4c1523a04f1 t/helper/test-tool: implement sha1-unsafe helper
    @@ Metadata
      ## Commit message ##
         t/helper/test-tool: implement sha1-unsafe helper
     
    -    Add a new helper similar to 't/helper/test-tool sha1' called instead
    -    "sha1-unsafe" which uses the unsafe variant of Git's SHA-1 wrappers.
    +    With the new "unsafe" SHA-1 build knob, it is convenient to have a
    +    test-tool that can exercise Git's unsafe SHA-1 wrappers for testing,
    +    similar to 't/helper/test-tool sha1'.
     
    -    While we're at it, modify the test-sha1.sh script to exercise both
    -    the sha1 and sha1-unsafe test tools to ensure that both produce the
    -    expected hash values.
    +    Implement that helper by altering the implementation of that test-tool
    +    (in cmd_hash_impl(), which is generic and parameterized over different
    +    hash functions) to conditionally run the unsafe variants of the chosen
    +    hash function, and expose the new behavior via a new 'sha1-unsafe' test
    +    helper.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    + ## t/helper/test-hash.c ##
    +@@
    + #include "test-tool.h"
    + #include "hex.h"
    + 
    +-int cmd_hash_impl(int ac, const char **av, int algo)
    ++int cmd_hash_impl(int ac, const char **av, int algo, int unsafe)
    + {
    + 	git_hash_ctx ctx;
    + 	unsigned char hash[GIT_MAX_HEXSZ];
    +@@ t/helper/test-hash.c: int cmd_hash_impl(int ac, const char **av, int algo)
    + 			die("OOPS");
    + 	}
    + 
    +-	algop->init_fn(&ctx);
    ++	if (unsafe)
    ++		algop->unsafe_init_fn(&ctx);
    ++	else
    ++		algop->init_fn(&ctx);
    + 
    + 	while (1) {
    + 		ssize_t sz, this_sz;
    +@@ t/helper/test-hash.c: int cmd_hash_impl(int ac, const char **av, int algo)
    + 		}
    + 		if (this_sz == 0)
    + 			break;
    +-		algop->update_fn(&ctx, buffer, this_sz);
    ++		if (unsafe)
    ++			algop->unsafe_update_fn(&ctx, buffer, this_sz);
    ++		else
    ++			algop->update_fn(&ctx, buffer, this_sz);
    + 	}
    +-	algop->final_fn(hash, &ctx);
    ++	if (unsafe)
    ++		algop->unsafe_final_fn(hash, &ctx);
    ++	else
    ++		algop->final_fn(hash, &ctx);
    + 
    + 	if (binary)
    + 		fwrite(hash, 1, algop->rawsz, stdout);
    +
      ## t/helper/test-sha1.c ##
    +@@
    + 
    + int cmd__sha1(int ac, const char **av)
    + {
    +-	return cmd_hash_impl(ac, av, GIT_HASH_SHA1);
    ++	return cmd_hash_impl(ac, av, GIT_HASH_SHA1, 0);
    + }
    + 
    + int cmd__sha1_is_sha1dc(int argc UNUSED, const char **argv UNUSED)
     @@ t/helper/test-sha1.c: int cmd__sha1_is_sha1dc(int argc UNUSED, const char **argv UNUSED)
      #endif
      	return 1;
    @@ t/helper/test-sha1.sh
      da39a3ee5e6b4b0d3255bfef95601890afd80709 0
      3f786850e387550fdab836ed7e6dc881de23001b 0 a
     
    + ## t/helper/test-sha256.c ##
    +@@
    + 
    + int cmd__sha256(int ac, const char **av)
    + {
    +-	return cmd_hash_impl(ac, av, GIT_HASH_SHA256);
    ++	return cmd_hash_impl(ac, av, GIT_HASH_SHA256, 0);
    + }
    +
      ## t/helper/test-tool.c ##
     @@ t/helper/test-tool.c: static struct test_cmd cmds[] = {
      	{ "serve-v2", cmd__serve_v2 },
    @@ t/helper/test-tool.h: int cmd__scrap_cache_tree(int argc, const char **argv);
      int cmd__sha256(int argc, const char **argv);
      int cmd__sigchain(int argc, const char **argv);
      int cmd__simple_ipc(int argc, const char **argv);
    +@@ t/helper/test-tool.h: int cmd__windows_named_pipe(int argc, const char **argv);
    + #endif
    + int cmd__write_cache(int argc, const char **argv);
    + 
    +-int cmd_hash_impl(int ac, const char **av, int algo);
    ++int cmd_hash_impl(int ac, const char **av, int algo, int unsafe);
    + 
    + #endif
3:  380133a1142 = 2:  99cc44895b5 csum-file: store the hash algorithm as a struct field
4:  e5076f003bf = 3:  1ffab2f8289 csum-file.c: extract algop from hashfile_checksum_valid()
5:  17f92dba34b = 4:  99dcbe2e716 hash.h: introduce `unsafe_hash_algo()`
6:  0d8e12599e2 = 5:  2dcc2aa6803 csum-file.c: use unsafe_hash_algo()
7:  a49a41703e2 = 6:  a2b9ef03080 t/helper/test-hash.c: use unsafe_hash_algo()
1:  0e2fcee6894 ! 7:  94c07fd8a55 t/helper/test-sha1: prepare for an unsafe mode
    @@ Metadata
     Author: Taylor Blau <me@ttaylorr.com>
     
      ## Commit message ##
    -    t/helper/test-sha1: prepare for an unsafe mode
    +    csum-file: introduce hashfile_checkpoint_init()
     
    -    With the new "unsafe" SHA-1 build knob, it would be convenient to have
    -    a test-tool that can exercise Git's unsafe SHA-1 wrappers for testing,
    -    similar to 't/helper/test-tool sha1'.
    +    In 106140a99f (builtin/fast-import: fix segfault with unsafe SHA1
    +    backend, 2024-12-30) and 9218c0bfe1 (bulk-checkin: fix segfault with
    +    unsafe SHA1 backend, 2024-12-30), we observed the effects of failing to
    +    initialize a hashfile_checkpoint with the same hash function
    +    implementation as is used by the hashfile it is used to checkpoint.
     
    -    Prepare for such a helper by altering the implementation of that
    -    test-tool (in cmd_hash_impl(), which is generic and parameterized over
    -    different hash functions) to conditionally run the unsafe variants of
    -    the chosen hash function.
    +    While both 106140a99f and 9218c0bfe1 work around the immediate crash,
    +    changing the hash function implementation within the hashfile API to,
    +    for example, the non-unsafe variant would re-introduce the crash. This
    +    is a result of the tight coupling between initializing hashfiles and
    +    hashfile_checkpoints.
     
    -    The following commit will add a new test-tool which makes use of this
    -    new parameter.
    +    Introduce and use a new function which ensures that both parts of a
    +    hashfile and hashfile_checkpoint pair use the same hash function
    +    implementation to avoid such crashes.
     
    +    A few things worth noting:
    +
    +      - In the change to builtin/fast-import.c::stream_blob(), we can see
    +        that by removing the explicit reference to
    +        'the_hash_algo->unsafe_init_fn()', we are hardened against the
    +        hashfile API changing away from the_hash_algo (or its unsafe
    +        variant) in the future.
    +
    +      - The bulk-checkin code no longer needs to explicitly zero-initialize
    +        the hashfile_checkpoint, since it is now done as a result of calling
    +        'hashfile_checkpoint_init()'.
    +
    +      - Also in the bulk-checkin code, we add an additional call to
    +        prepare_to_stream() outside of the main loop in order to initialize
    +        'state->f' so we know which hash function implementation to use when
    +        calling 'hashfile_checkpoint_init()'.
    +
    +        This is OK, since subsequent 'prepare_to_stream()' calls are noops.
    +        However, we only need to call 'prepare_to_stream()' when we have the
    +        HASH_WRITE_OBJECT bit set in our flags. Without that bit, calling
    +        'prepare_to_stream()' does not assign 'state->f', so we have nothing
    +        to initialize.
    +
    +      - Other uses of the 'checkpoint' in 'deflate_blob_to_pack()' are
    +        appropriately guarded.
    +
    +    Helped-by: Patrick Steinhardt <ps@pks.im>
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    - ## t/helper/test-hash.c ##
    -@@
    - #include "test-tool.h"
    - #include "hex.h"
    + ## builtin/fast-import.c ##
    +@@ builtin/fast-import.c: static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark)
    + 		|| (pack_size + PACK_SIZE_THRESHOLD + len) < pack_size)
    + 		cycle_packfile();
      
    --int cmd_hash_impl(int ac, const char **av, int algo)
    -+int cmd_hash_impl(int ac, const char **av, int algo, int unsafe)
    - {
    +-	the_hash_algo->unsafe_init_fn(&checkpoint.ctx);
    ++	hashfile_checkpoint_init(pack_file, &checkpoint);
    + 	hashfile_checkpoint(pack_file, &checkpoint);
    + 	offset = checkpoint.offset;
    + 
    +
    + ## bulk-checkin.c ##
    +@@ bulk-checkin.c: static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
      	git_hash_ctx ctx;
    - 	unsigned char hash[GIT_MAX_HEXSZ];
    -@@ t/helper/test-hash.c: int cmd_hash_impl(int ac, const char **av, int algo)
    - 			die("OOPS");
    - 	}
    + 	unsigned char obuf[16384];
    + 	unsigned header_len;
    +-	struct hashfile_checkpoint checkpoint = {0};
    ++	struct hashfile_checkpoint checkpoint;
    + 	struct pack_idx_entry *idx = NULL;
      
    --	algop->init_fn(&ctx);
    -+	if (unsafe)
    -+		algop->unsafe_init_fn(&ctx);
    -+	else
    -+		algop->init_fn(&ctx);
    + 	seekback = lseek(fd, 0, SEEK_CUR);
    +@@ bulk-checkin.c: static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
    + 					  OBJ_BLOB, size);
    + 	the_hash_algo->init_fn(&ctx);
    + 	the_hash_algo->update_fn(&ctx, obuf, header_len);
    +-	the_hash_algo->unsafe_init_fn(&checkpoint.ctx);
    + 
    + 	/* Note: idx is non-NULL when we are writing */
    +-	if ((flags & HASH_WRITE_OBJECT) != 0)
    ++	if ((flags & HASH_WRITE_OBJECT) != 0) {
    + 		CALLOC_ARRAY(idx, 1);
    + 
    ++		prepare_to_stream(state, flags);
    ++		hashfile_checkpoint_init(state->f, &checkpoint);
    ++	}
    ++
    + 	already_hashed_to = 0;
      
      	while (1) {
    - 		ssize_t sz, this_sz;
    -@@ t/helper/test-hash.c: int cmd_hash_impl(int ac, const char **av, int algo)
    - 		}
    - 		if (this_sz == 0)
    - 			break;
    --		algop->update_fn(&ctx, buffer, this_sz);
    -+		if (unsafe)
    -+			algop->unsafe_update_fn(&ctx, buffer, this_sz);
    -+		else
    -+			algop->update_fn(&ctx, buffer, this_sz);
    - 	}
    --	algop->final_fn(hash, &ctx);
    -+	if (unsafe)
    -+		algop->unsafe_final_fn(hash, &ctx);
    -+	else
    -+		algop->final_fn(hash, &ctx);
    - 
    - 	if (binary)
    - 		fwrite(hash, 1, algop->rawsz, stdout);
     
    - ## t/helper/test-sha1.c ##
    -@@
    - 
    - int cmd__sha1(int ac, const char **av)
    - {
    --	return cmd_hash_impl(ac, av, GIT_HASH_SHA1);
    -+	return cmd_hash_impl(ac, av, GIT_HASH_SHA1, 0);
    + ## csum-file.c ##
    +@@ csum-file.c: struct hashfile *hashfd_throughput(int fd, const char *name, struct progress *tp
    + 	return hashfd_internal(fd, name, tp, 8 * 1024);
      }
      
    - int cmd__sha1_is_sha1dc(int argc UNUSED, const char **argv UNUSED)
    -
    - ## t/helper/test-sha256.c ##
    -@@
    - 
    - int cmd__sha256(int ac, const char **av)
    ++void hashfile_checkpoint_init(struct hashfile *f,
    ++			      struct hashfile_checkpoint *checkpoint)
    ++{
    ++	memset(checkpoint, 0, sizeof(*checkpoint));
    ++	f->algop->init_fn(&checkpoint->ctx);
    ++}
    ++
    + void hashfile_checkpoint(struct hashfile *f, struct hashfile_checkpoint *checkpoint)
      {
    --	return cmd_hash_impl(ac, av, GIT_HASH_SHA256);
    -+	return cmd_hash_impl(ac, av, GIT_HASH_SHA256, 0);
    - }
    + 	hashflush(f);
     
    - ## t/helper/test-tool.h ##
    -@@ t/helper/test-tool.h: int cmd__windows_named_pipe(int argc, const char **argv);
    - #endif
    - int cmd__write_cache(int argc, const char **argv);
    + ## csum-file.h ##
    +@@ csum-file.h: struct hashfile_checkpoint {
    + 	git_hash_ctx ctx;
    + };
      
    --int cmd_hash_impl(int ac, const char **av, int algo);
    -+int cmd_hash_impl(int ac, const char **av, int algo, int unsafe);
    ++void hashfile_checkpoint_init(struct hashfile *, struct hashfile_checkpoint *);
    + void hashfile_checkpoint(struct hashfile *, struct hashfile_checkpoint *);
    + int hashfile_truncate(struct hashfile *, struct hashfile_checkpoint *);
      
    - #endif
8:  4081ad08549 = 8:  f5579883816 hash.h: drop unsafe_ function variants

base-commit: 14650065b76b28d3cfa9453356ac5669b19e706e
-- 
2.48.0.rc2.33.gaab3d23ed4c

  parent reply	other threads:[~2025-01-08 19:14 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-20 19:13 [PATCH 0/6] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Taylor Blau
2024-11-20 19:13 ` [PATCH 1/6] csum-file: store the hash algorithm as a struct field Taylor Blau
2024-11-21  9:18   ` Jeff King
2024-11-20 19:13 ` [PATCH 2/6] csum-file.c: extract algop from hashfile_checksum_valid() Taylor Blau
2024-11-20 19:13 ` [PATCH 3/6] hash.h: introduce `unsafe_hash_algo()` Taylor Blau
2024-11-21  9:37   ` Jeff King
2024-11-22  0:39     ` brian m. carlson
2024-11-22  8:25       ` Jeff King
2024-11-22 20:37         ` brian m. carlson
2025-01-10 21:38     ` Taylor Blau
2025-01-11  2:45       ` Jeff King
2024-11-20 19:13 ` [PATCH 4/6] csum-file.c: use unsafe_hash_algo() Taylor Blau
2024-11-20 19:13 ` [PATCH 5/6] t/helper/test-hash.c: " Taylor Blau
2024-11-20 19:13 ` [PATCH 6/6] hash.h: drop unsafe_ function variants Taylor Blau
2024-11-21  9:41   ` Jeff King
2025-01-08 19:14 ` Taylor Blau [this message]
2025-01-08 19:14   ` [PATCH v2 1/8] t/helper/test-tool: implement sha1-unsafe helper Taylor Blau
2025-01-08 19:14   ` [PATCH v2 2/8] csum-file: store the hash algorithm as a struct field Taylor Blau
2025-01-16 11:48     ` Patrick Steinhardt
2025-01-17 21:17       ` Taylor Blau
2025-01-08 19:14   ` [PATCH v2 3/8] csum-file.c: extract algop from hashfile_checksum_valid() Taylor Blau
2025-01-08 19:14   ` [PATCH v2 4/8] hash.h: introduce `unsafe_hash_algo()` Taylor Blau
2025-01-16 11:49     ` Patrick Steinhardt
2025-01-17 21:18       ` Taylor Blau
2025-01-08 19:14   ` [PATCH v2 5/8] csum-file.c: use unsafe_hash_algo() Taylor Blau
2025-01-08 19:14   ` [PATCH v2 6/8] t/helper/test-hash.c: " Taylor Blau
2025-01-08 19:14   ` [PATCH v2 7/8] csum-file: introduce hashfile_checkpoint_init() Taylor Blau
2025-01-10 10:37     ` Jeff King
2025-01-10 21:50       ` Taylor Blau
2025-01-17 21:30         ` Taylor Blau
2025-01-18 12:15           ` Jeff King
2025-01-08 19:14   ` [PATCH v2 8/8] hash.h: drop unsafe_ function variants Taylor Blau
2025-01-10 10:41   ` [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Jeff King
2025-01-10 21:29     ` Taylor Blau
2025-01-11  2:42       ` Jeff King
2025-01-11  0:14   ` Junio C Hamano
2025-01-11 17:14     ` Taylor Blau
2025-01-17 22:03 ` [PATCH v3 " Taylor Blau
2025-01-17 22:03   ` [PATCH v3 1/8] t/helper/test-tool: implement sha1-unsafe helper Taylor Blau
2025-01-17 22:03   ` [PATCH v3 2/8] csum-file: store the hash algorithm as a struct field Taylor Blau
2025-01-17 22:03   ` [PATCH v3 3/8] csum-file.c: extract algop from hashfile_checksum_valid() Taylor Blau
2025-01-17 22:03   ` [PATCH v3 4/8] hash.h: introduce `unsafe_hash_algo()` Taylor Blau
2025-01-17 22:03   ` [PATCH v3 5/8] csum-file.c: use unsafe_hash_algo() Taylor Blau
2025-01-17 22:03   ` [PATCH v3 6/8] t/helper/test-hash.c: " Taylor Blau
2025-01-17 22:03   ` [PATCH v3 7/8] csum-file: introduce hashfile_checkpoint_init() Taylor Blau
2025-01-17 22:03   ` [PATCH v3 8/8] hash.h: drop unsafe_ function variants Taylor Blau
2025-01-18 12:28   ` [PATCH v3 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Jeff King
2025-01-18 12:43     ` Jeff King
2025-01-22 21:31       ` Junio C Hamano
2025-01-23 17:34 ` [PATCH v4 " Taylor Blau
2025-01-23 17:34   ` [PATCH v4 1/8] t/helper/test-tool: implement sha1-unsafe helper Taylor Blau
2025-01-23 17:34   ` [PATCH v4 2/8] csum-file: store the hash algorithm as a struct field Taylor Blau
2025-01-23 17:34   ` [PATCH v4 3/8] csum-file.c: extract algop from hashfile_checksum_valid() Taylor Blau
2025-01-23 17:34   ` [PATCH v4 4/8] hash.h: introduce `unsafe_hash_algo()` Taylor Blau
2025-01-23 17:34   ` [PATCH v4 5/8] csum-file.c: use unsafe_hash_algo() Taylor Blau
2025-01-23 17:34   ` [PATCH v4 6/8] t/helper/test-hash.c: " Taylor Blau
2025-01-23 17:34   ` [PATCH v4 7/8] csum-file: introduce hashfile_checkpoint_init() Taylor Blau
2025-01-23 17:34   ` [PATCH v4 8/8] hash.h: drop unsafe_ function variants Taylor Blau
2025-01-23 18:30   ` [PATCH v4 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants Junio C Hamano
2025-01-23 18:50     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1736363652.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.