All of lore.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	Elijah Newren <newren@gmail.com>, Patrick Steinhardt <ps@pks.im>,
	Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v4 0/8] hash.h: support choosing a separate SHA-1 for non-cryptographic uses
Date: Tue, 24 Sep 2024 13:32:01 -0400	[thread overview]
Message-ID: <cover.1727199118.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1725206584.git.me@ttaylorr.com>

Here is a reroll of mine and Peff's series to add a build-time knob to
allow selecting an alternative SHA-1 implementation for
non-cryptographic hashing within Git, starting with the `hashwrite()`
family of functions.

This version is another fairly substantial reroll, with the main
differences being renaming the "fast" SHA-1 options to "unsafe", as well
as not running the collision checks via finalize_object_file() when
handling loose objects (see the relevant patches for details on why).

Note also there is an important bug fix in finalize_object_file() to
unlink() the temporary file when we do run the collision check, but no
collisions were found. This bug was causing a pile-up of tmp_obj_XXXXXX
files in GitHub's infrastructure.

Thanks in advance for your review!

Taylor Blau (8):
  finalize_object_file(): check for name collision before renaming
  finalize_object_file(): refactor unlink_or_warn() placement
  finalize_object_file(): implement collision check
  pack-objects: use finalize_object_file() to rename pack/idx/etc
  sha1: do not redefine `platform_SHA_CTX` and friends
  hash.h: scaffolding for _unsafe hashing variants
  Makefile: allow specifying a SHA-1 for non-cryptographic uses
  csum-file.c: use unsafe SHA-1 implementation when available

 Makefile                              |  25 ++++++
 block-sha1/sha1.h                     |   2 +
 csum-file.c                           |  18 ++--
 hash.h                                |  72 +++++++++++++++
 object-file.c                         | 124 ++++++++++++++++++++++++--
 object-file.h                         |   6 ++
 pack-write.c                          |   7 +-
 sha1/openssl.h                        |   2 +
 sha1dc_git.h                          |   3 +
 t/t5303-pack-corruption-resilience.sh |   7 +-
 tmp-objdir.c                          |  26 ++++--
 11 files changed, 266 insertions(+), 26 deletions(-)

Range-diff against v3:
 1:  738b1eb17b4 =  1:  6f1ee91fff3 finalize_object_file(): check for name collision before renaming
 2:  e1c2c39711f =  2:  133047ca8c9 finalize_object_file(): refactor unlink_or_warn() placement
 3:  0feee5d1d4f !  3:  ed9eeef8513 finalize_object_file(): implement collision check
    @@ Commit message
     
         The new check will cause the write of new differing content to be a
         failure, rather than a silent noop, and we'll retain the temporary file
    -    on disk.
    +    on disk. If there's no collision present, we'll clean up the temporary
    +    file as usual after either rename()-ing or link()-ing it into place.
     
         Note that this may cause some extra computation when the files are in
    -    fact identical, but this should happen rarely. For example, when writing
    -    a loose object, we compute the object id first, then check manually for
    -    an existing object (so that we can freshen its timestamp) before
    -    actually trying to write and link the new data.
    +    fact identical, but this should happen rarely.
    +
    +    Loose objects are exempt from this check, and the collision check may be
    +    skipped by calling the _flags variant of this function with the
    +    FOF_SKIP_COLLISION_CHECK bit set. This is done for a couple of reasons:
    +
    +      - We don't treat the hash of the loose object file's contents as a
    +        checksum, since the same loose object can be stored using different
    +        bytes on disk (e.g., when adjusting core.compression, using a
    +        different version of zlib, etc.).
    +
    +        This is fundamentally different from cases where
    +        finalize_object_file() is operating over a file which uses the hash
    +        value as a checksum of the contents. In other words, a pair of
    +        identical loose objects can be stored using different bytes on disk,
    +        and that should not be treated as a collision.
    +
    +      - We already use the path of the loose object as its hash value /
    +        object name, so checking for collisions at the content level doesn't
    +        add anything.
    +
    +        This is why we do not bother to check the inflated object contents
    +        for collisions either, since either (a) the object contents have the
    +        fingerprint of a SHA-1 collision, in which case the collision
    +        detecting SHA-1 implementation used to hash the contents to give us
    +        a path would have already rejected it, or (b) the contents are part
    +        of a colliding pair which does not bear the same fingerprints of
    +        known collision attacks, in which case we would not have caught it
    +        anyway.
    +
    +        So skipping the collision check here does not change for better or
    +        worse the hardness of loose object writes.
    +
    +    As a small note related to the latter bullet point above, we must teach
    +    the tmp-objdir routines to similarly skip the content-level collision
    +    checks when calling migrate_one() on a loose object file, which we do by
    +    setting the FOF_SKIP_COLLISION_CHECK bit when we are inside of a loose
    +    object shard.
     
         Co-authored-by: Jeff King <peff@peff.net>
         Signed-off-by: Jeff King <peff@peff.net>
    @@ object-file.c: static void write_object_file_prepare_literally(const struct git_
      /*
       * Move the just written object into its final resting place.
       */
    + int finalize_object_file(const char *tmpfile, const char *filename)
    ++{
    ++	return finalize_object_file_flags(tmpfile, filename, 0);
    ++}
    ++
    ++int finalize_object_file_flags(const char *tmpfile, const char *filename,
    ++			       enum finalize_object_file_flags flags)
    + {
    + 	struct stat st;
    + 	int ret = 0;
     @@ object-file.c: int finalize_object_file(const char *tmpfile, const char *filename)
      			errno = saved_errno;
      			return error_errno(_("unable to write file %s"), filename);
      		}
     -		/* FIXME!!! Collision check here ? */
    --		unlink_or_warn(tmpfile);
    -+		if (check_collision(tmpfile, filename))
    -+			return -1;
    ++		if (!(flags & FOF_SKIP_COLLISION_CHECK) &&
    ++		    check_collision(tmpfile, filename))
    ++				return -1;
    + 		unlink_or_warn(tmpfile);
      	}
      
    - out:
    +@@ object-file.c: static int write_loose_object(const struct object_id *oid, char *hdr,
    + 			warning_errno(_("failed utime() on %s"), tmp_file.buf);
    + 	}
    + 
    +-	return finalize_object_file(tmp_file.buf, filename.buf);
    ++	return finalize_object_file_flags(tmp_file.buf, filename.buf,
    ++					  FOF_SKIP_COLLISION_CHECK);
    + }
    + 
    + static int freshen_loose_object(const struct object_id *oid)
    +@@ object-file.c: int stream_loose_object(struct input_stream *in_stream, size_t len,
    + 		strbuf_release(&dir);
    + 	}
    + 
    +-	err = finalize_object_file(tmp_file.buf, filename.buf);
    ++	err = finalize_object_file_flags(tmp_file.buf, filename.buf,
    ++					 FOF_SKIP_COLLISION_CHECK);
    + 	if (!err && compat)
    + 		err = repo_add_loose_object_map(the_repository, oid, &compat_oid);
    + cleanup:
    +
    + ## object-file.h ##
    +@@ object-file.h: int check_object_signature(struct repository *r, const struct object_id *oid,
    +  */
    + int stream_object_signature(struct repository *r, const struct object_id *oid);
    + 
    ++enum finalize_object_file_flags {
    ++	FOF_SKIP_COLLISION_CHECK = 1,
    ++};
    ++
    + int finalize_object_file(const char *tmpfile, const char *filename);
    ++int finalize_object_file_flags(const char *tmpfile, const char *filename,
    ++			       enum finalize_object_file_flags flags);
    + 
    + /* Helper to check and "touch" a file */
    + int check_and_freshen_file(const char *fn, int freshen);
    +
    + ## tmp-objdir.c ##
    +@@ tmp-objdir.c: static int read_dir_paths(struct string_list *out, const char *path)
    + 	return 0;
    + }
    + 
    +-static int migrate_paths(struct strbuf *src, struct strbuf *dst);
    ++static int migrate_paths(struct strbuf *src, struct strbuf *dst,
    ++			 enum finalize_object_file_flags flags);
    + 
    +-static int migrate_one(struct strbuf *src, struct strbuf *dst)
    ++static int migrate_one(struct strbuf *src, struct strbuf *dst,
    ++		       enum finalize_object_file_flags flags)
    + {
    + 	struct stat st;
    + 
    +@@ tmp-objdir.c: static int migrate_one(struct strbuf *src, struct strbuf *dst)
    + 				return -1;
    + 		} else if (errno != EEXIST)
    + 			return -1;
    +-		return migrate_paths(src, dst);
    ++		return migrate_paths(src, dst, flags);
    + 	}
    +-	return finalize_object_file(src->buf, dst->buf);
    ++	return finalize_object_file_flags(src->buf, dst->buf, flags);
    + }
    + 
    +-static int migrate_paths(struct strbuf *src, struct strbuf *dst)
    ++static int is_loose_object_shard(const char *name)
    ++{
    ++	return strlen(name) == 2 && isxdigit(name[0]) && isxdigit(name[1]);
    ++}
    ++
    ++static int migrate_paths(struct strbuf *src, struct strbuf *dst,
    ++			 enum finalize_object_file_flags flags)
    + {
    + 	size_t src_len = src->len, dst_len = dst->len;
    + 	struct string_list paths = STRING_LIST_INIT_DUP;
    +@@ tmp-objdir.c: static int migrate_paths(struct strbuf *src, struct strbuf *dst)
    + 
    + 	for (i = 0; i < paths.nr; i++) {
    + 		const char *name = paths.items[i].string;
    ++		enum finalize_object_file_flags flags_copy = flags;
    + 
    + 		strbuf_addf(src, "/%s", name);
    + 		strbuf_addf(dst, "/%s", name);
    + 
    +-		ret |= migrate_one(src, dst);
    ++		if (is_loose_object_shard(name))
    ++			flags_copy |= FOF_SKIP_COLLISION_CHECK;
    ++
    ++		ret |= migrate_one(src, dst, flags_copy);
    + 
    + 		strbuf_setlen(src, src_len);
    + 		strbuf_setlen(dst, dst_len);
    +@@ tmp-objdir.c: int tmp_objdir_migrate(struct tmp_objdir *t)
    + 	strbuf_addbuf(&src, &t->path);
    + 	strbuf_addstr(&dst, repo_get_object_directory(the_repository));
    + 
    +-	ret = migrate_paths(&src, &dst);
    ++	ret = migrate_paths(&src, &dst, 0);
    + 
    + 	strbuf_release(&src);
    + 	strbuf_release(&dst);
 4:  620dde48a9d =  4:  3cc7f7b1f67 pack-objects: use finalize_object_file() to rename pack/idx/etc
 5:  bfe992765cd <  -:  ----------- i5500-git-daemon.sh: use compile-able version of Git without OpenSSL
 6:  22863d9f6df =  5:  8f8ac0f5b0e sha1: do not redefine `platform_SHA_CTX` and friends
 7:  119c318d812 !  6:  d300e9c6887 hash.h: scaffolding for _fast hashing variants
    @@ Metadata
     Author: Taylor Blau <me@ttaylorr.com>
     
      ## Commit message ##
    -    hash.h: scaffolding for _fast hashing variants
    +    hash.h: scaffolding for _unsafe hashing variants
     
         Git's default SHA-1 implementation is collision-detecting, which hardens
         us against known SHA-1 attacks against Git objects. This makes Git
    @@ Commit message
         the collision-detecting implementation, which is slower than
         non-collision detecting alternatives.
     
    -    Prepare for loading a separate "fast" SHA-1 implementation that can be
    +    Prepare for loading a separate "unsafe" SHA-1 implementation that can be
         used for non-cryptographic purposes, like computing the checksum of
         files that use the hashwrite() API.
     
         This commit does not actually introduce any new compile-time knobs to
    -    control which implementation is used as the fast SHA-1 variant, but does
    -    add scaffolding so that the "git_hash_algo" structure has five new
    -    function pointers which are "fast" variants of the five existing
    +    control which implementation is used as the unsafe SHA-1 variant, but
    +    does add scaffolding so that the "git_hash_algo" structure has five new
    +    function pointers which are "unsafe" variants of the five existing
         hashing-related function pointers:
     
    -      - git_hash_init_fn fast_init_fn
    -      - git_hash_clone_fn fast_clone_fn
    -      - git_hash_update_fn fast_update_fn
    -      - git_hash_final_fn fast_final_fn
    -      - git_hash_final_oid_fn fast_final_oid_fn
    +      - git_hash_init_fn unsafe_init_fn
    +      - git_hash_clone_fn unsafe_clone_fn
    +      - git_hash_update_fn unsafe_update_fn
    +      - git_hash_final_fn unsafe_final_fn
    +      - git_hash_final_oid_fn unsafe_final_oid_fn
     
         The following commit will introduce compile-time knobs to specify which
         SHA-1 implementation is used for non-cryptographic uses.
    @@ hash.h
      #define platform_SHA1_Final    	SHA1_Final
      #endif
      
    -+#ifndef platform_SHA_CTX_fast
    -+#  define platform_SHA_CTX_fast     platform_SHA_CTX
    -+#  define platform_SHA1_Init_fast   platform_SHA1_Init
    -+#  define platform_SHA1_Update_fast platform_SHA1_Update
    -+#  define platform_SHA1_Final_fast  platform_SHA1_Final
    ++#ifndef platform_SHA_CTX_unsafe
    ++#  define platform_SHA_CTX_unsafe      platform_SHA_CTX
    ++#  define platform_SHA1_Init_unsafe    platform_SHA1_Init
    ++#  define platform_SHA1_Update_unsafe  platform_SHA1_Update
    ++#  define platform_SHA1_Final_unsafe   platform_SHA1_Final
     +#  ifdef platform_SHA1_Clone
    -+#    define platform_SHA1_Clone_fast platform_SHA1_Clone
    ++#    define platform_SHA1_Clone_unsafe platform_SHA1_Clone
     +#  endif
     +#endif
     +
    @@ hash.h
      #define git_SHA1_Update		platform_SHA1_Update
      #define git_SHA1_Final		platform_SHA1_Final
      
    -+#define git_SHA_CTX_fast	platform_SHA_CTX_fast
    -+#define git_SHA1_Init_fast	platform_SHA1_Init_fast
    -+#define git_SHA1_Update_fast	platform_SHA1_Update_fast
    -+#define git_SHA1_Final_fast	platform_SHA1_Final_fast
    ++#define git_SHA_CTX_unsafe	platform_SHA_CTX_unsafe
    ++#define git_SHA1_Init_unsafe	platform_SHA1_Init_unsafe
    ++#define git_SHA1_Update_unsafe	platform_SHA1_Update_unsafe
    ++#define git_SHA1_Final_unsafe	platform_SHA1_Final_unsafe
     +
      #ifdef platform_SHA1_Clone
      #define git_SHA1_Clone	platform_SHA1_Clone
      #endif
    -+#ifdef platform_SHA1_Clone_fast
    -+#  define git_SHA1_Clone_fast platform_SHA1_Clone_fast
    ++#ifdef platform_SHA1_Clone_unsafe
    ++#  define git_SHA1_Clone_unsafe platform_SHA1_Clone_unsafe
     +#endif
      
      #ifndef platform_SHA256_CTX
    @@ hash.h: static inline void git_SHA1_Clone(git_SHA_CTX *dst, const git_SHA_CTX *s
      	memcpy(dst, src, sizeof(*dst));
      }
      #endif
    -+#ifndef SHA1_NEEDS_CLONE_HELPER_FAST
    -+static inline void git_SHA1_Clone_fast(git_SHA_CTX_fast *dst,
    -+				       const git_SHA_CTX_fast *src)
    ++#ifndef SHA1_NEEDS_CLONE_HELPER_UNSAFE
    ++static inline void git_SHA1_Clone_unsafe(git_SHA_CTX_unsafe *dst,
    ++				       const git_SHA_CTX_unsafe *src)
     +{
     +	memcpy(dst, src, sizeof(*dst));
     +}
    @@ hash.h: enum get_oid_result {
      /* A suitably aligned type for stack allocations of hash contexts. */
      union git_hash_ctx {
      	git_SHA_CTX sha1;
    -+	git_SHA_CTX_fast sha1_fast;
    ++	git_SHA_CTX_unsafe sha1_unsafe;
     +
      	git_SHA256_CTX sha256;
      };
    @@ hash.h: struct git_hash_algo {
      	/* The hash finalization function for object IDs. */
      	git_hash_final_oid_fn final_oid_fn;
      
    -+	/* The fast / non-cryptographic hash initialization function. */
    -+	git_hash_init_fn fast_init_fn;
    ++	/* The non-cryptographic hash initialization function. */
    ++	git_hash_init_fn unsafe_init_fn;
     +
    -+	/* The fast / non-cryptographic hash context cloning function. */
    -+	git_hash_clone_fn fast_clone_fn;
    ++	/* The non-cryptographic hash context cloning function. */
    ++	git_hash_clone_fn unsafe_clone_fn;
     +
    -+	/* The fast / non-cryptographic hash update function. */
    -+	git_hash_update_fn fast_update_fn;
    ++	/* The non-cryptographic hash update function. */
    ++	git_hash_update_fn unsafe_update_fn;
     +
    -+	/* The fast / non-cryptographic hash finalization function. */
    -+	git_hash_final_fn fast_final_fn;
    ++	/* The non-cryptographic hash finalization function. */
    ++	git_hash_final_fn unsafe_final_fn;
     +
    -+	/* The fast / non-cryptographic hash finalization function. */
    -+	git_hash_final_oid_fn fast_final_oid_fn;
    ++	/* The non-cryptographic hash finalization function. */
    ++	git_hash_final_oid_fn unsafe_final_oid_fn;
     +
      	/* The OID of the empty tree. */
      	const struct object_id *empty_tree;
    @@ object-file.c: static void git_hash_sha1_final_oid(struct object_id *oid, git_ha
      	oid->algo = GIT_HASH_SHA1;
      }
      
    -+static void git_hash_sha1_init_fast(git_hash_ctx *ctx)
    ++static void git_hash_sha1_init_unsafe(git_hash_ctx *ctx)
     +{
    -+	git_SHA1_Init_fast(&ctx->sha1_fast);
    ++	git_SHA1_Init_unsafe(&ctx->sha1_unsafe);
     +}
     +
    -+static void git_hash_sha1_clone_fast(git_hash_ctx *dst, const git_hash_ctx *src)
    ++static void git_hash_sha1_clone_unsafe(git_hash_ctx *dst, const git_hash_ctx *src)
     +{
    -+	git_SHA1_Clone_fast(&dst->sha1_fast, &src->sha1_fast);
    ++	git_SHA1_Clone_unsafe(&dst->sha1_unsafe, &src->sha1_unsafe);
     +}
     +
    -+static void git_hash_sha1_update_fast(git_hash_ctx *ctx, const void *data,
    ++static void git_hash_sha1_update_unsafe(git_hash_ctx *ctx, const void *data,
     +				      size_t len)
     +{
    -+	git_SHA1_Update_fast(&ctx->sha1_fast, data, len);
    ++	git_SHA1_Update_unsafe(&ctx->sha1_unsafe, data, len);
     +}
     +
    -+static void git_hash_sha1_final_fast(unsigned char *hash, git_hash_ctx *ctx)
    ++static void git_hash_sha1_final_unsafe(unsigned char *hash, git_hash_ctx *ctx)
     +{
    -+	git_SHA1_Final_fast(hash, &ctx->sha1_fast);
    ++	git_SHA1_Final_unsafe(hash, &ctx->sha1_unsafe);
     +}
     +
    -+static void git_hash_sha1_final_oid_fast(struct object_id *oid, git_hash_ctx *ctx)
    ++static void git_hash_sha1_final_oid_unsafe(struct object_id *oid, git_hash_ctx *ctx)
     +{
    -+	git_SHA1_Final_fast(oid->hash, &ctx->sha1_fast);
    ++	git_SHA1_Final_unsafe(oid->hash, &ctx->sha1_unsafe);
     +	memset(oid->hash + GIT_SHA1_RAWSZ, 0, GIT_MAX_RAWSZ - GIT_SHA1_RAWSZ);
     +	oid->algo = GIT_HASH_SHA1;
     +}
    @@ object-file.c: const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
      		.update_fn = git_hash_unknown_update,
      		.final_fn = git_hash_unknown_final,
      		.final_oid_fn = git_hash_unknown_final_oid,
    -+		.fast_init_fn = git_hash_unknown_init,
    -+		.fast_clone_fn = git_hash_unknown_clone,
    -+		.fast_update_fn = git_hash_unknown_update,
    -+		.fast_final_fn = git_hash_unknown_final,
    -+		.fast_final_oid_fn = git_hash_unknown_final_oid,
    ++		.unsafe_init_fn = git_hash_unknown_init,
    ++		.unsafe_clone_fn = git_hash_unknown_clone,
    ++		.unsafe_update_fn = git_hash_unknown_update,
    ++		.unsafe_final_fn = git_hash_unknown_final,
    ++		.unsafe_final_oid_fn = git_hash_unknown_final_oid,
      		.empty_tree = NULL,
      		.empty_blob = NULL,
      		.null_oid = NULL,
    @@ object-file.c: const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
      		.update_fn = git_hash_sha1_update,
      		.final_fn = git_hash_sha1_final,
      		.final_oid_fn = git_hash_sha1_final_oid,
    -+		.fast_init_fn = git_hash_sha1_init_fast,
    -+		.fast_clone_fn = git_hash_sha1_clone_fast,
    -+		.fast_update_fn = git_hash_sha1_update_fast,
    -+		.fast_final_fn = git_hash_sha1_final_fast,
    -+		.fast_final_oid_fn = git_hash_sha1_final_oid_fast,
    ++		.unsafe_init_fn = git_hash_sha1_init_unsafe,
    ++		.unsafe_clone_fn = git_hash_sha1_clone_unsafe,
    ++		.unsafe_update_fn = git_hash_sha1_update_unsafe,
    ++		.unsafe_final_fn = git_hash_sha1_final_unsafe,
    ++		.unsafe_final_oid_fn = git_hash_sha1_final_oid_unsafe,
      		.empty_tree = &empty_tree_oid,
      		.empty_blob = &empty_blob_oid,
      		.null_oid = &null_oid_sha1,
    @@ object-file.c: const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
      		.update_fn = git_hash_sha256_update,
      		.final_fn = git_hash_sha256_final,
      		.final_oid_fn = git_hash_sha256_final_oid,
    -+		.fast_init_fn = git_hash_sha256_init,
    -+		.fast_clone_fn = git_hash_sha256_clone,
    -+		.fast_update_fn = git_hash_sha256_update,
    -+		.fast_final_fn = git_hash_sha256_final,
    -+		.fast_final_oid_fn = git_hash_sha256_final_oid,
    ++		.unsafe_init_fn = git_hash_sha256_init,
    ++		.unsafe_clone_fn = git_hash_sha256_clone,
    ++		.unsafe_update_fn = git_hash_sha256_update,
    ++		.unsafe_final_fn = git_hash_sha256_final,
    ++		.unsafe_final_oid_fn = git_hash_sha256_final_oid,
      		.empty_tree = &empty_tree_oid_sha256,
      		.empty_blob = &empty_blob_oid_sha256,
      		.null_oid = &null_oid_sha256,
 8:  137ec30d68a !  7:  af8fd9aa4ed Makefile: allow specifying a SHA-1 for non-cryptographic uses
    @@ Metadata
      ## Commit message ##
         Makefile: allow specifying a SHA-1 for non-cryptographic uses
     
    -    Introduce _FAST variants of the OPENSSL_SHA1, BLK_SHA1, and
    +    Introduce _UNSAFE variants of the OPENSSL_SHA1, BLK_SHA1, and
         APPLE_COMMON_CRYPTO_SHA1 compile-time knobs which indicate which SHA-1
         implementation is to be used for non-cryptographic uses.
     
    @@ Makefile: include shared.mak
      # Define APPLE_COMMON_CRYPTO_SHA1 to use Apple's CommonCrypto for
      # SHA-1.
      #
    -+# Define the same Makefile knobs as above, but suffixed with _FAST to
    -+# use the corresponding implementations for "fast" SHA-1 hashing for
    ++# Define the same Makefile knobs as above, but suffixed with _UNSAFE to
    ++# use the corresponding implementations for unsafe SHA-1 hashing for
     +# non-cryptographic purposes.
     +#
      # If don't enable any of the *_SHA1 settings in this section, Git will
    @@ Makefile: endif
      endif
      endif
      
    -+ifdef OPENSSL_SHA1_FAST
    ++ifdef OPENSSL_SHA1_UNSAFE
     +ifndef OPENSSL_SHA1
     +	EXTLIBS += $(LIB_4_CRYPTO)
    -+	BASIC_CFLAGS += -DSHA1_OPENSSL_FAST
    ++	BASIC_CFLAGS += -DSHA1_OPENSSL_UNSAFE
     +endif
     +else
    -+ifdef BLK_SHA1_FAST
    ++ifdef BLK_SHA1_UNSAFE
     +ifndef BLK_SHA1
     +	LIB_OBJS += block-sha1/sha1.o
    -+	BASIC_CFLAGS += -DSHA1_BLK_FAST
    ++	BASIC_CFLAGS += -DSHA1_BLK_UNSAFE
     +endif
     +else
    -+ifdef APPLE_COMMON_CRYPTO_SHA1_FAST
    ++ifdef APPLE_COMMON_CRYPTO_SHA1_UNSAFE
     +ifndef APPLE_COMMON_CRYPTO_SHA1
     +	COMPAT_CFLAGS += -DCOMMON_DIGEST_FOR_OPENSSL
    -+	BASIC_CFLAGS += -DSHA1_APPLE_FAST
    ++	BASIC_CFLAGS += -DSHA1_APPLE_UNSAFE
     +endif
     +endif
     +endif
    @@ hash.h
      #include "block-sha1/sha1.h"
      #endif
      
    -+#if defined(SHA1_APPLE_FAST)
    ++#if defined(SHA1_APPLE_UNSAFE)
     +#  include <CommonCrypto/CommonDigest.h>
    -+#  define platform_SHA_CTX_fast CC_SHA1_CTX
    -+#  define platform_SHA1_Init_fast CC_SHA1_Init
    -+#  define platform_SHA1_Update_fast CC_SHA1_Update
    -+#  define platform_SHA1_Final_fast CC_SHA1_Final
    -+#elif defined(SHA1_OPENSSL_FAST)
    ++#  define platform_SHA_CTX_unsafe CC_SHA1_CTX
    ++#  define platform_SHA1_Init_unsafe CC_SHA1_Init
    ++#  define platform_SHA1_Update_unsafe CC_SHA1_Update
    ++#  define platform_SHA1_Final_unsafe CC_SHA1_Final
    ++#elif defined(SHA1_OPENSSL_UNSAFE)
     +#  include <openssl/sha.h>
     +#  if defined(OPENSSL_API_LEVEL) && OPENSSL_API_LEVEL >= 3
    -+#    define SHA1_NEEDS_CLONE_HELPER_FAST
    ++#    define SHA1_NEEDS_CLONE_HELPER_UNSAFE
     +#    include "sha1/openssl.h"
    -+#    define platform_SHA_CTX_fast openssl_SHA1_CTX
    -+#    define platform_SHA1_Init_fast openssl_SHA1_Init
    -+#    define platform_SHA1_Clone_fast openssl_SHA1_Clone
    -+#    define platform_SHA1_Update_fast openssl_SHA1_Update
    -+#    define platform_SHA1_Final_fast openssl_SHA1_Final
    ++#    define platform_SHA_CTX_unsafe openssl_SHA1_CTX
    ++#    define platform_SHA1_Init_unsafe openssl_SHA1_Init
    ++#    define platform_SHA1_Clone_unsafe openssl_SHA1_Clone
    ++#    define platform_SHA1_Update_unsafe openssl_SHA1_Update
    ++#    define platform_SHA1_Final_unsafe openssl_SHA1_Final
     +#  else
    -+#    define platform_SHA_CTX_fast SHA_CTX
    -+#    define platform_SHA1_Init_fast SHA1_Init
    -+#    define platform_SHA1_Update_fast SHA1_Update
    -+#    define platform_SHA1_Final_fast SHA1_Final
    ++#    define platform_SHA_CTX_unsafe SHA_CTX
    ++#    define platform_SHA1_Init_unsafe SHA1_Init
    ++#    define platform_SHA1_Update_unsafe SHA1_Update
    ++#    define platform_SHA1_Final_unsafe SHA1_Final
     +#  endif
    -+#elif defined(SHA1_BLK_FAST)
    ++#elif defined(SHA1_BLK_UNSAFE)
     +#  include "block-sha1/sha1.h"
    -+#  define platform_SHA_CTX_fast blk_SHA_CTX
    -+#  define platform_SHA1_Init_fast blk_SHA1_Init
    -+#  define platform_SHA1_Update_fast blk_SHA1_Update
    -+#  define platform_SHA1_Final_fast blk_SHA1_Final
    ++#  define platform_SHA_CTX_unsafe blk_SHA_CTX
    ++#  define platform_SHA1_Init_unsafe blk_SHA1_Init
    ++#  define platform_SHA1_Update_unsafe blk_SHA1_Update
    ++#  define platform_SHA1_Final_unsafe blk_SHA1_Final
     +#endif
     +
      #if defined(SHA256_NETTLE)
 9:  4018261366f !  8:  4b83dd05e9f csum-file.c: use fast SHA-1 implementation when available
    @@ Metadata
     Author: Taylor Blau <me@ttaylorr.com>
     
      ## Commit message ##
    -    csum-file.c: use fast SHA-1 implementation when available
    +    csum-file.c: use unsafe SHA-1 implementation when available
     
    -    Update hashwrite() and friends to use the fast_-variants of hashing
    -    functions, calling for e.g., "the_hash_algo->fast_update_fn()" instead
    +    Update hashwrite() and friends to use the unsafe_-variants of hashing
    +    functions, calling for e.g., "the_hash_algo->unsafe_update_fn()" instead
         of "the_hash_algo->update_fn()".
     
         These callers only use the_hash_algo to produce a checksum, which we
         depend on for data integrity, but not for cryptographic purposes, so
    -    these callers are safe to use the fast (and potentially non-collision
    +    these callers are safe to use the unsafe (and potentially non-collision
         detecting) SHA-1 implementation.
     
         To time this, I took a freshly packed copy of linux.git, and ran the
    -    following with and without the OPENSSL_SHA1_FAST=1 build-knob. Both
    +    following with and without the OPENSSL_SHA1_UNSAFE=1 build-knob. Both
         versions were compiled with -O3:
     
             $ git for-each-ref --format='%(objectname)' refs/heads refs/tags >in
             $ valgrind --tool=callgrind ~/src/git/git-pack-objects \
                 --revs --stdout --all-progress --use-bitmap-index <in >/dev/null
     
    -    Without OPENSSL_SHA1_FAST=1 (that is, using the collision-detecting
    +    Without OPENSSL_SHA1_UNSAFE=1 (that is, using the collision-detecting
         SHA-1 implementation for both cryptographic and non-cryptographic
         purposes), we spend a significant amount of our instruction count in
         hashwrite():
    @@ Commit message
         , and the resulting "clone" takes 19.219 seconds of wall clock time,
         18.94 seconds of user time and 0.28 seconds of system time.
     
    -    Compiling with OPENSSL_SHA1_FAST=1, we spend ~60% fewer instructions in
    -    hashwrite():
    +    Compiling with OPENSSL_SHA1_UNSAFE=1, we spend ~60% fewer instructions
    +    in hashwrite():
     
             $ callgrind_annotate --inclusive=yes | grep hashwrite | head -n1
              59,164,001,176 (58.79%)  /home/ttaylorr/src/git/csum-file.c:hashwrite [/home/ttaylorr/src/git/git-pack-objects]
     
    -    , and generate the resulting "clone" much faster, in only 11.597 seconds
    +    , and generate the resulting "clone" much unsafeer, in only 11.597 seconds
         of wall time, 11.37 seconds of user time, and 0.23 seconds of system
         time, for a ~40% speed-up.
     
    @@ csum-file.c: void hashflush(struct hashfile *f)
      	if (offset) {
      		if (!f->skip_hash)
     -			the_hash_algo->update_fn(&f->ctx, f->buffer, offset);
    -+			the_hash_algo->fast_update_fn(&f->ctx, f->buffer, offset);
    ++			the_hash_algo->unsafe_update_fn(&f->ctx, f->buffer, offset);
      		flush(f, f->buffer, offset);
      		f->offset = 0;
      	}
    @@ csum-file.c: int finalize_hashfile(struct hashfile *f, unsigned char *result,
      		hashclr(f->buffer, the_repository->hash_algo);
      	else
     -		the_hash_algo->final_fn(f->buffer, &f->ctx);
    -+		the_hash_algo->fast_final_fn(f->buffer, &f->ctx);
    ++		the_hash_algo->unsafe_final_fn(f->buffer, &f->ctx);
      
      	if (result)
      		hashcpy(result, f->buffer, the_repository->hash_algo);
    @@ csum-file.c: void hashwrite(struct hashfile *f, const void *buf, unsigned int co
      			 */
      			if (!f->skip_hash)
     -				the_hash_algo->update_fn(&f->ctx, buf, nr);
    -+				the_hash_algo->fast_update_fn(&f->ctx, buf, nr);
    ++				the_hash_algo->unsafe_update_fn(&f->ctx, buf, nr);
      			flush(f, buf, nr);
      		} else {
      			/*
    @@ csum-file.c: static struct hashfile *hashfd_internal(int fd, const char *name,
      	f->do_crc = 0;
      	f->skip_hash = 0;
     -	the_hash_algo->init_fn(&f->ctx);
    -+	the_hash_algo->fast_init_fn(&f->ctx);
    ++	the_hash_algo->unsafe_init_fn(&f->ctx);
      
      	f->buffer_len = buffer_len;
      	f->buffer = xmalloc(buffer_len);
    @@ csum-file.c: void hashfile_checkpoint(struct hashfile *f, struct hashfile_checkp
      	hashflush(f);
      	checkpoint->offset = f->total;
     -	the_hash_algo->clone_fn(&checkpoint->ctx, &f->ctx);
    -+	the_hash_algo->fast_clone_fn(&checkpoint->ctx, &f->ctx);
    ++	the_hash_algo->unsafe_clone_fn(&checkpoint->ctx, &f->ctx);
      }
      
      int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoint *checkpoint)
    @@ csum-file.c: int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoin
      		return -1;
      	f->total = offset;
     -	the_hash_algo->clone_fn(&f->ctx, &checkpoint->ctx);
    -+	the_hash_algo->fast_clone_fn(&f->ctx, &checkpoint->ctx);
    ++	the_hash_algo->unsafe_clone_fn(&f->ctx, &checkpoint->ctx);
      	f->offset = 0; /* hashflush() was called in checkpoint */
      	return 0;
      }
    @@ csum-file.c: int hashfile_checksum_valid(const unsigned char *data, size_t total
     -	the_hash_algo->init_fn(&ctx);
     -	the_hash_algo->update_fn(&ctx, data, data_len);
     -	the_hash_algo->final_fn(got, &ctx);
    -+	the_hash_algo->fast_init_fn(&ctx);
    -+	the_hash_algo->fast_update_fn(&ctx, data, data_len);
    -+	the_hash_algo->fast_final_fn(got, &ctx);
    ++	the_hash_algo->unsafe_init_fn(&ctx);
    ++	the_hash_algo->unsafe_update_fn(&ctx, data, data_len);
    ++	the_hash_algo->unsafe_final_fn(got, &ctx);
      
      	return hasheq(got, data + data_len, the_repository->hash_algo);
      }

base-commit: 6258f68c3c1092c901337895c864073dcdea9213
-- 
2.46.2.636.g4b83dd05e9f

  parent reply	other threads:[~2024-09-24 17:32 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-01 16:03 [PATCH 0/4] hash.h: support choosing a separate SHA-1 for non-cryptographic uses Taylor Blau
2024-09-01 16:03 ` [PATCH 1/4] sha1: do not redefine `platform_SHA_CTX` and friends Taylor Blau
2024-09-02 13:41   ` Patrick Steinhardt
2024-09-03 19:34     ` Taylor Blau
2024-09-01 16:03 ` [PATCH 2/4] hash.h: scaffolding for _fast hashing variants Taylor Blau
2024-09-02 13:41   ` Patrick Steinhardt
2024-09-03 17:27     ` Junio C Hamano
2024-09-03 19:52       ` Taylor Blau
2024-09-03 20:47         ` Junio C Hamano
2024-09-03 21:24           ` Taylor Blau
2024-09-04  7:05           ` Patrick Steinhardt
2024-09-04 14:53             ` Junio C Hamano
2024-09-03 19:40     ` Taylor Blau
2024-09-01 16:03 ` [PATCH 3/4] Makefile: allow specifying a SHA-1 for non-cryptographic uses Taylor Blau
2024-09-02 13:41   ` Patrick Steinhardt
2024-09-03 19:43     ` Taylor Blau
2024-09-01 16:03 ` [PATCH 4/4] csum-file.c: use fast SHA-1 implementation when available Taylor Blau
2024-09-02 13:41   ` Patrick Steinhardt
2024-09-03  1:22     ` brian m. carlson
2024-09-03 19:50     ` Taylor Blau
2024-09-02  3:41 ` [PATCH 0/4] hash.h: support choosing a separate SHA-1 for non-cryptographic uses Junio C Hamano
2024-09-03 19:48   ` Taylor Blau
2024-09-03 20:44     ` Junio C Hamano
2024-09-02 14:08 ` brian m. carlson
2024-09-03 19:47   ` Taylor Blau
2024-09-03 22:41     ` Junio C Hamano
2024-09-04 14:01     ` brian m. carlson
2024-09-05 10:37     ` Jeff King
2024-09-05 15:41       ` Junio C Hamano
2024-09-05 16:23         ` Taylor Blau
2024-09-05 16:51           ` Junio C Hamano
2024-09-05 17:04             ` Taylor Blau
2024-09-05 17:51               ` Taylor Blau
2024-09-05 20:21                 ` Taylor Blau
2024-09-05 20:27               ` Jeff King
2024-09-05 21:27                 ` Junio C Hamano
2024-09-05 15:11 ` [PATCH v2 " Taylor Blau
2024-09-05 15:12   ` [PATCH v2 1/4] sha1: do not redefine `platform_SHA_CTX` and friends Taylor Blau
2024-09-05 15:12   ` [PATCH v2 2/4] hash.h: scaffolding for _fast hashing variants Taylor Blau
2024-09-05 15:12   ` [PATCH v2 3/4] Makefile: allow specifying a SHA-1 for non-cryptographic uses Taylor Blau
2024-09-05 15:12   ` [PATCH v2 4/4] csum-file.c: use fast SHA-1 implementation when available Taylor Blau
2024-09-06 19:46 ` [PATCH v3 0/9] hash.h: support choosing a separate SHA-1 for non-cryptographic uses Taylor Blau
2024-09-06 19:46   ` [PATCH v3 1/9] finalize_object_file(): check for name collision before renaming Taylor Blau
2024-09-06 19:46   ` [PATCH v3 2/9] finalize_object_file(): refactor unlink_or_warn() placement Taylor Blau
2024-09-06 19:46   ` [PATCH v3 3/9] finalize_object_file(): implement collision check Taylor Blau
2024-09-06 21:44     ` Junio C Hamano
2024-09-06 21:51       ` Chris Torek
2024-09-10  6:53       ` Jeff King
2024-09-10 15:14         ` Junio C Hamano
2024-09-16 10:45     ` Patrick Steinhardt
2024-09-16 15:54       ` Taylor Blau
2024-09-16 16:03         ` Taylor Blau
2024-09-17 20:40       ` Junio C Hamano
2024-09-06 19:46   ` [PATCH v3 4/9] pack-objects: use finalize_object_file() to rename pack/idx/etc Taylor Blau
2024-09-06 19:46   ` [PATCH v3 5/9] i5500-git-daemon.sh: use compile-able version of Git without OpenSSL Taylor Blau
2024-09-11  6:10     ` Jeff King
2024-09-11  6:12       ` Jeff King
2024-09-12 20:28         ` Junio C Hamano
2024-09-11 15:28       ` Junio C Hamano
2024-09-11 21:23         ` Jeff King
2024-09-06 19:46   ` [PATCH v3 6/9] sha1: do not redefine `platform_SHA_CTX` and friends Taylor Blau
2024-09-06 19:46   ` [PATCH v3 7/9] hash.h: scaffolding for _fast hashing variants Taylor Blau
2024-09-06 19:46   ` [PATCH v3 8/9] Makefile: allow specifying a SHA-1 for non-cryptographic uses Taylor Blau
2024-09-06 19:46   ` [PATCH v3 9/9] csum-file.c: use fast SHA-1 implementation when available Taylor Blau
2024-09-06 21:50   ` [PATCH v3 0/9] hash.h: support choosing a separate SHA-1 for non-cryptographic uses Junio C Hamano
2024-09-24 17:32 ` Taylor Blau [this message]
2024-09-24 17:32   ` [PATCH v4 1/8] finalize_object_file(): check for name collision before renaming Taylor Blau
2024-09-25 17:02     ` Junio C Hamano
2024-09-24 17:32   ` [PATCH v4 2/8] finalize_object_file(): refactor unlink_or_warn() placement Taylor Blau
2024-09-24 17:32   ` [PATCH v4 3/8] finalize_object_file(): implement collision check Taylor Blau
2024-09-24 20:37     ` Jeff King
2024-09-24 21:59       ` Taylor Blau
2024-09-24 22:20         ` Jeff King
2024-09-25 18:06           ` Taylor Blau
2024-09-24 21:32     ` Junio C Hamano
2024-09-24 22:02       ` Taylor Blau
2024-09-24 17:32   ` [PATCH v4 4/8] pack-objects: use finalize_object_file() to rename pack/idx/etc Taylor Blau
2024-09-24 21:34     ` Junio C Hamano
2024-09-24 17:32   ` [PATCH v4 5/8] sha1: do not redefine `platform_SHA_CTX` and friends Taylor Blau
2024-09-24 17:32   ` [PATCH v4 6/8] hash.h: scaffolding for _unsafe hashing variants Taylor Blau
2024-09-24 17:32   ` [PATCH v4 7/8] Makefile: allow specifying a SHA-1 for non-cryptographic uses Taylor Blau
2024-09-24 17:32   ` [PATCH v4 8/8] csum-file.c: use unsafe SHA-1 implementation when available Taylor Blau
2024-09-24 20:52   ` [PATCH v4 0/8] hash.h: support choosing a separate SHA-1 for non-cryptographic uses Jeff King
2024-09-25 16:58   ` Elijah Newren
2024-09-25 17:11     ` Junio C Hamano
2024-09-25 17:22       ` Taylor Blau
2024-09-25 17:22     ` Taylor Blau
2024-09-26 15:22 ` [PATCH v5 " Taylor Blau
2024-09-26 15:22   ` [PATCH v5 1/8] finalize_object_file(): check for name collision before renaming Taylor Blau
2024-09-26 15:22   ` [PATCH v5 2/8] finalize_object_file(): refactor unlink_or_warn() placement Taylor Blau
2024-09-26 15:22   ` [PATCH v5 3/8] finalize_object_file(): implement collision check Taylor Blau
2024-09-26 15:22   ` [PATCH v5 4/8] pack-objects: use finalize_object_file() to rename pack/idx/etc Taylor Blau
2024-09-26 15:22   ` [PATCH v5 5/8] sha1: do not redefine `platform_SHA_CTX` and friends Taylor Blau
2024-09-26 15:22   ` [PATCH v5 6/8] hash.h: scaffolding for _unsafe hashing variants Taylor Blau
2024-09-26 15:22   ` [PATCH v5 7/8] Makefile: allow specifying a SHA-1 for non-cryptographic uses Taylor Blau
2024-09-26 15:22   ` [PATCH v5 8/8] csum-file.c: use unsafe SHA-1 implementation when available Taylor Blau
2024-09-26 22:47   ` [PATCH v5 0/8] hash.h: support choosing a separate SHA-1 for non-cryptographic uses Elijah Newren
2024-09-27  0:44     ` Junio C Hamano
2024-09-27  3:57   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1727199118.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.