git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	Elijah Newren <newren@gmail.com>, Patrick Steinhardt <ps@pks.im>,
	Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v5 0/8] hash.h: support choosing a separate SHA-1 for non-cryptographic uses
Date: Thu, 26 Sep 2024 11:22:27 -0400	[thread overview]
Message-ID: <cover.1727364141.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1725206584.git.me@ttaylorr.com>

Here is a minor reroll of mine and Peff's series to add a build-time
knob to allow selecting an alternative SHA-1 implementation for
non-cryptographic hashing within Git, starting with the `hashwrite()`
family of functions.

This version has changes which are limited to the commit messages only,
and amount to:

  - Updated rationale for skipping the collision check from within
    finalize_object_file() when handling loose objects.

  - Updated commit message with some over-eager s/fast/unsafe/
    conversions in the final patch.

I think most of the review dust has settled up to this point, so I'm
imagining that this is the final version of this series for now, or at
least very close to it. But if something new comes up, please let me
know!

Thanks in advance for your review!

Taylor Blau (8):
  finalize_object_file(): check for name collision before renaming
  finalize_object_file(): refactor unlink_or_warn() placement
  finalize_object_file(): implement collision check
  pack-objects: use finalize_object_file() to rename pack/idx/etc
  sha1: do not redefine `platform_SHA_CTX` and friends
  hash.h: scaffolding for _unsafe hashing variants
  Makefile: allow specifying a SHA-1 for non-cryptographic uses
  csum-file.c: use unsafe SHA-1 implementation when available

 Makefile                              |  25 ++++++
 block-sha1/sha1.h                     |   2 +
 csum-file.c                           |  18 ++--
 hash.h                                |  72 +++++++++++++++
 object-file.c                         | 124 ++++++++++++++++++++++++--
 object-file.h                         |   6 ++
 pack-write.c                          |   7 +-
 sha1/openssl.h                        |   2 +
 sha1dc_git.h                          |   3 +
 t/t5303-pack-corruption-resilience.sh |   7 +-
 tmp-objdir.c                          |  26 ++++--
 11 files changed, 266 insertions(+), 26 deletions(-)

Range-diff against v4:
-:  ---------- > 1:  6f1ee91fff finalize_object_file(): check for name collision before renaming
-:  ---------- > 2:  133047ca8c finalize_object_file(): refactor unlink_or_warn() placement
1:  ed9eeef851 ! 3:  41d38352a4 finalize_object_file(): implement collision check
    @@ Commit message
             object name, so checking for collisions at the content level doesn't
             add anything.
     
    -        This is why we do not bother to check the inflated object contents
    -        for collisions either, since either (a) the object contents have the
    -        fingerprint of a SHA-1 collision, in which case the collision
    -        detecting SHA-1 implementation used to hash the contents to give us
    -        a path would have already rejected it, or (b) the contents are part
    -        of a colliding pair which does not bear the same fingerprints of
    -        known collision attacks, in which case we would not have caught it
    -        anyway.
    +        Adding a content-level collision check would have to happen at a
    +        higher level than in finalize_object_file(), since (avoiding race
    +        conditions) writing an object loose which already exists in the
    +        repository will prevent us from even reaching finalize_object_file()
    +        via the object freshening code.
    +
    +        There is a collision check in index-pack via its `check_collision()`
    +        function, but there isn't an analogous function in unpack-objects,
    +        which just feeds the result to write_object_file().
     
             So skipping the collision check here does not change for better or
             worse the hardness of loose object writes.
    @@ Commit message
     
         Co-authored-by: Jeff King <peff@peff.net>
         Signed-off-by: Jeff King <peff@peff.net>
    +    Helped-by: Elijah Newren <newren@gmail.com>
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## object-file.c ##
2:  3cc7f7b1f6 = 4:  611475d83e pack-objects: use finalize_object_file() to rename pack/idx/etc
3:  8f8ac0f5b0 = 5:  9913a5d971 sha1: do not redefine `platform_SHA_CTX` and friends
4:  d300e9c688 = 6:  65de6d724d hash.h: scaffolding for _unsafe hashing variants
5:  af8fd9aa4e = 7:  3884cd0e3a Makefile: allow specifying a SHA-1 for non-cryptographic uses
6:  4b83dd05e9 ! 8:  62abddf73d csum-file.c: use unsafe SHA-1 implementation when available
    @@ Commit message
     
         These callers only use the_hash_algo to produce a checksum, which we
         depend on for data integrity, but not for cryptographic purposes, so
    -    these callers are safe to use the unsafe (and potentially non-collision
    -    detecting) SHA-1 implementation.
    +    these callers are safe to use the unsafe (non-collision detecting) SHA-1
    +    implementation.
     
         To time this, I took a freshly packed copy of linux.git, and ran the
         following with and without the OPENSSL_SHA1_UNSAFE=1 build-knob. Both
    @@ Commit message
             $ callgrind_annotate --inclusive=yes | grep hashwrite | head -n1
              59,164,001,176 (58.79%)  /home/ttaylorr/src/git/csum-file.c:hashwrite [/home/ttaylorr/src/git/git-pack-objects]
     
    -    , and generate the resulting "clone" much unsafeer, in only 11.597 seconds
    +    , and generate the resulting "clone" much faster, in only 11.597 seconds
         of wall time, 11.37 seconds of user time, and 0.23 seconds of system
         time, for a ~40% speed-up.
     

base-commit: 6258f68c3c1092c901337895c864073dcdea9213
-- 
2.46.1.507.gffd0c9a15b2.dirty

  parent reply	other threads:[~2024-09-26 15:22 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-01 16:03 [PATCH 0/4] hash.h: support choosing a separate SHA-1 for non-cryptographic uses Taylor Blau
2024-09-01 16:03 ` [PATCH 1/4] sha1: do not redefine `platform_SHA_CTX` and friends Taylor Blau
2024-09-02 13:41   ` Patrick Steinhardt
2024-09-03 19:34     ` Taylor Blau
2024-09-01 16:03 ` [PATCH 2/4] hash.h: scaffolding for _fast hashing variants Taylor Blau
2024-09-02 13:41   ` Patrick Steinhardt
2024-09-03 17:27     ` Junio C Hamano
2024-09-03 19:52       ` Taylor Blau
2024-09-03 20:47         ` Junio C Hamano
2024-09-03 21:24           ` Taylor Blau
2024-09-04  7:05           ` Patrick Steinhardt
2024-09-04 14:53             ` Junio C Hamano
2024-09-03 19:40     ` Taylor Blau
2024-09-01 16:03 ` [PATCH 3/4] Makefile: allow specifying a SHA-1 for non-cryptographic uses Taylor Blau
2024-09-02 13:41   ` Patrick Steinhardt
2024-09-03 19:43     ` Taylor Blau
2024-09-01 16:03 ` [PATCH 4/4] csum-file.c: use fast SHA-1 implementation when available Taylor Blau
2024-09-02 13:41   ` Patrick Steinhardt
2024-09-03  1:22     ` brian m. carlson
2024-09-03 19:50     ` Taylor Blau
2024-09-02  3:41 ` [PATCH 0/4] hash.h: support choosing a separate SHA-1 for non-cryptographic uses Junio C Hamano
2024-09-03 19:48   ` Taylor Blau
2024-09-03 20:44     ` Junio C Hamano
2024-09-02 14:08 ` brian m. carlson
2024-09-03 19:47   ` Taylor Blau
2024-09-03 22:41     ` Junio C Hamano
2024-09-04 14:01     ` brian m. carlson
2024-09-05 10:37     ` Jeff King
2024-09-05 15:41       ` Junio C Hamano
2024-09-05 16:23         ` Taylor Blau
2024-09-05 16:51           ` Junio C Hamano
2024-09-05 17:04             ` Taylor Blau
2024-09-05 17:51               ` Taylor Blau
2024-09-05 20:21                 ` Taylor Blau
2024-09-05 20:27               ` Jeff King
2024-09-05 21:27                 ` Junio C Hamano
2024-09-05 15:11 ` [PATCH v2 " Taylor Blau
2024-09-05 15:12   ` [PATCH v2 1/4] sha1: do not redefine `platform_SHA_CTX` and friends Taylor Blau
2024-09-05 15:12   ` [PATCH v2 2/4] hash.h: scaffolding for _fast hashing variants Taylor Blau
2024-09-05 15:12   ` [PATCH v2 3/4] Makefile: allow specifying a SHA-1 for non-cryptographic uses Taylor Blau
2024-09-05 15:12   ` [PATCH v2 4/4] csum-file.c: use fast SHA-1 implementation when available Taylor Blau
2024-09-06 19:46 ` [PATCH v3 0/9] hash.h: support choosing a separate SHA-1 for non-cryptographic uses Taylor Blau
2024-09-06 19:46   ` [PATCH v3 1/9] finalize_object_file(): check for name collision before renaming Taylor Blau
2024-09-06 19:46   ` [PATCH v3 2/9] finalize_object_file(): refactor unlink_or_warn() placement Taylor Blau
2024-09-06 19:46   ` [PATCH v3 3/9] finalize_object_file(): implement collision check Taylor Blau
2024-09-06 21:44     ` Junio C Hamano
2024-09-06 21:51       ` Chris Torek
2024-09-10  6:53       ` Jeff King
2024-09-10 15:14         ` Junio C Hamano
2024-09-16 10:45     ` Patrick Steinhardt
2024-09-16 15:54       ` Taylor Blau
2024-09-16 16:03         ` Taylor Blau
2024-09-17 20:40       ` Junio C Hamano
2024-09-06 19:46   ` [PATCH v3 4/9] pack-objects: use finalize_object_file() to rename pack/idx/etc Taylor Blau
2024-09-06 19:46   ` [PATCH v3 5/9] i5500-git-daemon.sh: use compile-able version of Git without OpenSSL Taylor Blau
2024-09-11  6:10     ` Jeff King
2024-09-11  6:12       ` Jeff King
2024-09-12 20:28         ` Junio C Hamano
2024-09-11 15:28       ` Junio C Hamano
2024-09-11 21:23         ` Jeff King
2024-09-06 19:46   ` [PATCH v3 6/9] sha1: do not redefine `platform_SHA_CTX` and friends Taylor Blau
2024-09-06 19:46   ` [PATCH v3 7/9] hash.h: scaffolding for _fast hashing variants Taylor Blau
2024-09-06 19:46   ` [PATCH v3 8/9] Makefile: allow specifying a SHA-1 for non-cryptographic uses Taylor Blau
2024-09-06 19:46   ` [PATCH v3 9/9] csum-file.c: use fast SHA-1 implementation when available Taylor Blau
2024-09-06 21:50   ` [PATCH v3 0/9] hash.h: support choosing a separate SHA-1 for non-cryptographic uses Junio C Hamano
2024-09-24 17:32 ` [PATCH v4 0/8] " Taylor Blau
2024-09-24 17:32   ` [PATCH v4 1/8] finalize_object_file(): check for name collision before renaming Taylor Blau
2024-09-25 17:02     ` Junio C Hamano
2024-09-24 17:32   ` [PATCH v4 2/8] finalize_object_file(): refactor unlink_or_warn() placement Taylor Blau
2024-09-24 17:32   ` [PATCH v4 3/8] finalize_object_file(): implement collision check Taylor Blau
2024-09-24 20:37     ` Jeff King
2024-09-24 21:59       ` Taylor Blau
2024-09-24 22:20         ` Jeff King
2024-09-25 18:06           ` Taylor Blau
2024-09-24 21:32     ` Junio C Hamano
2024-09-24 22:02       ` Taylor Blau
2024-09-24 17:32   ` [PATCH v4 4/8] pack-objects: use finalize_object_file() to rename pack/idx/etc Taylor Blau
2024-09-24 21:34     ` Junio C Hamano
2024-09-24 17:32   ` [PATCH v4 5/8] sha1: do not redefine `platform_SHA_CTX` and friends Taylor Blau
2024-09-24 17:32   ` [PATCH v4 6/8] hash.h: scaffolding for _unsafe hashing variants Taylor Blau
2024-09-24 17:32   ` [PATCH v4 7/8] Makefile: allow specifying a SHA-1 for non-cryptographic uses Taylor Blau
2024-09-24 17:32   ` [PATCH v4 8/8] csum-file.c: use unsafe SHA-1 implementation when available Taylor Blau
2024-09-24 20:52   ` [PATCH v4 0/8] hash.h: support choosing a separate SHA-1 for non-cryptographic uses Jeff King
2024-09-25 16:58   ` Elijah Newren
2024-09-25 17:11     ` Junio C Hamano
2024-09-25 17:22       ` Taylor Blau
2024-09-25 17:22     ` Taylor Blau
2024-09-26 15:22 ` Taylor Blau [this message]
2024-09-26 15:22   ` [PATCH v5 1/8] finalize_object_file(): check for name collision before renaming Taylor Blau
2024-09-26 15:22   ` [PATCH v5 2/8] finalize_object_file(): refactor unlink_or_warn() placement Taylor Blau
2024-09-26 15:22   ` [PATCH v5 3/8] finalize_object_file(): implement collision check Taylor Blau
2024-09-26 15:22   ` [PATCH v5 4/8] pack-objects: use finalize_object_file() to rename pack/idx/etc Taylor Blau
2024-09-26 15:22   ` [PATCH v5 5/8] sha1: do not redefine `platform_SHA_CTX` and friends Taylor Blau
2024-09-26 15:22   ` [PATCH v5 6/8] hash.h: scaffolding for _unsafe hashing variants Taylor Blau
2024-09-26 15:22   ` [PATCH v5 7/8] Makefile: allow specifying a SHA-1 for non-cryptographic uses Taylor Blau
2024-09-26 15:22   ` [PATCH v5 8/8] csum-file.c: use unsafe SHA-1 implementation when available Taylor Blau
2024-09-26 22:47   ` [PATCH v5 0/8] hash.h: support choosing a separate SHA-1 for non-cryptographic uses Elijah Newren
2024-09-27  0:44     ` Junio C Hamano
2024-09-27  3:57   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1727364141.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).