git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: <git@vger.kernel.org>,  Patrick Steinhardt <ps@pks.im>,
	 Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH 1/9] docs: update pack index v3 format
Date: Fri, 19 Sep 2025 15:08:03 -0700	[thread overview]
Message-ID: <xmqq7bxu14fw.fsf@gitster.g> (raw)
In-Reply-To: <20250919010911.649831-2-sandals@crustytoothpaste.net> (brian m. carlson's message of "Fri, 19 Sep 2025 01:09:03 +0000")

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> Our current pack index v3 format uses 4-byte integers to find the
> trailer of the file.  This effectively means that the file cannot be
> much larger than 2^32.  While this might at first seem to be okay, we
> expect that each object will have at least 64 bytes worth of data, which
> means that no more than about 67 million objects can be stored.
>
> Again, this might seem fine, but unfortunately, we know of many users
> who attempt to create repos with extremely large numbers of commits to
> get a "high score," and we've already seen repositories with at least 55
> million commits.  In the interests of gracefully handling repositories
> even for these well-intentioned but ultimately misguided users, let's
> change these lengths to 8 bytes.

Very sensible.

I do also agree that 32-byte is the natural size for the trailing
hash, but I found that the two paragraphs below was far more than
necessary.  As they argue, we use a truncated hash anywhere in our
file formats, so I would have understood if the explanation were

    "20" in "A copy of the 20-byte SHA-256 checksum" is an obvious
    typo, as SHA-256 is longer than that.  Fix it to "32".

instead of these two paragraphs.

Or did we mean to use a truncated hash back when this transition
design was proposed originally?

> For the checksums at the end of the file, we're producing 32-byte
> SHA-256 checksums because that's what we already do with pack index v2
> and SHA-256.  Truncating SHA-256 doesn't pose any actual security
> problems other than those related to the reduced size, but our pack
> checksum must already be 32 bytes (since SHA-256 packs have 32-byte
> checksums) and it simplifies the code to use the existing hashfile logic
> for these cases for the index checksum as well.
>
> In addition, even though we may not need cryptographic security for the
> index checksum, we'd like to avoid arguments from auditors and such for
> organizations that may have compliance or security requirements.  Using
> the simple, boring choice of the full SHA-256 hash avoids all possible
> discussion related to hash truncation and removes impediments for these
> organizations.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  Documentation/technical/hash-function-transition.adoc | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/technical/hash-function-transition.adoc b/Documentation/technical/hash-function-transition.adoc
> index f047fd80ca..f2df1d618d 100644
> --- a/Documentation/technical/hash-function-transition.adoc
> +++ b/Documentation/technical/hash-function-transition.adoc
> @@ -227,9 +227,9 @@ network byte order):
>      ** 4-byte length in bytes of shortened object names. This is the
>        shortest possible length needed to make names in the shortened
>        object name table unambiguous.
> -    ** 4-byte integer, recording where tables relating to this format
> +    ** 8-byte integer, recording where tables relating to this format
>        are stored in this index file, as an offset from the beginning.
> -  * 4-byte offset to the trailer from the beginning of this file.
> +  * 8-byte offset to the trailer from the beginning of this file.
>    * Zero or more additional key/value pairs (4-byte key, 4-byte
>      value). Only one key is supported: 'PSRC'. See the "Loose objects
>      and unreachable objects" section for supported values and how this
> @@ -276,10 +276,10 @@ network byte order):
>    up to and not including the table of CRC32 values.
>  - Zero or more NUL bytes.
>  - The trailer consists of the following:
> -  * A copy of the 20-byte SHA-256 checksum at the end of the
> +  * A copy of the 32-byte SHA-256 checksum at the end of the
>      corresponding packfile.
>  
> -  * 20-byte SHA-256 checksum of all of the above.
> +  * 32-byte SHA-256 checksum of all of the above.
>  
>  Loose object index
>  ~~~~~~~~~~~~~~~~~~

  reply	other threads:[~2025-09-19 22:08 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-19  1:09 [PATCH 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
2025-09-19  1:09 ` [PATCH 1/9] docs: update pack index v3 format brian m. carlson
2025-09-19 22:08   ` Junio C Hamano [this message]
2025-09-20 15:23     ` brian m. carlson
2025-09-20 17:01       ` Junio C Hamano
2025-09-24  7:55   ` Patrick Steinhardt
2025-09-25 21:39     ` brian m. carlson
2025-09-19  1:09 ` [PATCH 2/9] docs: update offset order for pack index v3 brian m. carlson
2025-09-19  1:09 ` [PATCH 3/9] docs: reflect actual double signature for tags brian m. carlson
2025-09-19 22:34   ` Junio C Hamano
2025-09-20 15:29     ` brian m. carlson
2025-09-20 17:04       ` Junio C Hamano
2025-09-24  7:55       ` Patrick Steinhardt
2025-09-25 21:46         ` brian m. carlson
2025-09-19  1:09 ` [PATCH 4/9] docs: improve ambiguous areas of pack format documentation brian m. carlson
2025-09-19 23:04   ` Junio C Hamano
2025-09-19  1:09 ` [PATCH 5/9] docs: add documentation for loose objects brian m. carlson
2025-09-19 19:10   ` Junio C Hamano
2025-09-19 19:13     ` Junio C Hamano
2025-09-19 19:15       ` brian m. carlson
2025-09-19 20:18       ` Junio C Hamano
2025-09-24  7:55       ` Patrick Steinhardt
2025-09-25 21:40         ` brian m. carlson
2025-09-19 23:16   ` Junio C Hamano
2025-09-24  7:55   ` Patrick Steinhardt
2025-09-30 16:39     ` brian m. carlson
2025-09-19  1:09 ` [PATCH 6/9] rev-parse: allow printing compatibility hash brian m. carlson
2025-09-19 23:24   ` Junio C Hamano
2025-09-24  7:55   ` Patrick Steinhardt
2025-09-25 21:48     ` brian m. carlson
2025-09-19  1:09 ` [PATCH 7/9] fsck: consider gpgsig headers expected in tags brian m. carlson
2025-09-19 23:31   ` Junio C Hamano
2025-09-22 21:38     ` brian m. carlson
2025-09-19  1:09 ` [PATCH 8/9] Allow specifying compatibility hash brian m. carlson
2025-09-24  7:56   ` Patrick Steinhardt
2025-09-30 16:44     ` brian m. carlson
2025-09-19  1:09 ` [PATCH 9/9] t: add a prerequisite for a " brian m. carlson
2025-09-24  7:56   ` Patrick Steinhardt
2025-10-02 22:38 ` [PATCH v2 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
2025-10-02 22:38   ` [PATCH v2 1/9] docs: update pack index v3 format brian m. carlson
2025-10-03 17:00     ` Junio C Hamano
2025-10-02 22:38   ` [PATCH v2 2/9] docs: update offset order for pack index v3 brian m. carlson
2025-10-02 22:38   ` [PATCH v2 3/9] docs: reflect actual double signature for tags brian m. carlson
2025-10-02 22:38   ` [PATCH v2 4/9] docs: improve ambiguous areas of pack format documentation brian m. carlson
2025-10-03 17:07     ` Junio C Hamano
2025-10-03 21:06       ` brian m. carlson
2025-10-02 22:38   ` [PATCH v2 5/9] docs: add documentation for loose objects brian m. carlson
2025-10-03 17:05     ` Junio C Hamano
2025-10-02 22:38   ` [PATCH v2 6/9] rev-parse: allow printing compatibility hash brian m. carlson
2025-10-02 22:38   ` [PATCH v2 7/9] fsck: consider gpgsig headers expected in tags brian m. carlson
2025-10-02 22:38   ` [PATCH v2 8/9] t: allow specifying compatibility hash brian m. carlson
2025-10-03 17:14     ` Junio C Hamano
2025-10-03 20:45       ` brian m. carlson
2025-10-02 22:38   ` [PATCH v2 9/9] t1010: use BROKEN_OBJECTS prerequisite brian m. carlson
2025-10-09 21:56 ` [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
2025-10-09 21:56   ` [PATCH v3 1/9] docs: update pack index v3 format brian m. carlson
2025-10-09 21:56   ` [PATCH v3 2/9] docs: update offset order for pack index v3 brian m. carlson
2025-10-09 21:56   ` [PATCH v3 3/9] docs: reflect actual double signature for tags brian m. carlson
2025-10-09 21:56   ` [PATCH v3 4/9] docs: improve ambiguous areas of pack format documentation brian m. carlson
2025-10-09 21:56   ` [PATCH v3 5/9] docs: add documentation for loose objects brian m. carlson
2025-10-09 21:56   ` [PATCH v3 6/9] rev-parse: allow printing compatibility hash brian m. carlson
2025-10-09 21:56   ` [PATCH v3 7/9] fsck: consider gpgsig headers expected in tags brian m. carlson
2025-10-09 21:56   ` [PATCH v3 8/9] t: allow specifying compatibility hash brian m. carlson
2025-10-09 21:56   ` [PATCH v3 9/9] t1010: use BROKEN_OBJECTS prerequisite brian m. carlson
2025-10-13 15:24   ` [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1 Junio C Hamano
2025-10-13 16:34     ` brian m. carlson
2025-10-14  5:53       ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq7bxu14fw.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=ps@pks.im \
    --cc=sandals@crustytoothpaste.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).