From: Junio C Hamano <gitster@pobox.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: <git@vger.kernel.org>, Patrick Steinhardt <ps@pks.im>,
Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH 1/9] docs: update pack index v3 format
Date: Fri, 19 Sep 2025 15:08:03 -0700 [thread overview]
Message-ID: <xmqq7bxu14fw.fsf@gitster.g> (raw)
In-Reply-To: <20250919010911.649831-2-sandals@crustytoothpaste.net> (brian m. carlson's message of "Fri, 19 Sep 2025 01:09:03 +0000")
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> Our current pack index v3 format uses 4-byte integers to find the
> trailer of the file. This effectively means that the file cannot be
> much larger than 2^32. While this might at first seem to be okay, we
> expect that each object will have at least 64 bytes worth of data, which
> means that no more than about 67 million objects can be stored.
>
> Again, this might seem fine, but unfortunately, we know of many users
> who attempt to create repos with extremely large numbers of commits to
> get a "high score," and we've already seen repositories with at least 55
> million commits. In the interests of gracefully handling repositories
> even for these well-intentioned but ultimately misguided users, let's
> change these lengths to 8 bytes.
Very sensible.
I do also agree that 32-byte is the natural size for the trailing
hash, but I found that the two paragraphs below was far more than
necessary. As they argue, we use a truncated hash anywhere in our
file formats, so I would have understood if the explanation were
"20" in "A copy of the 20-byte SHA-256 checksum" is an obvious
typo, as SHA-256 is longer than that. Fix it to "32".
instead of these two paragraphs.
Or did we mean to use a truncated hash back when this transition
design was proposed originally?
> For the checksums at the end of the file, we're producing 32-byte
> SHA-256 checksums because that's what we already do with pack index v2
> and SHA-256. Truncating SHA-256 doesn't pose any actual security
> problems other than those related to the reduced size, but our pack
> checksum must already be 32 bytes (since SHA-256 packs have 32-byte
> checksums) and it simplifies the code to use the existing hashfile logic
> for these cases for the index checksum as well.
>
> In addition, even though we may not need cryptographic security for the
> index checksum, we'd like to avoid arguments from auditors and such for
> organizations that may have compliance or security requirements. Using
> the simple, boring choice of the full SHA-256 hash avoids all possible
> discussion related to hash truncation and removes impediments for these
> organizations.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
> Documentation/technical/hash-function-transition.adoc | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/technical/hash-function-transition.adoc b/Documentation/technical/hash-function-transition.adoc
> index f047fd80ca..f2df1d618d 100644
> --- a/Documentation/technical/hash-function-transition.adoc
> +++ b/Documentation/technical/hash-function-transition.adoc
> @@ -227,9 +227,9 @@ network byte order):
> ** 4-byte length in bytes of shortened object names. This is the
> shortest possible length needed to make names in the shortened
> object name table unambiguous.
> - ** 4-byte integer, recording where tables relating to this format
> + ** 8-byte integer, recording where tables relating to this format
> are stored in this index file, as an offset from the beginning.
> - * 4-byte offset to the trailer from the beginning of this file.
> + * 8-byte offset to the trailer from the beginning of this file.
> * Zero or more additional key/value pairs (4-byte key, 4-byte
> value). Only one key is supported: 'PSRC'. See the "Loose objects
> and unreachable objects" section for supported values and how this
> @@ -276,10 +276,10 @@ network byte order):
> up to and not including the table of CRC32 values.
> - Zero or more NUL bytes.
> - The trailer consists of the following:
> - * A copy of the 20-byte SHA-256 checksum at the end of the
> + * A copy of the 32-byte SHA-256 checksum at the end of the
> corresponding packfile.
>
> - * 20-byte SHA-256 checksum of all of the above.
> + * 32-byte SHA-256 checksum of all of the above.
>
> Loose object index
> ~~~~~~~~~~~~~~~~~~
next prev parent reply other threads:[~2025-09-19 22:08 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-19 1:09 [PATCH 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
2025-09-19 1:09 ` [PATCH 1/9] docs: update pack index v3 format brian m. carlson
2025-09-19 22:08 ` Junio C Hamano [this message]
2025-09-20 15:23 ` brian m. carlson
2025-09-20 17:01 ` Junio C Hamano
2025-09-24 7:55 ` Patrick Steinhardt
2025-09-25 21:39 ` brian m. carlson
2025-09-19 1:09 ` [PATCH 2/9] docs: update offset order for pack index v3 brian m. carlson
2025-09-19 1:09 ` [PATCH 3/9] docs: reflect actual double signature for tags brian m. carlson
2025-09-19 22:34 ` Junio C Hamano
2025-09-20 15:29 ` brian m. carlson
2025-09-20 17:04 ` Junio C Hamano
2025-09-24 7:55 ` Patrick Steinhardt
2025-09-25 21:46 ` brian m. carlson
2025-09-19 1:09 ` [PATCH 4/9] docs: improve ambiguous areas of pack format documentation brian m. carlson
2025-09-19 23:04 ` Junio C Hamano
2025-09-19 1:09 ` [PATCH 5/9] docs: add documentation for loose objects brian m. carlson
2025-09-19 19:10 ` Junio C Hamano
2025-09-19 19:13 ` Junio C Hamano
2025-09-19 19:15 ` brian m. carlson
2025-09-19 20:18 ` Junio C Hamano
2025-09-24 7:55 ` Patrick Steinhardt
2025-09-25 21:40 ` brian m. carlson
2025-09-19 23:16 ` Junio C Hamano
2025-09-24 7:55 ` Patrick Steinhardt
2025-09-30 16:39 ` brian m. carlson
2025-09-19 1:09 ` [PATCH 6/9] rev-parse: allow printing compatibility hash brian m. carlson
2025-09-19 23:24 ` Junio C Hamano
2025-09-24 7:55 ` Patrick Steinhardt
2025-09-25 21:48 ` brian m. carlson
2025-09-19 1:09 ` [PATCH 7/9] fsck: consider gpgsig headers expected in tags brian m. carlson
2025-09-19 23:31 ` Junio C Hamano
2025-09-22 21:38 ` brian m. carlson
2025-09-19 1:09 ` [PATCH 8/9] Allow specifying compatibility hash brian m. carlson
2025-09-24 7:56 ` Patrick Steinhardt
2025-09-30 16:44 ` brian m. carlson
2025-09-19 1:09 ` [PATCH 9/9] t: add a prerequisite for a " brian m. carlson
2025-09-24 7:56 ` Patrick Steinhardt
2025-10-02 22:38 ` [PATCH v2 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
2025-10-02 22:38 ` [PATCH v2 1/9] docs: update pack index v3 format brian m. carlson
2025-10-03 17:00 ` Junio C Hamano
2025-10-02 22:38 ` [PATCH v2 2/9] docs: update offset order for pack index v3 brian m. carlson
2025-10-02 22:38 ` [PATCH v2 3/9] docs: reflect actual double signature for tags brian m. carlson
2025-10-02 22:38 ` [PATCH v2 4/9] docs: improve ambiguous areas of pack format documentation brian m. carlson
2025-10-03 17:07 ` Junio C Hamano
2025-10-03 21:06 ` brian m. carlson
2025-10-02 22:38 ` [PATCH v2 5/9] docs: add documentation for loose objects brian m. carlson
2025-10-03 17:05 ` Junio C Hamano
2025-10-02 22:38 ` [PATCH v2 6/9] rev-parse: allow printing compatibility hash brian m. carlson
2025-10-02 22:38 ` [PATCH v2 7/9] fsck: consider gpgsig headers expected in tags brian m. carlson
2025-10-02 22:38 ` [PATCH v2 8/9] t: allow specifying compatibility hash brian m. carlson
2025-10-03 17:14 ` Junio C Hamano
2025-10-03 20:45 ` brian m. carlson
2025-10-02 22:38 ` [PATCH v2 9/9] t1010: use BROKEN_OBJECTS prerequisite brian m. carlson
2025-10-09 21:56 ` [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
2025-10-09 21:56 ` [PATCH v3 1/9] docs: update pack index v3 format brian m. carlson
2025-10-09 21:56 ` [PATCH v3 2/9] docs: update offset order for pack index v3 brian m. carlson
2025-10-09 21:56 ` [PATCH v3 3/9] docs: reflect actual double signature for tags brian m. carlson
2025-10-09 21:56 ` [PATCH v3 4/9] docs: improve ambiguous areas of pack format documentation brian m. carlson
2025-10-09 21:56 ` [PATCH v3 5/9] docs: add documentation for loose objects brian m. carlson
2025-10-09 21:56 ` [PATCH v3 6/9] rev-parse: allow printing compatibility hash brian m. carlson
2025-10-09 21:56 ` [PATCH v3 7/9] fsck: consider gpgsig headers expected in tags brian m. carlson
2025-10-09 21:56 ` [PATCH v3 8/9] t: allow specifying compatibility hash brian m. carlson
2025-10-09 21:56 ` [PATCH v3 9/9] t1010: use BROKEN_OBJECTS prerequisite brian m. carlson
2025-10-13 15:24 ` [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1 Junio C Hamano
2025-10-13 16:34 ` brian m. carlson
2025-10-14 5:53 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq7bxu14fw.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=ps@pks.im \
--cc=sandals@crustytoothpaste.net \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).