From: Junio C Hamano <gitster@pobox.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: <git@vger.kernel.org>, Patrick Steinhardt <ps@pks.im>,
Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH 1/9] docs: update pack index v3 format
Date: Fri, 19 Sep 2025 15:08:03 -0700 [thread overview]
Message-ID: <xmqq7bxu14fw.fsf@gitster.g> (raw)
In-Reply-To: <20250919010911.649831-2-sandals@crustytoothpaste.net> (brian m. carlson's message of "Fri, 19 Sep 2025 01:09:03 +0000")
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> Our current pack index v3 format uses 4-byte integers to find the
> trailer of the file. This effectively means that the file cannot be
> much larger than 2^32. While this might at first seem to be okay, we
> expect that each object will have at least 64 bytes worth of data, which
> means that no more than about 67 million objects can be stored.
>
> Again, this might seem fine, but unfortunately, we know of many users
> who attempt to create repos with extremely large numbers of commits to
> get a "high score," and we've already seen repositories with at least 55
> million commits. In the interests of gracefully handling repositories
> even for these well-intentioned but ultimately misguided users, let's
> change these lengths to 8 bytes.
Very sensible.
I do also agree that 32-byte is the natural size for the trailing
hash, but I found that the two paragraphs below was far more than
necessary. As they argue, we use a truncated hash anywhere in our
file formats, so I would have understood if the explanation were
"20" in "A copy of the 20-byte SHA-256 checksum" is an obvious
typo, as SHA-256 is longer than that. Fix it to "32".
instead of these two paragraphs.
Or did we mean to use a truncated hash back when this transition
design was proposed originally?
> For the checksums at the end of the file, we're producing 32-byte
> SHA-256 checksums because that's what we already do with pack index v2
> and SHA-256. Truncating SHA-256 doesn't pose any actual security
> problems other than those related to the reduced size, but our pack
> checksum must already be 32 bytes (since SHA-256 packs have 32-byte
> checksums) and it simplifies the code to use the existing hashfile logic
> for these cases for the index checksum as well.
>
> In addition, even though we may not need cryptographic security for the
> index checksum, we'd like to avoid arguments from auditors and such for
> organizations that may have compliance or security requirements. Using
> the simple, boring choice of the full SHA-256 hash avoids all possible
> discussion related to hash truncation and removes impediments for these
> organizations.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
> Documentation/technical/hash-function-transition.adoc | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/technical/hash-function-transition.adoc b/Documentation/technical/hash-function-transition.adoc
> index f047fd80ca..f2df1d618d 100644
> --- a/Documentation/technical/hash-function-transition.adoc
> +++ b/Documentation/technical/hash-function-transition.adoc
> @@ -227,9 +227,9 @@ network byte order):
> ** 4-byte length in bytes of shortened object names. This is the
> shortest possible length needed to make names in the shortened
> object name table unambiguous.
> - ** 4-byte integer, recording where tables relating to this format
> + ** 8-byte integer, recording where tables relating to this format
> are stored in this index file, as an offset from the beginning.
> - * 4-byte offset to the trailer from the beginning of this file.
> + * 8-byte offset to the trailer from the beginning of this file.
> * Zero or more additional key/value pairs (4-byte key, 4-byte
> value). Only one key is supported: 'PSRC'. See the "Loose objects
> and unreachable objects" section for supported values and how this
> @@ -276,10 +276,10 @@ network byte order):
> up to and not including the table of CRC32 values.
> - Zero or more NUL bytes.
> - The trailer consists of the following:
> - * A copy of the 20-byte SHA-256 checksum at the end of the
> + * A copy of the 32-byte SHA-256 checksum at the end of the
> corresponding packfile.
>
> - * 20-byte SHA-256 checksum of all of the above.
> + * 32-byte SHA-256 checksum of all of the above.
>
> Loose object index
> ~~~~~~~~~~~~~~~~~~
next prev parent reply other threads:[~2025-09-19 22:08 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-19 1:09 [PATCH 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
2025-09-19 1:09 ` [PATCH 1/9] docs: update pack index v3 format brian m. carlson
2025-09-19 22:08 ` Junio C Hamano [this message]
2025-09-20 15:23 ` brian m. carlson
2025-09-20 17:01 ` Junio C Hamano
2025-09-24 7:55 ` Patrick Steinhardt
2025-09-25 21:39 ` brian m. carlson
2025-09-19 1:09 ` [PATCH 2/9] docs: update offset order for pack index v3 brian m. carlson
2025-09-19 1:09 ` [PATCH 3/9] docs: reflect actual double signature for tags brian m. carlson
2025-09-19 22:34 ` Junio C Hamano
2025-09-20 15:29 ` brian m. carlson
2025-09-20 17:04 ` Junio C Hamano
2025-09-24 7:55 ` Patrick Steinhardt
2025-09-25 21:46 ` brian m. carlson
2025-09-19 1:09 ` [PATCH 4/9] docs: improve ambiguous areas of pack format documentation brian m. carlson
2025-09-19 23:04 ` Junio C Hamano
2025-09-19 1:09 ` [PATCH 5/9] docs: add documentation for loose objects brian m. carlson
2025-09-19 19:10 ` Junio C Hamano
2025-09-19 19:13 ` Junio C Hamano
2025-09-19 19:15 ` brian m. carlson
2025-09-19 20:18 ` Junio C Hamano
2025-09-24 7:55 ` Patrick Steinhardt
2025-09-25 21:40 ` brian m. carlson
2025-09-19 23:16 ` Junio C Hamano
2025-09-24 7:55 ` Patrick Steinhardt
2025-09-30 16:39 ` brian m. carlson
2025-09-19 1:09 ` [PATCH 6/9] rev-parse: allow printing compatibility hash brian m. carlson
2025-09-19 23:24 ` Junio C Hamano
2025-09-24 7:55 ` Patrick Steinhardt
2025-09-25 21:48 ` brian m. carlson
2025-09-19 1:09 ` [PATCH 7/9] fsck: consider gpgsig headers expected in tags brian m. carlson
2025-09-19 23:31 ` Junio C Hamano
2025-09-22 21:38 ` brian m. carlson
2025-09-19 1:09 ` [PATCH 8/9] Allow specifying compatibility hash brian m. carlson
2025-09-24 7:56 ` Patrick Steinhardt
2025-09-30 16:44 ` brian m. carlson
2025-09-19 1:09 ` [PATCH 9/9] t: add a prerequisite for a " brian m. carlson
2025-09-24 7:56 ` Patrick Steinhardt
2025-10-02 22:38 ` [PATCH v2 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
2025-10-02 22:38 ` [PATCH v2 1/9] docs: update pack index v3 format brian m. carlson
2025-10-03 17:00 ` Junio C Hamano
2025-10-02 22:38 ` [PATCH v2 2/9] docs: update offset order for pack index v3 brian m. carlson
2025-10-02 22:38 ` [PATCH v2 3/9] docs: reflect actual double signature for tags brian m. carlson
2025-10-02 22:38 ` [PATCH v2 4/9] docs: improve ambiguous areas of pack format documentation brian m. carlson
2025-10-03 17:07 ` Junio C Hamano
2025-10-03 21:06 ` brian m. carlson
2025-10-02 22:38 ` [PATCH v2 5/9] docs: add documentation for loose objects brian m. carlson
2025-10-03 17:05 ` Junio C Hamano
2025-10-02 22:38 ` [PATCH v2 6/9] rev-parse: allow printing compatibility hash brian m. carlson
2025-10-02 22:38 ` [PATCH v2 7/9] fsck: consider gpgsig headers expected in tags brian m. carlson
2025-10-02 22:38 ` [PATCH v2 8/9] t: allow specifying compatibility hash brian m. carlson
2025-10-03 17:14 ` Junio C Hamano
2025-10-03 20:45 ` brian m. carlson
2025-10-02 22:38 ` [PATCH v2 9/9] t1010: use BROKEN_OBJECTS prerequisite brian m. carlson
2025-10-09 21:56 ` [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
2025-10-09 21:56 ` [PATCH v3 1/9] docs: update pack index v3 format brian m. carlson
2025-10-09 21:56 ` [PATCH v3 2/9] docs: update offset order for pack index v3 brian m. carlson
2025-10-09 21:56 ` [PATCH v3 3/9] docs: reflect actual double signature for tags brian m. carlson
2025-10-09 21:56 ` [PATCH v3 4/9] docs: improve ambiguous areas of pack format documentation brian m. carlson
2025-10-09 21:56 ` [PATCH v3 5/9] docs: add documentation for loose objects brian m. carlson
2025-10-09 21:56 ` [PATCH v3 6/9] rev-parse: allow printing compatibility hash brian m. carlson
2025-10-09 21:56 ` [PATCH v3 7/9] fsck: consider gpgsig headers expected in tags brian m. carlson
2025-10-09 21:56 ` [PATCH v3 8/9] t: allow specifying compatibility hash brian m. carlson
2025-10-09 21:56 ` [PATCH v3 9/9] t1010: use BROKEN_OBJECTS prerequisite brian m. carlson
2025-10-13 15:24 ` [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1 Junio C Hamano
2025-10-13 16:34 ` brian m. carlson
2025-10-14 5:53 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq7bxu14fw.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=ps@pks.im \
--cc=sandals@crustytoothpaste.net \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.