git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH 1/9] docs: update pack index v3 format
Date: Thu, 25 Sep 2025 21:39:58 +0000	[thread overview]
Message-ID: <aNW2riVWtLQbacSR@fruit.crustytoothpaste.net> (raw)
In-Reply-To: <aNOj8fFTvkQ6jsaT@pks.im>

[-- Attachment #1: Type: text/plain, Size: 3615 bytes --]

On 2025-09-24 at 07:55:29, Patrick Steinhardt wrote:
> On Fri, Sep 19, 2025 at 01:09:03AM +0000, brian m. carlson wrote:
> > Our current pack index v3 format uses 4-byte integers to find the
> > trailer of the file.  This effectively means that the file cannot be
> > much larger than 2^32.  While this might at first seem to be okay, we
> > expect that each object will have at least 64 bytes worth of data, which
> > means that no more than about 67 million objects can be stored.
> > 
> > Again, this might seem fine, but unfortunately, we know of many users
> > who attempt to create repos with extremely large numbers of commits to
> > get a "high score," and we've already seen repositories with at least 55
> > million commits.  In the interests of gracefully handling repositories
> > even for these well-intentioned but ultimately misguided users, let's
> > change these lengths to 8 bytes.
> 
> Yeah, this makes sense. We can only assume that repositories will
> continue to grow, so it makes sense to future proof.
> 
> We also have the 4-byte number of objects contained in the pack. But as
> you explain, it's nothing we should need to worry about given that this
> is a mere counter, and not an offset into the file. I doubt that there's
> repositories out there that'll have more than 4 billion objects anytime
> soon.

There are certainly some users who try to do that at $DAYJOB, but they
come to our attention (because our maintenance job fails due to taking
too long) before they get there.  I am not, however, aware of any
actually legitimate and productive uses of repositories that threaten to
break that limit, which is what I think what we should really care
about.

In the event we start seeing those kinds of problems, it should be easy
to implement pack v5 with a corresponding index, just with a larger
number of objects.

> For now we only have SHA256 and SHA1. But thinking about the future,
> there will be a time when SHA256 will be considered broken. I wonder
> whether we should safeguard against that and also specify the trailer
> hash to be agile? That is, instead of hardcoding the hash function, we
> add something like a "primary" hash to the packfile and then use the
> full output of that hash as checksum.
> 
> In any case, please feel free to say "no" to the above thought. It's
> just something that popped into my mind upon reading this.

It is actually that it's the main hash algorithm in use.  So if we add a
third algorithm which is SHA-3-512, then the trailer checksum will be
SHA-3-512 when that's the main algorithm.

Technically, it's also SHA-1 if we're in a SHA-1 repository with SHA-256
compatibility.  That's not a use case I really encourage, but it is a
use case I'm testing because it exposes bugs in our codebase and I
expect people will want to do in-place conversion from SHA-1 only to
SHA-1 with SHA-256 at some point.

I'll fix that for v2.

> I guess one thing that should be explicitly pointed out in the commit
> message is that there are no implementations of the v3 format yet, so
> this is basically updating our envisioned design, only. Otherwise one
> might wonder why we can update the spec just so.

That isn't completely true.  There is an implementation, but it is not
yet on the list, and it follows the spec written here.  I will provide
documentation with the rest of the pack index code when index v3 comes
in, but I wanted to update this in case people are trying to add it in
other implementations as well.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

  reply	other threads:[~2025-09-25 21:40 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-19  1:09 [PATCH 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
2025-09-19  1:09 ` [PATCH 1/9] docs: update pack index v3 format brian m. carlson
2025-09-19 22:08   ` Junio C Hamano
2025-09-20 15:23     ` brian m. carlson
2025-09-20 17:01       ` Junio C Hamano
2025-09-24  7:55   ` Patrick Steinhardt
2025-09-25 21:39     ` brian m. carlson [this message]
2025-09-19  1:09 ` [PATCH 2/9] docs: update offset order for pack index v3 brian m. carlson
2025-09-19  1:09 ` [PATCH 3/9] docs: reflect actual double signature for tags brian m. carlson
2025-09-19 22:34   ` Junio C Hamano
2025-09-20 15:29     ` brian m. carlson
2025-09-20 17:04       ` Junio C Hamano
2025-09-24  7:55       ` Patrick Steinhardt
2025-09-25 21:46         ` brian m. carlson
2025-09-19  1:09 ` [PATCH 4/9] docs: improve ambiguous areas of pack format documentation brian m. carlson
2025-09-19 23:04   ` Junio C Hamano
2025-09-19  1:09 ` [PATCH 5/9] docs: add documentation for loose objects brian m. carlson
2025-09-19 19:10   ` Junio C Hamano
2025-09-19 19:13     ` Junio C Hamano
2025-09-19 19:15       ` brian m. carlson
2025-09-19 20:18       ` Junio C Hamano
2025-09-24  7:55       ` Patrick Steinhardt
2025-09-25 21:40         ` brian m. carlson
2025-09-19 23:16   ` Junio C Hamano
2025-09-24  7:55   ` Patrick Steinhardt
2025-09-30 16:39     ` brian m. carlson
2025-09-19  1:09 ` [PATCH 6/9] rev-parse: allow printing compatibility hash brian m. carlson
2025-09-19 23:24   ` Junio C Hamano
2025-09-24  7:55   ` Patrick Steinhardt
2025-09-25 21:48     ` brian m. carlson
2025-09-19  1:09 ` [PATCH 7/9] fsck: consider gpgsig headers expected in tags brian m. carlson
2025-09-19 23:31   ` Junio C Hamano
2025-09-22 21:38     ` brian m. carlson
2025-09-19  1:09 ` [PATCH 8/9] Allow specifying compatibility hash brian m. carlson
2025-09-24  7:56   ` Patrick Steinhardt
2025-09-30 16:44     ` brian m. carlson
2025-09-19  1:09 ` [PATCH 9/9] t: add a prerequisite for a " brian m. carlson
2025-09-24  7:56   ` Patrick Steinhardt
2025-10-02 22:38 ` [PATCH v2 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
2025-10-02 22:38   ` [PATCH v2 1/9] docs: update pack index v3 format brian m. carlson
2025-10-03 17:00     ` Junio C Hamano
2025-10-02 22:38   ` [PATCH v2 2/9] docs: update offset order for pack index v3 brian m. carlson
2025-10-02 22:38   ` [PATCH v2 3/9] docs: reflect actual double signature for tags brian m. carlson
2025-10-02 22:38   ` [PATCH v2 4/9] docs: improve ambiguous areas of pack format documentation brian m. carlson
2025-10-03 17:07     ` Junio C Hamano
2025-10-03 21:06       ` brian m. carlson
2025-10-02 22:38   ` [PATCH v2 5/9] docs: add documentation for loose objects brian m. carlson
2025-10-03 17:05     ` Junio C Hamano
2025-10-02 22:38   ` [PATCH v2 6/9] rev-parse: allow printing compatibility hash brian m. carlson
2025-10-02 22:38   ` [PATCH v2 7/9] fsck: consider gpgsig headers expected in tags brian m. carlson
2025-10-02 22:38   ` [PATCH v2 8/9] t: allow specifying compatibility hash brian m. carlson
2025-10-03 17:14     ` Junio C Hamano
2025-10-03 20:45       ` brian m. carlson
2025-10-02 22:38   ` [PATCH v2 9/9] t1010: use BROKEN_OBJECTS prerequisite brian m. carlson
2025-10-09 21:56 ` [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
2025-10-09 21:56   ` [PATCH v3 1/9] docs: update pack index v3 format brian m. carlson
2025-10-09 21:56   ` [PATCH v3 2/9] docs: update offset order for pack index v3 brian m. carlson
2025-10-09 21:56   ` [PATCH v3 3/9] docs: reflect actual double signature for tags brian m. carlson
2025-10-09 21:56   ` [PATCH v3 4/9] docs: improve ambiguous areas of pack format documentation brian m. carlson
2025-10-09 21:56   ` [PATCH v3 5/9] docs: add documentation for loose objects brian m. carlson
2025-10-09 21:56   ` [PATCH v3 6/9] rev-parse: allow printing compatibility hash brian m. carlson
2025-10-09 21:56   ` [PATCH v3 7/9] fsck: consider gpgsig headers expected in tags brian m. carlson
2025-10-09 21:56   ` [PATCH v3 8/9] t: allow specifying compatibility hash brian m. carlson
2025-10-09 21:56   ` [PATCH v3 9/9] t1010: use BROKEN_OBJECTS prerequisite brian m. carlson
2025-10-13 15:24   ` [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1 Junio C Hamano
2025-10-13 16:34     ` brian m. carlson
2025-10-14  5:53       ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aNW2riVWtLQbacSR@fruit.crustytoothpaste.net \
    --to=sandals@crustytoothpaste.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=ps@pks.im \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).