From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: Aditya Garg <gargaditya08@live.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: Question: how will sha256sum be implemented in git
Date: Fri, 4 Jul 2025 19:27:29 +0000 [thread overview]
Message-ID: <aGgrIWSitF1NsN2L@fruit.crustytoothpaste.net> (raw)
In-Reply-To: <PN3PR01MB9597524FAEAA3B26B15804B9B842A@PN3PR01MB9597.INDPRD01.PROD.OUTLOOK.COM>
[-- Attachment #1: Type: text/plain, Size: 3257 bytes --]
On 2025-07-04 at 11:18:12, Aditya Garg wrote:
> Hi all
>
> I just read that git aims to transition to SHA256 by default, and conversion from SHA1 to SHA256 is needed for old
> repos. I was just curious how will that be achieved.
>
> Dumb idea, but maybe we can just encode the existing SHA1 sums' string to SHA256?
>
> Eg:
>
> $ echo -n 8994f255af5451b6cd1db01ee16d8cf15b9df81e | sha256sum
> bf8d6d915848377db81ee47e883c0a683b3d86a49ab120191ea1c3d76a30c33f *-
>
> so bf8d6d915848377db81ee47e883c0a683b3d86a49ab120191ea1c3d76a30c33f will be our new commit hash.
This would unfortunately still be vulnerable to collisions in SHA-1,
which is the problem we're trying to avoid. For instance, if I can
create two blobs with that SHA-1 hash, then I can also create two blobs
with the corresponding SHA-256 value, since the input in this case is
just the SHA-1 value.
The way we do the transition is pretty simple. Blobs don't change; we
just hash them with either SHA-1 or SHA-256. For trees, we re-write all
of the entries to use the SHA-256 object IDs instead of the SHA-1 object
IDs and then we hash the result with SHA-256. And for commits and tags,
the headers that represent objects (tree, parent, and object) are
converted in a similar manner and then, again, hashed with SHA-256.
You can actually see how the conversion operates in
`object-file-convert.c`. `repo_oid_to_algop` converts an object from
one format to another based on the loose object map outlined in
`Documentation/technical/hash-function-transition.adoc`, or the v3 pack
index functionality which is not yet upstream but is available in my
`sha256-interop` branch. In general, the hash function transition
document explains a lot of the decision behind why we're doing what
we're doing and how it works. I have to give credit to Jonathan Nieder
for writing the document and to many people on the list for helping to
contribute to it, and I encourage you to read it: it's not too complex.
So with this approach, the SHA-256 object ID is computed totally
independently of the SHA-1 object ID but in the exact same way, just
with SHA-256 object IDs inside. We already have support for
SHA-256-only repositories right now: you can do `git init
--object-format=sha256` and create one, although not all forges and
tools currently support them.
The process of the conversion when we're in interoperability mode means
that we can take a repository that's in SHA-1, convert it to SHA-256,
continue to interoperate with the old SHA-1 version if we like, and
then, when we no longer want to use SHA-1, simply stick with the SHA-256
version and avoid using SHA-1 at all. That's part of what I'm working
on right now, and I'm pleased to report that I'm making a good amount of
progress. If you're able to attend Git Merge this year, either in
person or remotely, I'll be giving a talk on this topic.
I'm also planning to open a discussion on the list within the next
couple days or weeks about some protocol extensions that will be
necessary to let us fetch, clone, and push all repositories in
interoperability mode, so please feel free to follow along for that.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
next prev parent reply other threads:[~2025-07-04 19:27 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-04 11:18 Question: how will sha256sum be implemented in git Aditya Garg
2025-07-04 19:27 ` brian m. carlson [this message]
2025-07-05 7:09 ` Aditya Garg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aGgrIWSitF1NsN2L@fruit.crustytoothpaste.net \
--to=sandals@crustytoothpaste.net \
--cc=gargaditya08@live.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox