* Question: how will sha256sum be implemented in git @ 2025-07-04 11:18 Aditya Garg 2025-07-04 19:27 ` brian m. carlson 0 siblings, 1 reply; 3+ messages in thread From: Aditya Garg @ 2025-07-04 11:18 UTC (permalink / raw) To: git@vger.kernel.org Hi all I just read that git aims to transition to SHA256 by default, and conversion from SHA1 to SHA256 is needed for old repos. I was just curious how will that be achieved. Dumb idea, but maybe we can just encode the existing SHA1 sums' string to SHA256? Eg: $ echo -n 8994f255af5451b6cd1db01ee16d8cf15b9df81e | sha256sum bf8d6d915848377db81ee47e883c0a683b3d86a49ab120191ea1c3d76a30c33f *- so bf8d6d915848377db81ee47e883c0a683b3d86a49ab120191ea1c3d76a30c33f will be our new commit hash. I think we can do that since sha256sum is chosen due to negligible collisions right? ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Question: how will sha256sum be implemented in git 2025-07-04 11:18 Question: how will sha256sum be implemented in git Aditya Garg @ 2025-07-04 19:27 ` brian m. carlson 2025-07-05 7:09 ` Aditya Garg 0 siblings, 1 reply; 3+ messages in thread From: brian m. carlson @ 2025-07-04 19:27 UTC (permalink / raw) To: Aditya Garg; +Cc: git@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 3257 bytes --] On 2025-07-04 at 11:18:12, Aditya Garg wrote: > Hi all > > I just read that git aims to transition to SHA256 by default, and conversion from SHA1 to SHA256 is needed for old > repos. I was just curious how will that be achieved. > > Dumb idea, but maybe we can just encode the existing SHA1 sums' string to SHA256? > > Eg: > > $ echo -n 8994f255af5451b6cd1db01ee16d8cf15b9df81e | sha256sum > bf8d6d915848377db81ee47e883c0a683b3d86a49ab120191ea1c3d76a30c33f *- > > so bf8d6d915848377db81ee47e883c0a683b3d86a49ab120191ea1c3d76a30c33f will be our new commit hash. This would unfortunately still be vulnerable to collisions in SHA-1, which is the problem we're trying to avoid. For instance, if I can create two blobs with that SHA-1 hash, then I can also create two blobs with the corresponding SHA-256 value, since the input in this case is just the SHA-1 value. The way we do the transition is pretty simple. Blobs don't change; we just hash them with either SHA-1 or SHA-256. For trees, we re-write all of the entries to use the SHA-256 object IDs instead of the SHA-1 object IDs and then we hash the result with SHA-256. And for commits and tags, the headers that represent objects (tree, parent, and object) are converted in a similar manner and then, again, hashed with SHA-256. You can actually see how the conversion operates in `object-file-convert.c`. `repo_oid_to_algop` converts an object from one format to another based on the loose object map outlined in `Documentation/technical/hash-function-transition.adoc`, or the v3 pack index functionality which is not yet upstream but is available in my `sha256-interop` branch. In general, the hash function transition document explains a lot of the decision behind why we're doing what we're doing and how it works. I have to give credit to Jonathan Nieder for writing the document and to many people on the list for helping to contribute to it, and I encourage you to read it: it's not too complex. So with this approach, the SHA-256 object ID is computed totally independently of the SHA-1 object ID but in the exact same way, just with SHA-256 object IDs inside. We already have support for SHA-256-only repositories right now: you can do `git init --object-format=sha256` and create one, although not all forges and tools currently support them. The process of the conversion when we're in interoperability mode means that we can take a repository that's in SHA-1, convert it to SHA-256, continue to interoperate with the old SHA-1 version if we like, and then, when we no longer want to use SHA-1, simply stick with the SHA-256 version and avoid using SHA-1 at all. That's part of what I'm working on right now, and I'm pleased to report that I'm making a good amount of progress. If you're able to attend Git Merge this year, either in person or remotely, I'll be giving a talk on this topic. I'm also planning to open a discussion on the list within the next couple days or weeks about some protocol extensions that will be necessary to let us fetch, clone, and push all repositories in interoperability mode, so please feel free to follow along for that. -- brian m. carlson (they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Question: how will sha256sum be implemented in git 2025-07-04 19:27 ` brian m. carlson @ 2025-07-05 7:09 ` Aditya Garg 0 siblings, 0 replies; 3+ messages in thread From: Aditya Garg @ 2025-07-05 7:09 UTC (permalink / raw) To: brian m. carlson; +Cc: git@vger.kernel.org On 5 July 2025 12:57:29 am IST, "brian m. carlson" <sandals@crustytoothpaste.net> wrote: >On 2025-07-04 at 11:18:12, Aditya Garg wrote: >> Hi all >> >> I just read that git aims to transition to SHA256 by default, and conversion from SHA1 to SHA256 is needed for old >> repos. I was just curious how will that be achieved. >> >> Dumb idea, but maybe we can just encode the existing SHA1 sums' string to SHA256? >> >> Eg: >> >> $ echo -n 8994f255af5451b6cd1db01ee16d8cf15b9df81e | sha256sum >> bf8d6d915848377db81ee47e883c0a683b3d86a49ab120191ea1c3d76a30c33f *- >> >> so bf8d6d915848377db81ee47e883c0a683b3d86a49ab120191ea1c3d76a30c33f will be our new commit hash. > >This would unfortunately still be vulnerable to collisions in SHA-1, >which is the problem we're trying to avoid. For instance, if I can >create two blobs with that SHA-1 hash, then I can also create two blobs >with the corresponding SHA-256 value, since the input in this case is >just the SHA-1 value. > >The way we do the transition is pretty simple. Blobs don't change; we >just hash them with either SHA-1 or SHA-256. For trees, we re-write all >of the entries to use the SHA-256 object IDs instead of the SHA-1 object >IDs and then we hash the result with SHA-256. And for commits and tags, >the headers that represent objects (tree, parent, and object) are >converted in a similar manner and then, again, hashed with SHA-256. > >You can actually see how the conversion operates in >`object-file-convert.c`. `repo_oid_to_algop` converts an object from >one format to another based on the loose object map outlined in >`Documentation/technical/hash-function-transition.adoc`, or the v3 pack >index functionality which is not yet upstream but is available in my >`sha256-interop` branch. In general, the hash function transition >document explains a lot of the decision behind why we're doing what >we're doing and how it works. I have to give credit to Jonathan Nieder >for writing the document and to many people on the list for helping to >contribute to it, and I encourage you to read it: it's not too complex. > I'll have a look >So with this approach, the SHA-256 object ID is computed totally >independently of the SHA-1 object ID but in the exact same way, just >with SHA-256 object IDs inside. We already have support for >SHA-256-only repositories right now: you can do `git init >--object-format=sha256` and create one, although not all forges and >tools currently support them. > >The process of the conversion when we're in interoperability mode means >that we can take a repository that's in SHA-1, convert it to SHA-256, >continue to interoperate with the old SHA-1 version if we like, and >then, when we no longer want to use SHA-1, simply stick with the SHA-256 >version and avoid using SHA-1 at all. That's part of what I'm working >on right now, and I'm pleased to report that I'm making a good amount of >progress. If you're able to attend Git Merge this year, either in >person or remotely, I'll be giving a talk on this topic. I'll see if remotely is possible. I neither have a US visa for in person, nor it suits my budget. > >I'm also planning to open a discussion on the list within the next >couple days or weeks about some protocol extensions that will be >necessary to let us fetch, clone, and push all repositories in >interoperability mode, so please feel free to follow along for that. Great! ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-07-05 7:09 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-07-04 11:18 Question: how will sha256sum be implemented in git Aditya Garg 2025-07-04 19:27 ` brian m. carlson 2025-07-05 7:09 ` Aditya Garg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox