git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Subject: [NOTES 01/11] SHA-256 and interoperability work
Date: Mon, 6 Oct 2025 15:18:02 -0400	[thread overview]
Message-ID: <aOQV6iM49QDhcC+C@nand.local> (raw)
In-Reply-To: <aOQVeVYY6zadPjln@nand.local>

Topic: SHA256 and interoperability work
Leader: brian
10:15am-10:45am PT

* lot of work to do
* brian is working on it
* it's progressing, not sure if we can get everything done by 3.0
* how to deal with submodules
	 * you can produce a split history
	 * accept, document, ?
	 * we need to have mapping on server or client
	 * if someone pushed one commit in sha1 and a different in 256, we can end up
		 with divergent histories that could produce security issues
	 * some private repos for open-core type submodules make this difficult with
		 submodules
	 * could have the server query, client derive mapping
	 * server could also be malicious
* if you're converting, how does that work in gpg signatures?
	 * we have a way to map both signatures
	 * if you're in compatibility mode, it will produce signatures for both
	 * what about for older histories, how can it be verified if it's only valid
		 for sha1?
			* it can be verified but can't be resigned
	 * for converting, can that work?
			* converting will retain the sha1 signature
* what is the simplest user journey?
	 * I have a clone of a repo in sha1, am I expected to run a conversion locally
		 and then I can talk to GH in 256 protocol?
			* you will create a new repo with 256 with sha1 compatibility and clone
				into that, which will convert it into both algo
			* download the data again?
				 * clone it to another directory locally
				 * it will preserve the sha1 repo and create the compatibility layer
			* let's say the local one has a submodule, clone locally including the
				submodule?
				 * yes, the conversion script will convert the submodule as well and
					 you'll have both ids
			* if I do a fetch, which do I need
				 * you need a mapping if you're talking to a server with the other algo
			* the mapping is only needed for the server if it wants to be forward
				facing?
			* with mapping, its only commits or all objects
				 * all objects
			* if someone trusts github, they can just consume it's mapping?
				 * the server and client will do their own mapping
* what happens if nobody has the submodule anymore? commit from 10 years ago but
	nobody has that submodule anymore, how do you make a 256 tree out of that
	 * pick one at random it doesnt matter
			* but you can't match everyone else
	 * we've chosen to use divergent history in this case
	 * Same issue exists with LFS objects
* if you have the old submodules,
* recursive/cyclic submodules?
	 * it's something we need to handle, don't have a great plan but it could be
		 done
	 * plan is to maybe have some pool
			* you have to convert the submodule up until that point, then convert them
				piecewise
* have you thought about mix/match where one uses sha1 and the other uses 256
	 * we can't distinguish the size of the object id vs filename
* right now you're doing the work, are you thinking of allowing another hash
	algo without having these issues again?
	 * the way the design works now is that we have two algos - main and
		 compatibility, but designed to accept multiple algos. if we switch to 3512
		 at some point for example, we could add another compat algo - it's some
		 work but the approach doesn't assume much about the specific algorithm
* steiny thought it could be useful to add a third algo not for security but
	speed
	 * gh has the insecure non crypto varients
	 * problem is always client support
	 * corporate controlled repo often also has control of the clients - so maybe
		 less of a security issue but depends
* can you put a sha1 link inside a 256 tree
	 * maybe an extra bit in the mode, some other interesting horrible thoughts
	 * would it make submodule problems go away if you could just carry the other
		 forever until the downstream decides to switch
	 * solves the submodule problem but not LFS problem?
			* LFS might be easier, you don't need to have the object to convert yours
			* assuming you have the object still
	 * brian not 100% against it
			* if I could do a 256 repo with a 256 submodule, you could parse it back,
				but if you do that, it's a different size and not usable by older
				versions of git
	 * if we were clever, sha1 trees hold sh1, 256 holds 256 and only when you
		 have a sha1 tree inside a 256 that we would use some new format
			* the problem is you still end up with stuff that doesn't work with older
				versions
			* degrades gracefully like a mode bit, worse case is that it checks out
				weird filenames?
			* write it out, take it to the list
	 * we discussed upgrading the tree object format, but it's so tight

  reply	other threads:[~2025-10-06 19:18 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-06 19:16 Notes from the Git Contributor's Summit, 2025 Taylor Blau
2025-10-06 19:18 ` Taylor Blau [this message]
2025-10-06 19:18 ` [NOTES 02/11] First-class conflicts in Git? Taylor Blau
2025-10-06 19:18 ` [NOTES 03/11] The future of history rewriting - rebase, replay and history (+Change-IDs) Taylor Blau
2025-10-06 19:18 ` [NOTES 04/11] Rust Taylor Blau
2025-10-06 19:19 ` [NOTES 05/11] Pluggable object databases Taylor Blau
2025-10-06 19:19 ` [NOTES 06/11] Repository maintenance long-term goals Taylor Blau
2025-10-06 19:19 ` [NOTES 07/11] Change-ID Header in Git Taylor Blau
2025-10-06 19:20 ` [NOTES 08/11] Resumable fetch / push Taylor Blau
2025-10-06 19:20 ` [NOTES 09/11] Git 3.0 Taylor Blau
2025-10-06 19:20 ` [NOTES 10/11] How can companies respectfully engage contractors to work on Git? Taylor Blau
2025-10-06 19:20 ` [NOTES 11/11] Conservancy 2025 updates Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aOQV6iM49QDhcC+C@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).