git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/10] Add SHA-256 by default as a breaking change
@ 2025-06-20  1:19 brian m. carlson
  2025-06-20  1:19 ` [PATCH 01/10] hash: add a constant for the default hash algorithm brian m. carlson
                   ` (10 more replies)
  0 siblings, 11 replies; 47+ messages in thread
From: brian m. carlson @ 2025-06-20  1:19 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt

Our breaking changes document outlines that Git 3.0 will make SHA-256
the default hash algorithm, which is a sensible and prudent approach,
especially from a security perspective.  However, we haven't tested this
adequately and it would be helpful to allow users to test this behaviour
so their code and environments are ready for it.

Fortunately, c5bc9a7f94 (Makefile: wire up build option for deprecated
features, 2025-01-22) introduces a build option that we can use for
testing breaking changes: WITH_BREAKING_CHANGES.

This series first introduces two new hash constants: one for the default
hash algorithm and one for the original algorithm (SHA-1).  The utility
of the former should be obvious; the latter[0] exists to allow us to
clearly document those places in the code that we are using SHA-1
because older versions of Git did not support SHA-1 and we are
maintaining backward compatibility, as opposed to those where SHA-1 is
an intentional and deliberate choice (such as when a file format allows
the user specify an algorithm).  We then have several patches that use
these constants throughout our code; fortunately, the need for them is
limited due to provident design decisions.

There is then a small amount of code to adjust the setup code to use the
default hash and adjust fallbacks to SHA-1 where necessary.

We then set up the testsuite to support a default algorithm.  It is
called the built-in algorithm since "default" is already used for the
algorithm we're running in the testsuite.  There are then a few tests
that need fixing.  Again, there aren't too many thanks to good decisions
on the part of contributors.

Finally, we wire up SHA-256 by default when WITH_BREAKING_CHANGES is
set with a rather anticlimactic amount of code.  All in all, this series
was far less work and far smaller than I expected.  Many thanks to
everyone who contributed to that being the case.

I have tested this series not only on CI but three different ways on my
local machine: without any GIT_TEST_DEFAULT_HASH, with that option set
to "sha256", and with that option set to "sha1".  It passed in every
case and all three options were useful in finding bugs in this series.

There was some discussion about adding support for SHA-256
interoperability for Git 3.0 as well.  Since my talk on that for Git
Merge was accepted, I've started on some of that work so the talk can
have more concrete and less handwavy content.

Over the past week, I've gotten down from 133 to 96 failing tests in
compatibility mode, which is definitely an improvement.  To be clear: I
am not committing to finishing that work 100%[1], but I am at least
working on it right now and it is at least mostly functional if you
ignore submodules, partial clone, and shallow clone.  My incentive to
work on it has also been helped by a new Framework 13 laptop that can
run the testsuite with -j12 in 65-70 seconds without making the
remainder of the system laggy, so testing is less painful and the
feedback loop is much shorter.

[0] This is, in my view, the most controversial part of the series, but
I think it's nice for documentary purposes.  If people really hate it,
we can drop it.
[1] I had previously said I wasn't going to do it at all, and clearly I
lied, but since this is in my free time, I may decide to do something
else instead at some point.  I'll make the code available for people to
work with if I don't get a chance to send all of the patches.

brian m. carlson (10):
  hash: add a constant for the default hash algorithm
  hash: add a constant for the original hash algorithm
  builtin: use default hash when outside a repository
  Use original hash for legacy formats
  setup: use the default algorithm to initialize repo format
  t: default to compile-time default hash if not set
  t1007: choose the built-in hash outside of a repo
  t4042: choose the built-in hash outside of a repo
  t5300: choose the built-in hash outside of a repo
  Enable SHA-256 by default in breaking changes mode

 builtin/apply.c                  |  2 +-
 builtin/diff.c                   |  2 +-
 builtin/hash-object.c            |  2 +-
 builtin/index-pack.c             |  2 +-
 builtin/ls-remote.c              |  2 +-
 builtin/patch-id.c               |  2 +-
 builtin/receive-pack.c           |  2 +-
 builtin/shortlog.c               |  2 +-
 builtin/show-index.c             |  2 +-
 bundle.c                         |  4 ++--
 connect.c                        |  6 +++---
 fetch-pack.c                     |  2 +-
 hash.h                           | 10 ++++++++++
 pkt-line.c                       |  2 +-
 remote-curl.c                    |  2 +-
 serve.c                          |  2 +-
 setup.c                          |  9 ++++++---
 setup.h                          |  2 +-
 t/t1007-hash-object.sh           |  4 ++--
 t/t4042-diff-textconv-caching.sh | 12 ++++++++++--
 t/t5300-pack-object.sh           |  6 +++---
 t/test-lib-functions.sh          |  5 ++++-
 t/test-lib.sh                    | 12 +++++++++++-
 transport.c                      |  2 +-
 24 files changed, 66 insertions(+), 32 deletions(-)


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 01/10] hash: add a constant for the default hash algorithm
  2025-06-20  1:19 [PATCH 00/10] Add SHA-256 by default as a breaking change brian m. carlson
@ 2025-06-20  1:19 ` brian m. carlson
  2025-06-20  1:19 ` [PATCH 02/10] hash: add a constant for the original " brian m. carlson
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 47+ messages in thread
From: brian m. carlson @ 2025-06-20  1:19 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt

Right now, SHA-1 is the default hash algorithm in Git.  However, this
may change in the future.

We have many places in our code that use the SHA-1 constant to indicate
the default hash if none is specified, but it will end up being more
practical to specify this explicitly and clearly using a constant for
whatever the default hash algorithm is.  Then, if we decide to change it
in the future, we can simply replace the constant representing the
default with a new value.

For these reasons, introduce GIT_HASH_DEFAULT to represent the default
hash algorithm.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 hash.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hash.h b/hash.h
index d6422ddf45..0d3d85e04c 100644
--- a/hash.h
+++ b/hash.h
@@ -174,6 +174,8 @@ static inline void git_SHA256_Clone(git_SHA256_CTX *dst, const git_SHA256_CTX *s
 #define GIT_HASH_SHA256 2
 /* Number of algorithms supported (including unknown). */
 #define GIT_HASH_NALGOS (GIT_HASH_SHA256 + 1)
+/* Default hash algorithm if unspecified. */
+#define GIT_HASH_DEFAULT GIT_HASH_SHA1
 
 /* "sha1", big-endian */
 #define GIT_SHA1_FORMAT_ID 0x73686131

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 02/10] hash: add a constant for the original hash algorithm
  2025-06-20  1:19 [PATCH 00/10] Add SHA-256 by default as a breaking change brian m. carlson
  2025-06-20  1:19 ` [PATCH 01/10] hash: add a constant for the default hash algorithm brian m. carlson
@ 2025-06-20  1:19 ` brian m. carlson
  2025-06-20  1:56   ` Junio C Hamano
  2025-06-20  1:19 ` [PATCH 03/10] builtin: use default hash when outside a repository brian m. carlson
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 47+ messages in thread
From: brian m. carlson @ 2025-06-20  1:19 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt

We have a a variety of uses of GIT_HASH_SHA1 littered throughout our
code.  Some of these really mean to represent specifically SHA-1, but
some actually represent the original hash algorithm used in Git which is
implied by older formats and protocols which do not contain hash
information.  For instance, the bundle v1 and v2 formats do not contain
hash algorithm information, and thus SHA-1 is implied by the use of
these formats.

Add a constant for documentary purposes which indicates this value.  It
will always be the same as SHA-1, since this is an essential part of
these formats, but its use indicates this particular reason and not any
other reason why SHA-1 might be used.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 hash.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hash.h b/hash.h
index 0d3d85e04c..0e14cade4e 100644
--- a/hash.h
+++ b/hash.h
@@ -176,6 +176,8 @@ static inline void git_SHA256_Clone(git_SHA256_CTX *dst, const git_SHA256_CTX *s
 #define GIT_HASH_NALGOS (GIT_HASH_SHA256 + 1)
 /* Default hash algorithm if unspecified. */
 #define GIT_HASH_DEFAULT GIT_HASH_SHA1
+/* Original hash algorithm. Implied for older data formats which don't specify. */
+#define GIT_HASH_ORIGINAL GIT_HASH_SHA1
 
 /* "sha1", big-endian */
 #define GIT_SHA1_FORMAT_ID 0x73686131

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 03/10] builtin: use default hash when outside a repository
  2025-06-20  1:19 [PATCH 00/10] Add SHA-256 by default as a breaking change brian m. carlson
  2025-06-20  1:19 ` [PATCH 01/10] hash: add a constant for the default hash algorithm brian m. carlson
  2025-06-20  1:19 ` [PATCH 02/10] hash: add a constant for the original " brian m. carlson
@ 2025-06-20  1:19 ` brian m. carlson
  2025-06-20 14:19   ` Junio C Hamano
  2025-07-01 11:35   ` Patrick Steinhardt
  2025-06-20  1:19 ` [PATCH 04/10] Use original hash for legacy formats brian m. carlson
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 47+ messages in thread
From: brian m. carlson @ 2025-06-20  1:19 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt

We have some commands that can operate inside or outside a repository.
If we're operating outside a repository, we clearly cannot use the
repository's hash algorithm as a default since it doesn't exist, so
instead, let's pick the default instead of specifically SHA-1.  Right
now this results in no functional change since the default is SHA-1, but
that may change in the future.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/apply.c       | 2 +-
 builtin/diff.c        | 2 +-
 builtin/hash-object.c | 2 +-
 builtin/index-pack.c  | 2 +-
 builtin/ls-remote.c   | 2 +-
 builtin/patch-id.c    | 2 +-
 builtin/shortlog.c    | 2 +-
 builtin/show-index.c  | 2 +-
 8 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/builtin/apply.c b/builtin/apply.c
index a1e20c593d..d642a40251 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -29,7 +29,7 @@ int cmd_apply(int argc,
 	 * cf. https://lore.kernel.org/git/xmqqcypfcmn4.fsf@gitster.g/
 	 */
 	if (!the_hash_algo)
-		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+		repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
 
 	argc = apply_parse_options(argc, argv,
 				   &state, &force_apply, &options,
diff --git a/builtin/diff.c b/builtin/diff.c
index c6231edce4..eebffe36cc 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -483,7 +483,7 @@ int cmd_diff(int argc,
 	 * configurable via a command line option.
 	 */
 	if (nongit)
-		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+		repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
 
 	init_diff_ui_defaults();
 	git_config(git_diff_ui_config, NULL);
diff --git a/builtin/hash-object.c b/builtin/hash-object.c
index 6a99ec250d..213a302e05 100644
--- a/builtin/hash-object.c
+++ b/builtin/hash-object.c
@@ -104,7 +104,7 @@ int cmd_hash_object(int argc,
 		prefix = setup_git_directory_gently(&nongit);
 
 	if (nongit && !the_hash_algo)
-		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+		repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
 
 	if (vpath && prefix) {
 		vpath_free = prefix_filename(prefix, vpath);
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index bb7925bd29..352ce7f88a 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -2034,7 +2034,7 @@ int cmd_index_pack(int argc,
 	 * choice but to guess the object hash.
 	 */
 	if (!the_repository->hash_algo)
-		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+		repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
 
 	opts.flags &= ~(WRITE_REV | WRITE_REV_VERIFY);
 	if (rev_index) {
diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index 01a4d4daa1..df09000b30 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -112,7 +112,7 @@ int cmd_ls_remote(int argc,
 	 * depending on what object hash the remote uses.
 	 */
 	if (!the_repository->hash_algo)
-		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+		repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
 
 	packet_trace_identity("ls-remote");
 
diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index cdef2ec10a..26f04b0335 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -254,7 +254,7 @@ int cmd_patch_id(int argc,
 	 * the code that computes patch IDs to always use SHA1.
 	 */
 	if (!the_hash_algo)
-		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+		repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
 
 	generate_id_list(opts ? opts > 1 : config.stable,
 			 opts ? opts == 3 : config.verbatim);
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index fe15e11497..60adc5e7a5 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -419,7 +419,7 @@ int cmd_shortlog(int argc,
 	 * git/nongit so that we do not have to do this.
 	 */
 	if (nongit && !the_hash_algo)
-		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+		repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
 
 	git_config(git_default_config, NULL);
 	shortlog_init(&log);
diff --git a/builtin/show-index.c b/builtin/show-index.c
index 9d4ecf5e7b..2c3e2940ce 100644
--- a/builtin/show-index.c
+++ b/builtin/show-index.c
@@ -47,7 +47,7 @@ int cmd_show_index(int argc,
 	 *       the index file passed in and use that instead.
 	 */
 	if (!the_hash_algo)
-		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+		repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
 
 	hashsz = the_hash_algo->rawsz;
 

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 04/10] Use original hash for legacy formats
  2025-06-20  1:19 [PATCH 00/10] Add SHA-256 by default as a breaking change brian m. carlson
                   ` (2 preceding siblings ...)
  2025-06-20  1:19 ` [PATCH 03/10] builtin: use default hash when outside a repository brian m. carlson
@ 2025-06-20  1:19 ` brian m. carlson
  2025-06-20 14:26   ` Junio C Hamano
  2025-06-20  1:19 ` [PATCH 05/10] setup: use the default algorithm to initialize repo format brian m. carlson
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 47+ messages in thread
From: brian m. carlson @ 2025-06-20  1:19 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt

We have a large variety of data formats and protocols where no hash
algorithm was defined and the default was assumed to always be SHA-1.
Instead of explicitly stating SHA-1, let's use the constant to represent
the original hash algorithm (which is still SHA-1) so that it's clear
for documentary purposes that it's a legacy fallback option and not an
intentional choice to use SHA-1.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/receive-pack.c | 2 +-
 bundle.c               | 4 ++--
 connect.c              | 6 +++---
 fetch-pack.c           | 2 +-
 pkt-line.c             | 2 +-
 remote-curl.c          | 2 +-
 serve.c                | 2 +-
 setup.c                | 4 ++--
 transport.c            | 2 +-
 9 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index a317d6c278..6c9b246895 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -2136,7 +2136,7 @@ static struct command *read_head_info(struct packet_reader *reader,
 				use_push_options = 1;
 			hash = parse_feature_value(feature_list, "object-format", &len, NULL);
 			if (!hash) {
-				hash = hash_algos[GIT_HASH_SHA1].name;
+				hash = hash_algos[GIT_HASH_ORIGINAL].name;
 				len = strlen(hash);
 			}
 			if (xstrncmpz(the_hash_algo->name, hash, len))
diff --git a/bundle.c b/bundle.c
index b0a3fee2ef..359cbc44d0 100644
--- a/bundle.c
+++ b/bundle.c
@@ -95,7 +95,7 @@ int read_bundle_header_fd(int fd, struct bundle_header *header,
 	 * by an "object-format=" capability, which is being handled in
 	 * `parse_capability()`.
 	 */
-	header->hash_algo = &hash_algos[GIT_HASH_SHA1];
+	header->hash_algo = &hash_algos[GIT_HASH_ORIGINAL];
 
 	/* The bundle header ends with an empty line */
 	while (!strbuf_getwholeline_fd(&buf, fd, '\n') &&
@@ -507,7 +507,7 @@ int create_bundle(struct repository *r, const char *path,
 	 *    SHA1.
 	 * 2. @filter is required because we parsed an object filter.
 	 */
-	if (the_hash_algo != &hash_algos[GIT_HASH_SHA1] || revs.filter.choice)
+	if (the_hash_algo != &hash_algos[GIT_HASH_ORIGINAL] || revs.filter.choice)
 		min_version = 3;
 
 	if (argc > 1) {
diff --git a/connect.c b/connect.c
index 3280435331..04415d8eed 100644
--- a/connect.c
+++ b/connect.c
@@ -251,7 +251,7 @@ static void process_capabilities(struct packet_reader *reader, size_t *linelen)
 			reader->hash_algo = &hash_algos[hash_algo];
 		free(hash_name);
 	} else {
-		reader->hash_algo = &hash_algos[GIT_HASH_SHA1];
+		reader->hash_algo = &hash_algos[GIT_HASH_ORIGINAL];
 	}
 }
 
@@ -500,7 +500,7 @@ static void send_capabilities(int fd_out, struct packet_reader *reader)
 		reader->hash_algo = &hash_algos[hash_algo];
 		packet_write_fmt(fd_out, "object-format=%s", reader->hash_algo->name);
 	} else {
-		reader->hash_algo = &hash_algos[GIT_HASH_SHA1];
+		reader->hash_algo = &hash_algos[GIT_HASH_ORIGINAL];
 	}
 	if (server_feature_v2("promisor-remote", &promisor_remote_info)) {
 		char *reply = promisor_remote_reply(promisor_remote_info);
@@ -665,7 +665,7 @@ int server_supports_hash(const char *desired, int *feature_supported)
 	if (feature_supported)
 		*feature_supported = !!hash;
 	if (!hash) {
-		hash = hash_algos[GIT_HASH_SHA1].name;
+		hash = hash_algos[GIT_HASH_ORIGINAL].name;
 		len = strlen(hash);
 	}
 	while (hash) {
diff --git a/fetch-pack.c b/fetch-pack.c
index fa4231fee7..970de4f7dc 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1342,7 +1342,7 @@ static void write_fetch_command_and_capabilities(struct strbuf *req_buf,
 			die(_("mismatched algorithms: client %s; server %s"),
 			    the_hash_algo->name, hash_name);
 		packet_buf_write(req_buf, "object-format=%s", the_hash_algo->name);
-	} else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1) {
+	} else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_ORIGINAL) {
 		die(_("the server does not support algorithm '%s'"),
 		    the_hash_algo->name);
 	}
diff --git a/pkt-line.c b/pkt-line.c
index a5bcbc96fb..e7c18da5e0 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -617,7 +617,7 @@ void packet_reader_init(struct packet_reader *reader, int fd,
 	reader->buffer_size = sizeof(packet_buffer);
 	reader->options = options;
 	reader->me = "git";
-	reader->hash_algo = &hash_algos[GIT_HASH_SHA1];
+	reader->hash_algo = &hash_algos[GIT_HASH_ORIGINAL];
 	strbuf_init(&reader->scratch, 0);
 }
 
diff --git a/remote-curl.c b/remote-curl.c
index b8bc3a80cf..961ecd4655 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -285,7 +285,7 @@ static const struct git_hash_algo *detect_hash_algo(struct discovery *heads)
 	 * back to SHA1, which may or may not be correct.
 	 */
 	if (!p)
-		return &hash_algos[GIT_HASH_SHA1];
+		return &hash_algos[GIT_HASH_ORIGINAL];
 
 	algo = hash_algo_by_length((p - heads->buf) / 2);
 	if (algo == GIT_HASH_UNKNOWN)
diff --git a/serve.c b/serve.c
index e3ccf1505c..6a47170247 100644
--- a/serve.c
+++ b/serve.c
@@ -14,7 +14,7 @@
 
 static int advertise_sid = -1;
 static int advertise_object_info = -1;
-static int client_hash_algo = GIT_HASH_SHA1;
+static int client_hash_algo = GIT_HASH_ORIGINAL;
 
 static int always_advertise(struct repository *r UNUSED,
 			    struct strbuf *value UNUSED)
diff --git a/setup.c b/setup.c
index f93bd6a24a..641c857ed5 100644
--- a/setup.c
+++ b/setup.c
@@ -2222,11 +2222,11 @@ void initialize_repository_version(int hash_algo,
 	 * version will get adjusted by git-clone(1) once it has learned about
 	 * the remote repository's format.
 	 */
-	if (hash_algo != GIT_HASH_SHA1 ||
+	if (hash_algo != GIT_HASH_ORIGINAL ||
 	    ref_storage_format != REF_STORAGE_FORMAT_FILES)
 		target_version = GIT_REPO_VERSION_READ;
 
-	if (hash_algo != GIT_HASH_SHA1 && hash_algo != GIT_HASH_UNKNOWN)
+	if (hash_algo != GIT_HASH_ORIGINAL && hash_algo != GIT_HASH_UNKNOWN)
 		git_config_set("extensions.objectformat",
 			       hash_algos[hash_algo].name);
 	else if (reinit)
diff --git a/transport.c b/transport.c
index 6c2801bcbd..a019435918 100644
--- a/transport.c
+++ b/transport.c
@@ -1243,7 +1243,7 @@ struct transport *transport_get(struct remote *remote, const char *url)
 			ret->smart_options->receivepack = remote->receivepack;
 	}
 
-	ret->hash_algo = &hash_algos[GIT_HASH_SHA1];
+	ret->hash_algo = &hash_algos[GIT_HASH_ORIGINAL];
 
 	return ret;
 }

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 05/10] setup: use the default algorithm to initialize repo format
  2025-06-20  1:19 [PATCH 00/10] Add SHA-256 by default as a breaking change brian m. carlson
                   ` (3 preceding siblings ...)
  2025-06-20  1:19 ` [PATCH 04/10] Use original hash for legacy formats brian m. carlson
@ 2025-06-20  1:19 ` brian m. carlson
  2025-06-20 14:55   ` Junio C Hamano
  2025-06-20  1:19 ` [PATCH 06/10] t: default to compile-time default hash if not set brian m. carlson
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 47+ messages in thread
From: brian m. carlson @ 2025-06-20  1:19 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt

When we define a new repository format with REPOSITORY_FORMAT_INIT, we
always use GIT_HASH_SHA1, and this value ends up getting used as the
default value to initialize a repository if none of the command line,
environment, or config tell us to do otherwise.

Because we might not always want to use SHA-1 as the default, let's
instead specify the default hash algorithm constant so that we will use
whatever the specified default is.  However, to make sure we continue to
read repositories without a specified hash algorithm as SHA-1, default
the repository format to the original hash algorithm (SHA-1) when
reading the repository format.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 setup.c | 5 ++++-
 setup.h | 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/setup.c b/setup.c
index 641c857ed5..fb38824a2b 100644
--- a/setup.c
+++ b/setup.c
@@ -835,9 +835,12 @@ static void init_repository_format(struct repository_format *format)
 int read_repository_format(struct repository_format *format, const char *path)
 {
 	clear_repository_format(format);
+	format->hash_algo = GIT_HASH_ORIGINAL;
 	git_config_from_file(check_repo_format, path, format);
-	if (format->version == -1)
+	if (format->version == -1) {
 		clear_repository_format(format);
+		format->hash_algo = GIT_HASH_ORIGINAL;
+	}
 	return format->version;
 }
 
diff --git a/setup.h b/setup.h
index 18dc3b7368..8522fa8575 100644
--- a/setup.h
+++ b/setup.h
@@ -149,7 +149,7 @@ struct repository_format {
 { \
 	.version = -1, \
 	.is_bare = -1, \
-	.hash_algo = GIT_HASH_SHA1, \
+	.hash_algo = GIT_HASH_DEFAULT, \
 	.ref_storage_format = REF_STORAGE_FORMAT_FILES, \
 	.unknown_extensions = STRING_LIST_INIT_DUP, \
 	.v1_only_extensions = STRING_LIST_INIT_DUP, \

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 06/10] t: default to compile-time default hash if not set
  2025-06-20  1:19 [PATCH 00/10] Add SHA-256 by default as a breaking change brian m. carlson
                   ` (4 preceding siblings ...)
  2025-06-20  1:19 ` [PATCH 05/10] setup: use the default algorithm to initialize repo format brian m. carlson
@ 2025-06-20  1:19 ` brian m. carlson
  2025-06-20  1:19 ` [PATCH 07/10] t1007: choose the built-in hash outside of a repo brian m. carlson
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 47+ messages in thread
From: brian m. carlson @ 2025-06-20  1:19 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt

Right now, the default compile-time hash is SHA-1.  However, in the
future, this might change and it would be helpful to gracefully handle
this case in our testsuite.

To avoid making these assumptions, let's introduce a variable that
contains the built-in default hash and use it in our setup code as the
fallback value if no hash was explicitly set.  For now, this is always
SHA-1, but in a future commit, we'll allow adjusting this and the
variable will be more useful.

To allow us to make our tests more robust, allow test_oid to take the
--hash=builtin option to specify this hash, whatever it is.

Additionally, add a DEFAULT_HASH_ALGORITHM prerequisite to check for the
compile-time hash.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/test-lib-functions.sh | 5 ++++-
 t/test-lib.sh           | 7 ++++++-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index bee4a2ca34..6ec95ea51f 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1695,7 +1695,7 @@ test_set_hash () {
 
 # Detect the hash algorithm in use.
 test_detect_hash () {
-	case "$GIT_TEST_DEFAULT_HASH" in
+	case "${GIT_TEST_DEFAULT_HASH:-$GIT_TEST_BUILTIN_HASH}" in
 	"sha256")
 	    test_hash_algo=sha256
 	    test_compat_hash_algo=sha1
@@ -1767,6 +1767,9 @@ test_oid () {
 	--hash=compat)
 		algo="$test_compat_hash_algo" &&
 		shift;;
+	--hash=builtin)
+		algo="$GIT_TEST_BUILTIN_HASH" &&
+		shift;;
 	--hash=*)
 		algo="${1#--hash=}" &&
 		shift;;
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 51370a201c..ef3759ec80 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -536,7 +536,8 @@ export GIT_COMMITTER_EMAIL GIT_COMMITTER_NAME
 export GIT_COMMITTER_DATE GIT_AUTHOR_DATE
 export EDITOR
 
-GIT_DEFAULT_HASH="${GIT_TEST_DEFAULT_HASH:-sha1}"
+GIT_TEST_BUILTIN_HASH=sha1
+GIT_DEFAULT_HASH="${GIT_TEST_DEFAULT_HASH:-$GIT_TEST_BUILTIN_HASH}"
 export GIT_DEFAULT_HASH
 GIT_DEFAULT_REF_FORMAT="${GIT_TEST_DEFAULT_REF_FORMAT:-files}"
 export GIT_DEFAULT_REF_FORMAT
@@ -1908,6 +1909,10 @@ test_lazy_prereq SHA1 '
 	esac
 '
 
+test_lazy_prereq DEFAULT_HASH_ALGORITHM '
+	test "$GIT_TEST_BUILTIN_HASH" = "$GIT_DEFAULT_HASH"
+'
+
 test_lazy_prereq DEFAULT_REPO_FORMAT '
 	test_have_prereq SHA1,REFFILES
 '

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 07/10] t1007: choose the built-in hash outside of a repo
  2025-06-20  1:19 [PATCH 00/10] Add SHA-256 by default as a breaking change brian m. carlson
                   ` (5 preceding siblings ...)
  2025-06-20  1:19 ` [PATCH 06/10] t: default to compile-time default hash if not set brian m. carlson
@ 2025-06-20  1:19 ` brian m. carlson
  2025-06-20  1:19 ` [PATCH 08/10] t4042: " brian m. carlson
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 47+ messages in thread
From: brian m. carlson @ 2025-06-20  1:19 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt

Right now, the built-in default hash is always SHA-1, but that will
change in a future commit.  Instead of assuming that operating outside
of a repository will always use SHA-1, simply ask test_oid for the
built-in hash instead, which will always be correct.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t1007-hash-object.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index 64658b3ba5..de076293b6 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -252,9 +252,9 @@ test_expect_success '--literally complains about non-standard types' '
 	test_must_fail git hash-object -t bogus --literally --stdin
 '
 
-test_expect_success '--stdin outside of repository (uses SHA-1)' '
+test_expect_success '--stdin outside of repository (uses default hash)' '
 	nongit git hash-object --stdin <hello >actual &&
-	echo "$(test_oid --hash=sha1 hello)" >expect &&
+	echo "$(test_oid --hash=builtin hello)" >expect &&
 	test_cmp expect actual
 '
 

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 08/10] t4042: choose the built-in hash outside of a repo
  2025-06-20  1:19 [PATCH 00/10] Add SHA-256 by default as a breaking change brian m. carlson
                   ` (6 preceding siblings ...)
  2025-06-20  1:19 ` [PATCH 07/10] t1007: choose the built-in hash outside of a repo brian m. carlson
@ 2025-06-20  1:19 ` brian m. carlson
  2025-06-20  1:19 ` [PATCH 09/10] t5300: " brian m. carlson
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 47+ messages in thread
From: brian m. carlson @ 2025-06-20  1:19 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt

Right now, the built-in default hash is always SHA-1, but that will
change in a future commit.  Instead of assuming that operating outside
of a repository will always use SHA-1, provide constants for both
algorithms and then simply ask test_oid for the built-in hash instead,
which will always be correct.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t4042-diff-textconv-caching.sh | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/t/t4042-diff-textconv-caching.sh b/t/t4042-diff-textconv-caching.sh
index ff0e73531b..31018ceba2 100755
--- a/t/t4042-diff-textconv-caching.sh
+++ b/t/t4042-diff-textconv-caching.sh
@@ -120,6 +120,14 @@ test_expect_success 'log notes cache and still use cache for -p' '
 '
 
 test_expect_success 'caching is silently ignored outside repo' '
+	test_oid_cache <<-\EOM &&
+	oid1 sha1:5626abf
+	oid1 sha256:a4ed1f3
+	oid2 sha1:f719efd
+	oid2 sha256:aa9e7dc
+	EOM
+	oid1=$(test_oid --hash=builtin oid1) &&
+	oid2=$(test_oid --hash=builtin oid2) &&
 	mkdir -p non-repo &&
 	echo one >non-repo/one &&
 	echo two >non-repo/two &&
@@ -129,9 +137,9 @@ test_expect_success 'caching is silently ignored outside repo' '
 		   -c diff.test.textconv="tr a-z A-Z <" \
 		   -c diff.test.cachetextconv=true \
 		   diff --no-index one two >actual &&
-	cat >expect <<-\EOF &&
+	cat >expect <<-EOF &&
 	diff --git a/one b/two
-	index 5626abf..f719efd 100644
+	index $oid1..$oid2 100644
 	--- a/one
 	+++ b/two
 	@@ -1 +1 @@

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 09/10] t5300: choose the built-in hash outside of a repo
  2025-06-20  1:19 [PATCH 00/10] Add SHA-256 by default as a breaking change brian m. carlson
                   ` (7 preceding siblings ...)
  2025-06-20  1:19 ` [PATCH 08/10] t4042: " brian m. carlson
@ 2025-06-20  1:19 ` brian m. carlson
  2025-06-20  1:19 ` [PATCH 10/10] Enable SHA-256 by default in breaking changes mode brian m. carlson
  2025-07-01 21:22 ` [PATCH v2 00/11] Add SHA-256 by default as a breaking change brian m. carlson
  10 siblings, 0 replies; 47+ messages in thread
From: brian m. carlson @ 2025-06-20  1:19 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt

Right now, the built-in default hash is always SHA-1, but that will
change in a future commit.  Instead of assuming that operating outside
of a repository will always use SHA-1, look up the default hash
algorithm for operating outside of a repository using an appropriate
environment variable, which will always be correct.

Additionally, for operations outside of a repository, use the
DEFAULT_HASH_ALGORITHM prerequisite rather than SHA1.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5300-pack-object.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index ae72158b94..73445782e7 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -525,7 +525,7 @@ test_expect_success 'index-pack --strict <pack> works in non-repo' '
 	test_path_is_file foo.idx
 '
 
-test_expect_success SHA1 'show-index works OK outside a repository' '
+test_expect_success DEFAULT_HASH_ALGORITHM 'show-index works OK outside a repository' '
 	nongit git show-index <foo.idx
 '
 
@@ -658,7 +658,7 @@ do
 		test_commit -C repo initial &&
 		git -C repo repack -ad &&
 		git -C repo verify-pack "$(pwd)"/repo/.git/objects/pack/*.idx &&
-		if test $hash = sha1
+		if test $hash = $GIT_TEST_BUILTIN_HASH
 		then
 			nongit git verify-pack "$(pwd)"/repo/.git/objects/pack/*.idx
 		else
@@ -676,7 +676,7 @@ do
 		test_commit -C repo initial &&
 		git -C repo repack -ad &&
 		git -C repo index-pack --verify "$(pwd)"/repo/.git/objects/pack/*.pack &&
-		if test $hash = sha1
+		if test $hash = $GIT_TEST_BUILTIN_HASH
 		then
 			nongit git index-pack --verify "$(pwd)"/repo/.git/objects/pack/*.pack
 		else

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 10/10] Enable SHA-256 by default in breaking changes mode
  2025-06-20  1:19 [PATCH 00/10] Add SHA-256 by default as a breaking change brian m. carlson
                   ` (8 preceding siblings ...)
  2025-06-20  1:19 ` [PATCH 09/10] t5300: " brian m. carlson
@ 2025-06-20  1:19 ` brian m. carlson
  2025-06-20 14:58   ` Junio C Hamano
                     ` (2 more replies)
  2025-07-01 21:22 ` [PATCH v2 00/11] Add SHA-256 by default as a breaking change brian m. carlson
  10 siblings, 3 replies; 47+ messages in thread
From: brian m. carlson @ 2025-06-20  1:19 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt

Our document on breaking changes indicates that we intend to default to
SHA-256 in Git 3.0.  Since most people choose the default option, this
is an important security upgrade to our defaults.

To allow people to test this case, when WITH_BREAKING_CHANGES is set in
the configuration, build Git with SHA-256 as the default hash.  Update
the testsuite to reflect this configuration so that the tests pass.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 hash.h        | 6 ++++++
 t/test-lib.sh | 7 ++++++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/hash.h b/hash.h
index 0e14cade4e..144b53b7d6 100644
--- a/hash.h
+++ b/hash.h
@@ -174,8 +174,14 @@ static inline void git_SHA256_Clone(git_SHA256_CTX *dst, const git_SHA256_CTX *s
 #define GIT_HASH_SHA256 2
 /* Number of algorithms supported (including unknown). */
 #define GIT_HASH_NALGOS (GIT_HASH_SHA256 + 1)
+
 /* Default hash algorithm if unspecified. */
+#ifdef WITH_BREAKING_CHANGES
+#define GIT_HASH_DEFAULT GIT_HASH_SHA256
+#else
 #define GIT_HASH_DEFAULT GIT_HASH_SHA1
+#endif
+
 /* Original hash algorithm. Implied for older data formats which don't specify. */
 #define GIT_HASH_ORIGINAL GIT_HASH_SHA1
 
diff --git a/t/test-lib.sh b/t/test-lib.sh
index ef3759ec80..bb18dd0606 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -536,7 +536,12 @@ export GIT_COMMITTER_EMAIL GIT_COMMITTER_NAME
 export GIT_COMMITTER_DATE GIT_AUTHOR_DATE
 export EDITOR
 
-GIT_TEST_BUILTIN_HASH=sha1
+if test -n "$WITH_BREAKING_CHANGES"
+then
+	GIT_TEST_BUILTIN_HASH=sha256
+else
+	GIT_TEST_BUILTIN_HASH=sha1
+fi
 GIT_DEFAULT_HASH="${GIT_TEST_DEFAULT_HASH:-$GIT_TEST_BUILTIN_HASH}"
 export GIT_DEFAULT_HASH
 GIT_DEFAULT_REF_FORMAT="${GIT_TEST_DEFAULT_REF_FORMAT:-files}"

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 02/10] hash: add a constant for the original hash algorithm
  2025-06-20  1:19 ` [PATCH 02/10] hash: add a constant for the original " brian m. carlson
@ 2025-06-20  1:56   ` Junio C Hamano
  2025-06-20 20:43     ` brian m. carlson
  0 siblings, 1 reply; 47+ messages in thread
From: Junio C Hamano @ 2025-06-20  1:56 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Patrick Steinhardt

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> We have a a variety of uses of GIT_HASH_SHA1 littered throughout our
> code.  Some of these really mean to represent specifically SHA-1, but
> some actually represent the original hash algorithm used in Git which is
> implied by older formats and protocols which do not contain hash
> information.  For instance, the bundle v1 and v2 formats do not contain
> hash algorithm information, and thus SHA-1 is implied by the use of
> these formats.

Does that mean use of _ORIGINAL is a sign that these places should
keep using SHA-1 and should not change?

I am having a hard time guessing/assessing the value of having _ORIGINAL
that is a synonym for _SHA1; with redirection, it pretends as if the
underlying value can be updated from SHA-1 to SHA-256 (and that is
the very intention behind GIT_HASH_DEFAULT symbol that gives us a
level of indirection), but it is hard to imagine we would ever want
to change what _ORIGINAL means, as that word talks about a historical
fact that will never change over time.

> Add a constant for documentary purposes which indicates this value.  It
> will always be the same as SHA-1, since this is an essential part of
> these formats, but its use indicates this particular reason and not any
> other reason why SHA-1 might be used.

I am not sure what this means.  If we use GIT_HASH_SHA1 in such
places explicitly (as opposed to GIT_HASH_DEFAULT), isn't it a sign
enough that with different versions of Git, that particular code
path should keep using SHA-1 no matter what the default is?

> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  hash.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/hash.h b/hash.h
> index 0d3d85e04c..0e14cade4e 100644
> --- a/hash.h
> +++ b/hash.h
> @@ -176,6 +176,8 @@ static inline void git_SHA256_Clone(git_SHA256_CTX *dst, const git_SHA256_CTX *s
>  #define GIT_HASH_NALGOS (GIT_HASH_SHA256 + 1)
>  /* Default hash algorithm if unspecified. */
>  #define GIT_HASH_DEFAULT GIT_HASH_SHA1
> +/* Original hash algorithm. Implied for older data formats which don't specify. */
> +#define GIT_HASH_ORIGINAL GIT_HASH_SHA1
>  
>  /* "sha1", big-endian */
>  #define GIT_SHA1_FORMAT_ID 0x73686131

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/10] builtin: use default hash when outside a repository
  2025-06-20  1:19 ` [PATCH 03/10] builtin: use default hash when outside a repository brian m. carlson
@ 2025-06-20 14:19   ` Junio C Hamano
  2025-07-01 11:35   ` Patrick Steinhardt
  1 sibling, 0 replies; 47+ messages in thread
From: Junio C Hamano @ 2025-06-20 14:19 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Patrick Steinhardt

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> We have some commands that can operate inside or outside a repository.
> If we're operating outside a repository, we clearly cannot use the
> repository's hash algorithm as a default since it doesn't exist, so
> instead, let's pick the default instead of specifically SHA-1.  Right
> now this results in no functional change since the default is SHA-1, but
> that may change in the future.

Nicely explained and this step shows why having GIT_HASH_DEFAULT
makes sense very well.

>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  builtin/apply.c       | 2 +-
>  builtin/diff.c        | 2 +-
>  builtin/hash-object.c | 2 +-
>  builtin/index-pack.c  | 2 +-
>  builtin/ls-remote.c   | 2 +-
>  builtin/patch-id.c    | 2 +-
>  builtin/shortlog.c    | 2 +-
>  builtin/show-index.c  | 2 +-
>  8 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/builtin/apply.c b/builtin/apply.c
> index a1e20c593d..d642a40251 100644
> --- a/builtin/apply.c
> +++ b/builtin/apply.c
> @@ -29,7 +29,7 @@ int cmd_apply(int argc,
>  	 * cf. https://lore.kernel.org/git/xmqqcypfcmn4.fsf@gitster.g/
>  	 */
>  	if (!the_hash_algo)
> -		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> +		repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
>  
>  	argc = apply_parse_options(argc, argv,
>  				   &state, &force_apply, &options,
> diff --git a/builtin/diff.c b/builtin/diff.c
> index c6231edce4..eebffe36cc 100644
> --- a/builtin/diff.c
> +++ b/builtin/diff.c
> @@ -483,7 +483,7 @@ int cmd_diff(int argc,
>  	 * configurable via a command line option.
>  	 */
>  	if (nongit)
> -		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> +		repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
>  
>  	init_diff_ui_defaults();
>  	git_config(git_diff_ui_config, NULL);
> diff --git a/builtin/hash-object.c b/builtin/hash-object.c
> index 6a99ec250d..213a302e05 100644
> --- a/builtin/hash-object.c
> +++ b/builtin/hash-object.c
> @@ -104,7 +104,7 @@ int cmd_hash_object(int argc,
>  		prefix = setup_git_directory_gently(&nongit);
>  
>  	if (nongit && !the_hash_algo)
> -		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> +		repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
>  
>  	if (vpath && prefix) {
>  		vpath_free = prefix_filename(prefix, vpath);
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index bb7925bd29..352ce7f88a 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -2034,7 +2034,7 @@ int cmd_index_pack(int argc,
>  	 * choice but to guess the object hash.
>  	 */
>  	if (!the_repository->hash_algo)
> -		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> +		repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
>  
>  	opts.flags &= ~(WRITE_REV | WRITE_REV_VERIFY);
>  	if (rev_index) {
> diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
> index 01a4d4daa1..df09000b30 100644
> --- a/builtin/ls-remote.c
> +++ b/builtin/ls-remote.c
> @@ -112,7 +112,7 @@ int cmd_ls_remote(int argc,
>  	 * depending on what object hash the remote uses.
>  	 */
>  	if (!the_repository->hash_algo)
> -		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> +		repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
>  
>  	packet_trace_identity("ls-remote");
>  
> diff --git a/builtin/patch-id.c b/builtin/patch-id.c
> index cdef2ec10a..26f04b0335 100644
> --- a/builtin/patch-id.c
> +++ b/builtin/patch-id.c
> @@ -254,7 +254,7 @@ int cmd_patch_id(int argc,
>  	 * the code that computes patch IDs to always use SHA1.
>  	 */
>  	if (!the_hash_algo)
> -		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> +		repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
>  
>  	generate_id_list(opts ? opts > 1 : config.stable,
>  			 opts ? opts == 3 : config.verbatim);
> diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> index fe15e11497..60adc5e7a5 100644
> --- a/builtin/shortlog.c
> +++ b/builtin/shortlog.c
> @@ -419,7 +419,7 @@ int cmd_shortlog(int argc,
>  	 * git/nongit so that we do not have to do this.
>  	 */
>  	if (nongit && !the_hash_algo)
> -		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> +		repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
>  
>  	git_config(git_default_config, NULL);
>  	shortlog_init(&log);
> diff --git a/builtin/show-index.c b/builtin/show-index.c
> index 9d4ecf5e7b..2c3e2940ce 100644
> --- a/builtin/show-index.c
> +++ b/builtin/show-index.c
> @@ -47,7 +47,7 @@ int cmd_show_index(int argc,
>  	 *       the index file passed in and use that instead.
>  	 */
>  	if (!the_hash_algo)
> -		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> +		repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
>  
>  	hashsz = the_hash_algo->rawsz;
>  

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/10] Use original hash for legacy formats
  2025-06-20  1:19 ` [PATCH 04/10] Use original hash for legacy formats brian m. carlson
@ 2025-06-20 14:26   ` Junio C Hamano
  2025-06-20 20:51     ` brian m. carlson
  0 siblings, 1 reply; 47+ messages in thread
From: Junio C Hamano @ 2025-06-20 14:26 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Patrick Steinhardt

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> We have a large variety of data formats and protocols where no hash
> algorithm was defined and the default was assumed to always be SHA-1.
> Instead of explicitly stating SHA-1, let's use the constant to represent
> the original hash algorithm (which is still SHA-1) so that it's clear
> for documentary purposes that it's a legacy fallback option and not an
> intentional choice to use SHA-1.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  builtin/receive-pack.c | 2 +-
>  bundle.c               | 4 ++--
>  connect.c              | 6 +++---
>  fetch-pack.c           | 2 +-
>  pkt-line.c             | 2 +-
>  remote-curl.c          | 2 +-
>  serve.c                | 2 +-
>  setup.c                | 4 ++--
>  transport.c            | 2 +-
>  9 files changed, 13 insertions(+), 13 deletions(-)

I earlier expressed my puzzlement, but this step shows how
GIT_HASH_ORIGINAL may make sense as a transitional measure, letting
us tell between "This place in the code uses GIT_HASH_SHA1 simply
because we haven't examined and inspected it for the purpose of
allowing us to eventually switch the default" and "This place in the
code we determined need to keep using SHA1 even when the default is
different".

If that is what is going on, after the whole-code transition
finishes, we would want to rename _ORIGINAL back to _SHA1 for
readability, as in such a future, developers should not have to
remember that we originally used SHA-1.

If we call use a name with SHA-1 in it (e.g., GIT_HASH_MUST_BE_SHA1)
from the beginning, perhaps we do not have to rename _ORIGINAL later?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/10] setup: use the default algorithm to initialize repo format
  2025-06-20  1:19 ` [PATCH 05/10] setup: use the default algorithm to initialize repo format brian m. carlson
@ 2025-06-20 14:55   ` Junio C Hamano
  2025-06-20 20:28     ` brian m. carlson
  0 siblings, 1 reply; 47+ messages in thread
From: Junio C Hamano @ 2025-06-20 14:55 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Patrick Steinhardt

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> When we define a new repository format with REPOSITORY_FORMAT_INIT, we
> always use GIT_HASH_SHA1, and this value ends up getting used as the
> default value to initialize a repository if none of the command line,
> environment, or config tell us to do otherwise.
>
> Because we might not always want to use SHA-1 as the default, let's
> instead specify the default hash algorithm constant so that we will use
> whatever the specified default is.  

All of the above hints the use of _DEFAULT instead of _SHA1 but ...

> However, to make sure we continue to
> read repositories without a specified hash algorithm as SHA-1, default
> the repository format to the original hash algorithm (SHA-1) when
> reading the repository format.

... this explains why we may want to

 - expect that we would be able to read the hash from repository
 - read from repository
 - if the repository specifies the hash, happily use it
 - otherwise it is a lagacy repository so use the SHA-1

Is that what is going on here?  Because I find some things that
happens in the code somewhat questionable.

> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  setup.c | 5 ++++-
>  setup.h | 2 +-
>  2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/setup.c b/setup.c
> index 641c857ed5..fb38824a2b 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -835,9 +835,12 @@ static void init_repository_format(struct repository_format *format)
>  int read_repository_format(struct repository_format *format, const char *path)
>  {
>  	clear_repository_format(format);
> +	format->hash_algo = GIT_HASH_ORIGINAL;

If we expect we can read from the config, and otherwise we fall back
to the hardcoded legacy SHA-1, do we need this assignment?  We
cleared and version is set to -1, and then we read from the config ...

>  	git_config_from_file(check_repo_format, path, format);

... so if the file said anything about "extensions.objectformat", we
would know it by now.  If not, wouldn't version be left to -1 as our
previous clal to clear_repository_format() set it via its call to
init_repository_format()?

Ahh, this code is prepared to handle a repository that claims to use
version 1 but does not set extensions.objectformat at all.  And in
such a case, we do want to use SHA-1.  OK, the above code makes
sense for that case.

> -	if (format->version == -1)
> +	if (format->version == -1) {

And if there is no core.repositoryformatversion set, we will come
here.  According to the comment before handle_extension_v0(), some
extensions.* should still be honored even in such a repository, and
the above call to git_config_from_file() should have handled them
just fine.

However, I do not understand why we clear all of what we read with
another call to clear_repository_format() here.

>  		clear_repository_format(format);
> +		format->hash_algo = GIT_HASH_ORIGINAL;
> +	}
>  	return format->version;
>  }

Admittedly, I find that the way how check_repo_format() does its
thing somewhat questionable.  Even though it reads into the .version
member the value of core.repositoryformatversion, it slurps in all
the extensions.* regardless of what .version the repository claims
to be in.  So even though there are two separate functions to handle
"historical compatibility" handle_extension_v0() and other extensions,
we still end up honoring extensions.objectformat in a repository that
does not say what format version it uses.  And clearing them with a
call to clear_repository_format() may make sense, but do we want to
clear things we read with handle_extension_v0() as well?

> diff --git a/setup.h b/setup.h
> index 18dc3b7368..8522fa8575 100644
> --- a/setup.h
> +++ b/setup.h
> @@ -149,7 +149,7 @@ struct repository_format {
>  { \
>  	.version = -1, \
>  	.is_bare = -1, \
> -	.hash_algo = GIT_HASH_SHA1, \
> +	.hash_algo = GIT_HASH_DEFAULT, \
>  	.ref_storage_format = REF_STORAGE_FORMAT_FILES, \
>  	.unknown_extensions = STRING_LIST_INIT_DUP, \
>  	.v1_only_extensions = STRING_LIST_INIT_DUP, \

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 10/10] Enable SHA-256 by default in breaking changes mode
  2025-06-20  1:19 ` [PATCH 10/10] Enable SHA-256 by default in breaking changes mode brian m. carlson
@ 2025-06-20 14:58   ` Junio C Hamano
  2025-06-20 19:18     ` brian m. carlson
  2025-06-20 15:03   ` Junio C Hamano
  2025-07-01 11:35   ` Patrick Steinhardt
  2 siblings, 1 reply; 47+ messages in thread
From: Junio C Hamano @ 2025-06-20 14:58 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Patrick Steinhardt

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> Our document on breaking changes indicates that we intend to default to
> SHA-256 in Git 3.0.  Since most people choose the default option, this
> is an important security upgrade to our defaults.
>
> To allow people to test this case, when WITH_BREAKING_CHANGES is set in
> the configuration, build Git with SHA-256 as the default hash.  Update
> the testsuite to reflect this configuration so that the tests pass.

Nice.


> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  hash.h        | 6 ++++++
>  t/test-lib.sh | 7 ++++++-
>  2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/hash.h b/hash.h
> index 0e14cade4e..144b53b7d6 100644
> --- a/hash.h
> +++ b/hash.h
> @@ -174,8 +174,14 @@ static inline void git_SHA256_Clone(git_SHA256_CTX *dst, const git_SHA256_CTX *s
>  #define GIT_HASH_SHA256 2
>  /* Number of algorithms supported (including unknown). */
>  #define GIT_HASH_NALGOS (GIT_HASH_SHA256 + 1)
> +
>  /* Default hash algorithm if unspecified. */
> +#ifdef WITH_BREAKING_CHANGES
> +#define GIT_HASH_DEFAULT GIT_HASH_SHA256
> +#else
>  #define GIT_HASH_DEFAULT GIT_HASH_SHA1
> +#endif

I think we decided to format the above this way.

    #ifdef WITH_BREAKING_CHANGES
    # define GIT_HASH_DEFAULT GIT_HASH_SHA256
    #else
    # define GIT_HASH_DEFAULT GIT_HASH_SHA1
    #endif

cf. Documentation/CodingGuidelines

 - Nested C preprocessor directives are indented after the hash by one
   space per nesting level.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 10/10] Enable SHA-256 by default in breaking changes mode
  2025-06-20  1:19 ` [PATCH 10/10] Enable SHA-256 by default in breaking changes mode brian m. carlson
  2025-06-20 14:58   ` Junio C Hamano
@ 2025-06-20 15:03   ` Junio C Hamano
  2025-06-20 19:15     ` brian m. carlson
  2025-07-01 11:35   ` Patrick Steinhardt
  2 siblings, 1 reply; 47+ messages in thread
From: Junio C Hamano @ 2025-06-20 15:03 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Patrick Steinhardt

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> Our document on breaking changes indicates that we intend to default to
> SHA-256 in Git 3.0.  Since most people choose the default option, this
> is an important security upgrade to our defaults.
>
> To allow people to test this case, when WITH_BREAKING_CHANGES is set in
> the configuration, build Git with SHA-256 as the default hash.  Update
> the testsuite to reflect this configuration so that the tests pass.

Another thing that I suspect nobody wrote tests for, but we must be
absolutely certain, is that the post-3.0 Git can still interoperate
well with historical SHA-1 repositories (I am not talking about
"fetch from SHA-1 into SHA-256", but "the binary does not lose
ability to work in SHA-1 repositories or fetch/push between SHA-1
repositories, only because the default is set to SHA-256"), even in
old repositories people have been using for ages without the
core.repositoryformatversion defined.

Thanks.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 10/10] Enable SHA-256 by default in breaking changes mode
  2025-06-20 15:03   ` Junio C Hamano
@ 2025-06-20 19:15     ` brian m. carlson
  2025-06-20 20:42       ` Junio C Hamano
  0 siblings, 1 reply; 47+ messages in thread
From: brian m. carlson @ 2025-06-20 19:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Patrick Steinhardt

[-- Attachment #1: Type: text/plain, Size: 2086 bytes --]

On 2025-06-20 at 15:03:23, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > Our document on breaking changes indicates that we intend to default to
> > SHA-256 in Git 3.0.  Since most people choose the default option, this
> > is an important security upgrade to our defaults.
> >
> > To allow people to test this case, when WITH_BREAKING_CHANGES is set in
> > the configuration, build Git with SHA-256 as the default hash.  Update
> > the testsuite to reflect this configuration so that the tests pass.
> 
> Another thing that I suspect nobody wrote tests for, but we must be
> absolutely certain, is that the post-3.0 Git can still interoperate
> well with historical SHA-1 repositories (I am not talking about
> "fetch from SHA-1 into SHA-256", but "the binary does not lose
> ability to work in SHA-1 repositories or fetch/push between SHA-1
> repositories, only because the default is set to SHA-256"), even in
> old repositories people have been using for ages without the
> core.repositoryformatversion defined.

Yes, I have definitely tested that here before sending it out.  When Git
3.0 comes out, we can switch our GIT_TEST_DEFAULT_HASH test from sha256
to sha1 to continue to verify that those work.  As I learned when
writing the SHA-256 functionality and as I'm experiencing today writing
the interop code, if clones, fetches, and pushes do not work properly,
the testsuite is completely broken with at the very least fifty-some-odd
tests failing, so I feel confident that functionality will continue to
work for SHA-1 as long as we do run an appropriate test job.

Also, when we initialize a SHA-1 repository with the files ref backend,
we still use repository format version 0 without any extensions, so the
cases that cover older-style configs will still be adequately tested. We
also have some tests that even test that things work properly without a
config file, which caught a bug in this series (that I fixed before
sending it out).
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 10/10] Enable SHA-256 by default in breaking changes mode
  2025-06-20 14:58   ` Junio C Hamano
@ 2025-06-20 19:18     ` brian m. carlson
  0 siblings, 0 replies; 47+ messages in thread
From: brian m. carlson @ 2025-06-20 19:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Patrick Steinhardt

[-- Attachment #1: Type: text/plain, Size: 1196 bytes --]

On 2025-06-20 at 14:58:07, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> > ---
> >  hash.h        | 6 ++++++
> >  t/test-lib.sh | 7 ++++++-
> >  2 files changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/hash.h b/hash.h
> > index 0e14cade4e..144b53b7d6 100644
> > --- a/hash.h
> > +++ b/hash.h
> > @@ -174,8 +174,14 @@ static inline void git_SHA256_Clone(git_SHA256_CTX *dst, const git_SHA256_CTX *s
> >  #define GIT_HASH_SHA256 2
> >  /* Number of algorithms supported (including unknown). */
> >  #define GIT_HASH_NALGOS (GIT_HASH_SHA256 + 1)
> > +
> >  /* Default hash algorithm if unspecified. */
> > +#ifdef WITH_BREAKING_CHANGES
> > +#define GIT_HASH_DEFAULT GIT_HASH_SHA256
> > +#else
> >  #define GIT_HASH_DEFAULT GIT_HASH_SHA1
> > +#endif
> 
> I think we decided to format the above this way.
> 
>     #ifdef WITH_BREAKING_CHANGES
>     # define GIT_HASH_DEFAULT GIT_HASH_SHA256
>     #else
>     # define GIT_HASH_DEFAULT GIT_HASH_SHA1
>     #endif

Great, I'll put this in a v2.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/10] setup: use the default algorithm to initialize repo format
  2025-06-20 14:55   ` Junio C Hamano
@ 2025-06-20 20:28     ` brian m. carlson
  2025-06-20 21:05       ` Junio C Hamano
  0 siblings, 1 reply; 47+ messages in thread
From: brian m. carlson @ 2025-06-20 20:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Patrick Steinhardt

[-- Attachment #1: Type: text/plain, Size: 5790 bytes --]

On 2025-06-20 at 14:55:12, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > When we define a new repository format with REPOSITORY_FORMAT_INIT, we
> > always use GIT_HASH_SHA1, and this value ends up getting used as the
> > default value to initialize a repository if none of the command line,
> > environment, or config tell us to do otherwise.
> >
> > Because we might not always want to use SHA-1 as the default, let's
> > instead specify the default hash algorithm constant so that we will use
> > whatever the specified default is.  
> 
> All of the above hints the use of _DEFAULT instead of _SHA1 but ...
> 
> > However, to make sure we continue to
> > read repositories without a specified hash algorithm as SHA-1, default
> > the repository format to the original hash algorithm (SHA-1) when
> > reading the repository format.
> 
> ... this explains why we may want to
> 
>  - expect that we would be able to read the hash from repository
>  - read from repository
>  - if the repository specifies the hash, happily use it
>  - otherwise it is a lagacy repository so use the SHA-1
> 
> Is that what is going on here?  Because I find some things that
> happens in the code somewhat questionable.

Yes, that's roughly it.  I'll explain this better in v2.

> > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> > ---
> >  setup.c | 5 ++++-
> >  setup.h | 2 +-
> >  2 files changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/setup.c b/setup.c
> > index 641c857ed5..fb38824a2b 100644
> > --- a/setup.c
> > +++ b/setup.c
> > @@ -835,9 +835,12 @@ static void init_repository_format(struct repository_format *format)
> >  int read_repository_format(struct repository_format *format, const char *path)
> >  {
> >  	clear_repository_format(format);
> > +	format->hash_algo = GIT_HASH_ORIGINAL;
> 
> If we expect we can read from the config, and otherwise we fall back
> to the hardcoded legacy SHA-1, do we need this assignment?  We
> cleared and version is set to -1, and then we read from the config ...
> 
> >  	git_config_from_file(check_repo_format, path, format);
> 
> ... so if the file said anything about "extensions.objectformat", we
> would know it by now.  If not, wouldn't version be left to -1 as our
> previous clal to clear_repository_format() set it via its call to
> init_repository_format()?

The subtle behaviour here is that -1 means that either there is no
version specified or that there's no config file.  I was surprised to
learn that we do not require a configuration file, but we have tests for
that case with `git branch`.

> Ahh, this code is prepared to handle a repository that claims to use
> version 1 but does not set extensions.objectformat at all.  And in
> such a case, we do want to use SHA-1.  OK, the above code makes
> sense for that case.

Correct.  We can set v1 because we want reftable, for instance, and we
never set extensions.objectFormat to "sha1".  We always rely on the
default behaviour since that's more compatible.

> > -	if (format->version == -1)
> > +	if (format->version == -1) {
> 
> And if there is no core.repositoryformatversion set, we will come
> here.  According to the comment before handle_extension_v0(), some
> extensions.* should still be honored even in such a repository, and
> the above call to git_config_from_file() should have handled them
> just fine.
> 
> However, I do not understand why we clear all of what we read with
> another call to clear_repository_format() here.

Because this is the case where there's no config file.  If nobody
bothered to write a configuration file, then we want to reset everything
to the default.

I don't know what we do if we have a repository with a config file and
no version, but literally every repository since Git 0.99.3 (I believe)
has core.repositoryformatversion written into the repo.  I'm certain
that the behaviour we'd want if nobody specified one was to do the most
compatible thing, so the defaults seem prudent.

> Admittedly, I find that the way how check_repo_format() does its
> thing somewhat questionable.  Even though it reads into the .version
> member the value of core.repositoryformatversion, it slurps in all
> the extensions.* regardless of what .version the repository claims
> to be in.  So even though there are two separate functions to handle
> "historical compatibility" handle_extension_v0() and other extensions,
> we still end up honoring extensions.objectformat in a repository that
> does not say what format version it uses.  And clearing them with a
> call to clear_repository_format() may make sense, but do we want to
> clear things we read with handle_extension_v0() as well?

No, it looks like that at first, but I don't think that's correct.
`verify_repository_format` complains if we have any v1-only extensions
(such as extensions.objectFormat) and we have version 0.  That's also
where we check whether there are unknown extensions in v1, since we must
refuse to read the repository in that case.  There are tests for this
case: I added some that v0 with extensions.objectFormat is rejected when
I did the SHA-256 work, and I think Peff and Patrick may have done some
additional cleanup here (and maybe others; my apologies if I've
forgotten anyone).

The reason we need to read all the extensions is that different config
options aren't ordered and the config callback processes items we see in
the order they're in in the config file.  We might first have an
extensions block and only then a core block, so when reading the
extensions we don't know yet what the repository version is.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 10/10] Enable SHA-256 by default in breaking changes mode
  2025-06-20 19:15     ` brian m. carlson
@ 2025-06-20 20:42       ` Junio C Hamano
  2025-06-20 21:06         ` brian m. carlson
  0 siblings, 1 reply; 47+ messages in thread
From: Junio C Hamano @ 2025-06-20 20:42 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Patrick Steinhardt

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2025-06-20 at 15:03:23, Junio C Hamano wrote:
>> Another thing that I suspect nobody wrote tests for, but we must be
>> absolutely certain, is that the post-3.0 Git can still interoperate
>> well with historical SHA-1 repositories (I am not talking about
>> "fetch from SHA-1 into SHA-256", but "the binary does not lose
>> ability to work in SHA-1 repositories or fetch/push between SHA-1
>> repositories, only because the default is set to SHA-256"), even in
>> old repositories people have been using for ages without the
>> core.repositoryformatversion defined.
>
> Yes, I have definitely tested that here before sending it out.

Is there a single t/tXXXX-*.sh test that is dedicated to that
interoperability, or is it spread across commands (like,
t????-clone-*.sh has a test that explicitly prepares an SHA-1 and an
SHA-256 repositories and then tries to clone them with the current
binary to make sure the result look reasonable, and t????-push-*.sh
has a test to push between a pair of SHA-1 repositories, and a pair
of SHA-256 repositories, with the current binary)?

> When Git
> 3.0 comes out, we can switch our GIT_TEST_DEFAULT_HASH test from sha256
> to sha1 to continue to verify that those work.  As I learned when
> writing the SHA-256 functionality and as I'm experiencing today writing
> the interop code, if clones, fetches, and pushes do not work properly,
> the testsuite is completely broken with at the very least fifty-some-odd
> tests failing, so I feel confident that functionality will continue to
> work for SHA-1 as long as we do run an appropriate test job.

OK.

> Also, when we initialize a SHA-1 repository with the files ref backend,
> we still use repository format version 0 without any extensions, so the
> cases that cover older-style configs will still be adequately tested. We
> also have some tests that even test that things work properly without a
> config file, which caught a bug in this series (that I fixed before
> sending it out).

Very nice.

Thanks.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 02/10] hash: add a constant for the original hash algorithm
  2025-06-20  1:56   ` Junio C Hamano
@ 2025-06-20 20:43     ` brian m. carlson
  2025-07-01 11:35       ` Patrick Steinhardt
  0 siblings, 1 reply; 47+ messages in thread
From: brian m. carlson @ 2025-06-20 20:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Patrick Steinhardt

[-- Attachment #1: Type: text/plain, Size: 3342 bytes --]

On 2025-06-20 at 01:56:02, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > We have a a variety of uses of GIT_HASH_SHA1 littered throughout our
> > code.  Some of these really mean to represent specifically SHA-1, but
> > some actually represent the original hash algorithm used in Git which is
> > implied by older formats and protocols which do not contain hash
> > information.  For instance, the bundle v1 and v2 formats do not contain
> > hash algorithm information, and thus SHA-1 is implied by the use of
> > these formats.
> 
> Does that mean use of _ORIGINAL is a sign that these places should
> keep using SHA-1 and should not change?

Yes.

> I am having a hard time guessing/assessing the value of having _ORIGINAL
> that is a synonym for _SHA1; with redirection, it pretends as if the
> underlying value can be updated from SHA-1 to SHA-256 (and that is
> the very intention behind GIT_HASH_DEFAULT symbol that gives us a
> level of indirection), but it is hard to imagine we would ever want
> to change what _ORIGINAL means, as that word talks about a historical
> fact that will never change over time.

I agree.  _ORIGINAL indicates that this is a use of SHA-1 which is a
historical fact and is a legacy decision as opposed to one specified
explicitly.

For instance, if we're setting the algorithm for bundle v1 and v2, then
we'd use _ORIGINAL because those formats did not specify a hash value
when they were designed and, for legacy reasons, we cannot change that
fact.  However, if with bundle v3, a user specified @object-format=sha1,
then we'd use _SHA1, since that was an explicit decision documented.
Similarly, _SHA1 represents extensions.objectFormat=sha1, which is an
intentional decision to use the older algorithm.

> > Add a constant for documentary purposes which indicates this value.  It
> > will always be the same as SHA-1, since this is an essential part of
> > these formats, but its use indicates this particular reason and not any
> > other reason why SHA-1 might be used.
> 
> I am not sure what this means.  If we use GIT_HASH_SHA1 in such
> places explicitly (as opposed to GIT_HASH_DEFAULT), isn't it a sign
> enough that with different versions of Git, that particular code
> path should keep using SHA-1 no matter what the default is?

If we have a test helper that computes hashes and someone specified
"sha1" on the command line, that's GIT_HASH_SHA1.  Someone said, "I'd
like to use SHA-1."  Similarly, in the reftable code, we can read the
byte value indicating that the reftable is in SHA-1 and that's an
explicit decision.

If we default to SHA-1 because nobody specified extensions.objectformat,
then that's GIT_HASH_ORIGINAL.  Nobody made a decision or opted into an
algorithm; we just didn't think hard enough about cryptographic agility
in the original Git and we assumed SHA-1.

They're both the same numeric constant here and always will be (even if,
in a future version of Git, we get rid of SHA-1 altogether and we
otherwise die on that code).  But there's a difference in intention: one
explicitly stated SHA-1 as opposed to a different algorithm and one just
got a default because that's the compatible legacy behaviour.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/10] Use original hash for legacy formats
  2025-06-20 14:26   ` Junio C Hamano
@ 2025-06-20 20:51     ` brian m. carlson
  2025-06-20 21:14       ` Junio C Hamano
  0 siblings, 1 reply; 47+ messages in thread
From: brian m. carlson @ 2025-06-20 20:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Patrick Steinhardt

[-- Attachment #1: Type: text/plain, Size: 704 bytes --]

On 2025-06-20 at 14:26:37, Junio C Hamano wrote:
> If we call use a name with SHA-1 in it (e.g., GIT_HASH_MUST_BE_SHA1)
> from the beginning, perhaps we do not have to rename _ORIGINAL later?

We could call it GIT_HASH_LEGACY_SHA1 if you prefer that.  I originally
considered something like GIT_HASH_GOOD_OLD_REV (GOOD_OLD_REV comes from
ext2's much more rigid and less extendable v0 rather than its newer v1
format), but I felt like that would be too esoteric and not document
things well enough.

I'm also open to other ideas for naming if someone has them.  After all,
naming things is one of the hard problems in computer science.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/10] setup: use the default algorithm to initialize repo format
  2025-06-20 20:28     ` brian m. carlson
@ 2025-06-20 21:05       ` Junio C Hamano
  0 siblings, 0 replies; 47+ messages in thread
From: Junio C Hamano @ 2025-06-20 21:05 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Patrick Steinhardt

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

>> > -	if (format->version == -1)
>> > +	if (format->version == -1) {
>> 
>> And if there is no core.repositoryformatversion set, we will come
>> here.  According to the comment before handle_extension_v0(), some
>> extensions.* should still be honored even in such a repository, and
>> the above call to git_config_from_file() should have handled them
>> just fine.
>> 
>> However, I do not understand why we clear all of what we read with
>> another call to clear_repository_format() here.
>
> Because this is the case where there's no config file.  

But my worries come from that .version == -1 does not necessarily
mean a missing config.  Missing config will give .version == -1 but
the opposite may not be true, no?

> If nobody
> bothered to write a configuration file, then we want to reset everything
> to the default.

True.  If config file is missing, yes, .version will be -1 and
clearing may make sense.  But if the file is missing, we wouldn't
have anything to "reset to the default" because we wouldn't have
read anything, so what clear_repository_format() call initialized to
the default before we read config from file would still be there,
no?

> I don't know what we do if we have a repository with a config file and
> no version, but literally every repository since Git 0.99.3 (I believe)
> has core.repositoryformatversion written into the repo.  I'm certain
> that the behaviour we'd want if nobody specified one was to do the most
> compatible thing, so the defaults seem prudent.

If our assumption is that no config file in repositories we care
about should lack core.repositoryformatversion, then what you wrote
above makes perfect sense, but then we should probably update the
comment before the handle_extension_v0() because it is stale.  The
new semantics is that any extension.* found in a config file that
lacks core.repositoryformatversion will be ignored with the new
code, right?  If that is our intention, it should be documented.

The current code will not clear, so there is a change in behaviour.
I do not know if we particularly care about this behaviour change,
though.

> The reason we need to read all the extensions is that different config
> options aren't ordered ...

Yes, that is where my "somewhat questionable" comes from.  We'd have
to read everything in and then refrain from touching a repository
with extensions that should not exist (i.e., the ones we do not
understand, or the ones that should not be used with the stated
format version) in verify_repository_format() as you said.

Thanks.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 10/10] Enable SHA-256 by default in breaking changes mode
  2025-06-20 20:42       ` Junio C Hamano
@ 2025-06-20 21:06         ` brian m. carlson
  0 siblings, 0 replies; 47+ messages in thread
From: brian m. carlson @ 2025-06-20 21:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Patrick Steinhardt

[-- Attachment #1: Type: text/plain, Size: 3137 bytes --]

On 2025-06-20 at 20:42:52, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > On 2025-06-20 at 15:03:23, Junio C Hamano wrote:
> >> Another thing that I suspect nobody wrote tests for, but we must be
> >> absolutely certain, is that the post-3.0 Git can still interoperate
> >> well with historical SHA-1 repositories (I am not talking about
> >> "fetch from SHA-1 into SHA-256", but "the binary does not lose
> >> ability to work in SHA-1 repositories or fetch/push between SHA-1
> >> repositories, only because the default is set to SHA-256"), even in
> >> old repositories people have been using for ages without the
> >> core.repositoryformatversion defined.
> >
> > Yes, I have definitely tested that here before sending it out.
> 
> Is there a single t/tXXXX-*.sh test that is dedicated to that
> interoperability, or is it spread across commands (like,
> t????-clone-*.sh has a test that explicitly prepares an SHA-1 and an
> SHA-256 repositories and then tries to clone them with the current
> binary to make sure the result look reasonable, and t????-push-*.sh
> has a test to push between a pair of SHA-1 repositories, and a pair
> of SHA-256 repositories, with the current binary)?

For the dual-hash case, I will add interoperability tests in the future
when we get the interoperability code working and I'll include
same-hash, different-hash, and dual-hash cases.  I'll also make sure
that interop produces the same results in terms of object IDs that a
native single-hash implementation provides.  Right now on that code, I'm
using GIT_DEFAULT_HASH=sha256:sha1 (which I added) to run the testsuite,
which sets up SHA-256 as the main hash and SHA-1 as the compat hash.
I'll add a CI job for that in the future.  (If you're interested, this
code is living in my sha256-interop branch on GitHub and my local
server.)

As far as the tests we have right now that apply to this series, it's
spread across a lot of tests.  There are lots of places in the code that
we clone a repository to make some changes that we don't want to make in
the current repository, for instance, and if clones don't work then
those tests are all broken.  The submodule tests actually add a wide
variety of nested repositories and push and pull them all over the
place, so those also exercise all the cases very well.  We do have clone
and fetch tests, as well as push tests, for local, HTTP, and SSH, so we
do get very comprehensive tests.  Most of those don't specifically
choose different hash algorithms, though (although a few do); they only
work with whatever the testsuite is doing.

Also, while I think some basic interoperability tests are helpful,
there's no substitute for running the entire testsuite in SHA-1 mode
because there are subtle variations in the protocol (e.g., HTTP is
stateless) and there are a lot of non-protocol cases we need to
adequately cover as well (like initializing repositories).  We're not
going to catch all the weird edge cases with a few interoperability
tests.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/10] Use original hash for legacy formats
  2025-06-20 20:51     ` brian m. carlson
@ 2025-06-20 21:14       ` Junio C Hamano
  2025-07-01 11:35         ` Patrick Steinhardt
  0 siblings, 1 reply; 47+ messages in thread
From: Junio C Hamano @ 2025-06-20 21:14 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Patrick Steinhardt

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2025-06-20 at 14:26:37, Junio C Hamano wrote:
>> If we call use a name with SHA-1 in it (e.g., GIT_HASH_MUST_BE_SHA1)
>> from the beginning, perhaps we do not have to rename _ORIGINAL later?
>
> We could call it GIT_HASH_LEGACY_SHA1 if you prefer that.  I originally
> considered something like GIT_HASH_GOOD_OLD_REV (GOOD_OLD_REV comes from
> ext2's much more rigid and less extendable v0 rather than its newer v1
> format), but I felt like that would be too esoteric and not document
> things well enough.
>
> I'm also open to other ideas for naming if someone has them.  After all,
> naming things is one of the hard problems in computer science.

Yup, legacy-sha1 is good enough.  I just did not want a name that
does not have sha1 in it.

Thanks.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 02/10] hash: add a constant for the original hash algorithm
  2025-06-20 20:43     ` brian m. carlson
@ 2025-07-01 11:35       ` Patrick Steinhardt
  0 siblings, 0 replies; 47+ messages in thread
From: Patrick Steinhardt @ 2025-07-01 11:35 UTC (permalink / raw)
  To: brian m. carlson, Junio C Hamano, git

On Fri, Jun 20, 2025 at 08:43:07PM +0000, brian m. carlson wrote:
> On 2025-06-20 at 01:56:02, Junio C Hamano wrote:
> > "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> > 
> > > We have a a variety of uses of GIT_HASH_SHA1 littered throughout our
> > > code.  Some of these really mean to represent specifically SHA-1, but
> > > some actually represent the original hash algorithm used in Git which is
> > > implied by older formats and protocols which do not contain hash
> > > information.  For instance, the bundle v1 and v2 formats do not contain
> > > hash algorithm information, and thus SHA-1 is implied by the use of
> > > these formats.
> > 
> > Does that mean use of _ORIGINAL is a sign that these places should
> > keep using SHA-1 and should not change?
> 
> Yes.

I think this makes sense. There have been a bunch of locations in our
code base where I was left wondering whether the use of SHA1 is
intentional or not. Making these explicit should make it a lot more
obvious into which of these categories a callsite falls into.

[snip]
> > > Add a constant for documentary purposes which indicates this value.  It
> > > will always be the same as SHA-1, since this is an essential part of
> > > these formats, but its use indicates this particular reason and not any
> > > other reason why SHA-1 might be used.
> > 
> > I am not sure what this means.  If we use GIT_HASH_SHA1 in such
> > places explicitly (as opposed to GIT_HASH_DEFAULT), isn't it a sign
> > enough that with different versions of Git, that particular code
> > path should keep using SHA-1 no matter what the default is?
> 
> If we have a test helper that computes hashes and someone specified
> "sha1" on the command line, that's GIT_HASH_SHA1.  Someone said, "I'd
> like to use SHA-1."  Similarly, in the reftable code, we can read the
> byte value indicating that the reftable is in SHA-1 and that's an
> explicit decision.

Tiny nit: even for the reftable format it is not always clear whether it
is GIT_HASH_SHA1 or GIT_HASH_ORIGINAL. There are two versions of the
format:

  - The first version implicitly uses SHA1, so this would be
    GIT_HASH_ORIGINAL.

  - The second version specifies the hash format, so it would be either
    GIT_HASH_SHA1 or GIT_HASH_SHA256.

But again, I think that this distinction is actually useful.

> If we default to SHA-1 because nobody specified extensions.objectformat,
> then that's GIT_HASH_ORIGINAL.  Nobody made a decision or opted into an
> algorithm; we just didn't think hard enough about cryptographic agility
> in the original Git and we assumed SHA-1.
> 
> They're both the same numeric constant here and always will be (even if,
> in a future version of Git, we get rid of SHA-1 altogether and we
> otherwise die on that code).  But there's a difference in intention: one
> explicitly stated SHA-1 as opposed to a different algorithm and one just
> got a default because that's the compatible legacy behaviour.

Yup.

Patrick

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/10] builtin: use default hash when outside a repository
  2025-06-20  1:19 ` [PATCH 03/10] builtin: use default hash when outside a repository brian m. carlson
  2025-06-20 14:19   ` Junio C Hamano
@ 2025-07-01 11:35   ` Patrick Steinhardt
  2025-07-01 21:14     ` brian m. carlson
  1 sibling, 1 reply; 47+ messages in thread
From: Patrick Steinhardt @ 2025-07-01 11:35 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Junio C Hamano

On Fri, Jun 20, 2025 at 01:19:35AM +0000, brian m. carlson wrote:
> We have some commands that can operate inside or outside a repository.
> If we're operating outside a repository, we clearly cannot use the
> repository's hash algorithm as a default since it doesn't exist, so
> instead, let's pick the default instead of specifically SHA-1.  Right
> now this results in no functional change since the default is SHA-1, but
> that may change in the future.

With the preceding commit in mind that introduced GIT_HASH_ORIGINAL you
could also argue that those callsites should be converted to use that
define instead. We always used to treat them as SHA1 repositories, and
we have no better way of telling otherwise, so we use the historical
value of SHA1 so that scripts aren't dependent on how exactly Git was
built.

Patrick

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/10] Use original hash for legacy formats
  2025-06-20 21:14       ` Junio C Hamano
@ 2025-07-01 11:35         ` Patrick Steinhardt
  0 siblings, 0 replies; 47+ messages in thread
From: Patrick Steinhardt @ 2025-07-01 11:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: brian m. carlson, git

On Fri, Jun 20, 2025 at 02:14:05PM -0700, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > On 2025-06-20 at 14:26:37, Junio C Hamano wrote:
> >> If we call use a name with SHA-1 in it (e.g., GIT_HASH_MUST_BE_SHA1)
> >> from the beginning, perhaps we do not have to rename _ORIGINAL later?
> >
> > We could call it GIT_HASH_LEGACY_SHA1 if you prefer that.  I originally
> > considered something like GIT_HASH_GOOD_OLD_REV (GOOD_OLD_REV comes from
> > ext2's much more rigid and less extendable v0 rather than its newer v1
> > format), but I felt like that would be too esoteric and not document
> > things well enough.
> >
> > I'm also open to other ideas for naming if someone has them.  After all,
> > naming things is one of the hard problems in computer science.
> 
> Yup, legacy-sha1 is good enough.  I just did not want a name that
> does not have sha1 in it.

Agreed. I almost started bikeshedding in the patch where you introduced
the define, but refrained from doing so. I myself would have proposed
GIT_HASH_SHA1_HISTORICAL, but calling it "legacy" is even better.

Patrick

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 10/10] Enable SHA-256 by default in breaking changes mode
  2025-06-20  1:19 ` [PATCH 10/10] Enable SHA-256 by default in breaking changes mode brian m. carlson
  2025-06-20 14:58   ` Junio C Hamano
  2025-06-20 15:03   ` Junio C Hamano
@ 2025-07-01 11:35   ` Patrick Steinhardt
  2 siblings, 0 replies; 47+ messages in thread
From: Patrick Steinhardt @ 2025-07-01 11:35 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Junio C Hamano

On Fri, Jun 20, 2025 at 01:19:42AM +0000, brian m. carlson wrote:
> Our document on breaking changes indicates that we intend to default to
> SHA-256 in Git 3.0.  Since most people choose the default option, this
> is an important security upgrade to our defaults.
> 
> To allow people to test this case, when WITH_BREAKING_CHANGES is set in
> the configuration, build Git with SHA-256 as the default hash.  Update
> the testsuite to reflect this configuration so that the tests pass.

Awesome. Thanks for advancing our migration towards SHA256!

> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index ef3759ec80..bb18dd0606 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -536,7 +536,12 @@ export GIT_COMMITTER_EMAIL GIT_COMMITTER_NAME
>  export GIT_COMMITTER_DATE GIT_AUTHOR_DATE
>  export EDITOR
>  
> -GIT_TEST_BUILTIN_HASH=sha1
> +if test -n "$WITH_BREAKING_CHANGES"
> +then
> +	GIT_TEST_BUILTIN_HASH=sha256
> +else
> +	GIT_TEST_BUILTIN_HASH=sha1
> +fi

There should probably be an option somewhere in Git to ask it what its
current builtin hash is. If so, you wouldn't have to hardcode the hash
over here but could ask for example `git version --builtin-hash`.

Patrick

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/10] builtin: use default hash when outside a repository
  2025-07-01 11:35   ` Patrick Steinhardt
@ 2025-07-01 21:14     ` brian m. carlson
  2025-07-02 15:08       ` Patrick Steinhardt
  0 siblings, 1 reply; 47+ messages in thread
From: brian m. carlson @ 2025-07-01 21:14 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1330 bytes --]

On 2025-07-01 at 11:35:33, Patrick Steinhardt wrote:
> On Fri, Jun 20, 2025 at 01:19:35AM +0000, brian m. carlson wrote:
> > We have some commands that can operate inside or outside a repository.
> > If we're operating outside a repository, we clearly cannot use the
> > repository's hash algorithm as a default since it doesn't exist, so
> > instead, let's pick the default instead of specifically SHA-1.  Right
> > now this results in no functional change since the default is SHA-1, but
> > that may change in the future.
> 
> With the preceding commit in mind that introduced GIT_HASH_ORIGINAL you
> could also argue that those callsites should be converted to use that
> define instead. We always used to treat them as SHA1 repositories, and
> we have no better way of telling otherwise, so we use the historical
> value of SHA1 so that scripts aren't dependent on how exactly Git was
> built.

I don't think I want to do that.  A lot of the functionality people use
outside of repositories, such as index-pack and ls-remote, actually
operates on repository objects and so it makes sense to use the default.

For instance, it will be a major inconvenience to still have to specify
a custom object format three to five years after the switchover.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 00/11] Add SHA-256 by default as a breaking change
  2025-06-20  1:19 [PATCH 00/10] Add SHA-256 by default as a breaking change brian m. carlson
                   ` (9 preceding siblings ...)
  2025-06-20  1:19 ` [PATCH 10/10] Enable SHA-256 by default in breaking changes mode brian m. carlson
@ 2025-07-01 21:22 ` brian m. carlson
  2025-07-01 21:22   ` [PATCH v2 01/11] hash: add a constant for the default hash algorithm brian m. carlson
                     ` (12 more replies)
  10 siblings, 13 replies; 47+ messages in thread
From: brian m. carlson @ 2025-07-01 21:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt

Our breaking changes document outlines that Git 3.0 will make SHA-256
the default hash algorithm, which is a sensible and prudent approach,
especially from a security perspective.  However, we haven't tested this
adequately and it would be helpful to allow users to test this behaviour
so their code and environments are ready for it.

Fortunately, c5bc9a7f94 (Makefile: wire up build option for deprecated
features, 2025-01-22) introduces a build option that we can use for
testing breaking changes: WITH_BREAKING_CHANGES.  This series introduces
functionality for SHA-256 by default in this mode so we can test it out.

Changes since v1:
* Add a build option for the default hash and use it in the tests.
* Rename GIT_HASH_ORIGINAL to GIT_HASH_SHA1_LEGACY.
* Improve some of the commit messages to better explain questions that
  have come up for review.
* Improve formatting of nested C preprocessor directives.

brian m. carlson (11):
  hash: add a constant for the default hash algorithm
  hash: add a constant for the legacy hash algorithm
  builtin: use default hash when outside a repository
  Use legacy hash for legacy formats
  setup: use the default algorithm to initialize repo format
  t: default to compile-time default hash if not set
  t1007: choose the built-in hash outside of a repo
  t4042: choose the built-in hash outside of a repo
  t5300: choose the built-in hash outside of a repo
  help: add a build option for default hash
  Enable SHA-256 by default in breaking changes mode

 builtin/apply.c                  |  2 +-
 builtin/diff.c                   |  2 +-
 builtin/hash-object.c            |  2 +-
 builtin/index-pack.c             |  2 +-
 builtin/ls-remote.c              |  2 +-
 builtin/patch-id.c               |  2 +-
 builtin/receive-pack.c           |  2 +-
 builtin/shortlog.c               |  2 +-
 builtin/show-index.c             |  2 +-
 bundle.c                         |  4 ++--
 connect.c                        |  6 +++---
 fetch-pack.c                     |  2 +-
 hash.h                           | 10 ++++++++++
 help.c                           |  1 +
 pkt-line.c                       |  2 +-
 remote-curl.c                    |  2 +-
 serve.c                          |  2 +-
 setup.c                          |  9 ++++++---
 setup.h                          |  2 +-
 t/t1007-hash-object.sh           |  4 ++--
 t/t4042-diff-textconv-caching.sh | 12 ++++++++++--
 t/t5300-pack-object.sh           |  6 +++---
 t/test-lib-functions.sh          |  5 ++++-
 t/test-lib.sh                    |  7 ++++++-
 transport.c                      |  2 +-
 25 files changed, 62 insertions(+), 32 deletions(-)


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 01/11] hash: add a constant for the default hash algorithm
  2025-07-01 21:22 ` [PATCH v2 00/11] Add SHA-256 by default as a breaking change brian m. carlson
@ 2025-07-01 21:22   ` brian m. carlson
  2025-07-01 21:22   ` [PATCH v2 02/11] hash: add a constant for the legacy " brian m. carlson
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 47+ messages in thread
From: brian m. carlson @ 2025-07-01 21:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt

Right now, SHA-1 is the default hash algorithm in Git.  However, this
may change in the future.

We have many places in our code that use the SHA-1 constant to indicate
the default hash if none is specified, but it will end up being more
practical to specify this explicitly and clearly using a constant for
whatever the default hash algorithm is.  Then, if we decide to change it
in the future, we can simply replace the constant representing the
default with a new value.

For these reasons, introduce GIT_HASH_DEFAULT to represent the default
hash algorithm.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 hash.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hash.h b/hash.h
index d6422ddf45..0d3d85e04c 100644
--- a/hash.h
+++ b/hash.h
@@ -174,6 +174,8 @@ static inline void git_SHA256_Clone(git_SHA256_CTX *dst, const git_SHA256_CTX *s
 #define GIT_HASH_SHA256 2
 /* Number of algorithms supported (including unknown). */
 #define GIT_HASH_NALGOS (GIT_HASH_SHA256 + 1)
+/* Default hash algorithm if unspecified. */
+#define GIT_HASH_DEFAULT GIT_HASH_SHA1
 
 /* "sha1", big-endian */
 #define GIT_SHA1_FORMAT_ID 0x73686131

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 02/11] hash: add a constant for the legacy hash algorithm
  2025-07-01 21:22 ` [PATCH v2 00/11] Add SHA-256 by default as a breaking change brian m. carlson
  2025-07-01 21:22   ` [PATCH v2 01/11] hash: add a constant for the default hash algorithm brian m. carlson
@ 2025-07-01 21:22   ` brian m. carlson
  2025-07-01 21:22   ` [PATCH v2 03/11] builtin: use default hash when outside a repository brian m. carlson
                     ` (10 subsequent siblings)
  12 siblings, 0 replies; 47+ messages in thread
From: brian m. carlson @ 2025-07-01 21:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt

We have a a variety of uses of GIT_HASH_SHA1 littered throughout our
code.  Some of these really mean to represent specifically SHA-1, but
some actually represent the original hash algorithm used in Git which is
implied by older, legacy formats and protocols which do not contain hash
information.  For instance, the bundle v1 and v2 formats do not contain
hash algorithm information, and thus SHA-1 is implied by the use of
these formats.

Add a constant for documentary purposes which indicates this value.  It
will always be the same as SHA-1, since this is an essential part of
these formats, but its use indicates this particular reason and not any
other reason why SHA-1 might be used.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 hash.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hash.h b/hash.h
index 0d3d85e04c..953e840d15 100644
--- a/hash.h
+++ b/hash.h
@@ -176,6 +176,8 @@ static inline void git_SHA256_Clone(git_SHA256_CTX *dst, const git_SHA256_CTX *s
 #define GIT_HASH_NALGOS (GIT_HASH_SHA256 + 1)
 /* Default hash algorithm if unspecified. */
 #define GIT_HASH_DEFAULT GIT_HASH_SHA1
+/* Legacy hash algorithm. Implied for older data formats which don't specify. */
+#define GIT_HASH_SHA1_LEGACY GIT_HASH_SHA1
 
 /* "sha1", big-endian */
 #define GIT_SHA1_FORMAT_ID 0x73686131

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 03/11] builtin: use default hash when outside a repository
  2025-07-01 21:22 ` [PATCH v2 00/11] Add SHA-256 by default as a breaking change brian m. carlson
  2025-07-01 21:22   ` [PATCH v2 01/11] hash: add a constant for the default hash algorithm brian m. carlson
  2025-07-01 21:22   ` [PATCH v2 02/11] hash: add a constant for the legacy " brian m. carlson
@ 2025-07-01 21:22   ` brian m. carlson
  2025-07-01 21:22   ` [PATCH v2 04/11] Use legacy hash for legacy formats brian m. carlson
                     ` (9 subsequent siblings)
  12 siblings, 0 replies; 47+ messages in thread
From: brian m. carlson @ 2025-07-01 21:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt

We have some commands that can operate inside or outside a repository.
If we're operating outside a repository, we clearly cannot use the
repository's hash algorithm as a default since it doesn't exist, so
instead, let's pick the default instead of specifically SHA-1.  Right
now this results in no functional change since the default is SHA-1, but
that may change in the future.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/apply.c       | 2 +-
 builtin/diff.c        | 2 +-
 builtin/hash-object.c | 2 +-
 builtin/index-pack.c  | 2 +-
 builtin/ls-remote.c   | 2 +-
 builtin/patch-id.c    | 2 +-
 builtin/shortlog.c    | 2 +-
 builtin/show-index.c  | 2 +-
 8 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/builtin/apply.c b/builtin/apply.c
index a1e20c593d..d642a40251 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -29,7 +29,7 @@ int cmd_apply(int argc,
 	 * cf. https://lore.kernel.org/git/xmqqcypfcmn4.fsf@gitster.g/
 	 */
 	if (!the_hash_algo)
-		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+		repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
 
 	argc = apply_parse_options(argc, argv,
 				   &state, &force_apply, &options,
diff --git a/builtin/diff.c b/builtin/diff.c
index c6231edce4..eebffe36cc 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -483,7 +483,7 @@ int cmd_diff(int argc,
 	 * configurable via a command line option.
 	 */
 	if (nongit)
-		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+		repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
 
 	init_diff_ui_defaults();
 	git_config(git_diff_ui_config, NULL);
diff --git a/builtin/hash-object.c b/builtin/hash-object.c
index 6a99ec250d..213a302e05 100644
--- a/builtin/hash-object.c
+++ b/builtin/hash-object.c
@@ -104,7 +104,7 @@ int cmd_hash_object(int argc,
 		prefix = setup_git_directory_gently(&nongit);
 
 	if (nongit && !the_hash_algo)
-		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+		repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
 
 	if (vpath && prefix) {
 		vpath_free = prefix_filename(prefix, vpath);
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index bb7925bd29..352ce7f88a 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -2034,7 +2034,7 @@ int cmd_index_pack(int argc,
 	 * choice but to guess the object hash.
 	 */
 	if (!the_repository->hash_algo)
-		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+		repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
 
 	opts.flags &= ~(WRITE_REV | WRITE_REV_VERIFY);
 	if (rev_index) {
diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index 01a4d4daa1..df09000b30 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -112,7 +112,7 @@ int cmd_ls_remote(int argc,
 	 * depending on what object hash the remote uses.
 	 */
 	if (!the_repository->hash_algo)
-		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+		repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
 
 	packet_trace_identity("ls-remote");
 
diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index cdef2ec10a..26f04b0335 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -254,7 +254,7 @@ int cmd_patch_id(int argc,
 	 * the code that computes patch IDs to always use SHA1.
 	 */
 	if (!the_hash_algo)
-		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+		repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
 
 	generate_id_list(opts ? opts > 1 : config.stable,
 			 opts ? opts == 3 : config.verbatim);
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index fe15e11497..60adc5e7a5 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -419,7 +419,7 @@ int cmd_shortlog(int argc,
 	 * git/nongit so that we do not have to do this.
 	 */
 	if (nongit && !the_hash_algo)
-		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+		repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
 
 	git_config(git_default_config, NULL);
 	shortlog_init(&log);
diff --git a/builtin/show-index.c b/builtin/show-index.c
index 9d4ecf5e7b..2c3e2940ce 100644
--- a/builtin/show-index.c
+++ b/builtin/show-index.c
@@ -47,7 +47,7 @@ int cmd_show_index(int argc,
 	 *       the index file passed in and use that instead.
 	 */
 	if (!the_hash_algo)
-		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+		repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
 
 	hashsz = the_hash_algo->rawsz;
 

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 04/11] Use legacy hash for legacy formats
  2025-07-01 21:22 ` [PATCH v2 00/11] Add SHA-256 by default as a breaking change brian m. carlson
                     ` (2 preceding siblings ...)
  2025-07-01 21:22   ` [PATCH v2 03/11] builtin: use default hash when outside a repository brian m. carlson
@ 2025-07-01 21:22   ` brian m. carlson
  2025-07-01 21:22   ` [PATCH v2 05/11] setup: use the default algorithm to initialize repo format brian m. carlson
                     ` (8 subsequent siblings)
  12 siblings, 0 replies; 47+ messages in thread
From: brian m. carlson @ 2025-07-01 21:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt

We have a large variety of data formats and protocols where no hash
algorithm was defined and the default was assumed to always be SHA-1.
Instead of explicitly stating SHA-1, let's use the constant to represent
the legacy hash algorithm (which is still SHA-1) so that it's clear
for documentary purposes that it's a legacy fallback option and not an
intentional choice to use SHA-1.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/receive-pack.c | 2 +-
 bundle.c               | 4 ++--
 connect.c              | 6 +++---
 fetch-pack.c           | 2 +-
 pkt-line.c             | 2 +-
 remote-curl.c          | 2 +-
 serve.c                | 2 +-
 setup.c                | 4 ++--
 transport.c            | 2 +-
 9 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index a317d6c278..24b33a3a5c 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -2136,7 +2136,7 @@ static struct command *read_head_info(struct packet_reader *reader,
 				use_push_options = 1;
 			hash = parse_feature_value(feature_list, "object-format", &len, NULL);
 			if (!hash) {
-				hash = hash_algos[GIT_HASH_SHA1].name;
+				hash = hash_algos[GIT_HASH_SHA1_LEGACY].name;
 				len = strlen(hash);
 			}
 			if (xstrncmpz(the_hash_algo->name, hash, len))
diff --git a/bundle.c b/bundle.c
index b0a3fee2ef..61e81bb0c3 100644
--- a/bundle.c
+++ b/bundle.c
@@ -95,7 +95,7 @@ int read_bundle_header_fd(int fd, struct bundle_header *header,
 	 * by an "object-format=" capability, which is being handled in
 	 * `parse_capability()`.
 	 */
-	header->hash_algo = &hash_algos[GIT_HASH_SHA1];
+	header->hash_algo = &hash_algos[GIT_HASH_SHA1_LEGACY];
 
 	/* The bundle header ends with an empty line */
 	while (!strbuf_getwholeline_fd(&buf, fd, '\n') &&
@@ -507,7 +507,7 @@ int create_bundle(struct repository *r, const char *path,
 	 *    SHA1.
 	 * 2. @filter is required because we parsed an object filter.
 	 */
-	if (the_hash_algo != &hash_algos[GIT_HASH_SHA1] || revs.filter.choice)
+	if (the_hash_algo != &hash_algos[GIT_HASH_SHA1_LEGACY] || revs.filter.choice)
 		min_version = 3;
 
 	if (argc > 1) {
diff --git a/connect.c b/connect.c
index 3280435331..e77287f426 100644
--- a/connect.c
+++ b/connect.c
@@ -251,7 +251,7 @@ static void process_capabilities(struct packet_reader *reader, size_t *linelen)
 			reader->hash_algo = &hash_algos[hash_algo];
 		free(hash_name);
 	} else {
-		reader->hash_algo = &hash_algos[GIT_HASH_SHA1];
+		reader->hash_algo = &hash_algos[GIT_HASH_SHA1_LEGACY];
 	}
 }
 
@@ -500,7 +500,7 @@ static void send_capabilities(int fd_out, struct packet_reader *reader)
 		reader->hash_algo = &hash_algos[hash_algo];
 		packet_write_fmt(fd_out, "object-format=%s", reader->hash_algo->name);
 	} else {
-		reader->hash_algo = &hash_algos[GIT_HASH_SHA1];
+		reader->hash_algo = &hash_algos[GIT_HASH_SHA1_LEGACY];
 	}
 	if (server_feature_v2("promisor-remote", &promisor_remote_info)) {
 		char *reply = promisor_remote_reply(promisor_remote_info);
@@ -665,7 +665,7 @@ int server_supports_hash(const char *desired, int *feature_supported)
 	if (feature_supported)
 		*feature_supported = !!hash;
 	if (!hash) {
-		hash = hash_algos[GIT_HASH_SHA1].name;
+		hash = hash_algos[GIT_HASH_SHA1_LEGACY].name;
 		len = strlen(hash);
 	}
 	while (hash) {
diff --git a/fetch-pack.c b/fetch-pack.c
index fa4231fee7..95f66ffc1d 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1342,7 +1342,7 @@ static void write_fetch_command_and_capabilities(struct strbuf *req_buf,
 			die(_("mismatched algorithms: client %s; server %s"),
 			    the_hash_algo->name, hash_name);
 		packet_buf_write(req_buf, "object-format=%s", the_hash_algo->name);
-	} else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1) {
+	} else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1_LEGACY) {
 		die(_("the server does not support algorithm '%s'"),
 		    the_hash_algo->name);
 	}
diff --git a/pkt-line.c b/pkt-line.c
index a5bcbc96fb..fc583feb26 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -617,7 +617,7 @@ void packet_reader_init(struct packet_reader *reader, int fd,
 	reader->buffer_size = sizeof(packet_buffer);
 	reader->options = options;
 	reader->me = "git";
-	reader->hash_algo = &hash_algos[GIT_HASH_SHA1];
+	reader->hash_algo = &hash_algos[GIT_HASH_SHA1_LEGACY];
 	strbuf_init(&reader->scratch, 0);
 }
 
diff --git a/remote-curl.c b/remote-curl.c
index b8bc3a80cf..84f4694780 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -285,7 +285,7 @@ static const struct git_hash_algo *detect_hash_algo(struct discovery *heads)
 	 * back to SHA1, which may or may not be correct.
 	 */
 	if (!p)
-		return &hash_algos[GIT_HASH_SHA1];
+		return &hash_algos[GIT_HASH_SHA1_LEGACY];
 
 	algo = hash_algo_by_length((p - heads->buf) / 2);
 	if (algo == GIT_HASH_UNKNOWN)
diff --git a/serve.c b/serve.c
index e3ccf1505c..53ecab3b42 100644
--- a/serve.c
+++ b/serve.c
@@ -14,7 +14,7 @@
 
 static int advertise_sid = -1;
 static int advertise_object_info = -1;
-static int client_hash_algo = GIT_HASH_SHA1;
+static int client_hash_algo = GIT_HASH_SHA1_LEGACY;
 
 static int always_advertise(struct repository *r UNUSED,
 			    struct strbuf *value UNUSED)
diff --git a/setup.c b/setup.c
index f93bd6a24a..3d2b3e745b 100644
--- a/setup.c
+++ b/setup.c
@@ -2222,11 +2222,11 @@ void initialize_repository_version(int hash_algo,
 	 * version will get adjusted by git-clone(1) once it has learned about
 	 * the remote repository's format.
 	 */
-	if (hash_algo != GIT_HASH_SHA1 ||
+	if (hash_algo != GIT_HASH_SHA1_LEGACY ||
 	    ref_storage_format != REF_STORAGE_FORMAT_FILES)
 		target_version = GIT_REPO_VERSION_READ;
 
-	if (hash_algo != GIT_HASH_SHA1 && hash_algo != GIT_HASH_UNKNOWN)
+	if (hash_algo != GIT_HASH_SHA1_LEGACY && hash_algo != GIT_HASH_UNKNOWN)
 		git_config_set("extensions.objectformat",
 			       hash_algos[hash_algo].name);
 	else if (reinit)
diff --git a/transport.c b/transport.c
index 6c2801bcbd..c123ac1e38 100644
--- a/transport.c
+++ b/transport.c
@@ -1243,7 +1243,7 @@ struct transport *transport_get(struct remote *remote, const char *url)
 			ret->smart_options->receivepack = remote->receivepack;
 	}
 
-	ret->hash_algo = &hash_algos[GIT_HASH_SHA1];
+	ret->hash_algo = &hash_algos[GIT_HASH_SHA1_LEGACY];
 
 	return ret;
 }

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 05/11] setup: use the default algorithm to initialize repo format
  2025-07-01 21:22 ` [PATCH v2 00/11] Add SHA-256 by default as a breaking change brian m. carlson
                     ` (3 preceding siblings ...)
  2025-07-01 21:22   ` [PATCH v2 04/11] Use legacy hash for legacy formats brian m. carlson
@ 2025-07-01 21:22   ` brian m. carlson
  2025-07-01 21:22   ` [PATCH v2 06/11] t: default to compile-time default hash if not set brian m. carlson
                     ` (7 subsequent siblings)
  12 siblings, 0 replies; 47+ messages in thread
From: brian m. carlson @ 2025-07-01 21:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt

When we define a new repository format with REPOSITORY_FORMAT_INIT, we
always use GIT_HASH_SHA1, and this value ends up getting used as the
default value to initialize a repository if none of the command line,
environment, or config tell us to do otherwise.

Because we might not always want to use SHA-1 as the default, let's
instead specify the default hash algorithm constant so that we will use
whatever the specified default is.

However, we also need to continue to read older repositories.  If we're
in a v0 repository or extensions.objectformat is not set, then we must
continue to default to the original hash algorithm: SHA-1.  If an
algorithm is set explicitly, however, it will override the hash_algo
member of the repository_format struct and we'll get the right value.

Similarly, if the repository was initialized before Git 0.99.3, then it
may lack a core.repositoryformatversion key, and some repositories lack
a config file altogether.  In both cases, format->version is -1 and we
need to assume that SHA-1 is in use.

Because clear_repository_format reinitializes the struct
repository_format and therefore sets the hash_algo member to the default
(which could in the future not be SHA-1), we need to reset this member
explicitly.  We know, however, that at the point we call
read_repository_format, we are actually reading an existing repository
and not initializing a new one or operating outside of a repository, so
we are not changing the default behavior back to SHA-1 if the default
algorithm is different.

It is potentially questionable that we ignore all repository
configuration if there is a config file but it doesn't have
core.repositoryformatversion set, in which case we reset all of the
configuration to the default.  However, it is unclear what the right
thing to do instead with such an old repository is and a simple git init
will add the missing entry, so for now, we simply honor what the
existing code does and reset the value to the default, simply adding our
initialization to SHA-1.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 setup.c | 5 ++++-
 setup.h | 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/setup.c b/setup.c
index 3d2b3e745b..03a61bd06a 100644
--- a/setup.c
+++ b/setup.c
@@ -835,9 +835,12 @@ static void init_repository_format(struct repository_format *format)
 int read_repository_format(struct repository_format *format, const char *path)
 {
 	clear_repository_format(format);
+	format->hash_algo = GIT_HASH_SHA1_LEGACY;
 	git_config_from_file(check_repo_format, path, format);
-	if (format->version == -1)
+	if (format->version == -1) {
 		clear_repository_format(format);
+		format->hash_algo = GIT_HASH_SHA1_LEGACY;
+	}
 	return format->version;
 }
 
diff --git a/setup.h b/setup.h
index 18dc3b7368..8522fa8575 100644
--- a/setup.h
+++ b/setup.h
@@ -149,7 +149,7 @@ struct repository_format {
 { \
 	.version = -1, \
 	.is_bare = -1, \
-	.hash_algo = GIT_HASH_SHA1, \
+	.hash_algo = GIT_HASH_DEFAULT, \
 	.ref_storage_format = REF_STORAGE_FORMAT_FILES, \
 	.unknown_extensions = STRING_LIST_INIT_DUP, \
 	.v1_only_extensions = STRING_LIST_INIT_DUP, \

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 06/11] t: default to compile-time default hash if not set
  2025-07-01 21:22 ` [PATCH v2 00/11] Add SHA-256 by default as a breaking change brian m. carlson
                     ` (4 preceding siblings ...)
  2025-07-01 21:22   ` [PATCH v2 05/11] setup: use the default algorithm to initialize repo format brian m. carlson
@ 2025-07-01 21:22   ` brian m. carlson
  2025-07-01 21:22   ` [PATCH v2 07/11] t1007: choose the built-in hash outside of a repo brian m. carlson
                     ` (6 subsequent siblings)
  12 siblings, 0 replies; 47+ messages in thread
From: brian m. carlson @ 2025-07-01 21:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt

Right now, the default compile-time hash is SHA-1.  However, in the
future, this might change and it would be helpful to gracefully handle
this case in our testsuite.

To avoid making these assumptions, let's introduce a variable that
contains the built-in default hash and use it in our setup code as the
fallback value if no hash was explicitly set.  For now, this is always
SHA-1, but in a future commit, we'll allow adjusting this and the
variable will be more useful.

To allow us to make our tests more robust, allow test_oid to take the
--hash=builtin option to specify this hash, whatever it is.

Additionally, add a DEFAULT_HASH_ALGORITHM prerequisite to check for the
compile-time hash.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/test-lib-functions.sh | 5 ++++-
 t/test-lib.sh           | 7 ++++++-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index bee4a2ca34..6ec95ea51f 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1695,7 +1695,7 @@ test_set_hash () {
 
 # Detect the hash algorithm in use.
 test_detect_hash () {
-	case "$GIT_TEST_DEFAULT_HASH" in
+	case "${GIT_TEST_DEFAULT_HASH:-$GIT_TEST_BUILTIN_HASH}" in
 	"sha256")
 	    test_hash_algo=sha256
 	    test_compat_hash_algo=sha1
@@ -1767,6 +1767,9 @@ test_oid () {
 	--hash=compat)
 		algo="$test_compat_hash_algo" &&
 		shift;;
+	--hash=builtin)
+		algo="$GIT_TEST_BUILTIN_HASH" &&
+		shift;;
 	--hash=*)
 		algo="${1#--hash=}" &&
 		shift;;
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 51370a201c..ef3759ec80 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -536,7 +536,8 @@ export GIT_COMMITTER_EMAIL GIT_COMMITTER_NAME
 export GIT_COMMITTER_DATE GIT_AUTHOR_DATE
 export EDITOR
 
-GIT_DEFAULT_HASH="${GIT_TEST_DEFAULT_HASH:-sha1}"
+GIT_TEST_BUILTIN_HASH=sha1
+GIT_DEFAULT_HASH="${GIT_TEST_DEFAULT_HASH:-$GIT_TEST_BUILTIN_HASH}"
 export GIT_DEFAULT_HASH
 GIT_DEFAULT_REF_FORMAT="${GIT_TEST_DEFAULT_REF_FORMAT:-files}"
 export GIT_DEFAULT_REF_FORMAT
@@ -1908,6 +1909,10 @@ test_lazy_prereq SHA1 '
 	esac
 '
 
+test_lazy_prereq DEFAULT_HASH_ALGORITHM '
+	test "$GIT_TEST_BUILTIN_HASH" = "$GIT_DEFAULT_HASH"
+'
+
 test_lazy_prereq DEFAULT_REPO_FORMAT '
 	test_have_prereq SHA1,REFFILES
 '

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 07/11] t1007: choose the built-in hash outside of a repo
  2025-07-01 21:22 ` [PATCH v2 00/11] Add SHA-256 by default as a breaking change brian m. carlson
                     ` (5 preceding siblings ...)
  2025-07-01 21:22   ` [PATCH v2 06/11] t: default to compile-time default hash if not set brian m. carlson
@ 2025-07-01 21:22   ` brian m. carlson
  2025-07-01 21:22   ` [PATCH v2 08/11] t4042: " brian m. carlson
                     ` (5 subsequent siblings)
  12 siblings, 0 replies; 47+ messages in thread
From: brian m. carlson @ 2025-07-01 21:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt

Right now, the built-in default hash is always SHA-1, but that will
change in a future commit.  Instead of assuming that operating outside
of a repository will always use SHA-1, simply ask test_oid for the
built-in hash instead, which will always be correct.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t1007-hash-object.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index 64658b3ba5..de076293b6 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -252,9 +252,9 @@ test_expect_success '--literally complains about non-standard types' '
 	test_must_fail git hash-object -t bogus --literally --stdin
 '
 
-test_expect_success '--stdin outside of repository (uses SHA-1)' '
+test_expect_success '--stdin outside of repository (uses default hash)' '
 	nongit git hash-object --stdin <hello >actual &&
-	echo "$(test_oid --hash=sha1 hello)" >expect &&
+	echo "$(test_oid --hash=builtin hello)" >expect &&
 	test_cmp expect actual
 '
 

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 08/11] t4042: choose the built-in hash outside of a repo
  2025-07-01 21:22 ` [PATCH v2 00/11] Add SHA-256 by default as a breaking change brian m. carlson
                     ` (6 preceding siblings ...)
  2025-07-01 21:22   ` [PATCH v2 07/11] t1007: choose the built-in hash outside of a repo brian m. carlson
@ 2025-07-01 21:22   ` brian m. carlson
  2025-07-01 21:22   ` [PATCH v2 09/11] t5300: " brian m. carlson
                     ` (4 subsequent siblings)
  12 siblings, 0 replies; 47+ messages in thread
From: brian m. carlson @ 2025-07-01 21:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt

Right now, the built-in default hash is always SHA-1, but that will
change in a future commit.  Instead of assuming that operating outside
of a repository will always use SHA-1, provide constants for both
algorithms and then simply ask test_oid for the built-in hash instead,
which will always be correct.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t4042-diff-textconv-caching.sh | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/t/t4042-diff-textconv-caching.sh b/t/t4042-diff-textconv-caching.sh
index ff0e73531b..31018ceba2 100755
--- a/t/t4042-diff-textconv-caching.sh
+++ b/t/t4042-diff-textconv-caching.sh
@@ -120,6 +120,14 @@ test_expect_success 'log notes cache and still use cache for -p' '
 '
 
 test_expect_success 'caching is silently ignored outside repo' '
+	test_oid_cache <<-\EOM &&
+	oid1 sha1:5626abf
+	oid1 sha256:a4ed1f3
+	oid2 sha1:f719efd
+	oid2 sha256:aa9e7dc
+	EOM
+	oid1=$(test_oid --hash=builtin oid1) &&
+	oid2=$(test_oid --hash=builtin oid2) &&
 	mkdir -p non-repo &&
 	echo one >non-repo/one &&
 	echo two >non-repo/two &&
@@ -129,9 +137,9 @@ test_expect_success 'caching is silently ignored outside repo' '
 		   -c diff.test.textconv="tr a-z A-Z <" \
 		   -c diff.test.cachetextconv=true \
 		   diff --no-index one two >actual &&
-	cat >expect <<-\EOF &&
+	cat >expect <<-EOF &&
 	diff --git a/one b/two
-	index 5626abf..f719efd 100644
+	index $oid1..$oid2 100644
 	--- a/one
 	+++ b/two
 	@@ -1 +1 @@

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 09/11] t5300: choose the built-in hash outside of a repo
  2025-07-01 21:22 ` [PATCH v2 00/11] Add SHA-256 by default as a breaking change brian m. carlson
                     ` (7 preceding siblings ...)
  2025-07-01 21:22   ` [PATCH v2 08/11] t4042: " brian m. carlson
@ 2025-07-01 21:22   ` brian m. carlson
  2025-07-01 21:22   ` [PATCH v2 10/11] help: add a build option for default hash brian m. carlson
                     ` (3 subsequent siblings)
  12 siblings, 0 replies; 47+ messages in thread
From: brian m. carlson @ 2025-07-01 21:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt

Right now, the built-in default hash is always SHA-1, but that will
change in a future commit.  Instead of assuming that operating outside
of a repository will always use SHA-1, look up the default hash
algorithm for operating outside of a repository using an appropriate
environment variable, which will always be correct.

Additionally, for operations outside of a repository, use the
DEFAULT_HASH_ALGORITHM prerequisite rather than SHA1.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5300-pack-object.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index ae72158b94..73445782e7 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -525,7 +525,7 @@ test_expect_success 'index-pack --strict <pack> works in non-repo' '
 	test_path_is_file foo.idx
 '
 
-test_expect_success SHA1 'show-index works OK outside a repository' '
+test_expect_success DEFAULT_HASH_ALGORITHM 'show-index works OK outside a repository' '
 	nongit git show-index <foo.idx
 '
 
@@ -658,7 +658,7 @@ do
 		test_commit -C repo initial &&
 		git -C repo repack -ad &&
 		git -C repo verify-pack "$(pwd)"/repo/.git/objects/pack/*.idx &&
-		if test $hash = sha1
+		if test $hash = $GIT_TEST_BUILTIN_HASH
 		then
 			nongit git verify-pack "$(pwd)"/repo/.git/objects/pack/*.idx
 		else
@@ -676,7 +676,7 @@ do
 		test_commit -C repo initial &&
 		git -C repo repack -ad &&
 		git -C repo index-pack --verify "$(pwd)"/repo/.git/objects/pack/*.pack &&
-		if test $hash = sha1
+		if test $hash = $GIT_TEST_BUILTIN_HASH
 		then
 			nongit git index-pack --verify "$(pwd)"/repo/.git/objects/pack/*.pack
 		else

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 10/11] help: add a build option for default hash
  2025-07-01 21:22 ` [PATCH v2 00/11] Add SHA-256 by default as a breaking change brian m. carlson
                     ` (8 preceding siblings ...)
  2025-07-01 21:22   ` [PATCH v2 09/11] t5300: " brian m. carlson
@ 2025-07-01 21:22   ` brian m. carlson
  2025-07-01 21:22   ` [PATCH v2 11/11] Enable SHA-256 by default in breaking changes mode brian m. carlson
                     ` (2 subsequent siblings)
  12 siblings, 0 replies; 47+ messages in thread
From: brian m. carlson @ 2025-07-01 21:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt

We'd like users to be able to determine the hash algorithm that is the
builtin default in their version of Git.  This is useful for
troubleshooting, especially when we decide to change the default.  Add
an entry for the default hash in the output of git version
--build-options so that users can easily access that information and
include it in bug reports.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 help.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/help.c b/help.c
index 21b778707a..bd0be2ee57 100644
--- a/help.c
+++ b/help.c
@@ -810,6 +810,7 @@ void get_version_info(struct strbuf *buf, int show_build_options)
 			    SHA1_UNSAFE_BACKEND);
 #endif
 		strbuf_addf(buf, "SHA-256: %s\n", SHA256_BACKEND);
+		strbuf_addf(buf, "default-hash: %s\n", hash_algos[GIT_HASH_DEFAULT].name);
 	}
 }
 

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 11/11] Enable SHA-256 by default in breaking changes mode
  2025-07-01 21:22 ` [PATCH v2 00/11] Add SHA-256 by default as a breaking change brian m. carlson
                     ` (9 preceding siblings ...)
  2025-07-01 21:22   ` [PATCH v2 10/11] help: add a build option for default hash brian m. carlson
@ 2025-07-01 21:22   ` brian m. carlson
  2025-07-01 22:10   ` [PATCH v2 00/11] Add SHA-256 by default as a breaking change Junio C Hamano
  2025-07-02 14:46   ` Patrick Steinhardt
  12 siblings, 0 replies; 47+ messages in thread
From: brian m. carlson @ 2025-07-01 21:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt

Our document on breaking changes indicates that we intend to default to
SHA-256 in Git 3.0.  Since most people choose the default option, this
is an important security upgrade to our defaults.

To allow people to test this case, when WITH_BREAKING_CHANGES is set in
the configuration, build Git with SHA-256 as the default hash.  Update
the testsuite to use the build options information to automatically
choose the right value.

Note that if the command substitution for GIT_TEST_BUILTIN_HASH fails,
so does the testsuite—and quite spectacularly at that.  Thus, the case
where the Git binary is somehow subtly broken will not go undetected.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 hash.h        | 8 +++++++-
 t/test-lib.sh | 2 +-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/hash.h b/hash.h
index 953e840d15..3fcbe9bcba 100644
--- a/hash.h
+++ b/hash.h
@@ -174,8 +174,14 @@ static inline void git_SHA256_Clone(git_SHA256_CTX *dst, const git_SHA256_CTX *s
 #define GIT_HASH_SHA256 2
 /* Number of algorithms supported (including unknown). */
 #define GIT_HASH_NALGOS (GIT_HASH_SHA256 + 1)
+
 /* Default hash algorithm if unspecified. */
-#define GIT_HASH_DEFAULT GIT_HASH_SHA1
+#ifdef WITH_BREAKING_CHANGES
+# define GIT_HASH_DEFAULT GIT_HASH_SHA256
+#else
+# define GIT_HASH_DEFAULT GIT_HASH_SHA1
+#endif
+
 /* Legacy hash algorithm. Implied for older data formats which don't specify. */
 #define GIT_HASH_SHA1_LEGACY GIT_HASH_SHA1
 
diff --git a/t/test-lib.sh b/t/test-lib.sh
index ef3759ec80..6dc2022ee1 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -536,7 +536,7 @@ export GIT_COMMITTER_EMAIL GIT_COMMITTER_NAME
 export GIT_COMMITTER_DATE GIT_AUTHOR_DATE
 export EDITOR
 
-GIT_TEST_BUILTIN_HASH=sha1
+GIT_TEST_BUILTIN_HASH=$("$GIT_BUILD_DIR/git" version --build-options | sed -ne 's/^default-hash: //p')
 GIT_DEFAULT_HASH="${GIT_TEST_DEFAULT_HASH:-$GIT_TEST_BUILTIN_HASH}"
 export GIT_DEFAULT_HASH
 GIT_DEFAULT_REF_FORMAT="${GIT_TEST_DEFAULT_REF_FORMAT:-files}"

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 00/11] Add SHA-256 by default as a breaking change
  2025-07-01 21:22 ` [PATCH v2 00/11] Add SHA-256 by default as a breaking change brian m. carlson
                     ` (10 preceding siblings ...)
  2025-07-01 21:22   ` [PATCH v2 11/11] Enable SHA-256 by default in breaking changes mode brian m. carlson
@ 2025-07-01 22:10   ` Junio C Hamano
  2025-07-02 14:46   ` Patrick Steinhardt
  12 siblings, 0 replies; 47+ messages in thread
From: Junio C Hamano @ 2025-07-01 22:10 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Patrick Steinhardt

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> Changes since v1:
> * Add a build option for the default hash and use it in the tests.

WITH_BREAKING_CHANGES flips GIT_HASH_DEFAULT between SHA-1 and
SHA-256 as before, but the choice is now exposed via "git help
--build-options", so test-lib.sh does not have to switch based
on WITH_BREAKING_CHANGES and use the build-options embedded in
the binary.

Very nice.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 00/11] Add SHA-256 by default as a breaking change
  2025-07-01 21:22 ` [PATCH v2 00/11] Add SHA-256 by default as a breaking change brian m. carlson
                     ` (11 preceding siblings ...)
  2025-07-01 22:10   ` [PATCH v2 00/11] Add SHA-256 by default as a breaking change Junio C Hamano
@ 2025-07-02 14:46   ` Patrick Steinhardt
  2025-07-02 15:01     ` Kristoffer Haugsbakk
  12 siblings, 1 reply; 47+ messages in thread
From: Patrick Steinhardt @ 2025-07-02 14:46 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Junio C Hamano

On Tue, Jul 01, 2025 at 09:22:26PM +0000, brian m. carlson wrote:
> Our breaking changes document outlines that Git 3.0 will make SHA-256
> the default hash algorithm, which is a sensible and prudent approach,
> especially from a security perspective.  However, we haven't tested this
> adequately and it would be helpful to allow users to test this behaviour
> so their code and environments are ready for it.
> 
> Fortunately, c5bc9a7f94 (Makefile: wire up build option for deprecated
> features, 2025-01-22) introduces a build option that we can use for
> testing breaking changes: WITH_BREAKING_CHANGES.  This series introduces
> functionality for SHA-256 by default in this mode so we can test it out.
> 
> Changes since v1:
> * Add a build option for the default hash and use it in the tests.
> * Rename GIT_HASH_ORIGINAL to GIT_HASH_SHA1_LEGACY.
> * Improve some of the commit messages to better explain questions that
>   have come up for review.
> * Improve formatting of nested C preprocessor directives.

I looked specifically for the things that I commented on, all of which
seem to have been addressed. Given that there is no range diff I trust
that there aren't any other unexpected changes.

So this iteration looks good to me, and I think that this series is a
step into the right direction overall.

Patrick

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 00/11] Add SHA-256 by default as a breaking change
  2025-07-02 14:46   ` Patrick Steinhardt
@ 2025-07-02 15:01     ` Kristoffer Haugsbakk
  0 siblings, 0 replies; 47+ messages in thread
From: Kristoffer Haugsbakk @ 2025-07-02 15:01 UTC (permalink / raw)
  To: Patrick Steinhardt, brian m. carlson; +Cc: git, Junio C Hamano

On Wed, Jul 2, 2025, at 16:46, Patrick Steinhardt wrote:
> I looked specifically for the things that I commented on, all of which
> seem to have been addressed. Given that there is no range diff I trust
> that there aren't any other unexpected changes.
>
> So this iteration looks good to me, and I think that this series is a
> step into the right direction overall.

Range diff with trimmed/mangled whitespace:

```
 1:  aced16ec164 =  1:  ca6daa1368e hash: add a constant for the default hash algorithm
 2:  291d6f26bdd !  2:  1f68f3da877 hash: add a constant for the original hash algorithm
    @@ Metadata
     Author: brian m. carlson <sandals@crustytoothpaste.net>
     
      ## Commit message ##
    -    hash: add a constant for the original hash algorithm
    +    hash: add a constant for the legacy hash algorithm
     
         We have a a variety of uses of GIT_HASH_SHA1 littered throughout our
         code.  Some of these really mean to represent specifically SHA-1, but
         some actually represent the original hash algorithm used in Git which is
    -    implied by older formats and protocols which do not contain hash
    +    implied by older, legacy formats and protocols which do not contain hash
         information.  For instance, the bundle v1 and v2 formats do not contain
         hash algorithm information, and thus SHA-1 is implied by the use of
         these formats.
    @@ hash.h: static inline void git_SHA256_Clone(git_SHA256_CTX *dst, const git_SHA25
      #define GIT_HASH_NALGOS (GIT_HASH_SHA256 + 1)
      /* Default hash algorithm if unspecified. */
      #define GIT_HASH_DEFAULT GIT_HASH_SHA1
    -+/* Original hash algorithm. Implied for older data formats which don't specify. */
    -+#define GIT_HASH_ORIGINAL GIT_HASH_SHA1
    ++/* Legacy hash algorithm. Implied for older data formats which don't specify. */
    ++#define GIT_HASH_SHA1_LEGACY GIT_HASH_SHA1
      
      /* "sha1", big-endian */
      #define GIT_SHA1_FORMAT_ID 0x73686131
 3:  d041a12031d =  3:  dc9c16c2fc8 builtin: use default hash when outside a repository
 4:  a52bf97d8eb !  4:  667d251a04c Use original hash for legacy formats
    @@ Metadata
     Author: brian m. carlson <sandals@crustytoothpaste.net>
     
      ## Commit message ##
    -    Use original hash for legacy formats
    +    Use legacy hash for legacy formats
     
         We have a large variety of data formats and protocols where no hash
         algorithm was defined and the default was assumed to always be SHA-1.
         Instead of explicitly stating SHA-1, let's use the constant to represent
    -    the original hash algorithm (which is still SHA-1) so that it's clear
    +    the legacy hash algorithm (which is still SHA-1) so that it's clear
         for documentary purposes that it's a legacy fallback option and not an
         intentional choice to use SHA-1.
     
    @@ builtin/receive-pack.c: static struct command *read_head_info(struct packet_read
      			hash = parse_feature_value(feature_list, "object-format", &len, NULL);
      			if (!hash) {
     -				hash = hash_algos[GIT_HASH_SHA1].name;
    -+				hash = hash_algos[GIT_HASH_ORIGINAL].name;
    ++				hash = hash_algos[GIT_HASH_SHA1_LEGACY].name;
      				len = strlen(hash);
      			}
      			if (xstrncmpz(the_hash_algo->name, hash, len))
    @@ bundle.c: int read_bundle_header_fd(int fd, struct bundle_header *header,
      	 * `parse_capability()`.
      	 */
     -	header->hash_algo = &hash_algos[GIT_HASH_SHA1];
    -+	header->hash_algo = &hash_algos[GIT_HASH_ORIGINAL];
    ++	header->hash_algo = &hash_algos[GIT_HASH_SHA1_LEGACY];
      
      	/* The bundle header ends with an empty line */
      	while (!strbuf_getwholeline_fd(&buf, fd, '\n') &&
    @@ bundle.c: int create_bundle(struct repository *r, const char *path,
      	 * 2. @filter is required because we parsed an object filter.
      	 */
     -	if (the_hash_algo != &hash_algos[GIT_HASH_SHA1] || revs.filter.choice)
    -+	if (the_hash_algo != &hash_algos[GIT_HASH_ORIGINAL] || revs.filter.choice)
    ++	if (the_hash_algo != &hash_algos[GIT_HASH_SHA1_LEGACY] || revs.filter.choice)
      		min_version = 3;
      
      	if (argc > 1) {
    @@ connect.c: static void process_capabilities(struct packet_reader *reader, size_t
      		free(hash_name);
      	} else {
     -		reader->hash_algo = &hash_algos[GIT_HASH_SHA1];
    -+		reader->hash_algo = &hash_algos[GIT_HASH_ORIGINAL];
    ++		reader->hash_algo = &hash_algos[GIT_HASH_SHA1_LEGACY];
      	}
      }
      
    @@ connect.c: static void send_capabilities(int fd_out, struct packet_reader *reade
      		packet_write_fmt(fd_out, "object-format=%s", reader->hash_algo->name);
      	} else {
     -		reader->hash_algo = &hash_algos[GIT_HASH_SHA1];
    -+		reader->hash_algo = &hash_algos[GIT_HASH_ORIGINAL];
    ++		reader->hash_algo = &hash_algos[GIT_HASH_SHA1_LEGACY];
      	}
      	if (server_feature_v2("promisor-remote", &promisor_remote_info)) {
      		char *reply = promisor_remote_reply(promisor_remote_info);
    @@ connect.c: int server_supports_hash(const char *desired, int *feature_supported)
      		*feature_supported = !!hash;
      	if (!hash) {
     -		hash = hash_algos[GIT_HASH_SHA1].name;
    -+		hash = hash_algos[GIT_HASH_ORIGINAL].name;
    ++		hash = hash_algos[GIT_HASH_SHA1_LEGACY].name;
      		len = strlen(hash);
      	}
      	while (hash) {
    @@ fetch-pack.c: static void write_fetch_command_and_capabilities(struct strbuf *re
      			    the_hash_algo->name, hash_name);
      		packet_buf_write(req_buf, "object-format=%s", the_hash_algo->name);
     -	} else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1) {
    -+	} else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_ORIGINAL) {
    ++	} else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1_LEGACY) {
      		die(_("the server does not support algorithm '%s'"),
      		    the_hash_algo->name);
      	}
    @@ pkt-line.c: void packet_reader_init(struct packet_reader *reader, int fd,
      	reader->options = options;
      	reader->me = "git";
     -	reader->hash_algo = &hash_algos[GIT_HASH_SHA1];
    -+	reader->hash_algo = &hash_algos[GIT_HASH_ORIGINAL];
    ++	reader->hash_algo = &hash_algos[GIT_HASH_SHA1_LEGACY];
      	strbuf_init(&reader->scratch, 0);
      }
      
    @@ remote-curl.c: static const struct git_hash_algo *detect_hash_algo(struct discov
      	 */
      	if (!p)
     -		return &hash_algos[GIT_HASH_SHA1];
    -+		return &hash_algos[GIT_HASH_ORIGINAL];
    ++		return &hash_algos[GIT_HASH_SHA1_LEGACY];
      
      	algo = hash_algo_by_length((p - heads->buf) / 2);
      	if (algo == GIT_HASH_UNKNOWN)
    @@ serve.c
      static int advertise_sid = -1;
      static int advertise_object_info = -1;
     -static int client_hash_algo = GIT_HASH_SHA1;
    -+static int client_hash_algo = GIT_HASH_ORIGINAL;
    ++static int client_hash_algo = GIT_HASH_SHA1_LEGACY;
      
      static int always_advertise(struct repository *r UNUSED,
      			    struct strbuf *value UNUSED)
    @@ setup.c: void initialize_repository_version(int hash_algo,
      	 * the remote repository's format.
      	 */
     -	if (hash_algo != GIT_HASH_SHA1 ||
    -+	if (hash_algo != GIT_HASH_ORIGINAL ||
    ++	if (hash_algo != GIT_HASH_SHA1_LEGACY ||
      	    ref_storage_format != REF_STORAGE_FORMAT_FILES)
      		target_version = GIT_REPO_VERSION_READ;
      
     -	if (hash_algo != GIT_HASH_SHA1 && hash_algo != GIT_HASH_UNKNOWN)
    -+	if (hash_algo != GIT_HASH_ORIGINAL && hash_algo != GIT_HASH_UNKNOWN)
    ++	if (hash_algo != GIT_HASH_SHA1_LEGACY && hash_algo != GIT_HASH_UNKNOWN)
      		git_config_set("extensions.objectformat",
      			       hash_algos[hash_algo].name);
      	else if (reinit)
    @@ transport.c: struct transport *transport_get(struct remote *remote, const char *
      	}
      
     -	ret->hash_algo = &hash_algos[GIT_HASH_SHA1];
    -+	ret->hash_algo = &hash_algos[GIT_HASH_ORIGINAL];
    ++	ret->hash_algo = &hash_algos[GIT_HASH_SHA1_LEGACY];
      
      	return ret;
      }
 5:  d27e4ee557d !  5:  d6e616cee74 setup: use the default algorithm to initialize repo format
    @@ Commit message
     
         Because we might not always want to use SHA-1 as the default, let's
         instead specify the default hash algorithm constant so that we will use
    -    whatever the specified default is.  However, to make sure we continue to
    -    read repositories without a specified hash algorithm as SHA-1, default
    -    the repository format to the original hash algorithm (SHA-1) when
    -    reading the repository format.
    +    whatever the specified default is.
    +
    +    However, we also need to continue to read older repositories.  If we're
    +    in a v0 repository or extensions.objectformat is not set, then we must
    +    continue to default to the original hash algorithm: SHA-1.  If an
    +    algorithm is set explicitly, however, it will override the hash_algo
    +    member of the repository_format struct and we'll get the right value.
    +
    +    Similarly, if the repository was initialized before Git 0.99.3, then it
    +    may lack a core.repositoryformatversion key, and some repositories lack
    +    a config file altogether.  In both cases, format->version is -1 and we
    +    need to assume that SHA-1 is in use.
    +
    +    Because clear_repository_format reinitializes the struct
    +    repository_format and therefore sets the hash_algo member to the default
    +    (which could in the future not be SHA-1), we need to reset this member
    +    explicitly.  We know, however, that at the point we call
    +    read_repository_format, we are actually reading an existing repository
    +    and not initializing a new one or operating outside of a repository, so
    +    we are not changing the default behavior back to SHA-1 if the default
    +    algorithm is different.
    +
    +    It is potentially questionable that we ignore all repository
    +    configuration if there is a config file but it doesn't have
    +    core.repositoryformatversion set, in which case we reset all of the
    +    configuration to the default.  However, it is unclear what the right
    +    thing to do instead with such an old repository is and a simple git init
    +    will add the missing entry, so for now, we simply honor what the
    +    existing code does and reset the value to the default, simply adding our
    +    initialization to SHA-1.
     
         Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
    @@ setup.c: static void init_repository_format(struct repository_format *format)
      int read_repository_format(struct repository_format *format, const char *path)
      {
      	clear_repository_format(format);
    -+	format->hash_algo = GIT_HASH_ORIGINAL;
    ++	format->hash_algo = GIT_HASH_SHA1_LEGACY;
      	git_config_from_file(check_repo_format, path, format);
     -	if (format->version == -1)
     +	if (format->version == -1) {
      		clear_repository_format(format);
    -+		format->hash_algo = GIT_HASH_ORIGINAL;
    ++		format->hash_algo = GIT_HASH_SHA1_LEGACY;
     +	}
      	return format->version;
      }
 6:  9bb3c06d5b7 =  6:  c470ac4ac41 t: default to compile-time default hash if not set
 7:  8670b4fc7b0 =  7:  6866b422608 t1007: choose the built-in hash outside of a repo
 8:  8804bfbfa2c =  8:  f957ce078f6 t4042: choose the built-in hash outside of a repo
 9:  231c9ff200f =  9:  9d619f2ef8c t5300: choose the built-in hash outside of a repo
10:  53554e53b35 <  -:  ----------- Enable SHA-256 by default in breaking changes mode
 -:  ----------- > 10:  39153c80971 help: add a build option for default hash
 -:  ----------- > 11:  c79bb70a2e7 Enable SHA-256 by default in breaking changes mode
```

From the repo.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/10] builtin: use default hash when outside a repository
  2025-07-01 21:14     ` brian m. carlson
@ 2025-07-02 15:08       ` Patrick Steinhardt
  0 siblings, 0 replies; 47+ messages in thread
From: Patrick Steinhardt @ 2025-07-02 15:08 UTC (permalink / raw)
  To: brian m. carlson, git, Junio C Hamano

On Tue, Jul 01, 2025 at 09:14:57PM +0000, brian m. carlson wrote:
> On 2025-07-01 at 11:35:33, Patrick Steinhardt wrote:
> > On Fri, Jun 20, 2025 at 01:19:35AM +0000, brian m. carlson wrote:
> > > We have some commands that can operate inside or outside a repository.
> > > If we're operating outside a repository, we clearly cannot use the
> > > repository's hash algorithm as a default since it doesn't exist, so
> > > instead, let's pick the default instead of specifically SHA-1.  Right
> > > now this results in no functional change since the default is SHA-1, but
> > > that may change in the future.
> > 
> > With the preceding commit in mind that introduced GIT_HASH_ORIGINAL you
> > could also argue that those callsites should be converted to use that
> > define instead. We always used to treat them as SHA1 repositories, and
> > we have no better way of telling otherwise, so we use the historical
> > value of SHA1 so that scripts aren't dependent on how exactly Git was
> > built.
> 
> I don't think I want to do that.  A lot of the functionality people use
> outside of repositories, such as index-pack and ls-remote, actually
> operates on repository objects and so it makes sense to use the default.
> 
> For instance, it will be a major inconvenience to still have to specify
> a custom object format three to five years after the switchover.

True. Initially it will be a pain for those edge cases, but the longer
the new default is in place the more useful it will become.

Patrick

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2025-07-02 15:08 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-20  1:19 [PATCH 00/10] Add SHA-256 by default as a breaking change brian m. carlson
2025-06-20  1:19 ` [PATCH 01/10] hash: add a constant for the default hash algorithm brian m. carlson
2025-06-20  1:19 ` [PATCH 02/10] hash: add a constant for the original " brian m. carlson
2025-06-20  1:56   ` Junio C Hamano
2025-06-20 20:43     ` brian m. carlson
2025-07-01 11:35       ` Patrick Steinhardt
2025-06-20  1:19 ` [PATCH 03/10] builtin: use default hash when outside a repository brian m. carlson
2025-06-20 14:19   ` Junio C Hamano
2025-07-01 11:35   ` Patrick Steinhardt
2025-07-01 21:14     ` brian m. carlson
2025-07-02 15:08       ` Patrick Steinhardt
2025-06-20  1:19 ` [PATCH 04/10] Use original hash for legacy formats brian m. carlson
2025-06-20 14:26   ` Junio C Hamano
2025-06-20 20:51     ` brian m. carlson
2025-06-20 21:14       ` Junio C Hamano
2025-07-01 11:35         ` Patrick Steinhardt
2025-06-20  1:19 ` [PATCH 05/10] setup: use the default algorithm to initialize repo format brian m. carlson
2025-06-20 14:55   ` Junio C Hamano
2025-06-20 20:28     ` brian m. carlson
2025-06-20 21:05       ` Junio C Hamano
2025-06-20  1:19 ` [PATCH 06/10] t: default to compile-time default hash if not set brian m. carlson
2025-06-20  1:19 ` [PATCH 07/10] t1007: choose the built-in hash outside of a repo brian m. carlson
2025-06-20  1:19 ` [PATCH 08/10] t4042: " brian m. carlson
2025-06-20  1:19 ` [PATCH 09/10] t5300: " brian m. carlson
2025-06-20  1:19 ` [PATCH 10/10] Enable SHA-256 by default in breaking changes mode brian m. carlson
2025-06-20 14:58   ` Junio C Hamano
2025-06-20 19:18     ` brian m. carlson
2025-06-20 15:03   ` Junio C Hamano
2025-06-20 19:15     ` brian m. carlson
2025-06-20 20:42       ` Junio C Hamano
2025-06-20 21:06         ` brian m. carlson
2025-07-01 11:35   ` Patrick Steinhardt
2025-07-01 21:22 ` [PATCH v2 00/11] Add SHA-256 by default as a breaking change brian m. carlson
2025-07-01 21:22   ` [PATCH v2 01/11] hash: add a constant for the default hash algorithm brian m. carlson
2025-07-01 21:22   ` [PATCH v2 02/11] hash: add a constant for the legacy " brian m. carlson
2025-07-01 21:22   ` [PATCH v2 03/11] builtin: use default hash when outside a repository brian m. carlson
2025-07-01 21:22   ` [PATCH v2 04/11] Use legacy hash for legacy formats brian m. carlson
2025-07-01 21:22   ` [PATCH v2 05/11] setup: use the default algorithm to initialize repo format brian m. carlson
2025-07-01 21:22   ` [PATCH v2 06/11] t: default to compile-time default hash if not set brian m. carlson
2025-07-01 21:22   ` [PATCH v2 07/11] t1007: choose the built-in hash outside of a repo brian m. carlson
2025-07-01 21:22   ` [PATCH v2 08/11] t4042: " brian m. carlson
2025-07-01 21:22   ` [PATCH v2 09/11] t5300: " brian m. carlson
2025-07-01 21:22   ` [PATCH v2 10/11] help: add a build option for default hash brian m. carlson
2025-07-01 21:22   ` [PATCH v2 11/11] Enable SHA-256 by default in breaking changes mode brian m. carlson
2025-07-01 22:10   ` [PATCH v2 00/11] Add SHA-256 by default as a breaking change Junio C Hamano
2025-07-02 14:46   ` Patrick Steinhardt
2025-07-02 15:01     ` Kristoffer Haugsbakk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).